One Wide Feedforward is All You Need

Source: Apple The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN). Attentioncaptures interdependencies between words regardless of their position, while the FFN non-linearly transforms each input token independently. In this work we explore the role of the FFN, and find that despite taking up a significant fraction of the […]

Text-To-4D Dynamic Scene Generation

Source: Meta We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed […]

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Source: Google Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) – a technique where preferences are labeled by an off-the-shelf LLM in […]