Research Papers

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Source: Meta Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide […]

September 27, 2023

The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was […]

September 22, 2023

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as […]

September 21, 2023

Chain-of-Verification Reduces Hallucination in Large Language Models

Source: Meta Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then […]

September 20, 2023

Chain-of-Verification Reduces Hallucination in Large Language Models

September 20, 2023

Agents: An Open-source Framework for Autonomous Language Agents

Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal […]

September 14, 2023

Textbooks Are All You Need II: phi-1.5 technical report

Source: Microsoft We continue the investigation into the power of smaller Transformer-based language models as initiated by textbf{TinyStories} — a 10 million parameter model that can produce coherent English — and the follow-up work on textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use […]

September 11, 2023

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Source: Google We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations revealed by self-auditing MADLAD-400, and the role data auditing had in the dataset creation process. We then train and release a 10.7B-parameter multilingual machine translation model on 250 billion tokens covering […]

September 9, 2023

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Source: Salesforce Selecting the “right” amount of information to include in a summary is a difficult task. A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a “Chain of Density” (CoD) […]

September 8, 2023

Large-Scale Automatic Audiobook Creation

An audiobook can dramatically improve a work of literature’s accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to […]

September 7, 2023

Provably safe systems: the only path to controllable AGI

We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We […]

September 5, 2023

One Wide Feedforward is All You Need

Source: Apple The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN). Attentioncaptures interdependencies between words regardless of their position, while the FFN non-linearly transforms each input token independently. In this work we explore the role of the FFN, and find that despite taking up a significant fraction of the […]

September 4, 2023

Text-To-4D Dynamic Scene Generation

Source: Meta We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed […]

September 4, 2023

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Source: Google Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) – a technique where preferences are labeled by an off-the-shelf LLM in […]

September 1, 2023

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

August 22, 2023

Reinforced Self-Training (ReST) for Language Modeling

Source: Google DeepMind Reinforcement learning from human feedback (RLHF) can improve the quality of large language model’s (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST […]

August 21, 2023

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

August 21, 2023

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact […]

August 10, 2023

Brain2Music: Reconstructing Music from Human Brain Activity

Source: Google The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation […]

July 20, 2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Source: Microsoft Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this work, we introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without […]

July 5, 2023

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present […]

June 27, 2023

Fast Segment Anything

The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer […]

June 21, 2023

STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

Source: University of Toronto Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL-E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two […]

June 5, 2023

Let’s Verify Step by Sep

Source: OpenAI In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each […]

May 31, 2023

ImageBind: One Embedding Space To Bind Them All

Source: Meta We present ImageBind, an approach to learn a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. ImageBind can […]