Gpt-Neo

GPT-Neo

GPT-Neo is the name of our codebase for transformer-based language models loosely styled around the GPT architecture. One of our goals is to use GPT-Neo to replicate a GPT-3 sized model and open source it to the public, for free. Along the way we will be running experiments with alternative architectures and attention types, releasing any intermediate models, and writing up any findings on our blog. Our models are built in Mesh TensorFlow, which will allow us to scale up to GPT-3 sizes and beyond using simultaneous model and data parallelism.