The complete OpenAlex database as gzip-compressed JSON Lines files, updated monthly. Includes works, authors, sources, institutions, topics, publishers, funders, and more.Best for: Full database replication, data warehousing, comprehensive analysisSize: ~330 GB compressed, ~1.6 TB decompressedLearn more about the snapshot format
Download PDFs and TEI XML for about 60 million works. Requires an API key — content downloads cost $0.01 per file.Best for: Text mining, content analysis, building corporaFull-text PDF documentation
Do you need the complete database?├── Yes → Download the snapshot│ (/download/snapshot-format)└── No ├── Do you need filtered metadata or content files? │ ├── Yes → Use the OpenAlex CLI │ │ (/download/openalex-cli) │ └── No → Use the REST API │ (/api-reference/introduction) └── Do you need bulk full-text PDFs? ├── Yes → See full-text PDF options │ (/download/full-text-pdfs) └── No → Use the REST API
For the CLI: Install with pip install openalex-official and run openalex download --help
For PDFs: See full-text PDFs for the three download options
Working with the full snapshot is challenging. The dataset is large (330 GB+ compressed) and complex. If you’re unsure, start with the REST API — it can answer most questions with much less setup.