Skip to main content
OpenAlex offers multiple ways to access data beyond the REST API. Choose the approach that best fits your use case.

When to use the API vs. downloads

Use the REST API

  • Quick lookups and searches
  • Building applications
  • Real-time data needs
  • Most use cases

Use data downloads

  • Large-scale analysis
  • Machine learning training
  • Building local search indexes
  • Offline access requirements

Download options

OpenAlex Snapshot

The complete OpenAlex database as gzip-compressed JSON Lines files, updated monthly. Includes works, authors, sources, institutions, topics, publishers, funders, and more. Best for: Full database replication, data warehousing, comprehensive analysis Size: ~330 GB compressed, ~1.6 TB decompressed Learn more about the snapshot format

OpenAlex CLI

The official command-line tool for downloading filtered subsets of OpenAlex data.
pip install openalex-official
openalex download --api-key YOUR_KEY --filter "publication_year:2024" --output ./data
Best for: Downloading specific subsets without the full snapshot OpenAlex CLI documentation

Full-text PDFs

Download PDFs and TEI XML for about 60 million works. Requires an API key — content downloads cost $0.01 per file. Best for: Text mining, content analysis, building corpora Full-text PDF documentation

Decision tree

Do you need the complete database?
├── Yes → Download the snapshot
│         (/download/snapshot-format)
└── No
    ├── Do you need filtered metadata or content files?
    │   ├── Yes → Use the OpenAlex CLI
    │   │         (/download/openalex-cli)
    │   └── No → Use the REST API
    │             (/api-reference/introduction)
    └── Do you need bulk full-text PDFs?
        ├── Yes → See full-text PDF options
        │         (/download/full-text-pdfs)
        └── No → Use the REST API

Getting started

  1. For the snapshot: Follow the download instructions to get the data to your machine
  2. For the CLI: Install with pip install openalex-official and run openalex download --help
  3. For PDFs: See full-text PDFs for the three download options
Working with the full snapshot is challenging. The dataset is large (330 GB+ compressed) and complex. If you’re unsure, start with the REST API — it can answer most questions with much less setup.