Skip to main content
OpenAlex offers multiple ways to access data beyond the REST API. Choose the approach that best fits your use case.

When to use the API vs. downloads

Use the REST API

  • Quick lookups and searches
  • Building applications
  • Real-time data needs
  • Most use cases

Use data downloads

  • Large-scale analysis
  • Machine learning training
  • Building local search indexes
  • Offline access requirements

Download options

OpenAlex Snapshot

The complete OpenAlex database, available in two formats: gzip-compressed JSON Lines and Apache Parquet. The free public snapshot is updated quarterly; paid plans get daily-refreshed snapshots and daily change files. Includes works, authors, sources, institutions, topics, publishers, funders, awards, and more. Best for: Full database replication, data warehousing, comprehensive analysis Size: The JSON Lines snapshot is ~330 GB compressed (~1.6 TB decompressed). Parquet is provided alongside it as a separate copy of the same data, so downloading both roughly doubles the transfer. Learn more about the snapshot format

OpenAlex CLI

The official command-line tool for downloading filtered subsets of OpenAlex data.
pip install openalex-official
openalex download \
  --api-key YOUR_KEY \
  --filter "publication_year:2024" \
  --output ./results
Best for: Downloading specific subsets without the full snapshot OpenAlex CLI documentation

Full-text PDFs

Download PDFs and TEI XML for about 60 million works. Requires an API key — content downloads cost $0.01 per file. Best for: Text mining, content analysis, building corpora Full-text PDF documentation

Decision tree

Do you need the complete database?
├── Yes → Download the snapshot
│         (/download/snapshot-format)
└── No
    ├── Do you need filtered metadata or content files?
    │   ├── Yes → Use the OpenAlex CLI
    │   │         (/download/openalex-cli)
    │   └── No → Use the REST API
    │             (/api-reference/introduction)
    └── Do you need bulk full-text PDFs?
        ├── Yes → See full-text PDF options
        │         (/download/full-text-pdfs)
        └── No → Use the REST API

Getting started

  1. For the snapshot: Follow the download instructions to get the data to your machine
  2. For the CLI: Install with pip install openalex-official and run openalex download --help
  3. For PDFs: See full-text PDFs for the three download options
Working with the full snapshot is challenging. The dataset is large (330 GB+ compressed) and complex. If you’re unsure, start with the REST API — it can answer most questions with much less setup.