Overview

Download options

OpenAlex Snapshot

The complete OpenAlex database, available in two formats: gzip-compressed JSON Lines and Apache Parquet. The free public snapshot is updated quarterly; paid plans get daily-refreshed snapshots and daily change files. Includes works, authors, sources, institutions, topics, publishers, funders, awards, and more.

Best for: Full database replication, data warehousing, comprehensive analysis

Size: The JSON Lines snapshot is ~330 GB compressed (~1.6 TB decompressed). Parquet is provided alongside it as a separate copy of the same data, so downloading both roughly doubles the transfer.

OpenAlex CLI

The official command-line tool for downloading filtered subsets of OpenAlex data.

pip install openalex-official
openalex download \
  --api-key YOUR_KEY \
  --filter "publication_year:2024" \
  --output ./results

Best for: Downloading specific subsets without the full snapshot

Full-text PDFs

Download PDFs and TEI XML for about 60 million works. Requires an API key — content downloads cost $0.01 per file.

Best for: Text mining, content analysis, building corpora

Decision tree

Do you need the complete database?
├── Yes → Download the snapshot
│         (/download/snapshot-format)
└── No
    ├── Do you need filtered metadata or content files?
    │   ├── Yes → Use the OpenAlex CLI
    │   │         (/download/openalex-cli)
    │   └── No → Use the REST API
    │             (/api-reference/introduction)
    └── Do you need bulk full-text PDFs?
        ├── Yes → See full-text PDF options
        │         (/download/full-text-pdfs)
        └── No → Use the REST API

Getting started

For the snapshot: Follow the download instructions to get the data to your machine

For the CLI: Install with pip install openalex-official and run openalex download --help

For PDFs: See full-text PDFs for the three download options

Working with the full snapshot is challenging. The dataset is large (330 GB+ compressed) and complex. If you’re unsure, start with the REST API — it can answer most questions with much less setup.

When to use the API vs. downloads

Use the REST API

Use data downloads

Download options

OpenAlex Snapshot

OpenAlex CLI

Full-text PDFs

Decision tree

Getting started

​When to use the API vs. downloads

Use the REST API

Use data downloads

​Download options

​OpenAlex Snapshot

​OpenAlex CLI

​Full-text PDFs

​Decision tree

​Getting started

When to use the API vs. downloads

Download options

OpenAlex Snapshot

OpenAlex CLI

Full-text PDFs

Decision tree

Getting started