Many thanks to the AWS Open Data program, which covers the data-transfer fees (about $70 per download) so users don’t have to.
Prerequisites
Install the AWS CLI. All commands in this guide use it. No account or credentials are needed — the--no-sign-request flag provides anonymous access.
You can also browse the snapshot in your browser: openalex.s3.amazonaws.com/browse.html
Download the full snapshot
This copies everything in theopenalex S3 bucket to a local folder.
Check the current size
The snapshot size changes over time. Check before downloading:File structure
After downloading, you’ll have a structure like this. The data is split into two top-level format folders —jsonl/ and parquet/ — each a complete copy of every entity:
Download a single format or entity type
If you only need one format, sync its prefix:Alternatives to local download
If you don’t want to download files locally, some services can read directly from S3:- Amazon Redshift: Load from S3 using the manifest files
- ETL tools with S3 connectors (Xplenty, Airbyte, etc.)
Download with an enterprise API key
Enterprise users can download a daily-refreshed snapshot. Each day’s full snapshot is published to dated folders in theopenalex-snapshots staging bucket, in both JSON Lines and Parquet.
- Add this to ~/.aws/config (replace YOUR_KEY with your OpenAlex API key):
- Browse and download
full/ is a complete snapshot built that day, so you can pull a fresh full copy daily rather than waiting for the quarterly public release. Both formats are included; Parquet is also being added to the free quarterly public snapshot beginning June 2026. If you’re interested in a daily-refreshed enterprise snapshot, contact sales@openalex.org.