> ## Documentation Index
> Fetch the complete documentation index at: https://developers.openalex.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Download to your machine

> Get the OpenAlex snapshot files onto your local machine using the AWS CLI

The OpenAlex snapshot is hosted on [Amazon S3](https://aws.amazon.com/s3/) and is **free to download**. You don't need an AWS account.

<Info>
  Many thanks to the [AWS Open Data program](https://aws.amazon.com/opendata/), which covers the data-transfer fees (about \$70 per download) so users don't have to.
</Info>

## Prerequisites

Install the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html). All commands in this guide use it. No account or credentials are needed — the `--no-sign-request` flag provides anonymous access.

You can also browse the snapshot in your browser: [openalex.s3.amazonaws.com/browse.html](https://openalex.s3.amazonaws.com/browse.html)

## Download the full snapshot

This copies everything in the `openalex` S3 bucket to a local folder.

```bash theme={"dark"}
aws s3 sync "s3://openalex" "openalex-snapshot" --no-sign-request
```

<Warning>
  This pulls **both** formats (JSON Lines under `data/jsonl/` and Parquet under `data/parquet/`) plus the `legacy-data/` prefix, so it's well over **660 GB**. Most users want a single format — see [Download a single format or entity type](#download-a-single-format-or-entity-type) below.
</Warning>

<Warning>
  If you're downloading into a folder that already has a previous snapshot, use the `--delete` flag to remove outdated files. Otherwise you'll get duplicate entities that have moved between partitions.

  ```bash theme={"dark"}
  aws s3 sync "s3://openalex" "openalex-snapshot" --no-sign-request --delete
  ```
</Warning>

## Check the current size

The snapshot size changes over time. Check before downloading:

```bash theme={"dark"}
aws s3 ls --summarize --human-readable --no-sign-request --recursive "s3://openalex/"
```

## File structure

After downloading, you'll have a structure like this. The data is split into two top-level format folders — `jsonl/` and `parquet/` — each a complete copy of every entity:

```text theme={"dark"}
openalex-snapshot/
├── LICENSE.txt
├── RELEASE_NOTES.txt
├── legacy-data/              # pre-2026 flat layout + merged_ids (kept for back-compat)
└── data
    ├── jsonl
    │   ├── manifest.json      # all entities
    │   ├── works
    │   │   ├── manifest.json
    │   │   └── updated_date=2026-06-24
    │   │       ├── part_0000.gz
    │   │       └── part_0001.gz
    │   ├── authors
    │   │   └── ...
    │   ├── institutions
    │   ├── sources
    │   ├── publishers
    │   ├── funders
    │   ├── awards
    │   └── ...               # topics, keywords, concepts, countries, etc.
    └── parquet
        ├── manifest.json
        ├── works
        │   ├── manifest.json
        │   └── updated_date=2026-06-24
        │       ├── part_0000.parquet
        │       └── part_0001.parquet
        └── ...
```

See [Snapshot data format](/download/snapshot-format) for the full entity list, the partition structure, and how to keep your copy up to date.

## Download a single format or entity type

If you only need one format, sync its prefix:

```bash theme={"dark"}
# JSON Lines only
aws s3 sync "s3://openalex/data/jsonl" "openalex-snapshot/data/jsonl" --no-sign-request

# Parquet only
aws s3 sync "s3://openalex/data/parquet" "openalex-snapshot/data/parquet" --no-sign-request
```

To narrow further to a single entity type, add it to the prefix:

```bash theme={"dark"}
aws s3 sync "s3://openalex/data/jsonl/works" "openalex-snapshot/data/jsonl/works" --no-sign-request
```

## Alternatives to local download

If you don't want to download files locally, some services can read directly from S3:

* **Amazon Redshift:** [Load from S3](https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html) using the manifest files
* **ETL tools** with S3 connectors (Xplenty, Airbyte, etc.)

For these approaches, the [snapshot data format](/download/snapshot-format) documentation should have enough detail to get started.

## Download with an enterprise API key

Enterprise users can download a daily-refreshed snapshot. Each day's full snapshot is published to dated folders in the `openalex-snapshots` staging bucket, in both JSON Lines and Parquet.

1. Add this to \~/.aws/config (replace YOUR\_KEY with your OpenAlex API key):

```shellscript theme={"dark"}
[profile openalex]
credential_process = curl -sf -X POST "https://api.openalex.org/snapshots/credentials?api_key=YOUR_KEY"
```

The AWS CLI will fetch and refresh credentials automatically.

2. Browse and download

```shellscript theme={"dark"}
aws s3 ls s3://openalex-snapshots/full/ --profile openalex

aws s3 sync s3://openalex-snapshots/full/2026-04-29/jsonl/ ./openalex-snapshot-jsonl --profile openalex

aws s3 sync s3://openalex-snapshots/full/2026-04-29/parquet/ ./openalex-snapshot-parquet --profile openalex
```

Each dated folder under `full/` is a complete snapshot built that day, so you can pull a fresh full copy daily rather than waiting for the quarterly public release. Both formats are included; Parquet is also being added to the free quarterly public snapshot beginning June 2026. If you're interested in a daily-refreshed enterprise snapshot, contact [sales@openalex.org](mailto:sales@openalex.org).
