> ## Documentation Index
> Fetch the complete documentation index at: https://developers.openalex.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Key Concepts

> Understand entities, IDs, and data structures in OpenAlex

Before diving into the API, it helps to understand a few core concepts that underpin everything in OpenAlex.

## Entity Types

OpenAlex describes scholarly research as a graph of interconnected **entities**. There are eight entity types:

<Note>
  Counts are approximate and change as new data is added. Use `/works?per_page=1` (etc.) for current counts.
</Note>

| Entity                                      | Description                                                  | Approx. Count        |
| ------------------------------------------- | ------------------------------------------------------------ | -------------------- |
| [Works](/api-reference/works)               | Scholarly documents (articles, books, datasets, theses)      | hundreds of millions |
| [Authors](/api-reference/authors)           | Researchers who create works                                 | tens of millions     |
| [Sources](/api-reference/sources)           | Where works are hosted (journals, repositories, conferences) | 250K+                |
| [Institutions](/api-reference/institutions) | Organizations where authors are affiliated                   | 110K+                |
| [Topics](/api-reference/topics)             | Subject classifications (4-level hierarchy)                  | 4.5K                 |
| [Publishers](/api-reference/publishers)     | Organizations that distribute works                          | 10K+                 |
| [Funders](/api-reference/funders)           | Organizations that fund research                             | 35K+                 |
| [Countries](/api-reference/countries)       | Geographic information (countries, continents)               | —                    |

Each entity type has its own API endpoint (e.g., `/works`, `/authors`).

## Topic Hierarchy

Topics are organized in a four-level hierarchy:

| Level | Name                                 | Count   | Example                     |
| ----- | ------------------------------------ | ------- | --------------------------- |
| 1     | [Domain](/api-reference/domains)     | 4       | Physical Sciences           |
| 2     | [Field](/api-reference/fields)       | 26      | Computer Science            |
| 3     | [Subfield](/api-reference/subfields) | 254     | Artificial Intelligence     |
| 4     | [Topic](/api-reference/topics)       | \~4,500 | Natural Language Processing |

Every work is assigned a `primary_topic` which includes the full hierarchy path. You can filter by any level: `filter=primary_topic.domain.id:1`, `filter=primary_topic.field.id:17`, etc.

## OpenAlex IDs

Every entity in OpenAlex has a unique **OpenAlex ID**. It's a URL formatted like this:

```
https://openalex.org/W2741809807
```

### ID Structure

The ID has two parts:

1. **Base**: Always `https://openalex.org/`
2. **Key**: A letter prefix + numeric ID (e.g., `W2741809807`)

The letter prefix indicates the entity type:

| Prefix | Entity               |
| ------ | -------------------- |
| W      | Work                 |
| A      | Author               |
| S      | Source               |
| I      | Institution          |
| T      | Topic                |
| K      | Keyword              |
| P      | Publisher            |
| F      | Funder               |
| G      | Award (Grant)        |
| C      | Concept (deprecated) |

IDs are case-insensitive: `W2741809807` and `w2741809807` are equivalent.

### Using IDs in the API

You can use just the key portion when making API calls:

```bash theme={"dark"}
# Full URL
curl "https://api.openalex.org/works/https://openalex.org/W2741809807"

# Just the key (recommended)
curl "https://api.openalex.org/works/W2741809807"
```

### Resolve IDs

<Warning>
  **Don't filter by names directly.** Entity names are ambiguous—"MIT" could match multiple institutions, "Smith" matches thousands of authors. Always resolve names to IDs first.
</Warning>

When you want to find works by a specific author, institution, journal, or topic, use the **two-step pattern**:

1. **Search** for the entity to get its OpenAlex ID
2. **Filter** works using that ID

This avoids hallucinated filters and ensures you get the right entity.

#### The Two-Step Pattern

**Step 1: Search for the Entity**

Use the search endpoint for the entity type:

```bash theme={"dark"}
# Find an author
curl "https://api.openalex.org/authors?search=Albert+Einstein"

# Find an institution
curl "https://api.openalex.org/institutions?search=MIT"

# Find a journal
curl "https://api.openalex.org/sources?search=Nature"

# Find a topic
curl "https://api.openalex.org/topics?search=machine+learning"
```

The response includes IDs you can use:

```json theme={"dark"}
{
  "results": [
    {
      "id": "https://openalex.org/A5012345678",
      "display_name": "Albert Einstein",
      "works_count": 272
    }
  ]
}
```

**Step 2: Filter Works by ID**

Use the ID to filter works:

```bash theme={"dark"}
# Works by this author
curl "https://api.openalex.org/works?filter=authorships.author.id:A5012345678"

# Works from this institution
curl "https://api.openalex.org/works?filter=authorships.institutions.id:I136199984"

# Works in this journal
curl "https://api.openalex.org/works?filter=primary_location.source.id:S137773608"

# Works on this topic
curl "https://api.openalex.org/works?filter=topics.id:T12345"
```

#### Common Lookups

**Find Works by Author Name**

```bash theme={"dark"}
# Step 1: Search for the author
curl "https://api.openalex.org/authors?search=Heather+Piwowar"
# Response: id = "https://openalex.org/A5023888391"

# Step 2: Get their works
curl "https://api.openalex.org/works?filter=authorships.author.id:A5023888391"
```

**Find Works by Institution Name**

```bash theme={"dark"}
# Step 1: Search for the institution
curl "https://api.openalex.org/institutions?search=Stanford+University"
# Response: id = "https://openalex.org/I97018004"

# Step 2: Get works from that institution
curl "https://api.openalex.org/works?filter=authorships.institutions.id:I97018004"
```

**Find Works by Journal Name**

```bash theme={"dark"}
# Step 1: Search for the journal (source)
curl "https://api.openalex.org/sources?search=Nature"
# Response: id = "https://openalex.org/S137773608"

# Step 2: Get works published there
curl "https://api.openalex.org/works?filter=primary_location.source.id:S137773608"
```

**Find Works by Topic**

```bash theme={"dark"}
# Step 1: Search for the topic
curl "https://api.openalex.org/topics?search=CRISPR"
# Response: id = "https://openalex.org/T10234"

# Step 2: Get works on that topic
curl "https://api.openalex.org/works?filter=topics.id:T10234"
```

#### When You Have External IDs

If you already have an external identifier (DOI, ORCID, ROR, ISSN), you can skip the search step and use the ID directly:

```bash theme={"dark"}
# By ORCID (author)
curl "https://api.openalex.org/works?filter=authorships.author.id:https://orcid.org/0000-0003-1613-5981"

# By ROR (institution)
curl "https://api.openalex.org/works?filter=authorships.institutions.id:https://ror.org/042nb2s44"

# By ISSN (source/journal)
curl "https://api.openalex.org/sources/issn:0028-0836"
# Then use the returned OpenAlex ID
```

#### Decision Guide

| Input Type       | What to Do                                                      |
| ---------------- | --------------------------------------------------------------- |
| Name (ambiguous) | Search first, then filter by ID                                 |
| ORCID            | Use directly: `authorships.author.id:https://orcid.org/...`     |
| ROR              | Use directly: `authorships.institutions.id:https://ror.org/...` |
| DOI              | Get work directly: `/works/https://doi.org/...`                 |
| ISSN             | Get source by ISSN, then filter by source ID                    |
| OpenAlex ID      | Use directly: `authorships.author.id:A123456`                   |

#### Handling Ambiguous Results

When searching returns multiple matches, you need to pick the right one:

**Use Additional Filters**

Narrow down the search:

```bash theme={"dark"}
# Author named "Smith" affiliated with MIT
curl "https://api.openalex.org/authors?search=Smith&filter=last_known_institution.id:I63966007"
```

**Check Display Name and Metadata**

Look at `display_name`, `works_count`, `cited_by_count`, and institutional affiliations to identify the right entity.

**Use Autocomplete for Interactive UIs**

The autocomplete endpoint is fast and returns ranked results:

```bash theme={"dark"}
curl "https://api.openalex.org/autocomplete/authors?q=einst"
```

#### Filter Field Reference

| To find works by...      | Filter field                                |
| ------------------------ | ------------------------------------------- |
| Author                   | `authorships.author.id`                     |
| Author's institution     | `authorships.institutions.id`               |
| Primary source (journal) | `primary_location.source.id`                |
| Any source               | `locations.source.id`                       |
| Topic                    | `topics.id` or `primary_topic.id`           |
| Publisher                | `primary_location.source.host_organization` |
| Funder                   | `funders.id`                                |

#### Example: Complete Workflow

Find highly-cited papers about machine learning from MIT in the last 3 years:

```bash theme={"dark"}
# 1. Get MIT's ID
curl "https://api.openalex.org/institutions?search=MIT"
# Result: I63966007

# 2. Get machine learning topic ID
curl "https://api.openalex.org/topics?search=machine+learning"
# Result: T154945302

# 3. Filter works with all criteria
curl "https://api.openalex.org/works?filter=authorships.institutions.id:I63966007,topics.id:T154945302,publication_year:>2022&sort=cited_by_count:desc"
```

### External IDs

You can also retrieve entities using external IDs like DOIs, ORCIDs, and RORs:

```bash theme={"dark"}
# By DOI
curl "https://api.openalex.org/works/https://doi.org/10.7717/peerj.4375"

# By ORCID
curl "https://api.openalex.org/authors/https://orcid.org/0000-0003-1613-5981"

# By ROR
curl "https://api.openalex.org/institutions/https://ror.org/02y3ad647"

# Shorthand format
curl "https://api.openalex.org/works/doi:10.7717/peerj.4375"
```

### Canonical External IDs

Each entity type has a "canonical" external ID—the most widely adopted identifier for that type:

| Entity       | Canonical ID |
| ------------ | ------------ |
| Works        | DOI          |
| Authors      | ORCID        |
| Sources      | ISSN-L       |
| Institutions | ROR          |
| Topics       | Wikidata ID  |
| Publishers   | Wikidata ID  |

### Merged IDs

Sometimes we merge duplicate entities (e.g., two author records for the same person). If you request a merged ID, you'll be redirected to the new ID:

```bash theme={"dark"}
$ curl -i https://api.openalex.org/authors/A5092938886
HTTP/1.1 301 MOVED PERMANENTLY
Location: https://api.openalex.org/authors/A5006060960
```

Most HTTP clients handle this automatically.

## Dehydrated Objects

When entities are nested inside other entities, they're often returned in **dehydrated** form—a stripped-down version with only essential fields.

For example, a Work's `authorships` field contains dehydrated Author objects:

```json theme={"dark"}
{
  "authorships": [
    {
      "author": {
        "id": "https://openalex.org/A5023888391",
        "display_name": "Jason Priem",
        "orcid": "https://orcid.org/0000-0001-6187-6610"
      },
      "institutions": [
        {
          "id": "https://openalex.org/I4200000001",
          "display_name": "OurResearch",
          "ror": "https://ror.org/02nr0ka47",
          "country_code": "US",
          "type": "nonprofit"
        }
      ]
    }
  ]
}
```

To get the full entity, make a separate request using the ID:

```bash theme={"dark"}
curl "https://api.openalex.org/authors/A5023888391"
```

## XPAC (Expansion Pack)

In November 2025, OpenAlex added 190+ million new works as part of an expansion called **XPAC** (part of the [Walden rewrite](https://blog.openalex.org/openalex-rewrite-walden-launch/)). This includes:

* All of DataCite
* Thousands of institutional and subject-area repositories
* Primarily datasets and repository records

### Why XPAC Works Are Excluded by Default

XPAC works have lower data quality on average (improving over time). To avoid surprising users with sudden changes in result counts and quality, **XPAC works are excluded by default**.

### Including XPAC Works

Add `include_xpac=true` to any works endpoint:

```bash theme={"dark"}
# Without XPAC (default)
curl "https://api.openalex.org/works"

# With XPAC (roughly doubles the count)
curl "https://api.openalex.org/works?include_xpac=true"
```

### Filtering by XPAC

Each work has an `is_xpac` boolean field:

```bash theme={"dark"}
# Get only XPAC works
curl "https://api.openalex.org/works?include_xpac=true&filter=is_xpac:true"
```

## Query Parameter Naming

<Note>
  OpenAlex uses **snake\_case** for all query parameters: `filter`, `sort`, `group_by`, `per_page`, `api_key`, etc.
</Note>

## What's Next?

* [Filtering](/guides/filtering) — Filter results by any field
* [Searching](/guides/searching) — Full-text and semantic search
* [Authentication](/guides/authentication) — API keys and pricing
