Skip to main content
This page is optimized for LLM agents and AI applications. For human-readable guides, see Getting Started.

Base URL and Authentication

Base: https://api.openalex.org
Auth: API key required (free at openalex.org/settings/api)
Rate: $1/day free usage with key, $0.01/day without

Entity Endpoints

/works          - Hundreds of millions of scholarly documents (articles, books, datasets)
/authors        - Researcher profiles with disambiguated identities
/sources        - Journals, repositories, conferences
/institutions   - Universities, research organizations
/topics         - Subject classifications (3-level hierarchy)
/publishers     - Publishing organizations
/funders        - Funding agencies

Special Endpoints

content.openalex.org/works/{id}.pdf - Download PDFs ($0.01 each)
/text                               - DEPRECATED, do not use

Critical: Two-Step ID Lookup

Never filter by entity names directly. Names are ambiguous. Always resolve to IDs first.
# WRONG - will fail or return wrong results
/works?filter=author_name:Einstein

# CORRECT - two steps
# 1. Get ID
/authors?search=Einstein
# Response: id = "A5012345678"

# 2. Filter by ID
/works?filter=authorships.author.id:A5012345678
This applies to: authors, institutions, sources, topics, publishers, funders.

Query Parameters

api_key=        - Required (get free at openalex.org/settings/api)
filter=         - Filter results (see syntax below)
search=         - Full-text search across title/abstract/fulltext
sort=           - Sort results (e.g., cited_by_count:desc)
per_page=       - Results per page (default: 25, max: 100)
page=           - Page number for pagination
sample=         - Random results (e.g., sample=50)
seed=           - Seed for reproducible sampling
select=         - Limit returned fields (e.g., select=id,title)
group_by=       - Aggregate results by a field
OpenAlex uses snake_case for all parameters: per_page, group_by, api_key.

Filter Syntax

# Single filter
?filter=publication_year:2024

# Multiple filters (AND)
?filter=publication_year:2024,is_oa:true

# Multiple values (OR) - up to 100 values
?filter=type:article|book|dataset

# Negation
?filter=type:!paratext

# Comparison
?filter=cited_by_count:>100
?filter=publication_year:<2020
?filter=publication_year:2020-2024

Common Filter Fields

Works

authorships.author.id         - Author's OpenAlex ID
authorships.institutions.id   - Institution's OpenAlex ID
primary_location.source.id    - Journal/source ID
topics.id                     - Topic ID
publication_year              - Year (integer)
cited_by_count                - Citations (integer)
is_oa                         - Open access (boolean)
type                          - article, book, dataset, etc.
has_fulltext                  - Has searchable fulltext (boolean)

Authors

last_known_institutions.id    - Current institution
works_count                   - Number of works
cited_by_count                - Total citations

Common Patterns

Get works by author

# Step 1: Find author
/authors?search=Heather+Piwowar
# Step 2: Get works
/works?filter=authorships.author.id:A5023888391

Get works from institution

# Step 1: Find institution
/institutions?search=MIT
# Step 2: Get works
/works?filter=authorships.institutions.id:I63966007

Bulk DOI lookup (up to 100)

/works?filter=doi:10.1234/a|10.1234/b|10.1234/c&per_page=100

Random sample

/works?sample=100&seed=42

Aggregate by field

/works?filter=publication_year:2024&group_by=topics.id

Pricing

EndpointCost
Singleton (/works/W123)Free
List (/works?filter=...)$0.0001
Search (?search=)$0.001
Content download (PDF)$0.01

Query Limits

LimitValue
OR values per filter100
per_page max100
sample max10,000
Basic paging limit10,000 results

Error Handling

def fetch_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, timeout=30)
        if response.status_code == 200:
            return response.json()
        if response.status_code in [429, 500]:
            time.sleep(2 ** attempt)  # Exponential backoff
            continue
        response.raise_for_status()
    raise Exception("Max retries exceeded")

Common Mistakes

MistakeFix
Filter by nameResolve to ID first
Default page sizeUse per_page=100
Sequential ID lookupsBatch with | operator
No error handlingImplement exponential backoff
Fetching all fieldsUse select= for needed fields

Deprecated Features

See Deprecations for full list. Key items:
  • Concepts → Use Topics instead
  • /text endpoint → Do not use
  • host_venue → Use primary_location
  • grants → Use funders and awards