AlphaFold 3 input schema models#

This documentation covers the public API of alphafold3_input.

Package#

AlphaFold 3 input models.

This package provides models for constructing AlphaFold 3 input files.

It offers a Pythonic, object-oriented interface for defining AlphaFold 3 jobs, abstracting the underlying JSON input format into typed models and validated structures. The implementation closely follows the official AlphaFold 3 input specification provided by DeepMind.

For full details on the expected input format and supported features, refer to the official AlphaFold 3 input specification.

Exports

JSON_SCHEMA_URL: canonical JSON Schema URL for editor validation.
Job, Dialect, Version: top-level job model and input format enums.
DNA, RNA, Protein, Ligand: entity models used under Job.entities.
Modification, Entity: residue modification model and its entity-scope enum.
Template: structural template specification for proteins.
Atom, Bond: covalent bond specification models.
Operation, trace(), reindex(), realign(): operation trace generation, reindexing, and realignment utilities.
ccd(), component(): generation of custom chemical component dictionaries.

Metadata#

Package metadata for AlphaFold 3 input models.

Provides a normalized interface to distribution metadata for the package.

Exports

__title__: Package title.
__description__: Package summary.
__author__: Package author.
__version__: Installed version.
__package__: Distribution name.
__module_name__: Import path.
__repository__: Repository URL.
__documentation__: Documentation URL.
__changelog__: Changelog URL.
__issues__: Issue tracker URL.

Reference#

Constants#

JSON_SCHEMA_URL = 'https://cdn.jsdelivr.net/gh/igor-koop/alphafold3_input@main/alphafold3-input.schema.json'#

Canonical JSON schema URL for AlphaFold 3 input files.

Use Job.save() with schema=True to include this URL as the top-level $schema field for editor validation. The upstream AlphaFold 3 parser rejects unknown top-level keys, so keep the default schema=False for runnable input files.

Models#

class Job(*, name, schema='https://cdn.jsdelivr.net/gh/igor-koop/alphafold3_input@main/alphafold3-input.schema.json', dialect=Dialect.LOCAL, version=Version.IV, seeds=<factory>, entities=<factory>, bonds=None, ccd=None)[source]#

AlphaFold 3 job specification.

A job contains one or more sequence entities (Protein, RNA, DNA, or Ligand) and may include explicit covalent bonds and a custom ccd.

The number of predicted structures is controlled by seeds, which may be given either as an integer count or as an explicit sequence of integer seeds.

The selected version must support the features used by the job. The dialect selects the AlphaFold 3 input format and currently only supports Dialect.LOCAL.

name: str#: Job name.

schema: str#: JSON Schema URI for editor validation.

dialect: Dialect#: Input format dialect.

version: Version#: Input format version.

seeds: int | Sequence[int]#: Random seeds or their total number.

entities: Sequence[Protein | RNA | DNA | Ligand]#: Entities included in the job.

bonds: Sequence[Bond] | None#: Covalent bonds between atom pairs.

ccd: str | Path | None#: Custom chemical components dictionary.

Examples

Job with a protein and a covalently linked ligand.

>>> job = Job(name="example")
>>> ((carboxylase,), (biotin,)) = job.add(
...     Protein(sequence="VLSAMKMETVV"),
...     Ligand(definition=["BTN"]),
... )
>>> job.bonds = (
...     Bond(
...         source=Atom(entity=biotin, residue=1, name="C11"),
...         target=Atom(entity=carboxylase, residue=6, name="NZ"),
...     ),
... )

Job with multiple entity copies and multiple model seeds.

>>> Job(
...     name="multimer",
...     seeds=5,
...     entities=[
...         Protein(
...             sequence="ACDE",
...             description="homotrimer",
...             copies=3,
...         ),
...     ],
... )

add(*entities)[source]#

Append entities to the job.

Parameters:: *entities (Protein | RNA | DNA | Ligand) – One or more entities to add.
Returns:: tuple[tuple[str, ...], ...] – Identifiers of the added entities.
Raises:: TypeError – If no entities were provided.

classmethod load(path, *, encoding='utf-8')[source]#

Load a job from an AlphaFold 3 input file.

Parameters:

path (Path) – Path to the JSON input file.
encoding (str) – Text encoding used to read the file.

Returns:

Job – Parsed and validated job instance.

export()[source]#

Export the job as an AlphaFold 3 input mapping.

Returns:: dict[str, Any] – AlphaFold 3 input mapping.

save(path, *, indent=2, ensure_ascii=False, encoding='utf-8', schema=False)[source]#

Save the job to an AlphaFold 3 input file.

Parameters:

path (Path) – Destination path for the JSON file.
indent (int | None) – JSON indentation level.
ensure_ascii (bool) – Whether to escape non-ASCII characters in the JSON output.
encoding (str) – Text encoding used to write the file.
schema (bool) – Whether to include the JSON Schema URI.

Returns:

Path – The written path.

class Dialect(*values)[source]#

AlphaFold 3 input format dialect.

LOCAL = 'alphafold3'#: AlphaFold 3 dialect

SERVER = 'alphafoldserver'#: AlphaFoldServer dialect.

class Version(*values)[source]#

AlphaFold 3 input format version.

I = 1#: Input format version 1.

II = 2#: Input format version 2.

III = 3#: Input format version 3.

IV = 4#: Input format version 4.

class Protein(*, id=None, description=None, sequence, modifications=<factory>, alignment=None, templates=None, copies=1)[source]#

Protein chain entity specification.

A protein chain is defined by its amino acid sequence, optional residue modifications, optional multiple sequence alignment, and optional structural templates.

Post-translational modifications can be provided through Modification entries. Additional entries may be appended using modify().

A multiple sequence alignment may be provided either inline as a string or as a filesystem path. When alignment is present, paired MSA output is serialized as an empty string.

Multiple copies of a protein chain can be defined either by setting copies or by providing multiple explicit identifiers in id. The optional description field is supported only when Job.version is set to Version.IV.

id: str | Sequence[str] | None#: Protein chain identifier(s).

description: str | None#: Free-text protein chain description.

sequence: str#: Protein chain amino acid sequence.

modifications: Sequence[Modification]#: Protein chain post-translational modifications.

alignment: str | Path | None#: Multiple sequence alignment.

templates: Sequence[Template] | None#: Structural templates.

copies: int#: Number of protein chain copies.

Examples

Protein chain with a description.

>>> Protein(
...     description="AviTag for BirA-mediated biotinylation",
...     sequence="GLNDIFEAQKIEWHE",
... )

Multiple copies of a protein chain.

>>> Protein(
...     sequence="RMKQLEDKVEELLSKKYHLENEVARLKKLVGER",
...     copies=2,
... )

Protein chain with modified residues.

>>> peptide = Protein(sequence="PVLSCGEWQL")
>>> peptide.modify(
...     Modification(type="HY3", position=1),
...     Modification(type="P1L", position=5),
... )

Protein chain with an alignment file.

>>> Protein(
...     id=["A", "B"],
...     sequence="KRRWKKNFIAVSAANRFKKISSSGAL",
...     alignment=Path("alignment.a3m"),
... )

Protein chain with a structural template.

>>> Protein(
...     id=["C"],
...     sequence="RPACQLW",
...     templates=[
...         Template(
...             structure=Path("template.cif.gz"),
...             indexes={0: 0, 1: 1, 2: 2, 4: 3, 5: 4, 6: 8},
...         ),
...     ],
... )

modify(*modifications)[source]#

Append residue modifications to the protein chain.

Parameters:: *modifications (Modification) – One or more modifications to add.
Returns:: Protein – Protein chain with appended modifications.
Raises:: TypeError – If no modifications were provided.

class DNA(*, id=None, description=None, sequence, modifications=<factory>, copies=1)[source]#

DNA chain entity specification.

A DNA chain is defined by its nucleotide sequence, optional modifications, and one or more chain identifiers via id. Multiple copies can be defined either by setting copies or by providing multiple explicit identifiers in id.

The optional description field is supported only when Job.version is set to Version.IV.

id: str | Sequence[str] | None#: DNA chain identifier(s).

description: str | None#: Free-text DNA chain description.

sequence: str#: DNA chain nucleotide sequence.

modifications: Sequence[Modification]#: DNA chain residue modifications.

copies: int#: Number of DNA chain copies.

Examples

DNA chain with a description.

>>> DNA(
...     description="Promoter for bacteriophage T7 RNA polymerase",
...     sequence="TAATACGACTCACTATAGG",
... )

Multiple copies of a DNA chain.

>>> DNA(
...     sequence="GAATTC",
...     copies=2,
... )

DNA chain with modified residues.

>>> heptamer = DNA(sequence="GACCTCT")
>>> heptamer.modify(
...     Modification(type="6OG", position=1),
...     Modification(type="6MA", position=2),
... )

modify(*modifications)[source]#

Append residue modifications to the DNA chain.

Parameters:: *modifications (Modification) – One or more modifications to add.
Returns:: DNA – DNA chain with appended modifications.
Raises:: TypeError – If no modifications were provided.

class RNA(*, id=None, description=None, sequence, modifications=<factory>, alignment=None, copies=1)[source]#

RNA chain entity specification.

An RNA chain is defined by its nucleotide sequence, optional residue modifications, and optional multiple sequence alignment.

Modified residues can be provided through Modification entries. Additional entries may be appended using modify().

A multiple sequence alignment may be provided either inline as a string or as a filesystem path.

Multiple copies of an RNA chain can be defined either by setting copies or by providing multiple explicit identifiers in id. The optional description field is supported only when Job.version is set to Version.IV.

id: str | Sequence[str] | None#: RNA chain identifier(s).

description: str | None#: Free-text RNA chain description.

sequence: str#: RNA chain nucleotide sequence.

modifications: Sequence[Modification]#: RNA chain residue modifications.

alignment: str | Path | None#: Multiple sequence alignment.

copies: int#: Number of RNA chain copies.

Examples

RNA chain with a description.

>>> RNA(
...     description="Ribosome-binding site from T7 phage, gene 10",
...     sequence="UUAACUUUAAGAAGGAG",
... )

Multiple copies of an RNA chain.

>>> RNA(
...     sequence="AAGGACGGGUCC",
...     copies=2,
... )

RNA chain with modified residues.

>>> tetramer = RNA(sequence="AGCU")
>>> tetramer.modify(
...     Modification(type="2MG", position=1),
...     Modification(type="5MC", position=4),
... )

RNA chain with an alignment file.

>>> RNA(
...     id=["A", "B"],
...     sequence="ACAUGAGGAUCACCCAUGU",
...     alignment=Path("alignment.a3m"),
... )

modify(*modifications)[source]#

Append residue modifications to the RNA chain.

Parameters:: *modifications (Modification) – One or more modifications to add.
Returns:: RNA – RNA chain with appended modifications.
Raises:: TypeError – If no modifications were provided.

class Ligand(*, id=None, description=None, definition, copies=1)[source]#

Ligand entity specification.

A ligand is defined either by CCD code(s) or by a SMILES string.

CCD codes (definition as a sequence of CCD codes) are preferred when available. Multiple codes can represent composite ligands such as glycans, with covalent connectivity specified separately via Job.bonds. Custom CCD entries may be provided through Job.ccd.

A SMILES string (definition as a string) can be used for ligands not present in the CCD, but such ligands cannot be referenced in Job.bonds.

Multiple copies of a ligand can be defined either by setting copies or by providing multiple explicit identifiers in id. The optional description field is supported only when Job.version is set to Version.IV.

id: str | Sequence[str] | None#: Ligand identifier(s).

description: str | None#: Free-text ligand description.

definition: str | Sequence[str]#: Ligand definition as SMILES or CCD code(s).

copies: int#: Number of ligand copies.

Examples

Ligand defined by a CCD code.

>>> Ligand(
...     description="Adenosine triphosphate",
...     definition=["ATP"],
... )

Multiple copies of an ion.

>>> Ligand(
...     definition=["MG"],
...     copies=2,
... )

Custom ligand with explicit identifier defined by SMILES.

>>> Ligand(
...     id=["LIG"],
...     description="Aceclidine",
...     definition="CC(=O)OC1C[NH+]2CCC1CC2",
... )

class Entity(*values)[source]#

AlphaFold 3 entity type identifier.

PROTEIN = 'protein'#: Protein polymer entity.

RNA = 'rna'#: RNA polymer entity.

DNA = 'dna'#: DNA polymer entity.

LIGAND = 'ligand'#: Ligand non-polymer entity.

class Modification(*, scope=None, type, position)[source]#

Residue modification specification.

A modification is defined by its CCD type and a 1-based position within the parent entity sequence.

The scope selects the polymer context used for serialization. It is typically assigned automatically by the parent entity model, such as DNA, RNA, or Protein.

scope: Entity | None#: Modification polymer context.

type: str#: Modification CCD code.

position: int#: Modification position in the parent sequence.

Examples

Base modification.

>>> methylation = Modification(type="5MC", position=4)

Post-translational modification.

>>> ptm = Modification(type="HY3", position=1)

class Template(*, structure, indexes)[source]#

Structural template specification.

A template is defined by an mmCIF structure, provided either inline as a string or as a filesystem path, together with a 0-based mapping between query and template residue indexes.

structure: str | Path#: Template structure.

indexes: dict[int, int]#: Query-to-template residue index mapping.

Examples

Template provided by path.

>>> Template(
...     structure=Path("template.cif.gz"),
...     indexes={0: 0, 1: 1, 2: 2, 4: 3, 5: 4, 6: 8},
... )

class Atom(*, entity, residue, name)[source]#

Atom specification within an entity.

An atom is defined by an entity identifier, a 1-based residue index, and an atom name.

entity: str#: Entity identifier.

residue: int#: Residue index within the entity.

name: str#: Atom name within the residue.

Examples

Atom definition.

>>> Atom(entity="A", residue=1, name="CB")

class Bond(*, source, target)[source]#

Covalent bond specification.

Defines a covalent bond between the source and the target atoms as an AlphaFold 3 bonded atom pair.

Bonds are intended for covalently linked multi-residue Ligand entities, for example glycans. Covalent bonds within or between polymer entities such as DNA, RNA, or Protein are not supported by AlphaFold 3.

source: Atom#: Source atom address.

target: Atom#: Target atom address.

Examples

Covalent bond between two entities.

>>> Bond(
...     source=Atom(entity="A", residue=1, name="CA"),
...     target=Atom(entity="G", residue=1, name="CHA"),
... )

Covalent bond within a multi-residue entity.

>>> Bond(
...     source=Atom(entity="I", residue=1, name="O6"),
...     target=Atom(entity="I", residue=2, name="C1"),
... )

Utilities#

class Operation(*values)[source]#

Per-residue alignment operation.

REF = 'match'#: Exact residue match.

SUB = 'substitution'#: Substitution mutation.

INS = 'insertion'#: Insertion mutation.

DEL = 'deletion'#: Deletion mutation.

trace(reference, query)[source]#

Generate an operation trace for alignment of query to reference.

Interprets an A3M-style query sequence against a canonical FASTA reference and emits a per-position sequence of alignment operations.

Parameters:

reference (str) – Canonical reference sequence in FASTA format.
query (str) – Aligned query sequence in A3M format.

Returns:

tuple[Operation, ...] – Per-position sequence of alignment operations.

Raises:

ValueError – If either input sequence fails validation.

reindex(template, operations)[source]#

Reindex template residues to a new query sequence.

Consumes an operation trace describing how a query sequence aligns to the reference and updates Template.indexes to query coordinates.

Parameters:

template (Template) – Template with indexes defined in reference coordinates.
operations (Sequence[Operation]) – Per-position alignment operations.

Returns:

Template – Template with indexes updated to query coordinates.

Raises:

ValueError – If an unexpected alignment operation is encountered.

realign(alignment, operations)[source]#

Realign an A3M alignment to a new query sequence.

Applies an operation trace describing how a query sequence aligns to the reference to each sequence in an A3M alignment.

Parameters:

alignment (str) – Sequence alignment in A3M format.
operations (Sequence[Operation]) – Per-position alignment operations.

Returns:

str – Realigned alignment in A3M format.

Raises:

ValueError – If headers or sequences are invalid, if a sequence is too short or too long for the operation trace, or if trailing insertions are encountered.

ccd(*components)[source]#

Export chemical component dictionaries for one or more components.

For each input component, yields a CCD definition containing component metadata, atoms, and bonds. Input molecules must provide molecule properties comp_id and comp_name and an atom_name property for every atom.

Parameters:

*components (Mol) – Chemical components for CCD export.

Yields:

str – CCD definition of each specified component.

Raises:

KeyError – If required molecule or atom properties are missing.
ValueError – If a component has no conformer or coordinates.
TypeError – If an unsupported bond type or stereochemical configuration is encountered.

component(smiles, code, name)[source]#

Construct an embedded chemical component from SMILES for CCD export.

Parses smiles and sanitizes the molecule, adds explicit hydrogens, embeds a 3D conformer, optimizes its geometry, assigns deterministic atom names, and stores component metadata required for CCD export.

Parameters:

smiles (str) – SMILES string describing a chemical component.
code (str) – Chemical component identifier.
name (str) – Human-readable component name.

Returns:

Mol – Embedded molecule annotated for CCD export.

Raises:

ValueError – If code is invalid, or if smiles is syntactically or chemically invalid.
RuntimeError – If conformer embedding fails.