AlphaFold 3 input schema models#
This documentation covers the public API of alphafold3_input.
Package#
AlphaFold 3 input models.
This package provides models for constructing AlphaFold 3 input files.
It offers a Pythonic, object-oriented interface for defining AlphaFold 3 jobs, abstracting the underlying JSON input format into typed models and validated structures. The implementation closely follows the official AlphaFold 3 input specification provided by DeepMind.
For full details on the expected input format and supported features, refer to the official AlphaFold 3 input specification.
Exports
Job,Dialect,Version: top-level job model and input format enums.DNA,RNA,Protein,Ligand: entity models used underJob.entities.Modification,Entity: residue modification model and its entity-scope enum.Template: structural template specification for proteins.Operation,trace(),reindex(),realign(): operation trace generation, reindexing, and realignment utilities.ccd(),component(): generation of custom chemical component dictionaries.
Metadata#
Package metadata for AlphaFold 3 input models.
Provides a normalized interface to distribution metadata for the package.
Exports
__title__: Package title.__description__: Package summary.__author__: Package author.__version__: Installed version.__package__: Distribution name.__module_name__: Import path.__repository__: Repository URL.__documentation__: Documentation URL.__issues__: Issue tracker URL.
Reference#
Models#
- class Job(*, name, dialect=Dialect.LOCAL, version=Version.IV, seeds=<factory>, entities=<factory>, bonds=None, ccd=None)[source]#
AlphaFold 3 job specification.
A job contains one or more sequence entities (
Protein,RNA,DNA, orLigand) and may include explicit covalentbondsand a customccd.The number of predicted structures is controlled by
seeds, which may be given either as an integer count or as an explicit sequence of integer seeds.The selected
versionmust support the features used by the job. Thedialectselects the AlphaFold 3 input format and currently only supportsDialect.LOCAL.Examples
Job with a protein and a covalently linked ligand.
>>> job = Job(name="example") >>> ((carboxylase,), (biotin,)) = job.add( ... Protein(sequence="VLSAMKMETVV"), ... Ligand(definition=["BTN"]), ... ) >>> job.bonds = ( ... Bond( ... source=Atom(entity=biotin, residue=1, name="C11"), ... target=Atom(entity=carboxylase, residue=6, name="NZ"), ... ), ... )
Job with multiple entity copies and multiple model seeds.
>>> Job( ... name="multimer", ... seeds=5, ... entities=[ ... Protein( ... sequence="ACDE", ... description="homotrimer", ... copies=3, ... ), ... ], ... )
- class Dialect(*values)[source]#
AlphaFold 3 input format dialect.
- LOCAL = 'alphafold3'#
AlphaFold 3 dialect
- SERVER = 'alphafoldserver'#
AlphaFoldServer dialect.
- class Version(*values)[source]#
AlphaFold 3 input format version.
- I = 1#
Input format version 1.
- II = 2#
Input format version 2.
- III = 3#
Input format version 3.
- IV = 4#
Input format version 4.
- class Protein(*, id=None, description=None, sequence, modifications=<factory>, alignment=None, templates=None, copies=1)[source]#
Protein chain entity specification.
A protein chain is defined by its amino acid
sequence, optional residuemodifications, optional multiple sequencealignment, and optional structuraltemplates.Post-translational modifications can be provided through
Modificationentries. Additional entries may be appended usingmodify().A multiple sequence alignment may be provided either inline as a string or as a filesystem path. When
alignmentis present, paired MSA output is serialized as an empty string.Multiple copies of a protein chain can be defined either by setting
copiesor by providing multiple explicit identifiers inid. The optionaldescriptionfield is supported only whenJob.versionis set toVersion.IV.- modifications: Sequence[Modification]#
Protein chain post-translational modifications.
Examples
Protein chain with a description.
>>> Protein( ... description="AviTag for BirA-mediated biotinylation", ... sequence="GLNDIFEAQKIEWHE", ... )
Multiple copies of a protein chain.
>>> Protein( ... sequence="RMKQLEDKVEELLSKKYHLENEVARLKKLVGER", ... copies=2, ... )
Protein chain with modified residues.
>>> peptide = Protein(sequence="PVLSCGEWQL") >>> peptide.modify( ... Modification(type="HY3", position=1), ... Modification(type="P1L", position=5), ... )
Protein chain with an alignment file.
>>> Protein( ... id=["A", "B"], ... sequence="KRRWKKNFIAVSAANRFKKISSSGAL", ... alignment=Path("alignment.a3m"), ... )
Protein chain with a structural template.
>>> Protein( ... id=["C"], ... sequence="RPACQLW", ... templates=[ ... Template( ... structure=Path("template.cif.gz"), ... indexes={0: 0, 1: 1, 2: 2, 4: 3, 5: 4, 6: 8}, ... ), ... ], ... )
- modify(*modifications)[source]#
Append residue modifications to the protein chain.
- Parameters:
*modifications (Modification) – One or more modifications to add.
- Returns:
Protein – Protein chain with appended modifications.
- Raises:
TypeError – If no modifications were provided.
- class DNA(*, id=None, description=None, sequence, modifications=<factory>, copies=1)[source]#
DNA chain entity specification.
A DNA chain is defined by its nucleotide
sequence, optionalmodifications, and one or more chain identifiers viaid. Multiple copies can be defined either by settingcopiesor by providing multiple explicit identifiers inid.The optional
descriptionfield is supported only whenJob.versionis set toVersion.IV.- modifications: Sequence[Modification]#
DNA chain residue modifications.
Examples
DNA chain with a description.
>>> DNA( ... description="Promoter for bacteriophage T7 RNA polymerase", ... sequence="TAATACGACTCACTATAGG", ... )
Multiple copies of a DNA chain.
>>> DNA( ... sequence="GAATTC", ... copies=2, ... )
DNA chain with modified residues.
>>> heptamer = DNA(sequence="GACCTCT") >>> heptamer.modify( ... Modification(type="6OG", position=1), ... Modification(type="6MA", position=2), ... )
- modify(*modifications)[source]#
Append residue modifications to the DNA chain.
- Parameters:
*modifications (Modification) – One or more modifications to add.
- Returns:
DNA – DNA chain with appended modifications.
- Raises:
TypeError – If no modifications were provided.
- class RNA(*, id=None, description=None, sequence, modifications=<factory>, alignment=None, copies=1)[source]#
RNA chain entity specification.
An RNA chain is defined by its nucleotide
sequence, optional residuemodifications, and optional multiple sequencealignment.Modified residues can be provided through
Modificationentries. Additional entries may be appended usingmodify().A multiple sequence alignment may be provided either inline as a string or as a filesystem path.
Multiple copies of an RNA chain can be defined either by setting
copiesor by providing multiple explicit identifiers inid. The optionaldescriptionfield is supported only whenJob.versionis set toVersion.IV.- modifications: Sequence[Modification]#
RNA chain residue modifications.
Examples
RNA chain with a description.
>>> RNA( ... description="Ribosome-binding site from T7 phage, gene 10", ... sequence="UUAACUUUAAGAAGGAG", ... )
Multiple copies of an RNA chain.
>>> RNA( ... sequence="AAGGACGGGUCC", ... copies=2, ... )
RNA chain with modified residues.
>>> tetramer = RNA(sequence="AGCU") >>> tetramer.modify( ... Modification(type="2MG", position=1), ... Modification(type="5MC", position=4), ... )
RNA chain with an alignment file.
>>> RNA( ... id=["A", "B"], ... sequence="ACAUGAGGAUCACCCAUGU", ... alignment=Path("alignment.a3m"), ... )
- modify(*modifications)[source]#
Append residue modifications to the RNA chain.
- Parameters:
*modifications (Modification) – One or more modifications to add.
- Returns:
RNA – RNA chain with appended modifications.
- Raises:
TypeError – If no modifications were provided.
- class Ligand(*, id=None, description=None, definition, copies=1)[source]#
Ligand entity specification.
A ligand is defined either by CCD code(s) or by a SMILES string.
CCD codes (
definitionas a sequence of CCD codes) are preferred when available. Multiple codes can represent composite ligands such as glycans, with covalent connectivity specified separately viaJob.bonds. Custom CCD entries may be provided throughJob.ccd.A SMILES string (
definitionas a string) can be used for ligands not present in the CCD, but such ligands cannot be referenced inJob.bonds.Multiple copies of a ligand can be defined either by setting
copiesor by providing multiple explicit identifiers inid. The optionaldescriptionfield is supported only whenJob.versionis set toVersion.IV.Examples
Ligand defined by a CCD code.
>>> Ligand( ... description="Adenosine triphosphate", ... definition=["ATP"], ... )
Multiple copies of an ion.
>>> Ligand( ... definition=["MG"], ... copies=2, ... )
Custom ligand with explicit identifier defined by SMILES.
>>> Ligand( ... id=["LIG"], ... description="Aceclidine", ... definition="CC(=O)OC1C[NH+]2CCC1CC2", ... )
- class Entity(*values)[source]#
AlphaFold 3 entity type identifier.
- PROTEIN = 'protein'#
Protein polymer entity.
- RNA = 'rna'#
RNA polymer entity.
- DNA = 'dna'#
DNA polymer entity.
- LIGAND = 'ligand'#
Ligand non-polymer entity.
- class Modification(*, scope=None, type, position)[source]#
Residue modification specification.
A modification is defined by its CCD
typeand a 1-basedpositionwithin the parent entity sequence.The
scopeselects the polymer context used for serialization. It is typically assigned automatically by the parent entity model, such asDNA,RNA, orProtein.Examples
Base modification.
>>> methylation = Modification(type="5MC", position=4)
Post-translational modification.
>>> ptm = Modification(type="HY3", position=1)
- class Template(*, structure, indexes)[source]#
Structural template specification.
A template is defined by an mmCIF
structure, provided either inline as a string or as a filesystem path, together with a 0-based mapping between query and template residue indexes.Examples
Template provided by path.
>>> Template( ... structure=Path("template.cif.gz"), ... indexes={0: 0, 1: 1, 2: 2, 4: 3, 5: 4, 6: 8}, ... )
- class Atom(*, entity, residue, name)[source]#
Atom specification within an entity.
An atom is defined by an
entityidentifier, a 1-basedresidueindex, and an atomname.Examples
Atom definition.
>>> Atom(entity="A", residue=1, name="CB")
- class Bond(*, source, target)[source]#
Covalent bond specification.
Defines a covalent bond between the
sourceand thetargetatoms as an AlphaFold 3 bonded atom pair.Bonds are intended for covalently linked multi-residue
Ligandentities, for example glycans. Covalent bonds within or between polymer entities such asDNA,RNA, orProteinare not supported by AlphaFold 3.Examples
Covalent bond between two entities.
>>> Bond( ... source=Atom(entity="A", residue=1, name="CA"), ... target=Atom(entity="G", residue=1, name="CHA"), ... )
Covalent bond within a multi-residue entity.
>>> Bond( ... source=Atom(entity="I", residue=1, name="O6"), ... target=Atom(entity="I", residue=2, name="C1"), ... )
Utilities#
- class Operation(*values)[source]#
Per-residue alignment operation.
- REF = 'match'#
Exact residue match.
- SUB = 'substitution'#
Substitution mutation.
- INS = 'insertion'#
Insertion mutation.
- DEL = 'deletion'#
Deletion mutation.
- trace(reference, query)[source]#
Generate an operation trace for alignment of
querytoreference.Interprets an A3M-style
querysequence against a canonical FASTAreferenceand emits a per-position sequence of alignment operations.- Parameters:
- Returns:
tuple[Operation, ...] – Per-position sequence of alignment operations.
- Raises:
ValueError – If either input sequence fails validation.
- reindex(template, operations)[source]#
Reindex template residues to a new query sequence.
Consumes an operation trace describing how a query sequence aligns to the reference and updates
Template.indexesto query coordinates.- Parameters:
- Returns:
Template – Template with indexes updated to query coordinates.
- Raises:
ValueError – If an unexpected alignment operation is encountered.
- realign(alignment, operations)[source]#
Realign an A3M alignment to a new query sequence.
Applies an operation trace describing how a query sequence aligns to the reference to each sequence in an A3M
alignment.- Parameters:
- Returns:
str – Realigned alignment in A3M format.
- Raises:
ValueError – If headers or sequences are invalid, if a sequence is too short or too long for the operation trace, or if trailing insertions are encountered.
- ccd(*components)[source]#
Export chemical component dictionaries for one or more components.
For each input component, yields a CCD definition containing component metadata, atoms, and bonds. Input molecules must provide molecule properties
comp_idandcomp_nameand anatom_nameproperty for every atom.- Parameters:
*components (Mol) – Chemical components for CCD export.
- Yields:
str – CCD definition of each specified component.
- Raises:
KeyError – If required molecule or atom properties are missing.
ValueError – If a component has no conformer or coordinates.
TypeError – If an unsupported bond type or stereochemical configuration is encountered.
- component(smiles, code, name)[source]#
Construct an embedded chemical component from SMILES for CCD export.
Parses
smilesand sanitizes the molecule, adds explicit hydrogens, embeds a 3D conformer, optimizes its geometry, assigns deterministic atom names, and stores component metadata required for CCD export.- Parameters:
- Returns:
Mol – Embedded molecule annotated for CCD export.
- Raises:
ValueError – If
codeis invalid, or ifsmilesis syntactically or chemically invalid.RuntimeError – If conformer embedding fails.