This document shows how data from legacy formats is represented in the data model.
Examples make use of a shorthand syntax.
For an accurate and complete listing of all classes and properties you can consult
the core.owl file that is distributed with the data.
The original flat text format is described in the UniProt user manual.
ID CASA1_SHEEP Reviewed; 214 AA. AC P04653; Q69B23; Q9TS03; Q9TS48;
uniprot:P04653 reviewed true mnemonic "CASA1_SHEEP" replaces uniprot:Q69B23 replaces uniprot:Q9TS03 replaces uniprot:Q9TS48
Changes:
CASA1_SHEEP can be used to refer to an entry it is
not a stable identifier.Reviewed and Unreviewed markers are
replaced by a boolean value that indicates whether an entry was reviewed or not.DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. DT 01-NOV-1996, sequence version 1. DT 16-DEC-2008, entry version 93.
uniprot:Q15848
created 1997-11-01
modified 2008-12-16
version 93
sequence <http://purl.uniprot.org/isoforms/Q15848-1>
rdf:type Simple_Sequence
modified 1996-11-01
version 1
Changes:
DE RecName: Full=Arginine biosynthesis bifunctional protein argJ; DE Includes: DE RecName: Full=Glutamate N-acetyltransferase; DE EC=2.3.1.35; DE AltName: Full=Ornithine acetyltransferase; DE Short=OATase; DE AltName: Full=Ornithine transacetylase; DE Includes: DE RecName: Full=Amino-acid acetyltransferase; DE EC=2.3.1.1; DE AltName: Full=N-acetylglutamate synthase; DE Short=AGS; DE Contains: DE RecName: Full=Arginine biosynthesis bifunctional protein argJ alpha chain; DE Contains: DE RecName: Full=Arginine biosynthesis bifunctional protein argJ beta chain;
uniprot:Q07908
recommendedName
rdf:type Structured_Name
fullName "Arginine biosynthesis bifunctional protein argJ"
component
rdf:type Part
recommendedName
rdf:type Structured_Name
fullName "Arginine biosynthesis bifunctional protein argJ alpha chain"
component
rdf:type Part
recommendedName
rdf:type Structured_Name
fullName "Arginine biosynthesis bifunctional protein argJ beta chain"
domain
rdf:type Part
recommendedName
rdf:type Structured_Name
fullName "Glutamate N-acetyltransferase"
enzyme enzyme:2.3.1.35
alternativeName
rdf:type Structured_Name
fullName "Ornithine acetyltransferase"
shortName "OATase"
alternativeName
rdf:type Structured_Name
fullName "Ornithine transacetylase"
domain
rdf:type Part
recommendedName
rdf:type Structured_Name
fullName "Amino-acid acetyltransferase"
enzyme enzyme:2.3.1.1
alternativeName
rdf:type Structured_Name
fullName "N-acetylglutamate synthase"
shortName "AGS"
Changes:
description construct.contains to component.includes to domain.GN Name=cysA1; Synonyms=cysA; OrderedLocusNames=Rv3117, MT3199; GN ORFNames=MTCY164.27; GN and GN Name=cysA2; OrderedLocusNames=Rv0815c, MT0837; ORFNames=MTV043.07c;
uniprot:O05793
encodedBy
rdf:type Gene
name "cysA1"
name "cysA"
locusName "Rv3117"
locusName "MT3199"
orfName "MTCY164.27"
encodedBy
rdf:type Gene
name "cysA2"
locusName "Rv0815c"
locusName "MT0837"
orfName "MTV043.07c"
OS Human immunodeficiency virus type 1 (isolate BRU/LAI group M subtype OS B) (HIV-1). OC Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; OC Lentivirus; Primate lentivirus group. OX NCBI_TaxID=11686; OH NCBI_TaxID=9606; Homo sapiens (Human).
uniprot:P03377 organism taxon:11686
Changes:
OG Mitochondrion. OG Plasmid pAL2-1.
uniprot:Q01529
encodedIn Mitochondrion
encodedIn
rdfs:subClassOf Plasmid
rdfs:label "pAL2-1"
Changes:
OG (organelle) to encodedIn to avoid
misunderstandings (this is not the subcellular location of the
protein, but the location of the gene).Issues:
encodedIn should be a property of a Gene,
but automatic mapping is not always possible.RN [38] RP VARIANTS PD GLU-82; CYS-256; TRP-275; GLU-328 AND 441-ARG. RC TISSUE=Brain; RX MEDLINE=22108846; PubMed=12116199; DOI=10.1002/ajmg.10525; RG French Parkinson's disease genetics study group; RG European consortium on genetic susceptibility on Parkinson's disease; RA West A., Periquet M., Lincoln S., Luecking C.B., Nicholl D., RA Bonifati V., Rawal N., Gasser T., Lohmann E., Deleuze J.-F., RA Maraganore D., Levey A., Wood N.W., Duerr A., Hardy J., Brice A., RA Farrer M.; RT "Complex relationship between parkin mutations and Parkinson RT disease."; RL Am. J. Med. Genet. 114:584-591(2002). RN [39] RP ERRATUM. RG French Parkinson's disease genetics study group; RG European consortium on genetic susceptibility on Parkinson's disease; RA West A., Periquet M., Lincoln S., Luecking C.B., Nicholl D., RA Bonifati V., Rawal N., Gasser T., Lohmann E., Deleuze J.-F., RA Maraganore D., Levey A., Wood N.W., Duerr A., Hardy J., Brice A., RA Farrer M.J.; RL Am. J. Med. Genet. 114:992-992(2002).
uniprot:O60260
citation <http://purl.uniprot.org/citations/12116199>
rdf:type Journal_Citation
title "Complex relationship between parkin mutations and Parkinson disease."
author "West A."
author "Periquet M."
author "Lincoln S."
author "Luecking C.B."
author "Nicholl D."
author "Bonifati V."
author "Rawal N."
author "Gasser T."
author "Lohmann E."
author "Deleuze J.-F."
author "Maraganore D."
author "Levey A."
author "Wood N.W."
author "Duerr A."
author "Hardy J."
author "Brice A."
author "Farrer M."
group "French Parkinson's disease genetics study group"
group "European consortium on genetic susceptibility on Parkinson's disease"
owl:sameAs <http://purl.uniprot.org/medline/22108846>
owl:sameAs <http://purl.uniprot.org/pubmed/12116199>
<http://purl.org/dc/elements/1.1/identifier> "doi:10.1002/ajmg.10525"
date 2002
erratum :1
rdf:type Journal_Citation
author "West A."
author "Periquet M."
author "Lincoln S."
author "Luecking C.B."
author "Nicholl D."
author "Bonifati V."
author "Rawal N."
author "Gasser T."
author "Lohmann E."
author "Deleuze J.-F."
author "Maraganore D."
author "Levey A."
author "Wood N.W."
author "Duerr A."
author "Hardy J."
author "Brice A."
author "Farrer M.J."
group "French Parkinson's disease genetics study group"
group "European consortium on genetic susceptibility on Parkinson's disease"
date 2002
name "Am. J. Med. Genet."
volume 114
pages 992
name "Am. J. Med. Genet."
volume 114
pages "584-591"
+rdf:type Citation_Statement
+scope "VARIANTS PD GLU-82; CYS-256; TRP-275; GLU-328 AND 441-ARG"
+source
rdf:type Tissue_Source
name 'Brain'
Changes:
Issues:
scope property.Sources belong together when there are several.How are errata represented?
uniprot:O60260
citation <http://purl.uniprot.org/citations/12116199>
rdf:type Journal_Citation
erratum :1
rdf:type Journal_Citation
Note that there is no direct connection between the protein and the erratum citation, though the erratum citation may be referenced from parts of the protein. This will simplify maintenance of citations once they are stored in a separate database.

What happened to plasmids in the RC line?
OG Plasmid F. .. RC STRAIN=K12 / CR63; PLASMID=F;
citation :1
+context
rdf:type Strain
rdfs:label "K12 / CR63"
encodedIn
rdfs:subClassOf Plasmid
rdfs:label "F"
+citation :1
Books:
RA Kato I., Kohr W.J., Laskowski M. Jr.; RT "Evolution of avian ovomucoids."; RL (In) Magnusson S., Ottesen M., Foltmann B., Dano K., Neurath H. RL (eds.); RL Regulatory proteolytic enzymes and their inhibitors, pp.197-206, RL Pergamon Press, New York (1978).
uniprot:P68128
citation
rdf:type Book_Citation
title "Evolution of avian ovomucoids."
author "Kato I."
author "Kohr W.J."
author "Laskowski M. Jr."
date 1978
editor "Magnusson S."
editor "Ottesen M."
editor "Foltmann B."
editor "Dano K."
editor "Neurath H."
name "Regulatory proteolytic enzymes and their inhibitors"
pages "197-206"
publisher "Pergamon Press"
place "New York"
Database submissions:
RL Submitted (NOV-1989) to the EMBL/GenBank/DDBJ databases.
uniprot:P0C002
citation :3
rdf:type Submission_Citation
date 1989-11
submittedTo "EMBL/GenBank/DDBJ"
Unpublished observations:
RL Unpublished observations (MAR-1996).
uniprot:P52697
citation
rdf:type Observation_Citation
date 1996-03
Patents:
RL Patent number EP0290986, 17-NOV-1988.
uniprot:P31668
citation
rdf:type Patent_Citation
date 1988-11-17
owl:sameAs <http://purl.uniprot.org/patents/EP0290986>
Theses:
RL Thesis (1979), University of Stanford, United States.
uniprot:P02283
citation
rdf:type Thesis_Citation
date 1979
institution "University of Stanford"
place "United States"
CC -!- PTM: Characterization of O-linked glycan was studied in Bowes CC melanoma cell line. ... FT PROPEP 33 35 Removed by plasmin. FT /FTId=PRO_0000028349.
uniprot:P00750
annotation
rdf:type PTM_Annotation
rdfs:comment
"Characterization of O-linked glycan was studied in Bowes melanoma
cell line."
annotation <http://purl.uniprot.org/annotation/PRO_0000028349>
rdf:type Propeptide_Annotation
rdfs:comment "Removed by plasmin"
range
rdf:type Range
begin 33
end 35
Changes:
Sequence annotation ranges:
| 1 10? | :r rdf:type Range begin 1 end 10 +rdf:type Endpoint_Statement +certain false |
| <185 >230 | :r rdf:type Range begin 185 +rdf:type Endpoint_Statement +limit false end 230 +rdf:type Endpoint_Statement +limit false |
| 1 ? | :r rdf:type Range begin 1 |
Sequence change annotations:
There are several types of annotations that indicate differences to other versions of the sequence.
MUTAGEN:
FT MUTAGEN 137 137 H->N: Binds copper. Forms dimer. FT MUTAGEN 717 717 V->C,S: Unchanged beta-APP42/total APP- FT beta ratio.
uniprot:P05067
annotation
rdf:type Mutagenesis_Annotation
rdfs:comment "Binds copper. Forms dimer"
range
rdf:type Range
begin 137
end 137
substitution "N"
annotation
rdf:type Mutagenesis_Annotation
rdfs:comment "Unchanged beta-APP42/total APP-beta ratio"
range
rdf:type Range
begin 717
end 717
substitution "C"
substitution "S"
VAR_SEQ:
FT VAR_SEQ 781 790 ALMRPGRIDR -> VPPSQTFLLL (in isoform 2). FT /FTId=VSP_033048. FT VAR_SEQ 791 893 Missing (in isoform 2). FT /FTId=VSP_033049.
uniprot:Q8NB90
annotation <http://purl.uniprot.org/annotation/VSP_033048>
rdf:type Alternative_Sequence_Annotation
rdfs:comment "In isoform 2."
range :3
rdf:type Range
begin 781
end 790
substitution "VPPSQTFLLL"
annotation <http://purl.uniprot.org/annotation/VSP_033049>
rdf:type Alternative_Sequence_Annotation
rdfs:comment "In isoform 2."
range :4
rdf:type Range
begin 791
end 893
substitution ""
VARIANT:
FT VARIANT 296 296 D -> G (in allele TF*D1; FT dbSNP:rs8177238). FT /FTId=VAR_007544.
uniprot:P02787
annotation <http://purl.uniprot.org/annotation/VAR_007544>
rdf:type Natural_Variant_Annotation
rdfs:comment "In allele TF*D1."
range
rdf:type Range
begin 296
end 296
substitution "G"
rdfs:seeAlso <http://purl.uniprot.org/dbsnp/rs8177238>
CONFLICT:
FT CONFLICT 541 541 R -> K (in Ref. 1; CAA75675/CAA75676 and FT 4; AAH02508).
uniprot:O95671
annotation
rdf:type Sequence_Conflict_Annotation
range
rdf:type Range
begin 541
end 541
substitution "K"
rdfs:seeAlso <http://purl.uniprot.org/embl/CAA75675>
rdfs:seeAlso <http://purl.uniprot.org/embl/CAA75676>
+citation <http://purl.uniprot.org/citations/9736779>
annotation
rdf:type Sequence_Conflict_Annotation
range
rdf:type Range
begin 541
end 541
substitution "K"
rdfs:seeAlso <http://purl.uniprot.org/embl/AAH02508>
+citation <http://purl.uniprot.org/citations/15489334>
Changes:
Crosslinks:
FT CROSSLNK 95 218 Tryptophyl-tyrosyl-methioninium (Trp-Tyr) FT (with M-244). FT CROSSLNK 218 244 Tryptophyl-tyrosyl-methioninium (Tyr-Met) FT (with W-95).
uniprot:O59651
annotation
rdf:type Cross-link_Annotation
rdfs:comment "Tryptophyl-tyrosyl-methioninium (Trp-Tyr) (with M-244)"
range
rdf:type Range
begin 95
end 95
range
rdf:type Range
begin 218
end 218
annotation
rdf:type Cross-link_Annotation
rdfs:comment "Tryptophyl-tyrosyl-methioninium (Tyr-Met) (with W-95)"
range
rdf:type Range
begin 218
end 218
range
rdf:type Range
begin 244
end 244
Mass spectrometry comments:
CC -!- MASS SPECTROMETRY: Mass=23638.14; Mass_error=3.0; CC Method=Electrospray; Range=16-214 (P04653-2); Note=Allele A, with CC 11 phosphate groups; Source=PubMed:7601973;
uniprot:P04653
annotation
rdf:type Mass_Spectrometry_Annotation
rdfs:comment "Allele A, with 11 phosphate groups."
measuredValue 23638.14
measuredError 3.0
method ESI
range
rdf:type Range
begin 16
end 214
sequence <http://purl.uniprot.org/isoforms/P04653-2>
+citation <http://purl.uniprot.org/citations/7601973>
RNA editing comments:
CC -!- RNA EDITING: Modified_positions=367, 379, 383, 405; Note=Partially CC edited.
uniprot:P21521
annotation
rdf:type RNA_Editing_Annotation
rdfs:comment "Partially edited."
position 367
position 379
position 383
position 405
CC -!- RNA EDITING: Modified_positions=Not_applicable; Note=Some CC positions are modified by RNA editing via nucleotide insertion or CC deletion. The initiator methionine is created by RNA editing.
uniprot:P14548
annotation
rdf:type RNA_Editing_Annotation
rdfs:comment
"Some positions are modified by RNA editing via nucleotide insertion
or deletion. The initiator methionine is created by RNA editing."
frameshift true
Issues:
Sequence caution:
CC -!- SEQUENCE CAUTION: CC Sequence=X07863; Type=Frameshift; Positions=Several;
uniprot:P0A7B3
annotation
rdf:type Frameshift_Annotation
sequence <http://purl.uniprot.org/embl/X07863>
rdf:type Genomic_DNA
CC -!- SEQUENCE CAUTION: CC Sequence=AAA39943.1; Type=Miscellaneous discrepancy; Note=Several frameshifts and contaminating sequence; CC Sequence=Ref.3; Type=Frameshift; Positions=697;
uniprot:P27612
annotation
rdf:type Sequence_Caution_Annotation
rdfs:comment "Several frameshifts and contaminating sequence."
sequence <http://purl.uniprot.org/embl-cds/AAA39943.1>
annotation
rdf:type Frameshift_Annotation
range
rdf:type Range
begin 697
end 697
sequence
rdf:type External_Sequence
citation <http://purl.uniprot.org/citations/7665086>
Changes:
Biophysicochemical properties:
CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Absorption: CC Abs(max)=~596 nm; CC Note=In the presence of anions, the maximum absorption shifts to CC about 575 nm;
uniprot:Q48314
annotation
rdf:type Absorption_Annotation
rdfs:comment "In the presence of anions, the maximum absorption shifts to about 575 nm."
maximum 596
certain false
CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Redox potential: CC E(0) is -450 mV for the 3Fe-4S, and -645 mV for the 4Fe-4S CC clusters;
uniprot:P00214
annotation
rdf:type Redox_Potential_Annotation
rdfs:comment "E(0) is -450 mV for the 3Fe-4S, and -645 mV for the 4Fe-4S clusters."
CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Kinetic parameters: CC KM=3.1 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH CC 6.0 with dimethylsulfoxide); CC KM=2.3 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH CC 9.0 with dimethylsulfoxide); CC KM=1.1 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH CC 9.0 without dimethylsulfoxide); CC pH dependence: CC Optimum pH is 8.3-9.6; CC Temperature dependence: CC Optimum temperature is 40 degrees Celsius;
uniprot:P83610
annotation
rdf:type Kinetics_Annotation
measuredAffinity
"3.1 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH 6.0 with
dimethylsulfoxide)"
measuredAffinity
"2.3 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH 9.0 with
dimethylsulfoxide)"
measuredAffinity
"1.1 mM for N-succinyl-Ala-Ala-Pro-Phe p-nitroanilide (at pH 9.0
without dimethylsulfoxide)"
annotation
rdf:type PH_Dependence_Annotation
rdfs:comment "Optimum pH is 8.3-9.6."
annotation
rdf:type Temperature_Dependence_Annotation
rdfs:comment "Optimum temperature is 40 degrees Celsius."
Changes:
Alternative products:
CC Event=Alternative splicing; Named isoforms=2;
CC Name=Long;
CC IsoId=P51650-1; Sequence=Displayed;
CC Name=Short;
CC IsoId=P51650-2; Sequence=VSP_001284;
CC Note=No experimental confirmation available;
..
FT VAR_SEQ 107 134 Missing (in isoform Short).
FT /FTId=VSP_001284.
SQ SEQUENCE 523 AA; 56131 MW; 4CA521139C9FA98F CRC64;
MATCFLLRNF CAARPALRPP GRLLREPAGA QRRSYVGGPA DLHADLLRGD SFVGGRWLPT
PATFPVYDPA SGAKLGT...
uniprot:P51650
annotation
rdf:type Alternative_Splicing_Annotation
sequence <http://purl.uniprot.org/isoforms/P51650-1>
sequence <http://purl.uniprot.org/isoforms/P51650-2>
annotation <http://purl.uniprot.org/annotation/VSP_001284>
rdf:type Alternative_Sequence_Annotation
rdfs:comment "In isoform Short."
range :2
rdf:type Range
begin 107
end 134
substitution ""
annotation
rdf:type Caution_Annotation
rdfs:comment "No experimental confirmation available."
sequence <http://purl.uniprot.org/isoforms/P51650-2>
sequence <http://purl.uniprot.org/isoforms/P51650-1>
rdf:type Simple_Sequence
modified 2008-04-08
version 2
precursor true
mass 56131
name "Long"
rdf:value
"MATCFLLRNF CAARPALRPP GRLLREPAGA QRRSYVGGPA DLHADLLRGD SFVGGRWLPT
PATFPVYDPA SGAKLGT..."
sequence <http://purl.uniprot.org/isoforms/P51650-2>
rdf:type Modified_Sequence
name "Short"
basedOn <http://purl.uniprot.org/isoforms/P51650-1>
modification <http://purl.uniprot.org/annotation/VSP_001284>
Some alternative products comments contain embedded variant, conflict or caution annotations. These are extracted and represented as proper annotation objects that reference the isoforms they are relevant for.
CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing, Alternative initiation; Named isoforms=8; CC Comment=Additional isoforms seem to exist; CC Name=SV6; Synonyms=p110; CC IsoId=Q9UQ88-1; Sequence=Displayed; CC Name=SV1; Synonyms=Pbeta21, Beta 2-1; CC IsoId=Q9UQ88-2; Sequence=VSP_008286, VSP_008288; CC Note=Ref.1 (AAA19594) sequence is in conflict in positions: CC 109:C->R, 112:H->R; ..
uniprot:Q9UQ88
annotation
rdf:type Alternative_Splicing_Annotation
rdf:type Alternative_Initiation_Annotation
rdfs:comment "Additional isoforms seem to exist."
sequence <http://purl.uniprot.org/isoforms/Q9UQ88-1>
sequence <http://purl.uniprot.org/isoforms/Q9UQ88-2>
...
annotation
rdf:type Sequence_Conflict_Annotation
range
rdf:type Range
begin 109
end 109
substitution "R"
rdfs:seeAlso <http://purl.uniprot.org/embl/AAA19594>
sequence <http://purl.uniprot.org/isoforms/Q9UQ88-2>
+citation <http://purl.uniprot.org/citations/8195233>
annotation
rdf:type Sequence_Conflict_Annotation
range
rdf:type Range
begin 112
end 112
substitution "R"
rdfs:seeAlso <http://purl.uniprot.org/embl/AAA19594>
sequence <http://purl.uniprot.org/isoforms/Q9UQ88-2>
+citation <http://purl.uniprot.org/citations/8195233>
...
CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=2; CC Name=1; Synonyms=TAF-I alpha; CC IsoId=Q01105-1; Sequence=Displayed; CC Name=2; Synonyms=TAF-I beta; CC IsoId=Q01105-2; Sequence=VSP_009868; CC Note=Acetylated on Lys-11. Phosphorylated on Ser-15 and Thr-23. CC Variant in position: 4:P->Q (in dbSNP:rs1141138);
uniprot:Q01105
annotation
rdf:type Alternative_Splicing_Annotation
sequence <http://purl.uniprot.org/isoforms/Q01105-1>
sequence <http://purl.uniprot.org/isoforms/Q01105-2>
annotation
rdf:type Natural_Variant_Annotation
range
rdf:type Range
begin 4
end 4
substitution "Q"
rdfs:seeAlso <http://purl.uniprot.org/dbsnp/rs1141138>
sequence <http://purl.uniprot.org/isoforms/Q01105-2>
annotation
rdf:type Caution_Annotation
rdfs:comment "Acetylated on Lys-11. Phosphorylated on Ser-15 and Thr-23."
sequence <http://purl.uniprot.org/isoforms/Q01105-2>
Alternative promoter usage and initiation:
CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative promoter usage, Alternative splicing; Named isoforms=4; CC Name=A; CC IsoId=Q8WXS8-1; Sequence=Displayed; CC Note=Produced by alternative promoter usage; CC Name=B; CC IsoId=Q8WXS8-2; Sequence=VSP_006958; CC Note=Produced by alternative promoter usage; CC Name=C; CC IsoId=Q8WXS8-3; Sequence=VSP_006958, VSP_005501; CC Note=Produced by alternative splicing of isoform B; CC Name=D; CC IsoId=Q8WXS8-4; Sequence=VSP_005501; CC Note=Produced by alternative splicing of isoform A;
uniprot:Q8WXS8
annotation
rdf:type Alternative_Promoter_Usage_Annotation
rdf:type Alternative_Splicing_Annotation
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-1>
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-2>
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-3>
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-4>
annotation
rdf:type Caution_Annotation
rdfs:comment "Produced by alternative promoter usage."
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-1>
annotation
rdf:type Caution_Annotation
rdfs:comment "Produced by alternative promoter usage."
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-2>
annotation
rdf:type Caution_Annotation
rdfs:comment "Produced by alternative splicing of isoform B."
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-3>
annotation
rdf:type Caution_Annotation
rdfs:comment "Produced by alternative splicing of isoform A."
sequence <http://purl.uniprot.org/isoforms/Q8WXS8-4>
Alternative splicing with several unspecified sequences:
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=5;
CC Comment=Experimental confirmation may be lacking for some
CC isoforms;
CC Name=APP(770);
CC IsoId=P29216-1; Sequence=Displayed;
CC Name=APP(395);
CC IsoId=P29216-2; Sequence=Not described;
CC Name=APP(563);
CC IsoId=P29216-3; Sequence=Not described;
..
SQ SEQUENCE 76 AA; 8527 MW; 492BF3069AB082A1 CRC64;
EVCSEQAETG PCRAMISRWY FDVTEGKCAP FFYGGCGGNR NNFDTEEYCM AVCGSVMSQS
LRKTTREPLT RDPVKL
//
uniprot:P29216
annotation
rdf:type Alternative_Splicing_Annotation
rdfs:comment "Experimental confirmation may be lacking for some isoforms."
sequence <http://purl.uniprot.org/isoforms/P29216-1>
sequence <http://purl.uniprot.org/isoforms/P29216-2>
sequence <http://purl.uniprot.org/isoforms/P29216-3>
...
sequence <http://purl.uniprot.org/isoforms/P29216-1>
rdf:type Simple_Sequence
name "APP(770)"
rdf:value
"EVCSEQAETG PCRAMISRWY FDVTEGKCAP FFYGGCGGNR NNFDTEEYCM AVCGSVMSQS
LRKTTREPLT RDPVKL"
sequence <http://purl.uniprot.org/isoforms/P29216-2>
rdf:type Unknown_Sequence
name "APP(395)"
sequence <http://purl.uniprot.org/isoforms/P29216-3>
rdf:type Unknown_Sequence
name "APP(563)"
...
Pathways:
CC -!- PATHWAY: Amino-acid biosynthesis; L-lysine biosynthesis via AAA CC pathway; L-alpha-aminoadipate from 2-oxoglutarate: step 2/4.
uniprot:P49367
annotation
rdf:type Pathway_Annotation
rdfs:comment
"Amino-acid biosynthesis; L-lysine biosynthesis via AAA pathway;
L-alpha-aminoadipate from 2-oxoglutarate: step 2/4."
+source <http://purl.uniprot.org/unipathway/402.33.12.29>
Notes:
Web resources:
CC -!- WEB RESOURCE: Name=Wikipedia; Note=Factor IX entry; CC URL="http://en.wikipedia.org/wiki/Factor_IX"; CC -!- WEB RESOURCE: Name=HAEMB; Note=Hemophilia B mutation database; CC URL="http://www.kcl.ac.uk/ip/petergreen/haemBdatabase.html";
uniprot:P00740
rdfs:seeAlso <http://en.wikipedia.org/wiki/Factor_IX>
rdf:type Resource
rdfs:comment "Wikipedia; Factor IX entry"
rdfs:seeAlso <http://www.kcl.ac.uk/ip/petergreen/haemBdatabase.html>
rdf:type Resource
rdfs:comment "HAEMB; Hemophilia B mutation database"
Changes:
Interactions:
CC -!- INTERACTION: CC Self; NbExp=1; IntAct=EBI-77797, EBI-77797; CC Q61824:Adam12 (xeno); NbExp=2; IntAct=EBI-77797, EBI-77785; CC P03950:ANG; NbExp=3; IntAct=EBI-77797, EBI-525291; .. DR IntAct; P35609; 8.
uniprot:P35609
interaction
rdf:type Interaction
xeno false
experiments 1
participant <http://purl.uniprot.org/intact/EBI-77797>
rdf:type Participant
owl:sameAs uniprot:P35609
interaction
rdf:type Interaction
xeno true
experiments 2
participant <http://purl.uniprot.org/intact/EBI-77797>
participant <http://purl.uniprot.org/intact/EBI-77785>
rdf:type Participant
rdfs:label "Adam12"
owl:sameAs uniprot:Q61824
interaction
rdf:type Interaction
xeno false
experiments 3
participant <http://purl.uniprot.org/intact/EBI-77797>
participant <http://purl.uniprot.org/intact/EBI-525291>
rdf:type Participant
rdfs:label "ANG"
owl:sameAs uniprot:P03950
rdfs:seeAlso <http://purl.uniprot.org/intact/P35609>
rdf:type Resource
database "IntAct"
rdfs:comment 8
Changes:
Simple example, applies to most types of links:
DR FlyBase; FBgn0010339; 128up.
uniprot:P32234
rdfs:seeAlso <http://purl.uniprot.org/flybase/FBgn0010339>
rdf:type Resource
database "FlyBase"
rdfs:comment "128up"
For PIR and StyGene, the primary and secondary identifiers are reversed, because only the latter is unique:
DR PIR; S07102; DEBOHS.
uniprot:P14893
rdfs:seeAlso <http://purl.uniprot.org/pir/DEBOHS>
rdf:type Resource
database "PIR"
rdfs:comment "S07102"
Links to domain databases:
DR PROSITE; PS00211; ABC_TRANSPORTER_1; FALSE_NEG. DR PROSITE; PS50893; ABC_TRANSPORTER_2; 1. DR PROSITE; PS51245; MALK; PARTIAL. DR PROSITE; PS00010; ASX_HYDROXYL; UNKNOWN_3.
uniprot:P18813
rdfs:seeAlso <http://purl.uniprot.org/prosite/PS00211>
rdf:type Resource
database "PROSITE"
rdfs:comment "ABC_TRANSPORTER_1"
+rdf:type Domain_Assignment_Statement
+falseNegative true
rdfs:seeAlso <http://purl.uniprot.org/prosite/PS50893>
rdf:type Resource
database "PROSITE"
rdfs:comment "ABC_TRANSPORTER_2"
+rdf:type Domain_Assignment_Statement
+hits 1
rdfs:seeAlso <http://purl.uniprot.org/prosite/PS51245>
rdf:type Resource
database "PROSITE"
rdfs:comment "MALK"
+rdf:type Domain_Assignment_Statement
uniprot:Q96DN2
rdfs:seeAlso <http://purl.uniprot.org/prosite/PS00010>
rdf:type Resource
database "PROSITE"
rdfs:comment "ASX_HYDROXYL"
+rdf:type Domain_Assignment_Statement
+hits 3
+certain false
Links to EMBL:
DR EMBL; AF053231; AAC08355.1; ALT_FRAME; mRNA. DR EMBL; BX842680; CAE81952.1; -; Genomic_DNA. DR EMBL; AABX02000002; EAA34926.2; ALT_SEQ; Genomic_DNA.
uniprot:O59942
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/AAC08355.1>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/AF053231>
rdf:type MRNA
+rdf:type Nucleotide_Mapping_Statement
+rdfs:comment "Frameshift."
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/CAE81952.1>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/BX842680>
rdf:type Genomic_DNA
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/EAA34926.2>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/AABX02000002>
rdf:type Genomic_DNA
+rdf:type Nucleotide_Mapping_Statement
+rdfs:comment "Sequence problems."
DR EMBL; L06463; -; NOT_ANNOTATED_CDS; mRNA.
uniprot:P56593
rdfs:seeAlso
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/L06463>
rdf:type MRNA
DR EMBL; BC027387; AAH27387.1; ALT_TERM; mRNA. DR EMBL; BC052513; AAH52513.1; ALT_INIT; mRNA.
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/AAH27387.1>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/BC027387>
rdf:type MRNA
+rdf:type Nucleotide_Mapping_Statement
+rdfs:comment "Different termination."
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/AAH52513.1>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/BC052513>
rdf:type MRNA
+rdf:type Nucleotide_Mapping_Statement
+rdfs:comment "Different initiation."
DR EMBL; X14907; CAA33034.1; -; Genomic_DNA. DR EMBL; X14908; CAA33034.1; JOINED; Genomic_DNA.
uniprot:P02668
rdfs:seeAlso <http://purl.uniprot.org/embl-cds/CAA33034.1>
rdf:type Nucleotide_Resource
database "EMBL"
locatedOn <http://purl.uniprot.org/embl/X14907>
rdf:type Genomic_DNA
locatedOn <http://purl.uniprot.org/embl/X14908>
rdf:type Genomic_DNA
Changes:
dbxref.txt is integrated into schema.Issues:
hits and status properties.rdfs:comment property is not particularly helpful.
But: Probably not worth the effort of defining more accurately.Gene Ontology (GO) terms are listed along with the keywords. Also, they have GO specific evidence codes attached to them:
DR GO; GO:0016514; C:SWI/SNF complex; IDA:UniProtKB.
uniprot:O14497
classifiedWith go:0016514
+citation <http://purl.uniprot.org/citations/12200431>
+status IDA
+database "UniProtKB"
Notes:
Multiple references to the same GenomeReviews entry are collapsed:
DR GenomeReviews; AE005674_GR; S2143. DR GenomeReviews; AE014073_GR; S1668. DR GenomeReviews; AE014073_GR; S2143.
uniprot:Q7UCD5
rdfs:seeAlso <http://purl.uniprot.org/genomereviews/AE005674_GR>
rdf:type Resource
database "GenomeReviews"
rdfs:comment "S2143"
rdfs:seeAlso <http://purl.uniprot.org/genomereviews/AE014073_GR>
rdf:type Resource
database "GenomeReviews"
rdfs:comment "S1668"
rdfs:comment "S2143"
KW ATP-binding; Chaperone; Membrane; Mitochondrion; KW Mitochondrion inner membrane; Nucleotide-binding; Transmembrane.
uniprot:Q5E9H5 classifiedWith keyword:67 classifiedWith keyword:143 classifiedWith keyword:999 classifiedWith keyword:812
Changes:
Membrane and Mitochondrion inner membrane), it is removed.
DT 01-NOV-1996, sequence version 1.
..
DE Flags: Precursor;
..
SQ SEQUENCE 244 AA; 26414 MW; 64D8C6C1204B1018 CRC64;
MLLLGAVLLL LALPGHDQET TTQGPGVLLP LPKGACTGWM AGIPGHPGHN GAPGRDGRDG
TPGEKGEKGD PGLIGPKGDI GETGVPGAEG PRGFPGIQGR KGEPGEGAYV YRSAFSVGLE
TYVTIPNMPI RFTKIFYNQQ NHYDGSTGKF HCNIPGLYYF AYHITVYMKD VKVSLFKKDK
AMLFTYDQYQ ENNVDQASGS VLLHLEVGDQ VWLQVYGEGE RNGLYADNDN DSTFTGFLLY
HDTN
uniprot:Q15848
sequence <http://purl.uniprot.org/isoforms/Q15848-1>
rdf:type Simple_Sequence
modified 1996-11-01
version 1
precursor true
mass 26414
rdf:value
"MLLLGAVLLL LALPGHDQET TTQGPGVLLP LPKGACTGWM AGIPGHPGHN GAPGRDGRDG
TPGEKGEKGD PGLIGPKGDI GETGVPGAEG PRGFPGIQGR KGEPGEGAYV YRSAFSVGLE
TYVTIPNMPI RFTKIFYNQQ NHYDGSTGKF HCNIPGLYYF AYHITVYMKD VKVSLFKKDK
AMLFTYDQYQ ENNVDQASGS VLLHLEVGDQ VWLQVYGEGE RNGLYADNDN DSTFTGFLLY
HDTN"
Notes:
See section on alternative products for more examples.
Evidence tags are attached directly to statements:
uniprot:P15529
annotation
rdf:type PTM_Annotation
rdfs:comment
"Extensively O-glycosylated in the Ser/Thr-rich domain.
O-glycosylation is required for Neisseria binding but not for
Measles virus or human adenovirus binding."
+source <http://purl.uniprot.org/citations/12112588>
+source <http://purl.uniprot.org/citations/15307194>
Following is a sample entry in XML syntax (see DTD of the original XML format).
<entry id="UniRef50_Q8WZ42" updated="2008-12-16"> <name>Cluster: Titin</name> <property type="member count" value="82" /> <property type="common taxon" value="Eukaryota" /> <property type="common taxon ID" value="2759" /> <representativeMember> <dbReference type="UniProtKB ID" id="TITIN_HUMAN"><property type="UniProtKB accession" value="Q8WZ42" /> <property type="UniParc ID" value="UPI0000D7E631" /> <property type="UniRef100 ID" value="UniRef100_Q8WZ42" /> <property type="UniRef90 ID" value="UniRef90_Q8WZ42" /> <property type="protein name" value="Titin" /> <property type="source organism" value="Homo sapiens (Human)" /> <property type="NCBI taxonomy" value="9606" /> <property type="length" value="34350" /> </dbReference> <sequence length="34350" checksum="2558BCFE2922B347"> MTTQAPTFTQPLQSVVVLEGSTATFEA..." </sequence> </representativeMember> <member> <dbReference type="UniProtKB ID" id="Q8WZ42-8"><property type="UniProtKB accession" value="Q8WZ42-8" /> <property type="UniParc ID" value="UPI0000D7E637" /> <property type="UniRef100 ID" value="UniRef100_Q8WZ42-8" /> <property type="UniRef90 ID" value="UniRef90_Q8WZ42" /> <property type="protein name" value="Isoform 8 of Titin" /> <property type="source organism" value="Homo sapiens (Human)" /> <property type="NCBI taxonomy" value="9606" /> <property type="length" value="34474" /> </dbReference> </member> ...
uniref:UniRef50_Q8WZ42
rdf:type Cluster
rdfs:label "Cluster: Titin"
similarity 0.5
identity 0.5
modified 2008-12-16
commonTaxon <http://purl.uniprot.org/taxonomy/2759>
member
rdf:type Sequence
owl:sameAs <http://purl.uniprot.org/uniparc/UPI0000D7E631>
sequenceFor <http://purl.uniprot.org/uniprot/Q8WZ42>
mnemonic "TITIN_HUMAN"
reviewed true
rdfs:label "Titin"
memberOf uniref:UniRef100_Q8WZ42
memberOf uniref:UniRef90_Q8WZ42
organism <http://purl.uniprot.org/taxonomy/9606>
length 34350
representativeFor uniref:UniRef50_Q8WZ42
sequence
"MTTQAPTFTQ PLQSVVVLEG STATFEA..."
member
rdf:type Sequence
owl:sameAs <http://purl.uniprot.org/uniparc/UPI0000D7E637>
owl:sameAs <http://purl.uniprot.org/isoforms/Q8WZ42-8>
reviewed true
rdfs:label "Isoform 8 of Titin"
memberOf uniref:UniRef100_Q8WZ42-8
memberOf uniref:UniRef90_Q8WZ42
organism <http://purl.uniprot.org/taxonomy/9606>
length 34474
...
Changes:
Here is a sample entry in the original XML format (currently not distributed):
<entry accession="UPI0000000001">
<dbReferenceList>
<dbReference db="EMBL" id="AAF63732" version="1" version_i="1" active="Y" created="12-Mar-2003" last="02-Se
p-2008" NCBI_GI="7546898" NCBI_taxonomy_id="10245"/>
<dbReference db="EMBL" id="CAD90637" version="1" version_i="1" active="Y" created="16-Jun-2003" last="02-Se
p-2008" NCBI_GI="30519462" NCBI_taxonomy_id="10243"/>
...
</dbReferenceList>
<sequence length="250" crc64="28FE89850863372D">
MGAAASIQTTVNTLSERISSKLEQEANASAQTKCDIEIGNFYIRQNHGCNLTVKNMCSAD
ADAQLDAVLSAATETYSGLTPEQKAYVPAMFTAALNIQTSVNTVVRDFENYVKQTCNSSA
VVDNKLKIQNVIIDECYGAPGSPTNLEFINTGSSKGNCAIKALMQLTTKATTQIAPKQVA
GTGVQFYMIVIGVIILAALFMYYAKRMLFTSTNDKIKLILANKENVHWTTYMDTFFRTSP
MVIATTDMQN
</sequence>
</entry>
Here is the RDF version of the same data:
uniparc:UPI0000000001
rdf:type Sequence
sequenceFor <http://purl.uniprot.org/embl-cds/AAF63732.1>
organism <http://purl.uniprot.org/taxonomy/10245>
rdfs:seeAlso <http://purl.uniprot.org/gi/7546898>
+version 1
+created 2003-03-12
+modified 2008-09-02
sequenceFor <http://purl.uniprot.org/embl-cds/CAD90637.1>
organism <http://purl.uniprot.org/taxonomy/10243>
rdfs:seeAlso <http://purl.uniprot.org/gi/30519462>
+version 1
+created 2003-06-16
+modified 2008-09-02
...
rdf:value
"MGAAASIQTT VNTLSERISS KLEQEANASA QTKCDIEIGN FYIRQNHGCN LTVKNMCSAD
ADAQLDAVLS AATETYSGLT PEQKAYVPAM FTAALNIQTS VNTVVRDFEN YVKQTCNSSA
VVDNKLKIQN VIIDECYGAP GSPTNLEFIN TGSSKGNCAI KALMQLTTKA TTQIAPKQVA
GTGVQFYMIV IGVIILAALF MYYAKRMLFT STNDKIKLIL ANKENVHWTT YMDTFFRTSP
MVIATTDMQN"
Notes:
ID Membrane. AC KW-0472 DE Protein which is membrane-bound or membrane-associated. A membrane is DE the layer which forms the boundary of cells and intracellular DE organelles. It is composed of two oriented lipid layers in which DE proteins are embedded and acts as a selective permeability barrier. GO GO:0016020; membrane HI Cellular component: Membrane. CA Cellular component. // ID Mitochondrion inner membrane. AC KW-0999 DE Protein found in or associated with the inner membrane of a DE mitochondrion, the membrane which separates the mitochondrial matrix DE from the intermembrane space. SY Mitochondrial inner membrane; Inner mitochondrial membrane. HI Cellular component: Membrane; Mitochondrion inner membrane. HI Cellular component: Mitochondrion; Mitochondrion inner membrane. CA Cellular component. // IC Cellular component. AC KW-9998 DE Keywords assigned to proteins because they are found in a specific DE cellular or extracellular component. //
keyword:472
rdf:type Concept
rdfs:label "Membrane"
rdfs:comment
"Protein which is membrane-bound or membrane-associated. A membrane is
the layer which forms the boundary of cells and intracellular
organelles. It is composed of two oriented lipid layers in which
proteins are embedded and acts as a selective permeability barrier."
rdfs:subClassOf keyword:9998
owl:sameAs <http://purl.uniprot.org/go/0016020>
keyword:999
rdf:type Concept
rdfs:label "Mitochondrion inner membrane"
rdfs:label "Mitochondrial inner membrane"
rdfs:label "Inner mitochondrial membrane"
rdfs:comment
"Protein found in or associated with the inner membrane of a
mitochondrion, the membrane which separates the mitochondrial matrix
from the intermembrane space."
rdfs:subClassOf keyword:472
rdfs:subClassOf keyword:496
keyword:9998
rdf:type Concept
rdfs:label "Cellular component"
rdfs:comment
"Keywords assigned to proteins because they are found in a specific
cellular or extracellular component."
Notes:
Note that keywords are both classes and instances. This allows us to make the
hierarchy explicitly through use of owl:subClassOf, without loosing
the ability to create restrictions such as someProperty rdfs:range Keyword.
Having classes that are instances as well
is allowed in OWL Full.
[enzclass.txt] 1. -. -.- Oxidoreductases. 1. 1. -.- Acting on the CH-OH group of donors. 1. 1. 1.- With NAD(+) or NADP(+) as acceptor. [enzyme.dat] ID 1.1.1.1 DE Alcohol dehydrogenase. AN Aldehyde reductase. CA An alcohol + NAD(+) = an aldehyde or ketone + NADH. CF Iron or zinc. CC -!- Acts on primary or secondary alcohols or hemiacetals. CC -!- The animal, but not the yeast, enzyme acts also on cyclic secondary CC alcohols. PR PROSITE; PDOC00058; PR PROSITE; PDOC00059; PR PROSITE; PDOC00060; DR P07327, ADH1A_HUMAN; P28469, ADH1A_MACMU; Q5RBP7, ADH1A_PONAB; ..
enzyme:1.-.-.- rdf:type Enzyme name "Oxidoreductases" enzyme:1.1.-.- rdf:type Enzyme name "Acting on the CH-OH group of donors" rdfs:subClassOf enzyme:1.-.-.- enzyme:1.1.1.- rdf:type Enzyme name "With NAD(+) or NADP(+) as acceptor" rdfs:subClassOf enzyme:1.1.-.- enzyme:1.1.1.1 rdf:type Enzyme name "Alcohol dehydrogenase" name "Aldehyde reductase" activity "An alcohol + NAD(+) = an aldehyde or ketone + NADH." cofactor "Zinc or Iron." rdfs:comment "Acts on primary or secondary alcohols or hemiacetals." rdfs:comment "The animal, but not the yeast, enzyme acts also on cyclic secondary alcohols." rdfs:subClassOf enzyme:1.1.1.-
Changes:
go:GO:0016209
rdf:type term
accession GO:0016209
name "antioxidant activity"
definition "Inhibition of the reactions brought about by ..."
synonym "some synonym"
is_a go:GO:0003674
dbxref
database_symbol "InterPro"
reference "IPR000866"
...
go:GO:0009492
rdf:type term
accession GO:0009492
name "2Fe-2S electron transfer carrier"
definition "OBSOLETE (was not defined before being made obsolete)."
is_a go:obsolete_molecular_function
go:0016209 rdf:type Concept rdfs:label "antioxidant activity" rdfs:comment "Inhibition of the reactions brought about by ..." rdfs:subClassOf go:0003674 ... go:0009492 rdf:type Concept obsolete true rdfs:label "2Fe-2S electron transfer carrier"
Even though GO is already distributed in RDF format, some changes were made:
GO: prefix from identifiers,
as colons have a special meaning within URNs.taxon:11686 rdf:type Taxon reviewed true mnemonic "HV1BR" scientificName "Human immunodeficiency virus type 1 (isolate BRU/LAI group M subtype B)" commonName "HIV-1" otherName "Human immunodeficiency virus type 1 (BRU ISOLATE)" host taxon:9606 rdfs:subClassOf taxon:540993 partOfLineage false
Note that taxa with partOfLineage false are omitted from the
lineage displayed in flat text UniProt entries.
pathway:402.33.12.29
rdf:type Pathway
rdfs:label
"Amino-acid biosynthesis; L-lysine biosynthesis via AAA pathway;
L-alpha-aminoadipate from 2-oxoglutarate: step 2/4"
rdfs:subClassOf pathway:402.33.12
This data is extracted and referenced from the main UniProt data set.