Sign In
Sign in

HPP Scientific Terms, Definitions & Abbreviations*

*Definitions of terms commonly used by HPP researchers, based on past literature or consensus reached by the HPP community (e.g. HPP publications, HPP guidelines, neXtProt, PeptideAtlas, Human Protein Atlas, GPMdb, ProteomeXchange), (Updated: Aug 30, 2018).

Term Definition Additional Information, Web links and References
HPP The Human Proteome Project (HPP) is an international project organized by the Human Proteome Organization (HUPO) that aims to map, annotate, and functionally characterize the entire human proteome in a systematic way using mass spectrometry complemented by antibody and affinity-based techniques and many other protein methods. The HPP extends and is a direct counterpart to the Human Genome Project. HPP annotation of the human genome gene products adds significant value and insights about human biology. The HPP is composed of two complementary initiatives: The Chromosome-centric HPP (C-HPP) and Biology/Disease HPP (B/D-HPP). The former focuses on the completion of the “parts list” for proteins and their proteoforms whereas the latter aims to make proteomics an integral part of multi-omics research throughout the life sciences and biomedical research communities. Both initiatives are supported by 4 resource pillars: (i) mass spectrometry (MS), (ii) affinity reagents (Ab), (iii) knowledge base (Kb), and (iv) pathology.

Note: Completion of the HPP will generate a protein-based map of the molecular architecture of human cells and the human body, enhance our understanding of human biology at the cellular level and lay a foundation for the development of novel diagnostic, prognostic, therapeutic, and preventive medical applications. The HPP is governed by the HPP Executive Committee.

Legrain P, Aebersold R, Archakov A et al., The human proteome project: current state and future direction. Mol Cell Proteomics 2011Jul 10(7):M111 009993. doi: 10.1074/mcp.M111.009993.

Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, van Eyk, JE, Liu S, Snyder M, Baker MS, Deutsch EW. Progress on identifying and characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. J Proteome Res 2018, doi: 10.1021/acs.jproteome.8b00441.
C-HPP The Chromosome-Centric HPP (C-HPP) is an international collaborative initiative of the HPP that aims to map, annotate, and characterize the human proteome on a chromosomeby- chromosome basis. The 25 international teams from 20 countries use various proteomics technologies to study how the proteome is encoded in Chr 1 – 22, X, Y, and mitochondrial DNA. Currently, major foci of the C-HPP are to map all remaining missing proteins (PE2,3,4 proteins in neXtProt 2018-1-17 = 2,186) and characterize 1,260 PE1 proteins with no function annotated in neXtProt 2018-1-17 (uPE1).
Note: The initial goal of C-HPP is to identify at least one representative protein with three posttranslational modifications (PTMs) (phosphoryl, -glycosyl-, acetyl-) and alternative splicing isoform encoded by each of c.a. 20,300 human protein-encoding genes with their tissue localization and quantitative studies using MS and/or antibody reagents.
Paik YK, Jeong SK, Omenn GS et al., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012 Mar 7;30(3):221-3. doi: 10.1038/nbt.2152.

Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec 14(12):1059-1071. doi: 10.1080/14789450.2017.1394189.
B/D-HPP The Biology/Disease HPP (B/D-HPP) is an international collaborative initiative of the HPP that focuses on mapping, annotating, and characterizing the proteome using proteomics technologies in relation to human biology and/or diseases. The B/D-HPP provides a framework for the coordination of 19 initiatives. A popular proteins strategy has been developed to stimulate use of targeted proteomics by each of the B/D initiatives and throughout the life sciences and biomedical community. Combining C-HPP, B/D-HPP, and Resource Pillar, the HPP has 50 international teams.
Note: The goals of B/D-HPP are to conduct experimental studies of specific organs and biofluids in health and disease and to assemble publicly accessible prioritized panels of proteins relevant to biological processes, organs (e.g., cardiovascular, cerebral, hepatic, renal, pulmonary, and intestinal systems) and organelles (e.g. mitochondria). More broadly, it aims to develop standardized methods for protein detection and quantification by proteomics to promote translation into clinical settings.
Van Eyk JE, Corrales FJ, Aebersold R et al., Highlights of the Biology and Disease-driven Human Proteome Project, 2015-2016. J Proteome Res 2016 Nov 4;15(11):3979-3987. doi: 10.1021/acs.jproteome.6b00444.
PE Protein existence (PE) levels indicate the degree of evidence for the existence of a human protein based on curated information. The levels PE1 to PE5 are assigned by UniProtKB/SwissProt and neXtProt as follows.
  • PE1: evidence at the protein level (identified by mass spectrometry (MS) according to HPP guidelines, or curated from multiple other experimental protein methods).
  • PE2: evidence at the transcript level (detection by RNAseq or presence of expressed sequence tag).
  • PE3: inferred by gene homology (assigned membership of a defined protein family).
  • PE4: predicted protein (not yet assigned membership of a defined protein family).
  • PE5: uncertain or dubious sequences (such as erroneous translation products or pseudogenes). In 2013, the HPP excluded PE5 entries from the search for missing proteins.
The HPP publishes annual HPP Metrics for worldwide progress identifying and characterizing the Human Proteome, based on the PE levels, e.g., Lane et al and Omenn et al (see HPP box above).

Lane L, Bairoch A, Beavis RC et al., Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. J Proteome Res 2014 Jan 3;13(1):15-20. doi: 10.1021/pr401144x.

Deutsch EW, Overall CM, Van Eyk JE et al., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines v2.1. J Proteome Res 2016 Nov 4;15(11):3961-3970. doi: 10.1021/acs.jproteome.6b00392.
uPE1 Proteins Uncharacterized PE1 proteins (uPE1s) devoid of any functional annotation in neXtProt or annotated only with broad Gene Ontology Molecular Function/Biological Process terms not linked to any specific function. As of neXtProt release 2018-01-17, there were 1,260 uPE1 proteins. The current list of uPE1 proteins can be retrieved at:

* Currently neXtProt excludes these 11 broad Gene Ontology (GO) function terms: GO:0005509, calcium ion binding; GO:0008270, zinc ion binding; GO:0005515, protein binding; GO:0042802, identical protein binding; GO:0051260, protein homooligomerization; GO:0005524, ATP-binding; GO:0000287 magnesium-binding; GO:0003676 nucleic acid binding; GO:0003824 catalytic activity; GO:0007165 signal transduction; GO:0035556 intracellular signal transduction. In the next neXtProt release this query will be refined by adding three terms: GO:0046914 transition metal ion binding; GO:0046872 metal ion binding; and GO:0035556 intracellular signal transduction.

Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec;14(12):1059-1071. doi: 10.1080/14789450.2017.1394189.
Missing Proteins Missing proteins (MPs) are defined as those protein entries that belong to categories PE2,3,4 in neXtProt. They correspond to confidently predicted proteins that lack sufficient experimental data from mass spectrometry or other direct protein methods to qualify as PE1.
The annual HPP Metrics papers in the special issues of the Journal of Proteome Research assess progress on identifying MPs and strategies needed to enrich and detect MPs.
The current list of MPs, based on neXtProt curation, can be retrieved at:

Paik YK, Jeong SK, Omenn GS et al., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012 Mar 7;30(3):221-3. doi: 10.1038/nbt.2152.

Lane L, Bairoch A, Beavis RC et al., Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. J Proteome Res 2014;13(1):15-20. doi: 10.1021/pr401144x.

Baker, MS, et al. Accelerating the search for the missing proteins in the human proteome. Nat Commun 2017; 8, 1471 doi: 10.1028/ncomms14271.
HPP Guidelines The HPP Mass Spectrometry Data Interpretation Guidelines version 2.1.0 ("the Guidelines") provide a set of expectations for data interpretation of MS data that is contributed to the HPP. There are broadly two sections, one that applies to all datasets, including data deposition requirements and false discovery rate thresholds, and a second that provides enhanced expectations for evidence of detections of missing proteins or translation products not currently listed in neXtProt.

Deutsch EW, Overall CM, Van Eyk JE, Baker MS, Paik YK, Weintraub ST, Lane L, Martens L, Vandenbrouck Y, Kusebauch U, Hancock WS, Hermjakob H, Aebersold R, Moritz RL, Omenn GS, Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1., J Proteome Res. 2016 Nov 4;15(11):3961-3970.
Dark Proteome The dark proteome is a colloquial term that includes missing proteins (PE2 - PE4), uncertain/dubious predicted proteins (PE5), uPE1 proteins, smORF (small proteins), and any proteins translated by long non-coding RNAs or uncharacterized transcripts including those arising from non-coding regions of DNA and/or novel alternative splicing.
Proteoforms Alternative protein products from the same gene resulting from genomic sequence alterations, alternative splicing, RNA editing, post-translational modifications of amino acid side chains, and proteolytic processing events.
Smith LM, Kelleher NL,Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat Methods 2013 Mar 10(3):186-7. doi: 10.1038/nmeth.2369.

LeDuc RD, Schwammle V, Shortreed MR, et al., ProForma: A Standard Proteoform Notation. J Proteome Res 2018 Mar 2;17(3):1321-1325. doi: 10.1021/acs.jproteome.7b00851.

Aebersold, R, Agar, JN, et al. How Many Proteoforms are there? Nature Chemical Biology 2018; 14:206-214. doi: 10.1038/nchembio.2576
neXt-MP50 A specific two-year C-HPP initiative, announced in September 2016, that aims to accelerate the identification and validation of the existence of 50 currently missing proteins per chromosome team while incorporating progress from the entire international proteomics community. Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec 14(12):1059-1071. doi: 10.1080/14789450.2017.1394189.
Omenn GS, Lane L, Lundberg EK, Overall CM, Deutsch EW. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J Proteome Res. 2017 Dec 1;16(12):4281-4287. doi: 10.1021/acs.jproteome.7b00375. Epub 2017 Oct 9.
neXt-CP50 A specific C-HPP initiative, announced in September 2017, that aims to characterize some cellular function/s of 50 uPE1 proteins within 3 years by >14 C-HPP working groups. Paik YK, Overall CM, Deutsch EW et al., Progress and Future Direction of Chromosome-Centric Human Proteome Project. J Proteome Res 2017 Dec 1;16(12):4253-4258. doi: 10.1021/acs.jproteome.7b00734.
Popular Proteins Popular Proteins is a BD-HPP initiative to define the most-cited proteins based on health and diseases as found in PubMed and thereby stimulate wide use of targeted proteomics in the life sciences/biomedical community.

Note: There are two algorithms which can be used to assist B/D-HPP initiatives and those with particular interest in a disease, state or organ.
The development of mass spectrometry-based assays to allow easier and accurate quantification across all fields of science for those proteins which are currently most studied. New expansion will be in popular PTMs and identification of proteins and specific amino acid residues which are most cited in PubMed for particular PTMs.

1. Lam MP, Venkatraman V, Xing Y et al., Data-Driven Approach To Determine Popular Proteins for Targeted Proteomics Translation of Six Organ Systems. J Proteome Res 2016 Nov 4;15(11):4126- 4134. doi: 10.1021/acs.jproteome.6b00095.

2. Yu KH, Lee TM, Wang CS et al., Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. J Proteome Res 2018 Apr 6;17(4):1383-1396. doi: 10.1021/acs.jproteome.7b00772.
HPA The Human Protein Atlas (HPA) is a Swedish-based program started in 2003 with the aim to map all the human proteins in cells, tissues and organs primarily using antibodies for immunohistochemistry and immunofluorescence. The HPA has been expanded to incorporate transcriptomes and systems biology. The Human Protein Atlas has provided the Affinity-based Resource Pillar for the HPP from the outset.

Uhlen et al, Tissue-based map of the human proteome. Science. 2015 347(6220):1260419. doi:10.1126/science.1260419.

Thul PJ, Akesson L, Wiking M et al., A subcellular map of the human proteome. Science 2017 May 26;356(6340). doi: 10.1126/science.aal3321.
ProteomeXchange ProteomeXchange is a consortium of proteomics data repositories that support public submission of data, stimulated by the HPP and built by the founding members PRIDE at the European Bioinformatics Institute and PeptideAtlas at the Institute for Systems Biology. ProteomeXchange aims to coordinate globally the submission of mass spectrometry proteomics data to the main existing proteomics repositories and to facilitate optimal dataset dissemination and access. It includes PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public.

Vizcaino JA, Deutsch EW, Wang R et al., ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 2014 Mar 32(3):223-6. doi: 10.1038/nbt.2839.

Deutsch EW, Csordas A, Sun Z, et al., The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition, Nucleic Acids Res. 2017 Jan 4;45(D1):D1100-D1106. doi: 10.1093/nar/gkw936. Epub 2016 Oct 18.
PeptideAtlas PeptideAtlas is a public data repository developed by the Institute for Systems Biology that accepts submissions of proteomics mass spectrometry datasets from laboratories all over the world, reprocesses them with the Trans-Proteomic Pipeline suite of software tools, collates all datasets into a global view of the human proteome as observed in MS datasets, informs neXtProt of HPP Guidelines-compliant findings, and makes all the results publicly available to the community.

Desiere F, Deutsch EW, King NL et al., The PeptideAtlas project. Nucleic Acids Res 2006 Jan 1;34(Database issue):D655-8. doi: 10.1093/nar/gkj040.

Deutsch EW, Sun Z, Campbell D, Kusebauch U, Chu CS, Mendoza L, Shteynberg D, Omenn GS, Moritz RL. State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet. J Proteome Res. 2015 Sep 4;14(9):3461-73. doi: 10.1021/acs.jproteome.5b00500.

Schwenk JM, Omenn GS, Sun Z, Campbell DS, Baker MS, Overall CM, Aebersold R, Moritz RL, Deutsch EW. The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. J Proteome Res. 2017 Dec 1;16(12):4299-4310. doi: 10.1021/acs.jproteome.7b00467.
neXtProt neXtProt is an online knowledge platform with extensive information on human proteins. It integrates data from UniProtKB/Swiss-Prot, PeptideAtlas, SRMAtlas, HPA and many other resources and offers an advanced search capacity based on the SPARQL technology and an API that allows users to programmatically extract the data stored in the resource. It also provides a peptide uniqueness checker that takes >5 million validated single amino acid variants (SAAVs) into account.

The HPP uses the PE levels assigned by neXtProt to monitor progress made collectively by the scientific community toward the complete experimental validation of the “parts list” of the human proteome.

Lane L, Argoud-Puy G, Britan A et al., neXtProt: a knowledge platform for human proteins. Nucleic Acids Res 2012 Jan 40(Database issue):D76-83. doi: 10.1093/nar/gkr1179.

Gaudet P, Argoud-Puy G, Cusin I et al., neXtProt: organizing protein knowledge in the context of human proteome projects. J Proteome Res 2013 Jan 4;12(1):293-8. doi: 10.1021/pr300830v.

Gaudet P, Michel PA, Zahn-Zabal M et al., The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res 2017 Jan 4;45(D1):D177-D182. doi: 10.1093/nar/gkw1062.

Schaeffer M, Gateau A, Teixeira D, Michel PA, Zahn-Zabal M, Lane L. The neXtProt peptide uniqueness checker: a tool for the proteomics community. Bioinformatics. 2017 Nov 1;33(21):3471- 3472. doi: 10.1093/bioinformatics/btx318.
SRMAtlas SRMAtlas is a compendium of the best-available SRM transitions for nearly every human protein, drawn from data generated on multiple QTOF and QQQ instruments. The data were produced by targeted proteomics and are used for validation of spectral evidence of detection of missing proteins under the SRMAtlas. SRMAtlas is also the base for the popular protein strategy of the B/D-HPP.

Kusebauch U, Campbell DS, Deutsch EW et al., Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell 2016 Jul 28;166(3):766-778. doi: 10.1016/j.cell.2016.06.041.