C-HPP

A proposed short- and long-term plan to characterize human dark proteins of unknown functions. The terms, neXt-CP50 and neXt-CP2000 stand for characterizing 50 uPE1, a small set of dark proteins in 3 years (a pilot) and then ~2000 dark proteins (1260 uPE1 plus 677 uMPs) over a longer period of time. Term 'CP' stands for characterization of proteins with unknown functions. (from Paik et al., J. Proteome Res. 2018, 17, 4042−4050)

We believe that high-quality, extensive proteome maps are achievable within a planned 10-year period. During Phase 1 (6 years), the C-HPP group plans to map all proteins lacking good quality MS evidence, three major classes of PTMs, one representative alternative splicing transcription (AST) product [Menon, R. et al., Methods Mol Biol. 2011, 696, 319-26] and one non-synonymous SNP product, and protein distribution in a major organ/tissue of interest. C-HPP will utilize all high-quality consortium-generated proteomic datasets for focused analysis on individual chromosomes. In phase II (4 years), identified proteins will be further characterized and validated at the genomic/transcriptomic and cellular levels in the 4 selected tissues of interest. C-HPP outputs will also be integrated with all biology/disease-driven HPP research. We will also provide a correlation of C-HPP and B/D HPP study results with recent SNP and haplotypic mapping studies. [Im, KM. et al., Hum Genet. 2011, 130, 685-99.]

Table. The proposed two phases of C-HPP and short-term/long-term challenges [Paik. YK., et al., J Proteome Res. 2012, Just Accepted]

Phase	Phase I	Phase II
Years	6	4
Milestones	Organization, SOP and Guidelines Mapping and characterization of 6000 missing proteins having no MS evidence^a Mapping the predicted three major PTMs (phosphoryl-, glycosyl-, acetyl-) of 10,000 well-known proteins^b One representative AST and nsSNP of predicted c.a. 14,300 well-known proteins Genomic/transcriptomic/proteomic validation of predicted c.a. 14,300 well-known proteins Cellular localization and quantitation of newly identified proteins in at least three tissues	Further characterization of whole human proteins (~20,300) with respect to gene location on each chromosome, cellular distribution and quantitation Validation of three major PTMs present in 20,300 human proteins Validation of one representative AST and nsSNP of 20,300 well-known proteins Genomic/transcriptomic validation of whole proteins in at least three representative tissues Development of drug targets and biomarker candidates of interest. Functional studies of gene families/clusters in each chromosome
Coping with short-term and longer-term challenges	Short-Term	Solutions
	Sustainable funding/cost savings Effective cross analysis and integration of different datasets Handling of data variability Harmonization with biology-driven projects Procurement of affinity captured reagents	Sharing resources, data, and reagents (AB), ref specimens Employing neXtProt, dbSNP, GPMDB and PeptideAtlas Standard data submission system/criteria Sharing data through C-HPP portal and other public DBs Close collaborations between providers and C-HPP groups
	Longer-Term	Solutions
	Enhanced detection limit for low abundance (rare) proteins Improved pretreatment of clinical specimens for characterization and SRM analysis Inclusion of PTM information in different datasets Sample bio-banking and maintenance	Miniaturization of sample preparation/efficient fractionations Continued refinement of non-redundant protein list Development of new algorithms for inclusion of PTMs Collaboration with government agencies

^aLess well-known: proteins that have only transcriptomic evidence, but not proteomic MS data (constitutes about 6,000 proteins).
^bWell-known: proteins that have both transcriptomic and proteomics MS data. The data for proteins under investigation will be integrated into one common C-HPP portal by contributions from each chromosome team.