A proposed short- and long-term plan to characterize human dark proteins of unknown functions. The terms, neXt-CP50 and neXt-CP2000 stand for characterizing 50 uPE1, a small set of dark proteins in 3 years (a pilot) and then ~2000 dark proteins (1260 uPE1 plus 677 uMPs) over a longer period of time. Term 'CP' stands for characterization of proteins with unknown functions.
(from Paik et al., J. Proteome Res. 2018, 17, 4042−4050)
We believe that high-quality, extensive proteome maps are achievable within a planned 10-year period.
During Phase 1 (6 years), the C-HPP group plans to map all proteins lacking good quality MS evidence,
three major classes of PTMs, one representative alternative splicing transcription (AST) product
[Menon, R. et al., Methods Mol Biol. 2011, 696, 319-26] and one non-synonymous SNP product, and protein
distribution in a major organ/tissue of interest. C-HPP will utilize all high-quality consortium-generated
proteomic datasets for focused analysis on individual chromosomes. In phase II (4 years), identified proteins
will be further characterized and validated at the genomic/transcriptomic and cellular levels in
the 4 selected tissues of interest. C-HPP outputs will also be integrated with all biology/disease-driven
HPP research. We will also provide a correlation of C-HPP and B/D HPP study results with recent SNP and
haplotypic mapping studies. [Im, KM. et al., Hum Genet. 2011, 130, 685-99.]
Table. The proposed two phases of C-HPP and short-term/long-term challenges
[Paik. YK., et al., J Proteome Res.
2012, Just Accepted]
|Phase||Phase I||Phase II|
- Organization, SOP and Guidelines
- Mapping and characterization of 6000 missing proteins having no MS evidencea
- Mapping the predicted three major PTMs (phosphoryl-, glycosyl-, acetyl-) of 10,000 well-known proteinsb
- One representative AST and nsSNP of predicted c.a. 14,300 well-known proteins
- Genomic/transcriptomic/proteomic validation of predicted c.a. 14,300 well-known proteins
- Cellular localization and quantitation of newly identified proteins in at least three tissues
- Further characterization of whole human proteins (~20,300) with respect to gene location on each chromosome, cellular distribution and quantitation
- Validation of three major PTMs present in 20,300 human proteins
- Validation of one representative AST and nsSNP of 20,300 well-known proteins
- Genomic/transcriptomic validation of whole proteins in at least three representative tissues
- Development of drug targets and biomarker candidates of interest.
- Functional studies of gene families/clusters in each chromosome
|Coping with short-term and longer-term challenges||Short-Term||Solutions|
- Sustainable funding/cost savings
- Effective cross analysis and integration of different datasets
- Handling of data variability
- Harmonization with biology-driven projects
- Procurement of affinity captured reagents
- Sharing resources, data, and reagents (AB), ref specimens
- Employing neXtProt, dbSNP, GPMDB and PeptideAtlas
- Standard data submission system/criteria
- Sharing data through C-HPP portal and other public DBs
- Close collaborations between providers and C-HPP groups
- Enhanced detection limit for low abundance (rare) proteins
- Improved pretreatment of clinical specimens for characterization and SRM analysis
- Inclusion of PTM information in different datasets
- Sample bio-banking and maintenance
- Miniaturization of sample preparation/efficient fractionations
- Continued refinement of non-redundant protein list
- Development of new algorithms for inclusion of PTMs
- Collaboration with government agencies
Less well-known: proteins that have only transcriptomic evidence, but not proteomic MS data (constitutes about 6,000 proteins).
Well-known: proteins that have both transcriptomic and proteomics MS data. The data for proteins under investigation will be integrated into one common C-HPP portal by contributions from each chromosome team.