SimChrom - an interactive resource to explore function, composition and localization of human chromatin proteins


Supplementary material to

"(Re)defining the human chromatome:
an integrated meta-analysis of localization, function, abundance, physical properties and domain composition of chromatin proteins"


(by A.K. Gribkova, G.A. Armeev, M.P. Kirpichnikov, A.K. Shaytan)

The resource provides interactive tools to analyze human nuclear and chromatin proteins with respect to their subnuclear localization (based on data from UniProt, HPA, OpenCell), chromatin category according to the developed simplified chromatin protein classification (SimChrom), protein abundance (from PaxDb), and domain architecture and composition (based on Pfam and predicted in TED). Preprocessed MS-based chromatome datasets and reference sets of nuclear and non-nuclear proteins are also available for download.

Interactive Figure 1. Overview of nuclear structure and composition.

Interactive Figure 2. Comparative analysis of protein localization ontologies and their content between UniProt, the Human Protein Atlas (HPA) and OpenCell.
Clicking the checkboxes of individual localization terms selects the list of proteins shown in the table below. When several checkboxes are selected the boolean logic operation selected by the toggle above applies.
Interactive Figure 3. The SimChrom chromatin proteins set and its classification according to the SimChrom ontology.

Clicking the categories selects them for display in the list of proteins in the table below. When several categories are selected the boolean logic operation selected by the toggle above applies. Number of classifed proteins are displayed next to each category.


Classification:
Standard SimChrom
Single label SimChrom-SL
Interactive Figure 4. The most representative protein domains/families (according to Pfam) in different categories of chromatin proteins.
The classification system (standard or single label) and the threshold for the data point display are selected below.
Interactive Figure 5. Co-occurence of Pfam protein domains in chromatin/epigenetic regulator proteins.

From the list of Pfam domain type pairs that were found in at least three chromatin regulator protein, domains involved in various histone post-translational modifications, chromatin remodeling, histone binding, DNA binding and protein dimerization/oligomerization were manually selected and grouped into the respective categories based on the information currently available in the literature. The conditional probability of finding a corresponding domain A in a chromatin protein given that another domain B is already present was estimated and is presented below (columns and rows correspond to domains A and B, respectively). The number of proteins with the corresponding domain pair is given for all SimChrom proteins. Clicking on individual squares selects the respective proteins for the display in the table below.

Interactive Table 3. The list of novel structural domains identified in chromatin proteins via AlphaFold-based predictions in the TED resource.

ReferenceChromatin state specificityChromatin purification methods (short) and additional computation filtrationType of cellsProteins identified (by authors)Processed number of proteinsDownload protein list
Torrente et al., 2011Total chromatin, euchromatin, heterochromatin(1) Total chromatin extraction with hypotonic lysis, Triton X-100 permeabilization, low-speed centrifugation, and EDTA-mediated nuclear lysis. (2) Salt extraction using high salt buffer (420 mM KCl), sonication, centrifugation, followed by dialysis. (3) Micrococcal nuclease (MNase) digestion, including total digestion and partial digestion to separate euchromatin and heterochromatin fractions.HeLa S31038 (total chromatin extraction), 1388 (salt), 949 (MNase); 751 (partial MNase); 1912 (all identified chromatin proteins)1501torrente_2011
Kustatscher et al., 2014Total interphase chromatinChromatin Enrichment for Proteomics (ChEP): in vivo crosslinking with 1% formaldehyde, followed by differential extraction under denaturing conditions (SDS, urea), RNase A treatment, centrifugation-based chromatin pelleting, and sonication. To assess chromatin association, the study applied Multiclassifier Combinatorial Proteomics (MCCP), which integrates SILAC-based quantitative proteomics from 35 biochemical and biological perturbation experiments. A random forest machine learning algorithm was trained on curated chromatin and non-chromatin reference proteins to assign each detected protein an interphase chromatin probability score (ICP).HeLa, MCF-7, HepG2, HEK293, U2OS, DT401980 (chromatin proteins with ICP>0.5); 7635 (total chromatin proteins with ICP values)1956kustatscher_2014
Alabert et al., 2014Nascent vs. mature chromatinNascent Chromatin Capture (NCC): biochemical isolation of newly replicated chromatin using biotin–dUTP incorporation. Cells were pulse-labelled with biotin–dUTP during DNA replication and fixed after either 20 min (nascent chromatin) or 2 h (mature chromatin). Chromatin was crosslinked with 2% formaldehyde, nuclei were isolated, and chromatin was sheared to 2–3 kb by sonication. Biotin-labelled DNA–protein complexes were isolated using streptavidin beads. For proteomic analysis, nascent and mature chromatin were metabolically labeled by SILAC and processed together.HeLa S3426 (nascent-enriched); 3995 (all identified chromatin proteins)3861alabert_2014
Itzhak et al., 2016Nuclear proteomeCells were metabolically labeled with SILAC and gently lysed under hypo-osmotic conditions to preserve organelle integrity. Post-nuclear supernatants were fractionated by differential centrifugation into five sub-organellar fractions plus cytosolic and nuclear pellets. Protein abundance profiles across SILAC fractions were processed by PCA and classified using a supervised SVM algorithm trained on curated organelle markers. In parallel, total intensities in the nuclear, cytosolic, and organellar fractions (from label-free MS) were used to assign each protein to global classes such as "mostly nuclear", based on relative signal distribution.HeLa1133 (nuclear); 672 (nucleo-cytosolic); 8710 (total proteome)1092itzhak_2016
Ginno et al., 2018Total chromatin: time-resolved (G1, S, M)Density-based enrichment for mass spectrometry analysis of chromatin (DEMAC): formaldehyde-fixed cells were sonicated, subjected to cesium chloride (CsCl) gradient ultracentrifugation to isolate DNA–protein complexes by buoyant density (1.39 g/cm³). Chromatin fractions were collected, dialyzed, decrosslinked, digested with DNase I.Human T98G (glioblastoma)3065 (chromatome); 6242 (total proteome)3051ginno_2018
Shi et al., 2021Promoter-proximal chromatinHi-MS (Hi-C-based proteomics, adapted from BL-Hi-C): cells crosslinked with 1% formaldehyde; genomic DNA digested with HaeIII (GGCC sites); ligated with biotinylated bridge linkers; nuclei lysed in 0.2% SDS; chromatin sonicated; chromatin-DNA complexes captured on streptavidin beads. Quantified sensitivity to 1,6-hexanediol evaluated via AICAP index (Anti-1,6-Hexanediol Index of Chromatin-Associated Proteins).K56232282848shi_2021
Ugur et al., 2023Total chromatinChromatin Aggregation Capture (ChAC): nuclei fixed with 1% formaldehyde, lysed with SDS and urea, sonicated, and purified by protein aggregation capture (PAC) on magnetic beads. DIA-MS with DIA-NN used for quantification.Human ESCs (H9)24871730ugur_2023
Alvarez et al., 2023Time-resolved (nascent, G2/M, early and late G1)Nascent Chromatin Capture (NCC) method, which relies on pulse-labeling newly replicated DNA with biotin-dUTP, followed by formaldehyde crosslinking and sonication-based chromatin fragmentation. Biotinylated DNA-protein complexes were affinity-purified using streptavidin magnetic beads. HeLa S3 cells were synchronized and harvested at five post-replication time points (Nasc, Late S, G2/M, early G1, late G1) across six biological replicates.HeLa S31454 (present at all time points in all 6 replicates; from total of 5770)1478 (2894 total)alvarez_2023
Alvarez et al., 2023Time-resolved (nascent, G2/M, early and late G1)isolation of Proteins On Nascent DNA (iPOND): formaldehyde crosslinking (1%), EdU labeling for 15 minutes, click chemistry with biotin-azide, chromatin fragmentation, streptavidin bead enrichment.TIG-3 fibroblasts2351 (detected in 4 to 5 of 5 replicates)2397 (2894 total)alvarez_2023
-->

Dataset name (download protein list)Definition of datasetNumber of proteins
NULOC_CSentries annotated as nuclear in both databases: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved), excludes proteins labeled only as non-nuclear in the OpenCell database (annotation grade 2 or 3)3296
NULOC_CS_NECFentries annotated as nuclear in both databases: UniProt AND HPA, excludes proteins labeled only as non-nuclear in the OpenCell database3988
NULOC_CS_ULentries annotated only as nuclear in both databases: UniProt (provided evidence code is available) AND HPA (with reliability score: Enhanced, Supported, Approved), excludes proteins labeled only as non-nuclear in the OpenCell database (annotation grade 2 or 3)1322
NULOC_CS_UL_NECFentries annotated only as nuclear in both databases: UniProt AND HPA, excludes proteins labeled only as non-nuclear in the OpenCell database1322
NULOC_JTentries annotated as nuclear in at least one database: UniProt (provided evidence code is available), HPA (with evidence tags: Enhanced, Supported, Approved), OpenCell (annotation grade 2 or 3)8048
NULOC_JT_NECFentries annotated as nuclear in at least one database: UniProt, HPA, OpenCell8912
NULOC_JT_ULentries annotated only as nuclear in in at least one database: UniProt (provided evidence code is available) OR HPA (with reliability score: Enhanced, Supported, Approved) OR OpenCell database (annotation grade 2 or 3)4292
NULOC_JT_UL_NECFentries annotated only as nuclear in in at least one database: UniProt OR HPA, excludes proteins labeled only as non-nuclear in the OpenCell database4292
NON_NULOC_CSproteins whose localization annotations exclude nuclear localization in both databases: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved)3674
NON_NULOC_JTproteins whose localization annotations exclude nuclear localization at least one databases: UniProt (provided evidence code is available) OR HPA (with evidence tags: Enhanced, Supported, Approved) OR OpenCell (annotation grade 2 or 3)11479
CYTLOC_CS_ULproteins with only aggregate cytoplasm annotation in both database: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved)2026

Update cookies preferences