Supplementary material to

"(Re)defining the human chromatome:
an integrated meta-analysis of localization, function, abundance, physical properties and domain composition of chromatin proteins"

(by A.K. Gribkova, G.A. Armeev, M.P. Kirpichnikov, A.K. Shaytan)

The resource provides interactive tools to analyze human nuclear and chromatin proteins with respect to their subnuclear localization (based on data from UniProt, HPA, OpenCell), chromatin category according to the developed simplified chromatin protein classification (SimChrom), protein abundance (from PaxDb), and domain architecture and composition (based on Pfam and predicted in TED). Preprocessed MS-based chromatome datasets and reference sets of nuclear and non-nuclear proteins are also available for download.

Interactive Figure 1. Overview of nuclear structure and composition.

Interactive Figure 2. Comparative analysis of protein localization ontologies and their content between UniProt, the Human Protein Atlas (HPA) and OpenCell.
Clicking the checkboxes of individual localization terms selects the list of proteins shown in the table below. When several checkboxes are selected the boolean logic operation selected by the toggle above applies.

Interactive Figure 3. The SimChrom chromatin proteins set and its classification according to the SimChrom ontology.

Clicking the categories selects them for display in the list of proteins in the table below. When several categories are selected the boolean logic operation selected by the toggle above applies. Number of classifed proteins are displayed next to each category.

Classification:

Standard SimChrom

Single label SimChrom-SL

Interactive Figure 4. The most representative protein domains/families (according to Pfam) in different categories of chromatin proteins.
The classification system (standard or single label) and the threshold for the data point display are selected below.

Interactive Figure 5. Co-occurence of Pfam protein domains in chromatin/epigenetic regulator proteins.

From the list of Pfam domain type pairs that were found in at least three chromatin regulator protein, domains involved in various histone post-translational modifications, chromatin remodeling, histone binding, DNA binding and protein dimerization/oligomerization were manually selected and grouped into the respective categories based on the information currently available in the literature. The conditional probability of finding a corresponding domain A in a chromatin protein given that another domain B is already present was estimated and is presented below (columns and rows correspond to domains A and B, respectively). The number of proteins with the corresponding domain pair is given for all SimChrom proteins. Clicking on individual squares selects the respective proteins for the display in the table below.

Interactive Table 3. The list of novel structural domains identified in chromatin proteins via AlphaFold-based predictions in the TED resource.

Reference	Chromatin state specificity	Chromatin purification methods (short) and additional computation filtration	Type of cells	Proteins identified (by authors)	Processed number of proteins	Download protein list
Torrente et al., 2011	Total chromatin, euchromatin, heterochromatin	(1) Total chromatin extraction with hypotonic lysis, Triton X-100 permeabilization, low-speed centrifugation, and EDTA-mediated nuclear lysis. (2) Salt extraction using high salt buffer (420 mM KCl), sonication, centrifugation, followed by dialysis. (3) Micrococcal nuclease (MNase) digestion, including total digestion and partial digestion to separate euchromatin and heterochromatin fractions.	HeLa S3	1038 (total chromatin extraction), 1388 (salt), 949 (MNase); 751 (partial MNase); 1912 (all identified chromatin proteins)	1501	torrente_2011
Kustatscher et al., 2014	Total interphase chromatin	Chromatin Enrichment for Proteomics (ChEP): in vivo crosslinking with 1% formaldehyde, followed by differential extraction under denaturing conditions (SDS, urea), RNase A treatment, centrifugation-based chromatin pelleting, and sonication. To assess chromatin association, the study applied Multiclassifier Combinatorial Proteomics (MCCP), which integrates SILAC-based quantitative proteomics from 35 biochemical and biological perturbation experiments. A random forest machine learning algorithm was trained on curated chromatin and non-chromatin reference proteins to assign each detected protein an interphase chromatin probability score (ICP).	HeLa, MCF-7, HepG2, HEK293, U2OS, DT40	1980 (chromatin proteins with ICP>0.5); 7635 (total chromatin proteins with ICP values)	1956	kustatscher_2014
Alabert et al., 2014	Nascent vs. mature chromatin	Nascent Chromatin Capture (NCC): biochemical isolation of newly replicated chromatin using biotin–dUTP incorporation. Cells were pulse-labelled with biotin–dUTP during DNA replication and fixed after either 20 min (nascent chromatin) or 2 h (mature chromatin). Chromatin was crosslinked with 2% formaldehyde, nuclei were isolated, and chromatin was sheared to 2–3 kb by sonication. Biotin-labelled DNA–protein complexes were isolated using streptavidin beads. For proteomic analysis, nascent and mature chromatin were metabolically labeled by SILAC and processed together.	HeLa S3	426 (nascent-enriched); 3995 (all identified chromatin proteins)	3861	alabert_2014
Itzhak et al., 2016	Nuclear proteome	Cells were metabolically labeled with SILAC and gently lysed under hypo-osmotic conditions to preserve organelle integrity. Post-nuclear supernatants were fractionated by differential centrifugation into five sub-organellar fractions plus cytosolic and nuclear pellets. Protein abundance profiles across SILAC fractions were processed by PCA and classified using a supervised SVM algorithm trained on curated organelle markers. In parallel, total intensities in the nuclear, cytosolic, and organellar fractions (from label-free MS) were used to assign each protein to global classes such as "mostly nuclear", based on relative signal distribution.	HeLa	1133 (nuclear); 672 (nucleo-cytosolic); 8710 (total proteome)	1092	itzhak_2016
Ginno et al., 2018	Total chromatin: time-resolved (G1, S, M)	Density-based enrichment for mass spectrometry analysis of chromatin (DEMAC): formaldehyde-fixed cells were sonicated, subjected to cesium chloride (CsCl) gradient ultracentrifugation to isolate DNA–protein complexes by buoyant density (1.39 g/cm³). Chromatin fractions were collected, dialyzed, decrosslinked, digested with DNase I.	Human T98G (glioblastoma)	3065 (chromatome); 6242 (total proteome)	3051	ginno_2018
Shi et al., 2021	Promoter-proximal chromatin	Hi-MS (Hi-C-based proteomics, adapted from BL-Hi-C): cells crosslinked with 1% formaldehyde; genomic DNA digested with HaeIII (GGCC sites); ligated with biotinylated bridge linkers; nuclei lysed in 0.2% SDS; chromatin sonicated; chromatin-DNA complexes captured on streptavidin beads. Quantified sensitivity to 1,6-hexanediol evaluated via AICAP index (Anti-1,6-Hexanediol Index of Chromatin-Associated Proteins).	K562	3228	2848	shi_2021
Ugur et al., 2023	Total chromatin	Chromatin Aggregation Capture (ChAC): nuclei fixed with 1% formaldehyde, lysed with SDS and urea, sonicated, and purified by protein aggregation capture (PAC) on magnetic beads. DIA-MS with DIA-NN used for quantification.	Human ESCs (H9)	2487	1730	ugur_2023
Alvarez et al., 2023	Time-resolved (nascent, G2/M, early and late G1)	Nascent Chromatin Capture (NCC) method, which relies on pulse-labeling newly replicated DNA with biotin-dUTP, followed by formaldehyde crosslinking and sonication-based chromatin fragmentation. Biotinylated DNA-protein complexes were affinity-purified using streptavidin magnetic beads. HeLa S3 cells were synchronized and harvested at five post-replication time points (Nasc, Late S, G2/M, early G1, late G1) across six biological replicates.	HeLa S3	1454 (present at all time points in all 6 replicates; from total of 5770)	1478 (2894 total)	alvarez_2023
Alvarez et al., 2023	Time-resolved (nascent, G2/M, early and late G1)	isolation of Proteins On Nascent DNA (iPOND): formaldehyde crosslinking (1%), EdU labeling for 15 minutes, click chemistry with biotin-azide, chromatin fragmentation, streptavidin bead enrichment.	TIG-3 fibroblasts	2351 (detected in 4 to 5 of 5 replicates)	2397 (2894 total)	alvarez_2023

-->

Dataset name (download protein list)	Definition of dataset	Number of proteins
NULOC_CS	entries annotated as nuclear in both databases: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved), excludes proteins labeled only as non-nuclear in the OpenCell database (annotation grade 2 or 3)	3296
NULOC_CS_NECF	entries annotated as nuclear in both databases: UniProt AND HPA, excludes proteins labeled only as non-nuclear in the OpenCell database	3988
NULOC_CS_UL	entries annotated only as nuclear in both databases: UniProt (provided evidence code is available) AND HPA (with reliability score: Enhanced, Supported, Approved), excludes proteins labeled only as non-nuclear in the OpenCell database (annotation grade 2 or 3)	1322
NULOC_CS_UL_NECF	entries annotated only as nuclear in both databases: UniProt AND HPA, excludes proteins labeled only as non-nuclear in the OpenCell database	1322
NULOC_JT	entries annotated as nuclear in at least one database: UniProt (provided evidence code is available), HPA (with evidence tags: Enhanced, Supported, Approved), OpenCell (annotation grade 2 or 3)	8048
NULOC_JT_NECF	entries annotated as nuclear in at least one database: UniProt, HPA, OpenCell	8912
NULOC_JT_UL	entries annotated only as nuclear in in at least one database: UniProt (provided evidence code is available) OR HPA (with reliability score: Enhanced, Supported, Approved) OR OpenCell database (annotation grade 2 or 3)	4292
NULOC_JT_UL_NECF	entries annotated only as nuclear in in at least one database: UniProt OR HPA, excludes proteins labeled only as non-nuclear in the OpenCell database	4292
NON_NULOC_CS	proteins whose localization annotations exclude nuclear localization in both databases: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved)	3674
NON_NULOC_JT	proteins whose localization annotations exclude nuclear localization at least one databases: UniProt (provided evidence code is available) OR HPA (with evidence tags: Enhanced, Supported, Approved) OR OpenCell (annotation grade 2 or 3)	11479
CYTLOC_CS_UL	proteins with only aggregate cytoplasm annotation in both database: UniProt (provided evidence code is available) AND HPA (with evidence tags: Enhanced, Supported, Approved)	2026

SimChrom - an interactive resource to explore function, composition and localization of human chromatin proteins

Interactive Table 1. Representative list of nuclear and chromatome datasets from MS-based experimental studies.

Interactive Table 2. Constructed reference datasets of nuclear and non-nuclear proteins at different levels of confidence and uniqueness of localization (uniquely localized or shared localization with other cellular compartments).