ĭespite large-scale data generation efforts by public consortia such as ENCODE, delineating the binding regions of each human TF in the genome remains incomplete. Based on ReMap, the UniBind database stores reliable TFBS predictions from four different computational models, including position weight matrices (PWMs reviewed in ), for the ChIP-seq peaks of 231 human TFs in 315 different human cell and tissue types. It provides access to millions of ChIP-seq peaks related to the binding of approximately 800 human TFs in 602 different human cell and tissue types. The ReMap database has compiled and uniformly reprocessed thousands of public ChIP-seq datasets. These regions, known as ChIP-seq peaks, are expected to be enriched for TFBSs.
![led display model tfs 4u2see led display model tfs 4u2see](https://ae01.alicdn.com/kf/H17c7d62841ea45cdbd1791968d99774c4.jpg)
Therefore, delineating the regions to which TFs bind in the genome could indicate potential regulatory regions on which to focus analyses and help to broaden our understanding of how genes are regulated in health and disease.Ĭhromatin immunoprecipitation followed by sequencing (ChIP-seq) is an experimental assay that enables the identification of TF-bound regions in vivo at a resolution of a few hundred base pairs (bp). The disruption of TF genes and TFBSs is associated with rare genetic disorders and cancer. Our results confirm that transfer learning is a powerful technique for TF binding prediction.Ī subset of human DNA-binding transcription factors (TFs) control gene expression at the transcriptional level by recognizing and binding to specific sequence motifs within cis-regulatory regions known as TF binding sites (TFBSs). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate.
![led display model tfs 4u2see led display model tfs 4u2see](https://s.alicdn.com/@sc04/kf/H6e1871d5fe2e424d88f3b88abf7d4c20e.jpg)
Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets.