![]() ![]() Juicer is the complementary software package to Juicebox that processes sequencing reads from a Hi-C experiment into.hic files that contain contact matrices at different resolutions and in various transformations. Several software tools are available to carry out these external transformations. Other tools require the user to externally apply the transformations to the raw Hi-C data prior to upload. Juicebox indirectly performs all three transformations through the Juicer software. Hi-Browse can transform raw Hi-C contact matrix into a (log) correlation matrix, whereas my5C generates the expected Hi-C signal and the ratio of observed to expected Hi-C signal. The three most common transformations are matrix balancing to remove bin-specific biases, calculation of a correlation matrix for visualization of A and B compartments, and calculation of the ratio of observed over expected Hi-C counts to account for the so-called “genomic distance effect” (the density of interactions close to the diagonal in the Hi-C matrix). For user-uploaded datasets, the user is responsible for generating contact matrices at different resolutions, except for the.hic format which stores multiple resolutions in a single file.Īfter the resolution is set by the user, Hi-C data can be transformed to focus on different features of the data. Datasets for each tool are stored at different resolution values, typically from 1 Mb to 5 kb. All tools in this review support visualization of Hi-C matrices at different resolutions. Generally, the user chooses a resolution value (i.e., bin size) based on sequencing depth of the dataset, striking a balance between detail and the sparsity that results from high resolution analysis. Hi-C data sets can be binned at different resolutions. An important tradeoff between different formats is the size of the file sparse representations and especially the binary BUTLR and.hic formats require less disk space relative to uncompressed versions of other file formats. Most of the present file formats are very similar to one another, and conversion between most formats is straightforward using command line tools. The Epigenome Browser also supports the.hic format.Īs Hi-C datasets continue to accumulate, the scientific community will likely come to a consensus on standardized file formats to represent Hi-C datasets. These.hic files are built from sequenced read pair files from a Hi-C experiment. Juicebox uses a complementary software package, Juicer, to build.hic files that store binary contact matrices at different resolutions. The 3D Genome Browser uses its own sparse matrix representation in binary format, which can be created using the BUTLRTools software package. Hi-Browse and my5C also uses tab delimited text files, but unlike the Epigenome Browser format, the my5C and Hi-Browse formats require that every entry be explicitly represented in the input file, which includes pairs of loci with zero contacts. The Epigenome Browser represents Hi-C matrices using tab-delimited text files, similar to the browser extensible data (BED) files often used in Genomics. Most tools accept files that represent contact matrices however, the file format requirements differ by tool (Table (Table1). All five visualization tools can upload user data or data downloaded from repositories such as 3DGD or 4DGenome. Hi-C datasets are accumulating rapidly, and many users will need the capability to upload new datasets into these tools. ![]() The Hi-Browse, Juicebox, and my5C can be used with any genome. The Epigenome Browser supports a total of 19 genomes, and the 3D Genome browser supports human and mouse genomes. Most of these datasets are from Hi-C experiments performed on human cells, but each tool supports genomes of other organisms. Juicebox also offers datasets from 27 other studies, which includes data from a variety of organisms (Additional file 1). Datasets available for each tool are summarized in Table Table1. These three studies include nine human cell types from different lineages and tissues-IMR90, H1, GM06990, HMEC, NHEK, K562, HUVEC, HeLa, and KBM7-which makes them useful for many types of analyses. Available datasets include three influential studies that performed Hi-C experiments on several cell types, which we refer to using the last name of the first author on the respective publications: Lieberman-Aiden, Dixon, and Rao. Four of the five visualization software tools come with publicly available datasets, but my5C does not. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |