.. _file_formats: ************ File Formats ************ This guide describes the file formats used in TopiaryExplorer. Tree File (Newick string .txt) ============================== The tree file is formatted as a Newick string and can be generated using any number of phylogenetic tree generation programs. An example Newick string with branch lengths and tip labels is shown below:: (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); Tip Data (Tab-delimited .txt) ============================= The tip data file is a matrix with rows corresponding to tip IDs and columns corresponding to some related tip metadata, such as a taxonomy assignment. An example tip metadata file is shown below:: #OTU ID Greengenes taxonomy 0 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae 1 k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales 10 k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales 100 k__Bacteria;p__Bacteroidetes;c__Flavobacteria;o__Flavobacteriales;f__Flavobacteriaceae 1000 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae The header line must start with a ``#``, and if no header line is supplied, one will be automatically generated. Any taxonomic assignment information to be used with the `Consensus Lineage <./tree_toolbar.html>`_ determination must be present in this file. OTU Table (Tab-delimited .txt) ============================== The OTU table file is a matrix with rows corresponding to OTU(tip) IDs and columns corresponding to sample IDs. This file acts as a map between the OTUs and the samples that appear in them, allowing the user to color an OTU built tree using related sample metadata. An example OTU table is shown below:: #Full OTU Counts #OTU ID PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636 Consensus Lineage 0 0 0 0 0 0 0 0 1 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae” 1 0 0 0 0 0 1 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae” 2 0 0 0 0 0 0 0 0 1 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Porphyromonadaceae;Parabacteroides 3 2 1 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”;”Lachnospiraceae Incertae Sedis” 4 1 0 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae” 5 0 0 0 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales 6 0 0 0 0 0 0 0 1 0 Root;Bacteria;Actinobacteria;Actinobacteria 7 0 0 2 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Ruminococcaceae” 8 1 1 0 2 4 0 0 0 0 Root;Bacteria;Firmicutes;”Bacilli”;”Lactobacillales”;Lactobacillaceae;Lactobacillus 9 0 0 2 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae” The header line must start with a ``#``, and is taken as the last line of comments, which all start with ``#``. If no header line is supplied, one will be automatically generated. Sample Data (Tab-delimited .txt) ================================ The sample data file is generated by the user. This file contains all of the information about the samples necessary to perform the data analysis. In general, you should include in the mapping file any metadata that relates to the samples(for instance, health status or sampling site). An example sample data file is shown below:: #SampleID COMMON_NAME DESCRIPTION KeyHand DigitHand Hand Individual M2Akey217.141030 keyboard Akey Left NA Left M2 M2Bkey217.141063 keyboard Bkey Ambiguous NA Ambiguous M2 M2Ckey217.141092 keyboard Ckey Left NA Left M2 M2Dkey217.140994 keyboard Dkey Left NA Left M2 M2Ekey217.141011 keyboard Ekey Left NA Left M2