File Formats

This guide describes the file formats used in TopiaryExplorer.

Tree File (Newick string .txt)

The tree file is formatted as a Newick string and can be generated using any number of phylogenetic tree generation programs. An example Newick string with branch lengths and tip labels is shown below:

(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);

Tip Data (Tab-delimited .txt)

The tip data file is a matrix with rows corresponding to tip IDs and columns corresponding to some related tip metadata, such as a taxonomy assignment. An example tip metadata file is shown below:

#OTU ID      Greengenes taxonomy
0    k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae
1    k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales
10   k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales
100  k__Bacteria;p__Bacteroidetes;c__Flavobacteria;o__Flavobacteriales;f__Flavobacteriaceae
1000 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae

The header line must start with a #, and if no header line is supplied, one will be automatically generated. Any taxonomic assignment information to be used with the Consensus Lineage determination must be present in this file.

OTU Table (Tab-delimited .txt)

The OTU table file is a matrix with rows corresponding to OTU(tip) IDs and columns corresponding to sample IDs. This file acts as a map between the OTUs and the samples that appear in them, allowing the user to color an OTU built tree using related sample metadata. An example OTU table is shown below:

#Full OTU Counts
#OTU ID PC.354 PC.355 PC.356 PC.481 PC.593 PC.607 PC.634 PC.635 PC.636 Consensus Lineage
0 0 0 0 0 0 0 0 1 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
1 0 0 0 0 0 1 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
2 0 0 0 0 0 0 0 0 1 Root;Bacteria;Bacteroidetes;Bacteroidetes;Bacteroidales;Porphyromonadaceae;Parabacteroides
3 2 1 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”;”Lachnospiraceae Incertae Sedis”
4 1 0 0 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”
5 0 0 0 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales
6 0 0 0 0 0 0 0 1 0 Root;Bacteria;Actinobacteria;Actinobacteria
7 0 0 2 0 0 0 0 0 1 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Ruminococcaceae”
8 1 1 0 2 4 0 0 0 0 Root;Bacteria;Firmicutes;”Bacilli”;”Lactobacillales”;Lactobacillaceae;Lactobacillus
9 0 0 2 0 0 0 0 0 0 Root;Bacteria;Firmicutes;”Clostridia”;Clostridiales;”Lachnospiraceae”

The header line must start with a #, and is taken as the last line of comments, which all start with #. If no header line is supplied, one will be automatically generated.

Sample Data (Tab-delimited .txt)

The sample data file is generated by the user. This file contains all of the information about the samples necessary to perform the data analysis. In general, you should include in the mapping file any metadata that relates to the samples(for instance, health status or sampling site). An example sample data file is shown below:

#SampleID    COMMON_NAME     DESCRIPTION     KeyHand DigitHand       Hand    Individual
M2Akey217.141030     keyboard        Akey    Left    NA      Left    M2
M2Bkey217.141063     keyboard        Bkey    Ambiguous       NA      Ambiguous       M2
M2Ckey217.141092     keyboard        Ckey    Left    NA      Left    M2
M2Dkey217.140994     keyboard        Dkey    Left    NA      Left    M2
M2Ekey217.141011     keyboard        Ekey    Left    NA      Left    M2

Table Of Contents

Previous topic

Installing TopiaryExplorer

Next topic

Quickstart Tutorial

This Page