Manual & Documentation

Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced DNA reads from thousands of microbial genomes. Interpreting these profiles can be a challenge since the data they represent is very complex.

Particularly challenging is their visualization, as existing techniques are inadequate when the taxa number in the thousands. Microbiome Maps are visualizations of abundance profiles using a space-filling curve.

Jasper is a tool for visualizing abundance profiles from metagenomic whole-genome DNA sequencing and 16S sequencing. It creates easy to understand images using a Hilbert Curve. Jasper is FREE and the current version runs on macOS.

 

Jasper Software

Where do I get it?

The Jasper software is a set of easy to use graphical and command-line applications designed for macOS, Python, and R. You can download them at the Mac App Store, or from GitHub.

How do I install it?

Installation is simple: just search for β€œMicrobiome Maps” or β€œJasper” in the Mac App Store and then install Jasper as you would any other application. For the Python and R scripts, follow the instructions at the repository.

How much is it?

Jasper is completely free. πŸ˜€

How do I use it?

Jasper is very easy to use. When you start the macOS version, you can just play around with the interface to create sample Hilbert Curves to get an idea of how space-filling curves work. The β€œHilbert Curve” slider on the app allows you to create curves up to level 10 β€” curves of a higher level partition the image into very small sections that are not distinguishable with current monitors, and the result is a big grey image. For the Python script, just run it with the applicable parameters for your data.

Abundance Profile

To get started with Jasper, you’ll need a metagenomic abundance profile from either whole-genome sequencing data, or 16S sequencing. The same profile format works for both the GUI and CLI versions. The abundance profile is formatted using a simple β€œ.txt” file with 5 or 6 tab-delimited fields. These fields are:

  1. Taxa ID

  2. Accession Number (Genome Assembly)

  3. Kingdom

  4. Taxa Name

  5. Abundance

  6. Condition Label

The last field is optional if you are using a β€œTaxonomic” scheme, but required for the β€œLabeled” scheme. After Jasper loads your profile, you can click on any part of the image and get a popover that will allow you to find out more about the taxa you just clicked on.

Input abundance profile format. The input β€œ.txt” file should be tab-delimited (β€œ\t”), UTF-8 encoded, and use β€œUnix (LF)” line endings.

 

Fields & Integrations

Jasper integrates with several annotation authorities such as Ensembl, NCBI, and Uniprot. After you load your profile, you will be able to click on any region of the image and get a popover view with links that will open a website with more information about the taxa you just clicked on. Below is a discussion on the specifics of each field.

⚠️ Note: If you do not have a β€œTaxa ID”, just add β€œ00” (that’s a double zero) in its place. For the remaining fields, if you do not have data, just substitute β€œNA” in its place.

Fields

β€’ Taxa ID : numeric

A numeric identifier that can have length of up to seven digits. If you do not have a Taxa ID, substitute with a double-zero (β€œ00”).

β€’ Accession Number : Alphanumeric

An Ensembl genome assembly identifier. It should be prefixed with β€œGCA_”. If you do not have a Accession Number, substitute with β€œNA”.

β€’ Kingdom : String

A plain-text string that denotes the top-most taxonomic level. Three are supported: β€œBacteria”, β€œFungi”, and β€œVirus”. When the β€œtaxonomic” ordering scheme is selected, Jasper will order all the taxons for a given group within the same region. If you do not have a Kingdom label, substitute with β€œNA”.

β€’ Taxa Name : String

A plain-text string containing the full name of the taxon. The taxa name string should contain at least three names: β€œGenus”, β€œSpecies”, and β€œStrain”. Jasper will parse this field and use the first token as the genus identifier, the second token as the species identifier, and the remaining tokens as the strain names.

β€’ Abundance : Floating-Point

A floating-point scalar value that represents the relative abundance of the given taxon.

β€’ Condition Label : String

A user-defined plain-text label that defines a biological condition, or biological interpretation. Jasper will group taxons around these labels and then order them using a taxonomic ordering within each region.

 

Example Profiles

To get you started with formatting your profiles, you can download these examples.

By fixing the orderings of the taxa, a microbiome map can be used to present groups of metagenomic samples that can be partitioned temporally (longitudinal studies), spatially (body or environmental sites), by disease type (and subtype), by disease stage, and by developmental stages.
Additionally, it is readily possible to create average maps, aggregate maps, and differential maps showing either average, aggregate, or differential abundances, respectively.

Differential Abundances

To visualize differential abundances, you will need to format the profile so that it does not reflect the single abundance of a single sample, but rather, the processed results of a differential analysis of many samples in multiple biological conditions. How to analyze microbiome data for a differential abundance analysis is outside the scope of this manual, but you the paper from Quinn et al., β€œA Field Guide for the Compositional Analysis of Any-Omics Data” is a good place to start:

Figure 1 from Quinn et al., A Field Guide for the Compositional Analysis of Any-Omics Data. β€œColored boxes indicate procedures that would apply to any relative data set.” Green boxes describe the log-ratio transformation dependent methods described in the section β€œTransformation Dependent Analyses, and includes the centered log-ratio (CLR) procedure.

Visualizing a compositional analysis of two conditions using a microbiome map would require that the input profile represent the taxons that are found to be differentially expressed, and the abundance field in the input format would not be a raw abundance value, but rather, the clr-transformed ratios of the sample, or the adjusted p-value of a statistical test. The format would be the same as before.

 

 

Microbiome Maps

We use a technique called the Hilbert curve visualization (HCV) to visualize the microbial community abundance profiles of a large number of genomes. These profiles contain the relative abundance measurements of thousands of genomes, and they are ordered along a space-filling curve in a 2D square using the Hilbert curve, making it possible to visualize the profile of a single metagenomic sample. In the resulting Hilbert image, each position is a genome from the reference database, and the intensity of the position's color value represents the abundance of a genome in the sample.

Depending on the ordering of the genomes that is selected, different microbial neighborhoods are created, allowing for different interpretations of the clusters of bright segments, i.e., hotspots, of abundant genomes in the images. Fixing the position of a genome results in visualizations that allow for quick comparisons of the abundance of the same genome or sets of genomes in multiple microbiome samples.

The color intensity of each position in the image represents the abundance of one microbial genome. Groups of segments are labeled by the common taxonomic groups induced by the ordering of the taxa with the Hilbert curve.

(A) The first five iterations of the Hilbert curve: the Level 1 curve is obtained by connecting the centers of the four initial squares as shown; the Level k curve is obtained by a recursive partitioning of each square from Level k-1.

Microbial Neighborhoods

Different linear orderings produce different Hilbert visualizations, with each resulting in clusters of related microbes along neighboring regions in the 2D plane. The clustering creates unique areas that resemble community neighborhoods, and they represent microbes belonging to either the same taxonomic group, or the same biological condition β€” the idea being that they are clustering around a common scheme (taxonomic or biological).

Multiple 1D linear orderings can exists, but our current version of the software uses two: 1) a taxonomic order, and 2) a user-defined biological condition ordering. Ordering genomes using a taxonomic order is based on a genome's taxonomic lineage. In this ordering, pairs of taxa belonging to the same taxonomic group are placed close to each other along the curve, and consequently, close to each other in the Hilbert image. Ordering genomes using a user-defined order allows users to define their own orderings based on their experimental conditions.

Neighborhoods are drawn based on a taxonomic tree for microbial classification. Here, we show the distribution of Genera in the reference genome database, and the relative size of each neighborhood is consistent with the number of genomes that belong to it.