[Up] [ CIESM ] [ New copepods ] [ Taxonomic revison..amphipods ] [ New marine biology informatics infrastructure ] [ Pelagic and benthic macrofauna species diversity ] [ Fish biodiversity ] [ NAGISA project ] [ Biodiversity-substratum interactions ] [ Epibenthos in the Barents Sea ] [ ARB ] [ Biogeosciences ]

ARB - "A comprehensive software package for phylogenetic analysis"

Dr. Johanna Wesnigk and Prof. Dr. Frank Oliver Glöckner

Max Planck Institute for Marine Microbiology, Celsiusstraße, 128359 Bremen, Germany

Further Reading

Amann, R., W. Ludwig, & K.-H. Schleifer, (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 43- 69.

Ludwig W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, A. Buchner, T. Lai, S. Steppi, G. Jobb, W. Foerster, I. Brettske, S. Gerber, A. W. Ginhart, O. Gross, S. Grumann, S. Hermann, R. Jost, A. Koenig, T. Liss, R. Luemann, M. May, B. Nonhoff, B. Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A. Bode and K.-H. Schleifer, (2004) ARB: a software environment for sequence data. Nucleic Acids Research, 2004, Vol.32, No.4 363- 37

Ludwig, W. & H.-P. Klenk, (2001) Overview: a phylogenetic backbone and taxonomic framework for prokaryotic systematics. In: Garrity, G. (ed.) Bergey's Manual of Systematic Bacteriology (2nd edn). Springer, New York, pp. 49-65.

Molecular phylogeny provides a stable framework to address the diversity of organisms. The understanding of the basic principles of sequence alignment and phylogenetic tree reconstruction is therefore essential. To link phylogeny with community composition and structure in the environment, the application of group- or taxon-specific molecular probes has become a standard technique over the last years.

To facilitate sequence database maintenance, phylogenetic analysis and molecular probe design the powerful integrated software package ARB was developed at the Technical University of Munich. After 10 years of continuous improvement it has become a standard tool used by experts all around the world. In order to use the system efficiently, participation in a training workshop is highly recommended. The workshops give a thorough introduction into the theoretical background of phylogenetic tree reconstruction and probe design followed by a "hands-on" experience on phylogenetic tree making using the freely available software package ARB.

Description of ARB

ARB (Latin: arbor, tree) is a comprehensive software package originally developed for ribosomal RNA (rRNA) data. Nowadays, it can is used for any nucleic- or amino acid sequence data. The central idea is to arrange a database of sequences and associated descriptive information according to the phylogenetic relationships of the corresponding organisms. This phylogenetic tree is visualised and can be used for walking through the database via simple mouse clicks.

ARB imports raw data and primary structure data from existing sources like GenBank and EBI. For rRNA data ARB already provides databases with processed primary structures (aligned sequences). Any additional data related to the individual sequences can be stored in structured database fields or linked via local or worldwide networks. Furthermore, to facilitate in-depth analysis of molecular data a comprehensive selection of software tools is integrated into ARB. These are controlled via a common graphical user interface and interact directly with one another as well as with the database (see Figure 1). Software and start-up databases are freely available at http://www.arb-home.de.

Technical details and features

1. ARB-Database

The designation and hierarchy of database fields can be customised by each user. Links to other databases are possible with several levels of protection facilitating security management. All information stored in the database, along with the sequence data such as bibliography, user made entries, or information calculated on-line from the database entries, can be shown at the terminal nodes of the tree.

Furthermore, all integrated and second-party tools are linked to the ARB-database. Thus, any changes or rearrangements are immediately known to all software components without user intervention.

The following tools are integrated:

  • Tools for importing and exporting aligned and unaligned nucleic acid and protein sequences in various database formats;
  • Tools for defining sequence profiles and for finding and evaluating organism or group specific identifiers (probes or primers);
  • Database management with full text search in any database field, as well as complex search options;
  • A script language for writing small programs to facilitate recurring tasks.

2. Sequence editor and alignment

A powerful sequence editor which can be used for nucleic acid and amino acid sequences allows sequence editing, string search, visualisation of base pairing, positional variability or likelihood. All colours and symbols are user defined and separate "align" and "edit" modes prevent erroneous changes. A special feature for rRNA sequences is the simultaneous secondary structure editor which helps the scientist to evaluate probe targets. The following features are implemented:

  • Automated alignment of a sequence according to its next relative (only for nucleic acids);
  • Synchronisation of nucleic acid and amino acid sequence alignments;
  • Implemented secondary structure editor (only for rRNA);
  • Visualisation of patterns and probes.

3. Phylogenetic reconstruction

For phylogenetic tree making several alternative methods available as part of the package ranging from simple character counting to a special maximum parsimony approach that allows the reconstruction and evaluation of big trees. One prominent feature is the possibility to add sequences to an existing tree without changing the overall topology. Optimisation of trees can be applied to the complete or user-selected sub-trees and intermediate stages can be stored. Various programs for phylogenetic analyses are directly cooperating within ARB, e.g. some programs of the PHYLIP package for phylogeny inference are incorporated.

Useful features include:

  • Distance matrix, maximum parsimony and maximum likelihood methods;
  • Confidence tests (bootstrapping);
  • Partial-and full-length sequences can be added to existing trees without altering the general topology;
  • Extended possibilities for tree visualization and optimization.

4. Probe design and evaluation

In just three steps the selection of the target organisms, probe design and probe match is done and can be visualised. For fluorescence in situ hybridisation (FISH) the ARB multi-probe software is especially useful. A peculiarity is the position tree (PT) server, which has to be established by every user locally. Analyses with the PT-server do not rely upon aligned sequences and can be used for fast and reliable probe design and probe check. The advantages of ARB in this respect are:

  • Easy graphical selection of target sequences
  • Fast and reliable probe design and probe check based on suffix trees
  • Graphical visualisation of results in the tree, alignment and secondary structure

Everybody with an interest in sequence alignment, phylogenetic analysis and/or probe design and hybridisation should be able to profit from using ARB. ARB sequence databases are currently available for small subunit rRNA. Ongoing developments are focusing on:

  1. Sequence databases for further phylogenetic markers
  2. A tool for visualisation of accessibility maps
  3. Further extensions so that ARB can handle complete genomes.

MarBEF EU Network of Excellence, funded under the Sixth Framework Programme of the European Union
Principle investigators: Chris Emblow and Roisin Nash