Format of Input and Output Sequence Files

 

1. List of orthologous genes and their locations in 26 conserved operons

This table lists predicted operons in Geobacter sulfurreducens with links to operon sequence data. The names and locations of open reading frames predicted to be within the operon are provided. Also listed are GenBank GI identification numbers for G. sulfurreducens. For each open reading frame in G.sulfurreducens links are also provided are links to the sequences of their putative orthologs in Geobacter metallireducens and Desulfovibrio vulgaris, along with the E-values for their Blastp similarity to the ORF in G. sulfurreducens.

                                                                                                                                 

2. List of files with sequence data for 26 operons

This table provides links to sequence files for each operon, AlignACE input files, and AlignACE output files.

The name of each file with sequence data for individual operons is OperonNumber_SpeciesName, where

OperonNumber is the operon name (see List of orthologous genes and their locations in 26 conserved operons) assigned by FGENESB (Solovyev and Salamov, unpublished).

SpeciesName is gsul for Geobacter sulfurreducens, gmet for Geobacter metallireducens and dvul for Desulfovibrio vulgaris.

 

Examples:

1256_gsul contains sequence data for operon 1256 in G. sulfurreducens;

1256_gmet contains sequence data for operon 1256 in G. metallireducens;

1256_dvul contains sequence data for operon 1256 in D. vulgaris.

 

The sequence files for individual operons are in FASTA format.

 

3. Input sequence files for AlignACE

 

The files are of the type *.in, where *  is the operon name.

 

Examples:

2.in is the AlignACE input file for operon 2;

17.in is the AlignACE input file for operon 17.

 

All sequences in AlignACE input files are in FASTA format.

 

AlignACE was developed by Roth et al. (Roth, F. P., Hughes, J. D., Estep, P. W., Church, G. M. 1998. Finding DNA regulatory motifs within unaligned noncoding sequences clusted by whole-genome mRNa quantitation. Nature Biotechnol. 16, 939-945).

 

 

4. Motifs predicted by comparative genomics analysis (AlignACE output files)

 

The files are of the type *.out, where * is the operon name.

Listed are motifs predicted for the noncoding regions of the 26 conserved operons.

 

Examples of files:

2.out is the file with predicted motif sequences for operon 2;

17.out is the file with predicted motif sequences for operon 17.

 

Output files provided here are in the AlignACE output format (Roth et al. 1998).  These files were used in further analyses to identify likely biologically significant motifs.  The AlignACE output format is described in detail in the help file of the George M. Church Laboratory Analysis Software for mRNA Abundance Data Web site .