How to assess the confidence on this analysis unit?

The results shown in these pages are obtained by an automatic analysis using PDA (Casillas & Barbadilla 2004) ( We highly recommend users to follow the following steps in order to assess the confidence on any analysis unit.

1. Revise the parameters about the QUALITY OF THE ALIGNMENT (Figure 1a, Figure 2a):

To assess the quality of an alignment we used three criteria: the number of sequences included in the alignment, the percentage of gaps o ambiguous bases within the alignment and the percentage of difference between the shortest and the longest sequences. For each criterion three qualitative categories were defined: low quality, medium quality and high quality:

Number of sequences Low (2-5 =  ! )
Medium (6-10 =    K )
High (>10 =   J)
Percentage of gaps / ambiguous bases within the alignment Low quality (≥30% =  ! )
Medium quality (≥10%-<30% =    K )
High quality (<10% =   J )
Percentage of difference in length between the shortest and the longest sequences Low quality (≥30% =  ! )
Medium quality (≥10%-<30% =    K )
High quality (<10% =   J )


2. Check the ALIGNMENT, the DND TREE FILE and the CLUSTAL SCORES (Figure 2b):

The ALIGNMENT is given in CLUSTAL, FASTA and JALVIEW formats. JALVIEW is recommended, because it allows you to view the alignment in colors, do manual edition, output the alignment in different formats, etc. However, if you just want to download the alignment in order to use it in another program, we recommend to download the FASTA file. The Clustal log file reports the aligment scores.

You can open the DND Tree File as text, but if TREEVIEW is installed on your computer, you will be able to see it graphically. The unrooted tree can be very useful to check any clustering of sequences.


3. Revise the parameters about the QUALITY OF THE DATA SOURCE (Figure 1b):

The following four criteria were used to determine if the study had a polymorphism goal:

  1. One or more sequences from the alignment can be found in the Popset database.

  2. All the sequences from the alignment have consecutive Genbank accession numbers (for example, AF254110-AF254111-AF254112-AF254113-AF254114-AF254115-AF254116-AF254117-AF254118-etc.)

  3. All the sequences from the alignment share one or more references (in which they were published)

  4. At least one of their references (shared or not) are from these journals that typically publish polymorphism studies: Genetics, Mol. Biol. Evol., J. Mol. Evol., Mol. Phylogenet. Evol. or Mol. Ecol.

Two values are assigned to each criterion: true (complies the requirement) or false (does not comply the requirement).


4. Revise the ORIGIN OF THE SEQUENCES (Figure 2c):

In the main results page, three parameters are given when available in the Genbank annotations: the country, strain and population variant of each sequence. For a complete description of the sequences, you can follow the links to the DPDB, Genbank and EMBL databases.


5. Check the RESULTS OF THE ANALYSES (Figure 2d):

Check the results of polymorphism, linkage disequilibrium and codon bias, especially when they show extreme values. In those cases, the program may have grouped together sequences from different origins, or maybe the alignment is poor.


6. REANALYZE THE DATA if needed:

If needed, reanalyze your data using PDA, specifying the input sequences or changing the default parameters. Otherwise, you can use the SNPs-Graphic (Figure 2e) tool of DPDB to reanalyze your data and obtaing graphics. This is a web module that estimates several measures of DNA sequence polymorphism and allows performing these analyses by the sliding windows method, obtaining graphic representations. Aligned DNA sequences are introduced as input in FASTA format. See the SNPs-Graphic help for more information.


Figure 1  

Figure 2  

[Close this window]