1.
Revise the parameters
about the QUALITY OF THE ALIGNMENT (Figure
1a, Figure 2a):
To assess the quality of an alignment we used three criteria: the number of sequences included in the alignment, the percentage of gaps o ambiguous bases within the alignment and the percentage of difference between the shortest and the longest sequences. For each criterion three qualitative categories were defined:
low quality,
medium quality and high quality:
Number
of sequences |
Low (2-5
=
!
) |
Medium (6-10
=
K
) |
High (>10
=
J) |
|
|
Percentage of gaps / ambiguous bases
within the alignment |
Low quality (≥30%
=
!
) |
Medium quality (≥10%-<30%
=
K
) |
High quality (<10%
=
J
) |
|
|
Percentage of difference in length
between the shortest and the longest
sequences |
Low quality (≥30%
=
!
) |
Medium quality (≥10%-<30%
=
K
) |
High quality (<10%
=
J
) |
2.
Check the
ALIGNMENT, the DND TREE FILE and the CLUSTAL SCORES (Figure 2b):
The ALIGNMENT is given in CLUSTAL,
FASTA and JALVIEW formats. JALVIEW is recommended, because
it allows you to view the alignment in colors, do manual edition, output
the alignment in different formats, etc. However, if you just want to
download the alignment in order to use it in another program, we
recommend to download the FASTA file. The Clustal log file reports the aligment scores.
You can open the DND Tree File as text,
but if TREEVIEW is installed on your computer, you will be able to see
it graphically. The unrooted tree can be very useful to check any clustering of sequences.
3.
Revise the
parameters about the QUALITY OF THE DATA SOURCE (Figure 1b):
The following four criteria were used to
determine if the study had a polymorphism
goal:
-
One or more sequences from the alignment
can be found in the Popset database.
-
All the sequences from the alignment have
consecutive Genbank accession numbers (for example,
AF254110-AF254111-AF254112-AF254113-AF254114-AF254115-AF254116-AF254117-AF254118-etc.)
-
All the sequences from the alignment
share one or more references (in which
they were published)
-
At least one of their references (shared
or not) are from these journals that
typically publish polymorphism studies:
Genetics, Mol. Biol. Evol., J. Mol. Evol.,
Mol. Phylogenet.
Evol. or Mol. Ecol.
Two values are assigned to each criterion:
true (complies
the requirement) or false (does not
comply the requirement).
4.
Revise the ORIGIN OF THE SEQUENCES
(Figure 2c):
In the main results page, three parameters
are given when available in the Genbank annotations: the country,
strain and population variant of each sequence. For a
complete description of the sequences, you can follow the links to the
DPDB, Genbank and EMBL databases.
5.
Check the RESULTS OF THE ANALYSES
(Figure 2d):
Check the results of polymorphism, linkage
disequilibrium and codon bias, especially when they show extreme
values. In those cases, the program may have grouped together
sequences from different origins, or maybe the alignment is poor.
6.
REANALYZE THE
DATA if needed:
If needed, reanalyze your data using
PDA, specifying the input sequences or changing the default
parameters. Otherwise, you can use the
SNPs-Graphic
(Figure 2e)
tool of DPDB to reanalyze your
data and obtaing graphics. This
is a web module that estimates several measures
of DNA sequence polymorphism and allows performing these analyses by the sliding windows
method, obtaining
graphic representations. Aligned DNA sequences are
introduced as input in FASTA format. See the
SNPs-Graphic help for more information.