Current version: 2.0

 Index

1. Introduction to PDA

2. The PDA Search form

Input data
Main parameters
Alignment parameters

3. Algorithm of Maximization of the Number of Informative Sites in the aligned sequences (AMNIS)

4. PDA Output

HTML output
MySQL database
Quality parameters
Description of all the diversity estimations

5. Histogram Maker tool

6. Managing your submissions

7. Download PDA

 

 

1. Introduction to PDA:

PDA, Pipeline Diversity Analysis, is a collection of programs and modules, mainly written in Perl, that automatically can:

  1. explore for potentially polymorphic sequences from a large source of heterogeneous DNA,

  2. extract and sort them out by gene, species and extent of similarity,

  3. align the different groups, and

  4. estimate the genetic diversity in different functional regions.

PDA has a user-friendly, web-based interface where the user can select the sequences to be analyzed and the parameters to be used. Sequences can be retrieved from either Genbank (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide; see the NCBI's Disclaimer and Copyright) or the DPDB database (http://dpdb.uab.es) as a list of organisms, genes or accession numbers. Low quality sequences coming from large-scale sequencing projects (i.e. working draft), where most missing data is, will be excluded from the analysis. Alternatively, sequences can be introduced manually in Fasta or Genbank formats. All sequences will be grouped by organism and gene, and groups will be aligned using one of the three alignment programs available. After, different analyses will be performed in the different functional regions of the genes.

 
Typical usage of this program:

The researcher only needs to address to the PDA main page, choose Genbank or DPDB as the data to search for, and write the species/gene names or accession numbers. Additionally, the user can change some parameter values such as the analyses to be performed, the gene regions to be studied separately, or some quality alignment parameters. The alignment parameter values can also be changed, such as the alignment program to be used.

 

When the job is sended, it is added to the batch queue, so the user has to wait until the results are ready. These include a database containing all the sequences and measures of DNA diversity, and all the alignments performed in different formats, including the java applet visualizer and editor Jalview. Links to the Genbank and EMBL databases are available for each sequence included in each alignment. What’s more, the output includes some HTML pages with summary statistics and the estimations, and a histogram maker tool for graphical displays of the results.


The PDA Menu allows you to navigate through the different pages of the Web site:

  1. The PDA Search page, from which you input your data and submit your jobs.

  2. The Previous IDs page, from which you can manage your submissions.

  3. The Help page (this page)

  4. The Example, which contains a submission to PDA looking for polymorphic sequences in the Cetacean group.

  5. The Download Source Code page, from which you can download the source code of PDA and install it locally in your computer.

  6. The Bugs in Older Versions

  7. The Contact us page, where you can find information about the authors and how to cite PDA.

A second menu (PDA Databases) contain quick links to the secondary databases that have been created with PDA. At this moment:

  1. The DPDB database: the Drosophila Polymorphism DataBase contains all the polymorphic sequences in the Drosophila genus.

  2. The MamPol database: contains all the polymorphic sequences in the mammalian species (excluding humans).

 

2. The PDA Search form:

Input data:

First, you have to indicate whether you want the sequences:

  1. to be retrieved from a database, or

  2. they will be given by yourself in Fasta or Genbank formats.

In the case you want the sequences to be retrieved from a database, go to the Input from database layer and choose the source database from which the sequences will be obtained. At this moment, two options are allowed: Genbank or DPDB. Note that DPDB is a database of nucleotide sequences from the Drosophila genus, so you can use it in the case you are interested with sequences from this taxonomic group. Then, you have to choose whether you will introduce a list of organisms and/or genes, or a list of accession numbers corresponding to the source database. Every new item must be in a new line. For example:

 

Otherwise, if you want to introduce the sequences yourself, go to the Input sequences manually layer and paste them in the form, or use the Navigator to find the appropriate file in your computer. The program can read two formats: Fasta and Genbank. Follow the instructions given below:


Fasta format:

Each new sequence begins with a line >HEADER. The sequence follows on the next lines until the next >HEADER line is found. Note that you cannot specify any sequence annotation using this format. However, you can specify the organism and gene names using the following syntax in the header:

>Organism|gene

Example:

 

Genbank format:

You can retrieve sequences in Genbank format from the Genbank database. Note that each record must end with a new line including exclusively two bars as in the example:

 
LOCUS       AY147419                 486 bp    DNA     linear   INV 12-JAN-2004
DEFINITION  Drosophila auraria isolate DPAJ1325 histone H2A gene, partial cds;
            H2A/H2B intergenic spacer region, complete sequence; and histone
            H2B gene, partial cds.
ACCESSION   AY147419
VERSION     AY147419.1  GI:27368158
KEYWORDS    .
SOURCE      Drosophila auraria
  ORGANISM  Drosophila auraria
            Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota;
            Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha;
            Ephydroidea; Drosophilidae; Drosophila.
REFERENCE   1  (bases 1 to 486)
  AUTHORS   Yang,Y., Zhang,Y.P., Qian,Y.H. and Zeng,Q.T.
  TITLE     Phylogenetic relationships of Drosophila melanogaster species group
            deduced from spacer regions of histone gene H2A-H2B
  JOURNAL   Mol. Phylogenet. Evol. 30 (2), 336-343 (2004)
   PUBMED   14715225
REFERENCE   2  (bases 1 to 486)
  AUTHORS   Yang,Y. and Zhang,Y.P.
  TITLE     Direct Submission
  JOURNAL   Submitted (03-SEP-2002) Life Science, Hubei University, Wuhan,
            Hubei 430062, China
FEATURES             Location/Qualifiers
     source          1..486
                     /organism="Drosophila auraria"
                     /mol_type="genomic DNA"
                     /isolate="DPAJ1325"
                     /db_xref="taxon:47315"
     mRNA            complement(<1..>132)
                     /product="histone H2A"
     CDS             complement(<1..132)
                     /codon_start=1
                     /product="histone H2A"
                     /protein_id="AAN87198.1"
                     /db_xref="GI:27368159"
                     /translation="MSGRGKGGKVKGKAKSRSNRAGLQFPVGRIHRLLRKGNYAERVG
                     "
     misc_feature    133..348
                     /note="histone H2A/H2B intergenic spacer"
     mRNA            <349..>486
                     /product="histone H2B"
     CDS             349..>486
                     /codon_start=1
                     /product="histone H2B"
                     /protein_id="AAN87199.1"
                     /db_xref="GI:27368160"
                     /translation="MPPKTSGKAAKKAGKAQKTSPRTTRRRSGRGRRALLIYIYKVLK
                     QV"
ORIGIN     
        1 accaacgcgc tcggcatagt tgcccttgcg gagcagacga tgaatacggc cgactgggaa
       61 ctggagtccg gcacggtttg agcgggactt tgcctttccc tttactttgc caccttttcc
      121 acgaccagac attttctttt atttcacttt attcacttca cacagacgaa gaacgaatgt
      181 tggtgcaacc caagttgtca cgaatttata cttttaggtc tgcttgcgcg ttcagtttgg
      241 ggtgggtcga cttagacctg aaaacattgc tggaaaaaaa gtataagagc gaacaccaaa
      301 actcgtctac catattaagt gaatcgtcaa gtgaagtgaa gtgaaataat gccgccgaaa
      361 actagtggaa aggcagccaa gaaggctggc aaggctcaga agacatcacc aagaacgaca
      421 agaagaagaa gcggaagagg aaggagagct ttgcttatct acatttacaa ggtcctgaag
      481 caggtc
//

LOCUS       AY147418                 486 bp    DNA     linear   INV 12-JAN-2004
DEFINITION  Drosophila auraria isolate YY001325 histone H2A gene, partial cds;
            H2A/H2B intergenic spacer region, complete sequence; and histone
            H2B gene, partial cds.
ACCESSION   AY147418
VERSION     AY147418.1  GI:27368155
KEYWORDS    .
SOURCE      Drosophila auraria
  ORGANISM  Drosophila auraria
            Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota;
            Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha;
            Ephydroidea; Drosophilidae; Drosophila.
REFERENCE   1  (bases 1 to 486)
  AUTHORS   Yang,Y., Zhang,Y.P., Qian,Y.H. and Zeng,Q.T.
  TITLE     Phylogenetic relationships of Drosophila melanogaster species group
            deduced from spacer regions of histone gene H2A-H2B
  JOURNAL   Mol. Phylogenet. Evol. 30 (2), 336-343 (2004)
   PUBMED   14715225
REFERENCE   2  (bases 1 to 486)
  AUTHORS   Yang,Y. and Zhang,Y.P.
  TITLE     Direct Submission
  JOURNAL   Submitted (03-SEP-2002) Life Science, Hubei University, Wuhan,
            Hubei 430062, China
FEATURES             Location/Qualifiers
     source          1..486
                     /organism="Drosophila auraria"
                     /mol_type="genomic DNA"
                     /isolate="YY001325"
                     /db_xref="taxon:47315"
     mRNA            complement(<1..>132)
                     /product="histone H2A"
     CDS             complement(<1..132)
                     /codon_start=1
                     /product="histone H2A"
                     /protein_id="AAN87196.1"
                     /db_xref="GI:27368156"
                     /translation="MSGRGKGGKVKGKAKSRSNRAGLQFPVGRIHRLLRKGNYAERVG
                     "
     misc_feature    133..348
                     /note="histone H2A/H2B intergenic spacer"
     mRNA            <349..>486
                     /product="histone H2B"
     CDS             349..>486
                     /codon_start=1
                     /product="histone H2B"
                     /protein_id="AAN87197.1"
                     /db_xref="GI:27368157"
                     /translation="MPPKTSGKAAKKAGKAQKTSPRTTRRRSGRGRRALLIYIYKVLK
                     QV"
ORIGIN     
        1 accaacgcgc tcggcatagt tgcccttgcg gagcagacga tgaatacggc cgactgggaa
       61 ctggagtccg gcacggtttg agcgggactt tgcctttccc tttactttgc caccttttcc
      121 acgaccagac attttctttt atttcacttt attcacttca cacagacgaa gaacgaatgt
      181 tggtgcaacc caagttgtca cgaatttata cttttaggtc tgcttgcgcg ttcagtttgg
      241 ggtgggtcga cttagacctg aaaacattgc tggaaaaaaa gtataagagc gaacaccaaa
      301 actcgtctac catattaagt gaatcgtcaa gtgaagtgaa gtgaaataat gccgccgaaa
      361 actagtggaa aggcagccaa gaaggctggc aaggctcaga agacatcacc aagaacgaca
      421 agaagaagaa gcggaagagg aaggagagct ttgcttatct acatttacaa ggtcctgaag
      481 caggtc
//

LOCUS       AF461290                 384 bp    DNA     linear   INV 30-SEP-2003
DEFINITION  Drosophila auraria cytochrome oxidase II (COII) gene, partial cds;
            mitochondrial gene for mitochondrial product.
ACCESSION   AF461290
VERSION     AF461290.1  GI:20805483
KEYWORDS    .
SOURCE      mitochondrion Drosophila auraria
  ORGANISM  Drosophila auraria
            Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota;
            Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha;
            Ephydroidea; Drosophilidae; Drosophila.
REFERENCE   1  (bases 1 to 384)

...


 

Main parameters:

PDA will analyze the performed alignments in terms of Polymorphism, Synonymous and non-synonymous substitutions (1), Linkage disequilibrium (2) and Codon bias (3) by default. However, you can disable the last three analyses. Linkage disequilibrium is estimated in non-overlapping sliding windows. You can choose the window length, which is set to 50 segregating sites by default, although the maximum value that will be used on the web is 100 consecutive segregating sites (be aware that windows may not be of the same length in nucleotides).

You can also choose which Gene regions you want PDA to analyze separately (4). Regions are that of Genbank annotations, which is also used in the DPDB database. CDS and exon are chosen by default, and also introns if you unable the Linkage disequilibrium analysis.

The Minimum number of sequences per category is the minimum number of sequences you for alignment (5). Number 2 is set by default.

The Minimum Identity Score for pairwise comparisons is the minimum percentage of identity between each pairwise sequences comparison in a final alignment (excluding gapped positions) (6). The aim of this parameter is to separate fragmented or incorrectly annotated sequences into different subgroups, but it is also very useful to separate, in different alignments, sequences of a same organism that come from well separated populations. Sequences are never used in more than one subgroup. 95% is the default value.

The Minimum sequences length in the analyses makes the program to exclude those sequences shorter than the value set (in number of nucleotides) (7). 100 is the default value.

You can choose to use the Algorithm of Maximization of the Number of Informative Sites in the aligned sequences (AMNIS) or not (see below) (8).

Finally, you can introduce a Title to remember the contents of the analysis (9).

 

Alignment parameters:

Finally, you can choose which alignment program you want to use, or modify the default ClustalW parameters, although these have been optimized for polymorphism analyses.

Muscle and T-Coffee have been added in PDA version 2 because they have been shown to perform better in alignments with a lot of gaps. We suggest to use them when analyzing non-coding data (introns, UTRs, etc.).

Fast pairwise alignment:

  • K-tup: Can be 1 or 2 for proteins; 1 to 4 for DNA. Increase this to increase speed; decrease to improve sensitivity.
     

  • Window length: The number of diagonals around each "top" diagonal that are considered. Decrease for speed; increase for greater sensitivity.
     

  • Score type: The similarity scores may be expressed as raw scores (number of identical residues minus a "gap penalty" for each gap) or as percentage scores. If the sequences are of very different lengths, percentage scores make more sense.
     

  • Topdiag: The number of best diagonals in the imaginary dot-matrix plot that are considered. Decrease (must be greater than zero) to increase speed; increase to improve sensitivity.
     

  • Pairgap: The number of matching residues that must be found in order to introduce a gap. This should be larger than K-Tuple Size. This has little effect on speed or sensitivity.
     

Multiple alignment:

  • Gap open: Reduce this to encourage gaps of all sizes; increase it to discourage them.   Terminal gaps are penalized same as all others except for END GAPS not being selected.  BEWARE of making this too small (approx 5 or so); if the penalty is too small, the program may prefer to align each sequence opposite one long gap.
     

  • End gaps: Here you can select if you want the terminal gaps to be penalized or not.
     

  • Gap extension: Reduce this to encourage longer gaps; increase it to shorten them.   Terminal gaps are penalized same as all others.  BEWARE of making this too small (approx 5 or so); if the penalty is too small, the program may prefer to align each sequence opposite one long gap.
     

  • Gap distances: Penalization for the distance between gaps.

 

3. AMNIS: Algorithm of Maximization of the Number of Informative Sites in the aligned sequences:

After the grouping and alignment of sequences, a further step (optional) is taken before estimating the polymorphism parameters. It is referred here as the Algorithm of Maximization of the Number of Informative Sites in the aligned sequences (AMNIS):

  1. First, PDA groups the sequences by length, so that sequences in the same group must not differ in more than the 20% of their length.

  2. It calculates the amount of informative sites in each accumulative group of sequences (e.j. group 1 (longest sequences), groups 1+ 2, groups 1 + 2 + 3, etc.).

  3. PDA will use the set of sequences with the largest number of informative sites (in some cases discarding the shortest sequences).

Example:

>LDseq000001
AGCATCGATCATCATCTACGTACGTACGATCAGCCGATGCGCGGGGTTTT   50
>LDseq000002
AGCATCGATCATCGTGTACGTACGTACGATCAGCCGATGCGCGGGGTTTT   50
>LDseq000003
AGCATCGATCATCATCTACGTACGTACGATCAGCCGATGCGCGGGG----   46
>LDseq000004
AGCATCGATCATCATCTACGTACGTACGATCAGCCGATGCGC--------   42
>LDseq000005
AGCATCG-------------------------------------------    7
>LDseq000006
AGCATCG-------------------------------------------    7
>LDseq000007
AGCATCG-------------------------------------------    7
>LDseq000008
AGCATCG-------------------------------------------    7

In this example, the first four sequences would be assigned to group 1, and the last four sequences to group 2. The number of informative sites (without gaps) using the four first sequences (group 1) is:

Informative sites group 1 = 42 non-gapped positions * 4 sequences = 168

Using the accumulative set of sequences of group 1 + group 2, we have more sequences, but less non-gapped positions:

Informative sites group 1+2 = 7 non-gapped positions * 8 sequences = 56

Therefore, we will have more informative sites by using the four long sequences only and discarding the short ones. PDA would show the alignment with all the sequences, but would use the four long sequences only to calculate the polymorphism estimates (n = 4 in the results).

To distinguish which sequences were used in the analyses from those which were discarded, PDA uses a color code:

    for sequences that were included in the estimates, and
    for sequences that were NOT included in the estimates.

You can find this information in the ALIGNMENTS page.

If you always want to use all the sequences of the alignments to calculate the estimates, unselect the appropriate box in the initial form:

    Use the Algorithm of Maximization of the Number of Informative Sites in the aligned sequences (AMNIS)

 

4. PDA output:

The output of the program is stored in our server. It includes a set of HTML pages with the results of the alignments and analyses, as well as a MySQL database with the same information but also all the sequences used and their annotations.
 

HTML output:

The main HTML results page is divided in different sections:

  1. The input parameters are described in the first section (organism/gene, input database, analyses performed, etc.)
     

  2. The second section allows you to download the database and see the LOG files (where the program saves possible errors that could occur during the analyses)
     

  3. Then, you can see a table with a summary of all the polymorphic sets that have been analyzed, and the quality parameters for each of the analyses.

    From this table you can access all the analyses performed. If you click on each gene region of each polymorphic set, all the information for each individual analysis unit will be displayed in different sections:
     

    • Polymorphic Set information

      • Setcode ID

      • Organism

      • Gene
         

    • Analysis unit information

      • Accession number

      • Region

      • Number of sequences

      • % excluded sites in the alignment (due to gaps or ambiguous positions).

      • Minimum/Maximum sequences lengths (% difference)

      • Date
         

    • Performed analyses

      • Polymorphism

      • Synonymous and non-synonymous substitutions

      • Linkage disequilibrium

      • Codon bias
         

    • Alignment

      • Clustal align (text file)

      • Fasta align (text file)

      • Jalview align (graphic viewer / editor)

      • Log file

      • DND Tree file (phylogeny)
         

    • Sequences used

      • Assigned sequence ID (identification number used in alignments)

      • Genbank / EMBL accession number (and links to both databases)

      • Location (corresponding to the original sequence from Genbank/EMBL)

      • Information on the source of the sequences (country, strain, population variant)
         

    • All diversity estimates

      • Polymorphism, SNPs list and SNPs-Graphic tool

      • Synonymous and non-synonymous substitutions

      • Linkage disequilibrium

      • Codon bias

     

  4. Histogram Maker Tool
     


 

MySQL database:

The database can be downloaded as a compressed file from the main HTML results page, or can be requested directly to our server (see a quick help). Its structure in tables is represented by the following figure:


 

Quality parameters:

PDA v.2 provides several measures concerning the quality of each data set so that the user can assess the confidence on the data source and the estimations. A quick guide is also supplied explaining how to use these quality measures and how to easily reanalyze the data.

Quality assessment of the alignments

To assess the quality of an alignment we use three criteria:

  1. The number of sequences included in the alignment

  2. The percentage of gaps or ambiguous bases within the alignment

  3. The percent difference between the shortest and the longest sequences

Three qualitative categories are defined for each criterion: high, medium and low quality, which are shown in the main output table to quickly visualize the confidence on the results. You should pay special attention at the quality of alignments and revise them after each analysis.

Quality assessment of the data sources

According to the data source, we use four criteria to determine if the sequences of a polymorphic set come from a population study:

  1. One or more sequences from the alignment are stored in the PopSet database

  2. All the sequences have consecutive GenBank accession numbers

  3. All the sequences share at least one reference, and

  4. One or more references are from journals that typically publish polymorphism studies (Genetics, Molecular Biology and Evolution, Journal of Molecular Evolution, Molecular Phylogenetics and Evolution or Molecular Ecology)

This information is shown by means of a confirmatory tick where data sets satisfies the corresponding criterion.

All this quality parameters are shown in the main output table for each analysis unit:

 

Description of all the parameters:
 

Polymorphism
G+C G+C percentage  
n Number of sequences
m Number of nucleotides in the alignment (alignment length)
start First nucleotide in the analysis
end Last nucleotide in the analysis
excluded Number of excluded positions in the analysis (gap or ambiguous positions)
analyzed Number of analyzed positions
S Number of segregating sites (S) Nei 1987
S / m Number of segregating sites per nucleotide
Theta (from S) Theta θ per site (estimated from S)
Eta Minimum number of mutations (η) Tajima 1996
Eta / m Minimum number of mutations η per site
Theta (from Eta) Theta θ per site (estimated from η)
Theta Theta θ per DNA sequence (estimated from S) Tajima 1993
V Theta (norec) Variance of θ per DNA sequence (estimated from S) - without recombination
dV Theta (norec) Standard deviation of θ per DNA sequence (estimated from S) - without recombination
V Theta (rec) Variance of θ per DNA sequence (estimated from S) - free recombination
dV Theta (rec) Standard deviation of θ per DNA sequence (estimated from S) - free recombination
Theta per site Theta θ per site (estimated from S) Nei 1987
V Theta_site (norec) Variance of θ per site (estimated from S) - without recombination
dV Theta_site (norec) Standard deviation of θ per site (estimated from S) - without recombination
V Theta_site (rec) Variance of θ per site (estimated from S) - free recombination
dV Theta_site (rec) Standard deviation of θ per site (estimated from S) - free recombination
FSM Theta (from pi) Theta θ per site (estimated from π) - Under a Finite Sites Model Tajima 1996
FSM Theta (from S) Theta θ per site (estimated from S) - Under a Finite Sites Model
FSM Theta (from eta) Theta θ per site (estimated from η) - Under a Finite Sites Model
k Average number of nucleotide differences Tajima 1983
Vst k (norec) Stochastic variance of k - without recombination
Vs k (norec) Sampling variance of k - without recombination
V k (norec) Total variance of k - without recombination
Vst k (rec) Stochastic variance of k - free recombination
Vs k (rec) Sampling variance of k - free recombination
V k (rec) Total variance of k - free recombination
Pi Average number of nucleotide differences per site (π) Nei 1987
Pi_JC Average number of nucleotide differences per site (π) - with Jukes&Cantor correction Nei 1987;
Jukes and Cantor 1969
V Pi_JC Variance of π (Jukes&Cantor)
Tajima Tajima's D Tajima 1989

Synonymous and non-synonymous changes
# Stop codons Number of STOP codons  
# Codons Total number of codons  
# Sites Total number of sites (nucleotides)  
# SS Number of Synonymous sites  
Pi (SS) π in synonymous sites  
Pi_JC (SS) π (with Jukes&Cantor correction) in synonymous sites  
V Pi_JC (SS) Variance of π (with Jukes&Cantor correction) in synonymous sites  
# NS Number of Non-synonymous sites  
Pi (NS) π in non-synonymous sites  
Pi_JC (NS) π (with Jukes&Cantor correction) in non-synonymous sites  
V Pi_JC (NS) Variance of π (with Jukes&Cantor correction) in non-synonymous sites  

Pairwise comparisons (between pairs of sequences)
SynDif Number of Synonymous differences Nei and Gojobori 1986
NSynDif Number of Non-synonymous differences
SynPos Number of Synonymous positions
NSynPos Number of Non-synonymous positions
Ks Number of synonymous polymorphisms per synonymous site
Ka Number of non-synonymous polymorphisms per non-synonymous site
Ks_JC Number of synonymous polymorphisms per synonymous site - with Jukes&Cantor correction
Ka_JC Number of non-synonymous polymorphisms per non-synonymous site - with Jukes&Cantor correction
V Ks_JC Variance of Ks (Jukes&Cantor)
V Ka_JC Variance of Ka (Jukes&Cantor)

Linkage disequilibrium
# Polym sites Number of polymorphic sites  
# Pairwise comparisons Number of pairwise comparisons (between pairs of polymorphic sites)
ZnS R2 average over all pairwise comparisons Kelly 1997

Pairwise comparisons (between pairs of polymorphic sites)
Dist Distance between both compared sites (in nucleotides, considering alignment gaps)  
D D Lewontin and Kojima 1960
D' D' Lewontin 1964
R R Hill and Robertson 1968
R2 R2
SChi2 χ2 test  
Sig (SChi2) Significance of the χ2 test
Fisher Fisher test
Sig (Fisher) Significance of the Fisher test

Codon Bias
# Codons Total number of codons  
# Sites Total number of sites (nucleotides)
# Stop Codons Number of STOP codons
# Codons (no !) Number of codons excluding STOP codons
# Codons (no !, W, M) Number of codons excluding STOP codons and those coding for a unique aminoacid (W=Trp and M=Met) (nuclear universal genetic code)
ENC Effective Number of Codons Wright 1990
CAI Codon Adaptation Index (the submitted sequences their selves are taken as the reference set) Sharp and Li 1987
SChi2 Scaled χ2 Shields 1988
G+Cc G+C content in all coding positions Wright 1990
G+C2 G+C content in second positions of codons
G+C3 G+C content in third positions of codons
RSCU Relative Synonymous Codon Usage Sharp 1986

 

5. Histogram maker tool:

This tool allows you to create personalized histograms for every parameter estimated. You can:

  1. Restrict the information to a organism and/or gene

  2. Choose the type of distribution (which parameter you want to represent in the histogram)

  3. Restrict the information to specific gene regions

  4. Choose the order of the categories: histogram or frequency

  5. Set a number of categories (number of bars in the histogram or frequency representation)

 

6. Managing your submissions:

On submitting a job, PDA v.2 can optionally store user information to allow them enter the Previous IDs section of the Web site and manage their previous analyses, either to revisit or to delete them. This new feature extends the previous Request by ID option of PDA v.1, which is still available.

In order to use this new feature, you have to create a new account for PDA with your e-mail address and a password. Please go the the end of the PDA Search form and find the link Help / Create a new account, and fill the form as follows:

After that, each time you submit a job remember to fill the form corresponding to your e-mail in the PDA Search form, so that the results will be saved with your e-mail account:

To manage your submitted jobs, go the the Previous IDs section of the Web site and write your e-mail and password in the appropriate boxes:

You will obtain a list with all your submitted jobs. From this list you can either revisit your analyses or delete them the PDA server:

You can also retrieve your previous analyses by identification number (specially useful if you didn't enter your e-mail address, so that they are not registered to your account):

 

7. Download PDA:

The source code of PDA can be downloaded from our FTP site under the GNU General Public License (GPL) and be used locally. This is highly recommended for big analyses. Please, go to the section "Download Source Code" and register in order to get access to the FTP server. See the Installation file that comes with the distribution or that can be downloaded here (1,238 KB) for specific instructions on how to use the program locally.



UAB       DGM