We have developed grid services for the applications listed below. The table indicates the platforms on which the grid service runs and how many CPU Years of computation have been performed using the service.
Application | Ported To BOINC | Ported To | CPU Years^{0} | ||
Linux X86 | Windows | Mac OS X | |||
BLAST^{1} | — | ✓ | — | — | N/A |
Clustal W | ✓ | ✓ | ✓ | ✓ | N/A |
CNS | ✓ | ✓ | ✓ | — | 25.43 |
Complab^{2} | — | ✓ | ✓ | ✓ | 6.66 |
GARLI | ✓ | ✓ | ✓ | ✓ | 48191.37 |
GARLI-partition | — | ✓ | ✓ | ✓ | 117.86 |
gsi^{3} | — | ✓ | ✓ | ✓ | 142.64 |
HMMPfam | ✓ | ✓ | ✓ | ✓ | 8193.54 |
IM | ✓ | ✓ | ✓ | ✓ | 0.18 |
LAMARC | ✓ | ✓ | ✓ | ✓ | N/A |
MARXAN | ✓ | — | ✓ | — | 5248.17 |
MDIV | ✓ | ✓ | ✓ | ✓ | 13.25 |
Migrate-N | ✓ | ✓ | ✓ | ✓ | 0.00 |
Modeltest | ✓ | ✓ | ✓ | ✓ | N/A |
MrBayes | ✓ | ✓ | ✓ | ✓ | N/A |
ms | ✓ | ✓ | ✓ | ✓ | N/A |
Muscle | ✓ | ✓ | ✓ | ✓ | N/A |
PAUP*^{4} | — | ✓ | — | — | 0.02 |
Phyml | ✓ | ✓ | ✓ | ✓ | N/A |
Pknots | ✓ | ✓ | ✓ | ✓ | N/A |
Seq-gen | ✓ | ✓ | ✓ | ✓ | N/A |
S_{nn} | ✓ | ✓ | ✓ | ✓ | N/A |
ssearch | ✓ | ✓ | ✓ | ✓ | N/A |
Structure | — | ✓ | ✓ | ✓ | 2.64 |
^{1}BLAST has not been ported to BOINC because it requires large pre-staged databases.
^{2}Complab has not been ported to BOINC because it is implemented in Java.
^{3}gsi has not been ported to BOINC because it is implemented in R.
^{4}PAUP has not been ported to BOINC because of licensing restrictions.
BLAST
BLAST (Basic Local Alignment Search Tool) is a package of applications that can quickly and efficiently compare a specified nucleotide or amino acid sequence (or batch of such sequences) with an existing database of known sequences, translating, if necessary, between nucleotides and amino acids, and determine similarities among them. Various builds of blast are specialized to certain types of searches. This implementation uses a build of the general version, blastall. Visit the BLAST web site for more information and source distribution.
Clustal W
A program for multiple DNA or protein sequence alignment based on a progressive alignment strategy where more similar sequences are aligned first to produce groups of aligned sequences and then these groups are aligned together. Initial pairwise alignments provide the basis for calculating a distance between sequences. These distances are used to produce a neighbor-joining-based guide tree that is used to guide the progressive alignment. An affine gap penalty formula with independently selectable weights for gap open and gap extension is used for scoring gaps, and a choice of several user selectable weight matrices are used for scoring matches. A choice of slow/accurate and fast/approximate alignment algorithms are available. A neighbor-joining tree can be produced on the multiple alignment and bootstrap analysis can be performed. The program accommodates several input formats and also produces output in several formats.
CNS
Crystallography & NMR System (CNS) is the result of an international collaborative effort among several research groups. The program has been designed to provide a flexible multi-level hierachical approach for the most commonly used algorithms in macromolecular structure determination. Highlights include heavy atom searching, experimental phasing (including MAD and MIR), density modification, crystallographic refinement with maximum likelihood targets, and NMR structure calculation using NOEs, J-coupling, chemical shift, and dipolar coupling data. For more information visit the CNS web site.
Complab
Complab uses agent-based simulation models to study the geographic spread of Avian Influenza across the United States, to quantify the relative pandemic risk of US cities and determine optimal intervention strategies.
GARLI
GARLI is a phylogenetic analysis program that performs heuristic searches for the maximum likelihood tree. For more information, visit the GARLI web site.
GARLI-partition
GARLI is a phylogenetic analysis program that performs heuristic searches for the maximum likelihood tree. For more information, visit the GARLI web site.
gsi
The genealogical sorting index (gsi) is a statistic to quantify the common ancestry of labeled tips on a tree. For more information, visit genealogicalsorting.org.
HMMPfam
hmmpfam is part of the HMMER package. HMMER is an implementation of profile hidden Markov models (profile HMMs) for biological sequence analysis. Profile HMMs are statistical models of multiple sequence alignments. They capture position-specific information about how conserved each column of the alignment is, and which residues are likely. For more information, visit the HMMER web site.
IM
IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations. IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 GENETICS 158:885). Large numbers of loci can be studied simultaneously, and different mutation models can be used. For more information, visit the IM web site.
LAMARC
LAMARC is a package of programs for computing population parameters, such as population size, population growth rate and migration rates by using likelihoods for samples of data (sequences, microsatellites, and electrophoretic polymorphisms) from populations. It approximates the summation of likelihood over all possible gene genealogies that could explain the observed sample. For more information visit the LAMARC web site.
MARXAN
MARXAN is software that delivers decision support for reserve system design. MARXAN finds reasonably efficient solutions to the problem of selecting a system of spatially cohesive sites that meet a suite of biodiversity targets. Given reasonably uniform data on species, habitats and/or other relevant biodiversity features and surrogates for a number of planning units (as many as 20,000) MARXAN minimizes the cost (a weighted sum of area and boundary length, Possingham, Ball and Andelman 2001) while meeting user-defined biodiversity targets. For more information visit the MARXAN web site.
MDIV
MDIV is a program that will simultaneously estimate divergence times and migration rates between two populations under the infinite sites model or under a finite sites model. Here is the author's web interface to the program. Please note that this resource is not considered part of The Lattice Project.
Migrate-N
Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It is a maximum likelihood estimator and uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy. For more information visit the Migrate web site.
Modeltest
Modeltest is a program that assists in the evaluating the fit of a range of nucleotide substitution models to DNA sequence data through a hierarchical series of hypothesis tests. Two test statistics, likelihood ratio and Akaike information criterion (AIC), are provided to compare model pairs that differ in complexity. The program is used
together with PAUP, typically as part of data exploration prior to more extensive phylogenetic analysis.
MrBayes
MrBayes is a program for phylogenetic analysis of nucleotide or amino acid sequence data using a Bayesian approach. A Metropolis-coupled Markov Chain Monte Carlo (MCMCMC) algorithm is used with multiple chains, all but one of which is heated. The chains are used to sample model space through a process of parameter modification proposal and acceptance/rejection steps (also called cycles or generations). The heating raises the posterior probability by a factor, beta, which has the effect of increasing the magnitude of change between steps in the Markov Chain. After each cycle an exchange between a heated and unheated chain is evaluated similar to the other proposal and acceptance/rejection mechanism. The motivation for MCMCMC is to increase mixing. After the process becomes stationary the frequency with which parameter values are visited in the process represents an estimate of their underlying posterior probability. A choice of several commonly used likelihood models is available as are choices for starting tree (user-defined and random), data partitions (e.g., by codon position), and Markov Chain Monte Carlo parameters. For more information, visit the MrBayes web site.
ms
ms is an application which generates random independent sequence samples according to a simple Wright-Fisher neutral model. If invoked with a minimum of options, it produces samples under a panmictic, equilibrium model without recombination. By specifying various options on the command line the model can include recombination, island-model type structure, gene conversion and simple population size changes in the past. Visit the ms web site for more information and source distribution.
Muscle
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options is provided that give you the choice of optimizing accuracy, speed, or some compromise between the two. For more information, visit the Muscle web site.
PAUP*
Phylogenetic Analysis Using Parsimony (*and Other Methods) is a program for phylogenetic analysis using parsimony, maximum likelihood, and distance methods. The program features an extensive selection of analysis options and model choices, and accommodates DNA, RNA, protein and general data types. Among the many strengths of the program are the rich array of options for dealing with phylogenetic trees including importing, combining, comparing, constraining, rooting and testing hypotheses. For more information, visit the PAUP* web site.
Phyml
PHYML is a software implementing a fast and accurate heuristic for estimating maximum likelihood phylogenies from alignments of homologous sequences. Large DNA and protein sequences data sets can be analysed under a broad range of substitution models. Extensive simulations showed that the topological accuracy of PHYML compares favourably with that of other existing programs, while being much faster. Visit the Phyml web site for more information.
Pknots
PKNOTS implements a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. The implementation generates the optimal minimum energy structure for a single RNA sequence, using standard RNA folding thermodynamic parameters augmented by a few parameters describing the thermodynamic stability of pseudoknots. Although the time and memory demands of the algorithm are steep, it is believed to be the first algorithm to be able to fold optimal minimum energy pseudoknotted RNAs with the accepted RNA thermodynamic model. Visit the Pknots web site for more information.
Seq-Gen
Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution. Visit the Seq-Gen web site for more information and source distribution.
S_{nn}
A program that performs the "nearest neighbor test" to detect genetic differentiation. Given a matrix of pairwise differences between sampled sequences, S_{nn} can determine genetic differentiation among local sample groups. Visit the Permtest web site for more information and source distribution.
SSEARCH
SSEARCH does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it can be very slow. SSEARCH uses William Pearson's implementation of the method of Smith and Waterman (Advances in Applied Mathematics 2; 482-489 (1981)) to search for similarities between one sequence (the query) and any group of sequences of the same type (nucleic acid or protein) as the query sequence. It is available as part of the FASTA package.
Structure
Structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including microsatellites, RFLPs and SNPs. For more information, visit the Pritchard Lab web site.