Remerciez-le!

Remerciez @Admin pour avoir partagé cet document gratuitement, de la manière la plus simple, en partageant sur les réseaux sociaux.

Network 4.5.1.6. User Guide Version date: 31 December 2009 Copyright © 2009 Flu

Network 4.5.1.6. User Guide Version date: 31 December 2009 Copyright © 2009 Fluxus Technology Ltd. All rights reserved. Legal Disclaimer : This user guide shall not be interpreted as a warranty of any kind. Use of the software is subject to the terms under www.fluxus-engineering.com/network_terms.htm 2 Table of Contents 1. Overview ................................................................................................................................ 4 1.1 Scope of application............................................................................................................ 4 1.2 Network building options.................................................................................................... 4 1.3 Further complexity reduction options ................................................................................. 4 1.4 Complementary options ...................................................................................................... 4 2. Work Flow.............................................................................................................................. 5 2.1 Overview of the general work flow and the RM-MJ work flow.......................................... 5 2.1.1 Variable data ..................................................................................................................... 7 2.1.2 Preparation of variable data sets for Network................................................................... 8 2.1.3 Weights............................................................................................................................ 11 2.1.4 Frequency........................................................................................................................ 15 2.1.5 Epsilon (in MJ), Connection Cost / Greedy FHP (in MJ)............................................... 16 2.1.6 Reduction threshold r and out file option (in RM network option) ................................ 19 2.1.7 MP option to clean up networks...................................................................................... 21 2.1.8 Star Contraction option: Use for network simplification, or for identification of population expansion events........................................................................................... 23 2.1.9 "Frequency>1" Criterion for networks with large number of taxa ................................. 25 2.1.10 RM-MJ network calculation for reduced complexity .................................................. 26 2.2 DNA nucleotide sequence data .......................................................................................... 27 2.2.1 Data entry........................................................................................................................ 27 2.2.2 Initial analyses using the MJ option................................................................................ 28 2.2.3 Discussing, analysing, and interpreting network results (MJ and RM) .......................... 29 2.2.4 Graphical layout of results .............................................................................................. 31 2.2.4.1 Node and pie chart colouring in Network Publisher 1.1.0.6........................................ 32 2.2.5 Verification using the RM option.................................................................................... 34 2.3 RNA nucleotide sequence data .......................................................................................... 36 2.3.1 Data entry........................................................................................................................ 36 2.4 Amino acid nucleotide sequence data ................................................................................ 37 2.4.1 Data entry........................................................................................................................ 37 2.4.2 Network calculation, analysis, interpretation, and graphics............................................ 38 2.5 STR data (short tandem repeat, microsatellite data) .......................................................... 39 2.5.1 Data entry........................................................................................................................ 39 2.5.2 Network calculation, analysis, interpretation, and graphics............................................ 40 2.6 Endonuclease data (RFLP, restriction fragment length data) ............................................ 41 2.6.1 Data entry........................................................................................................................ 41 2.6.2 Network calculation, analysis, interpretation, and graphics............................................ 42 3 2.7 Binary data ......................................................................................................................... 43 2.7.1 Data entry........................................................................................................................ 43 2.7.2 Network calculation, analysis, interpretation, and graphics............................................ 43 2.8 Time estimates.................................................................................................................... 44 2.8.1 Calibration of network mutation rate to a known event.................................................. 44 2.8.2 Age estimation of a node in the network......................................................................... 46 3. Software Limits in Network 4.5.1.6..................................................................................... 48 4. Network 4.5.1.6.: Present and Future................................................................................... 49 5. Feedback: Bug Reports and Enhancement Requests ........................................................... 50 6. Updates to Network 4.5.1.6 User Guide......................................................................... 51 7. Updates to Network 4.5.1.0. User Guide (compared to Network 4.5.0.1 User Guide of 24 June 2008)................................................................................................... 51 8. Updates to Network 4.5.0.1 User Guide (compared to Network 4.5.0.0 User Guide of 31 December 2007) .................................................................................................... 51 9. Updates to Network 4.5.0.0 User Guide (compared to Network 4.2.0.1 User Guide of 19 September 2007).................................................................................................... 52 10. Updates to Network 4.2.0.1 User Guide (compared to 3 April 2007)............................ 52 4 1. Overview 1.1 Scope of application Network is used to reconstruct phylogenetic networks and trees, infer ancestral types and potential types, evolutionary branchings and variants, and to estimate datings. The algorithms are designed for non-recombining bio-molecules. Successful applications include mtDNA, Y-STR, amino acid, RNA, virus DNA, bacterium DNA, some effectively non-recombining autosomal DNA, and non-biomolecule data such as linguistic data. By contrast, recombining bio-molecules will deliver high-dimensional networks which will be difficult to interpret. Work flow including data preparation and interpretation of results is described in detail in the next chapters. 1.2 Network building options The Network software was developed to reconstruct all possible shortest least complex phylogenetic trees (all maximum parsimony or MP trees) from a given data set. Two different network-building options are included which can be used independently of each other. The reduced median or RM network algorithm RM requires binary data (example: at nucleotide position 16092 each taxon must have either T or C). To allow interpretation of complex data, a reduction parameter is available. If the reduction threshold r is set to a sufficiently high number, RM will yield a full median network containing all MP trees. The median-joining or MJ network algorithm allows multi-state data (example: at nucleotide position 16092 there can be A, C, G, T, and ambiguities such as N). For larger data sizes, the parameter epsilon can be set low to calculate sparse networks quickly, or incrementally increased to calculate higher-resolution networks at the cost of longer run times and increased network complexity. If epsilon is set to a sufficiently high number, MJ will yield a full median network. We recommend MJ for general use as first choice. If verification of the MJ results is an issue, we recommend that RM is then also run on suitably prepared data (nucleotide FASTA data are easily prepared with the DNA Alignment software). 1.3 Further complexity reduction options The star contraction option can simplify complex data. The MP option deletes non-MP links from the network, i.e. links which are not used by the shortest trees in the network. For STR data or RFLP data, or binary data, a combined RM-MJ calculation may be performed to simplify the network. 1.4 Complementary options Network includes a data editor and a graphics program. FASTA files can be imported and prepared for Network using Fluxus' DNA Alignment software. Higher-quality graphics of Network's results files can be prepared using Fluxus' Network Publisher software. 5 2. Work Flow 2.1 Overview of the general work flow and the RM-MJ work flow Fig. 1a: General overview of the work flow Prepare your variable data / DNA Alignment Network will ignore loci (e.g. nucleotide positions) which are invariant throughout your data set. Calculate Network MJ method recommended for new users Weights: default ( = 10) epsilon = 0 (for MJ) or r = 2 (for RM) Draw Network rdf or other format (ami, ych, tor, nex, phy) out clean, tree-like Re-Calculate Network : Change epsilon to 10, 20, 30 etc. (for MJ) to explore whether and how the network changes. (For RM change r to 3, 4, 5 etc.) Re-Calculate Network : Change weights (see detailed notes). If this also leads to poor networks, use only those taxa which contain >1 individuals high-dimensional cubes or large cyles when exploring finished MP Option * Purge superfluous links and median vectors from network. sto Draw Network / Network Publisher To lay out final network graphics in high quality wmf or emf or bmp or pdf Import emf picture into MS Powerpoint or wmf picture into publication/layout software. MP Option * * kill MP, if too long run time MP Option * sto or out out 6 Fig. 1b: Specific work flow for the RM-MJ network calculation Prepare your binary variable data / DNA Alignment Network will ignore loci (e.g. nucleotide positions) which are invariant throughout your data set. Calculate RM-MJ Network Run RM (switch off out file generation), then run MJ on the rmf file. Weights: default ( = 10) r = 2, no out-file (for RM) and epsilon = 0 (for MJ) Draw Network rdf (binary rdf only), ych, tor out clean, tree-like Re-Calculate RM-MJ Network : Change epsilon to 10, 20, 30 etc. (for MJ) to explore whether and how the network changes. Re-Calculate RM-MJ Network : Change weights (see detailed notes). If this also leads to poor networks, use only those taxa which contain >1 individuals high-dimensional cubes or large cyles when exploring finished MP Option * Purge superfluous links and median vectors from network. sto Draw Network / Network Publisher To lay out final network graphics in high quality wmf or emf or bmp or pdf Import emf picture into MS Powerpoint or wmf picture into publication/layout software. MP Option * * kill MP, if too long run time MP Option * sto or out out 7 2.1.1 Variable data Network will use only the variable data from your data file or manually entered data set. Network will ignore non-variable data if your file or manually entered data contains such data. What do we mean by variable data? Definition of variable data: By variable data we mean a genetic nucleotide position, or a genetic locus, or a trait, or a linguistic feature, or more generally a "character", which allows you to separate your individuals into at least two groups. Example 1, variable data: You have an mtDNA data set, and your sequencing range included nucleotide position 16092 for all individuals. In your data, some individuals are C, others are T at np 16092. This means that nucleotide position 16092 holds variable data (for your set of data). 16091 16092 16093 16094 16095 Alice T C G A G Brenda T T C A C Chris G T G T G Doug T C G T C Example 2, some non-variable data: All individuals in your data set have C at np 16092. So nucleotide position 16092 is useless for differentiating between the individuals in your data set. This means that np 16092 holds non-variable data for your set of data. You can leave away np 16092. You only need to enter nps 16091, 16093-16095. 16091 16092 16093 16094 16095 Alice T C G A G Bruce T C C A C Clarissa G C uploads/Finance/ network-4-5-1-6-user-guide.pdf