-- Version history --

4.3   1. Icarus updates:
       - predicted genes (see --gene-finding option) are now displayed in contigs (on both
         Contig Size and Contig Alignment Viewers);
       - zoom buttons added to read coverage panels insteads of log/normal scale switch;
       - small fixes for better performance on low resolution screens.

      2. setup.py added for simplifying QUAST installation process.

      3. Heavy output files (lists of SNPs, GFF with predicted genes) are now compressed
         with gzip. This default behaviour may be cancelled by using --no-gzip option.

      4. Automatic suggestion to use --scaffolds option if assembly contains long continuous
         fragments of N's.

      5. Embedded samtools replaced with sambamba which is significantly faster.

      6. Embedded bowtie2 replaced with BWA-MEM which is more accurate.

      7. E-MEM is also working under OS X now (in addition to Linux).

      8. Removed limitation of not moving QUAST installation dir after the first use.

      9. Fixed bug causing creation of error.log file in current working directory.

      10. Fixed improper Duplication ratio calculation in some MetaQUAST runs.

      11. Requests to NCBI switched to HTTPS which will be mandatory after September 30, 2016.

      12. SILVA database release updated from 119 to 123 (downloaded on the first use).

      13. Fixed several minor bugs.


4.2   1. Icarus update -- improved read coverage panel: 
       - physical coverage added (the coverage of the reference by the paired-end fragments,
         counting the reads and the gap between the paired-end reads as covered);
       - normal/log scale switcher added;
       - samtools max coverage depth increased, so now even very large coverages are shown.

      2. Icarus update -- search window added:
       - searching contigs and genes/operons;
       - tooltip shows possible suggestions as one types.

      3. Icarus update -- Contig Alignment Viewer changes:
       - coordinates are counted for each chromosome separately;
       - all good alignments of ambiguous contigs are shown if --ambiguity-usage=all.

      4. Read coverage histograms added for assemblies with SPAdes/Velvet-like contig 
         naming style (i.e. all names are ..._length_X_cov_Y_...).

      5. Several new options added:
       - "--references-list" for specifying list with reference names to be searched 
         and downloaded from NCBI (MetaQUAST only);
       - "--min-identity" for setting threshold on minimal IDY% of alignments (Nucmer 
         parameter, default is 95%);
       - "--ambiguity-score" for choosing how close good ambiguous alignments should be to
         the best one (default is 0.99);
       - "--sam" and "--bam" for specifying SAM and BAM files instead of/in addition to 
         raw reads. 

      6. Several new metrics added:
       - Total aligned length (total size of contig fragments aligned to reference);
       - Icarus similarity statistics (in HTML report only, at least 2 assemblies are
         needed).

      7. MetaQUAST updates:
       - requests to NCBI database in no-reference mode are now repeated if error occurs;
       - Krona updated to v2.6, total aligned length is used for building Krona charts 
         (instead of Total length in previous versions);
       - possibly interspecies translocations are not reported if unaligned parts consist 
         mostly of Ns;
       - Total length and Largest contigs substituted with their "aligned" version in 
         summary HTML report (i.e. calculated for contig aligned fragments rather than 
         original contigs).

      8. HTML report updated: 
       - Genome statistics are moved to the top, Statistics without reference are moved to
         the bottom; 
       - reference line added to GC content plot.

      9. GeneMark now outputs nucleotide and protein sequences for all found genes.

      10. Best set selection improved (misassembly detection algorithm in case of 
         multiple ambiguous mappings of contig fragments). Now BSS can find not only
         the very best set but all sets with scores close to the best one (controlled by 
         --ambiguity-usage and --ambiguity-score options). Also several minor bugfixes
         were made and BSS became deterministic in case of equivalent scores.

      11. Embedded E-MEM aligner updated to version 2.

      12. Significant refactoring of the source code. Most changes are in Icarus, 
          Contigs Analyzer, QUAST and MetaQUAST running scripts.

      13. Fixed several minor bugs.

      14. Icarus citation updated.


4.1   1. Icarus update -- new visualization of contigs in both viewers:
       - all blocks are drawn with transparency, so overlaps are more visible;
       - controls for switching between overlapped blocks are added to Contig info panel;
       - y-coordinates offsets and shades of colors for neighbouring contigs are removed.

      2. Icarus update -- integration of Contig Size Viewer with Contig Alignment Viewer
         (if reference genome is available and thus alignment information is present):
       - contigs are colored according to their mapping on the reference (correct,
         misassembled, unaligned);
       - positions of the breakpoints in the misassembled contigs are shown;
       - full scale Contig info is now available with links to Contig Alignment Viewer;

      3. Nucmer aligner is replaced with significantly faster E-MEM (for Linux only).

      4. Significant speed up of misassembly detection algorithm in case of complicated
         genomes (with many short repeats).

      5. New option added:
       - "--no-sv" for skipping structural variants calling and processing. Make sense
         only if reads are specified: in this case they will be used for building Icarus
         read coverage histograms only.

      6. Embedded Manta updated to version 0.29.6.

      7. Reference line on cumulative plot is redesigned to be more consistent with
         assemblies lines (starts at 0, depends on number of chromosomes/plasmids).

      8. Fixed several minor bugs.


4.0   1. Icarus added! Icarus stands for Interactive Contig Assessment browseR.
         Icarus complements QUAST output with interactive visualisations of assembly
         alignments to the reference genome, and all assembly features detected by 
         QUAST (e.g. misassemblies, genes). Icarus generate two types of viewers:
       - Contig alignment viewer (available if a reference genome is provided). Shows
         contig alignments to the reference, misassemblies, similarities between assemblies,
         genome annotations (if genes/operons are provided), read coverage along the genome
         (if reads are provided).
       - Contig size viewer. Draws contigs ordered from largest to smallest. Allows hiding
         of short contigs. Explicitly shows contigs of length N50, N75 (and NG50, NG75).

      2. Improved misassembly detection algorithm in case of multiple ambiguous mappings of
         contig fragments. The algorithm identifies best set of non-overlapping contig fragments
         mappings. The set minimises possible number of misassemblies, i.e. this algorithm 
         reduces number of false negatives (erroneously detected misassemblies).

      3. Several new options added:
       - "--significant-part-size" for setting threshold on detecting partially unaligned 
         contigs with both significant aligned and unaligned parts (default: 500bp as earlier).
       - "--fragmented" for notifying QUAST about fragmented reference genome (e.g. scaffold
         reference). QUAST will try to detect misassemblies caused by the fragmentation and
         mark them "fake".
       - "--unique-mapping" for forcing -a=one in QUAST run on the combined reference (MetaQUAST only).
       - "--sv-bedpe" for specifying BEDPE file with structural variations. Note that QUAST may
         create such BEDPE automatically using Manta SV caller if reads are specified.
         However, it is rather slow because include reads alignment to the reference.

      4. Fixed bug with skipping "broken" scaffolds in per reference runs of MetaQUAST.

      5. Slightly rearranged MetaQUAST summary HTML report. New order is: statistics,
         plots, table with links to per reference runs (the table was in beginning previously). 

      6. Fixed several minor bugs.
 
      7. Embedded samtools updated to version 1.3.

      8. Citation for MetaQUAST paper updated, citation for Icarus paper added.


3.2   1. The tool now accepts raw reads for improving quality assessment (experimental
      feature, try with care):
       - reads should be provided with --reads1 (or -1), --reads2 (or -2) options,
       - reads are aligned to reference genome using bowtie2 (embedded),
       - Manta structural variation (SV) calling tool (embedded) is run on bowtie2 output,
       - found SVs are used for classifying QUAST misassemblies into true ones and fake
         ones (caused by structural differences between reference sequence and
         sequenced organism). Fake misassemblies are excluded from "# misassemblies" metric
         and reported in novel "# structural variants" metric.

      2. HTML reports content reformatted, especially in MetaQUAST reports:
       - GC % and all metrics based on genome length (NG50, NGA50, LGA75, etc) are excluded
         from the combined reference statistics (they don't make sense there);
       - N50 is replaced with more fair and comparable metrics such as "Total length >= 1000 bp",
         "Total length >= 10000 bp" in the main MetaQUAST report. Extended version still has N50.
       - N50 is also hidden in single-genome QUAST reports and exposed NGA50 instead.

      3. Scaffold gap size misassemblies are introduced. They are reported only when --scaffolds
         is used.

      4. Several new options added:
       - "--memory-efficient" for running QUAST with minimal memory consumption (but significantly slower)
       - "--test-sv" for testing structural variants mode (see 1.)
       - "--silent" for minimal output to stdout (full verbose output is saved in the logs anyway)

      5. MetaQUAST output directory content is reformatted. Per reference reports are saved
         inside the directory <output_dir>/runs_per_reference, summary reports are under summary directory.
         The only top-level files are metaquast.log and report.html (summary HTML report with links to all subreports).

      6. Colors in HTML reports and PDF/PNG plots are synchronized, now they are the same
         for the same assemblies.

      7. Additional check for Matplotlib v1.1 (needed for drawing PDF/PNG plots since QUAST v3.1)

      8. Fixed several minor and major bugs.

      9. Citation for MetaQUAST paper added.


3.1   1. MetaQUAST:
       - more specific algorithm for reference searching and downloading, particularly
         only Bacteria and Archaea are downloaded;
       - Krona charts are added for showing taxonomic profile based on found references;
       - metric-level and misassemblies plots are added to summary HTML report;
       - better structure for text reports and plots in summary folder.

      2. Significantly reduced size of the installation package. This is done by removing
         BLAST binaries which are needed only for MetaQUAST run without references. On the
         first such run, BLAST binary for target OS will be automatically downloaded.

      3. Heatmaps added to HTML reports.

      4. Separate install.sh and more complex install_full.sh scripts for installing regular
         versions of QUAST and MetaQUAST or extended one (with ability to run MetaQUAST without
         reference genomes).

      5. New option "--plots-format" for selecting output format for plots (PDF by default).

      6. Default number of threads changed from 100% CPUs to 25% of CPUs.

      7. Changes in one-letter options, including replacing confusing -T to -t for --threads
         and -M to -m for --min-contig.

      8. Skipping broken version of scaffolds from analysis if they are equal to original
         ones (see --scaffold option for details).

      9. Fixed several minor bugs.


3.0   1. Significant changes in MetaQUAST functionality:
       - if no references are provided, MetaQUAST downloads references from NCBI database
         based on best hits of assemblies alignments vs SILVA 16S rRNA database 
         (included in QUAST package);
       - multiple summarising reports are added: plots and text tables for each metric 
         (all assemblies vs all references in one file), histograms of # misassemblies 
         per reference for all assemblies separately, summary HTML-report with all metrics 
         per each assembly and each reference (expandable lines);
       - new metrics: interspecies translocations and # possibly misassembled contigs;
       - fixed handling of reference with more than one entry in FASTA file 
         (chromosome plus plasmids or multiple chromosomes);
       - option --reference now accepts directory (takes all references from this dir);
       - fixed handling of closely related species by using --ambiguity-usage 'all' on 
         combined reference run.
       
      2. Speed ups:
       - 5x-100x speed up in case of running QUAST without reference;
       - processing of large (> 50 Mbp) multi-chromosome references in parallel (per chromosome);
       - new speed up options: --no-check, --no-gc, --no-snps, --fast (a combination of other).

      3. More reports about misassemblies (in <output_dir>/contigs_reports/):
       - misassemblies_plot.pdf with histogram of misassembly types distribution per assembly.
       - contigs_report_<assembly_name>.mis_contig.info with brief details about misassembled 
         contigs only. 

      4. Improved and updated misassemblies detection algorithm:
       - more accurate algorithm for processing multiple ambiguous alignments;
       - using --ambiguity-usage value for processing internal overlaps between adjacent aligned
         blocks of a misassembled contig;
       - marking of local misassemblies with small inconsistency (<= 85 bp) as fake misassemblies
         (indels or mismatches) if gap on the contig is filled mostly with Ns;
       - option --extensive-mis-size/-x to set extensive misassembly size, i.e. min inconsistency
         size for relocations (default is 1000 as earlier);
       - option --min-alignment/-i to set minimum alignment length (Nucmer's parameter), default
         is 0.

      5. Fixed HTML-reports issues:
       - Y-axis coordinates on plots interactive hangover tooltips;
       - NAx plot small bug.

      6. GeneMark-ES is used for predicting genes in eukaryotic genomes instead of Glimmer-HMM.
      For using Glimmer-HMM, a new option --glimmer added.

      7. GeneMarkS incorporation fixed. Previously it was run on predefined heuristic models based
      on GC content. Now it uses self-training module for getting the correct model.
      
      8. Fixed several minor bugs.

      9. GeneMark licenses updated.

      10. Updated LICENSE (third-party tools details) and Manual.


2.3   1. Changed logic in misassembly computation. Fixed several minor bugs in misassembly
      detection algorithm and one major bug caused by linear representation of circular
      references and contigs in fasta format.

      2. Added contig alignment plots. See details in manual and in the QUAST paper (Fig. 1)

      3. Genome analyzer module (computation of genome fraction, duplication ratio,
      number of genes and operons) is parallelized.

      4. Option --test became an installation util analogue. It compiles all required binaries and 
      checks correctness of QUAST and MetaQUAST execution on test datasets.

      5. Former plots.pdf upgraded with report tables and renamed to report.pdf. Now it is 
      a file with all tables and plots generated by QUAST.

      6. A new option --no-plots added for speeding up computation if plots are not needed.
    
      7. GeneMark license updated, instructions for manual updating added.

      8. Generation of misleading single-columns histograms removed (when only single assembly
      file was specified).

      9. More error and exception handlers added.

      10. Fixed bug with indel counting (caused slightly overestimated indels rate in some cases).
      
      11. Fixed several minor bugs.
       
      12. Code refactored.


2.2   1. The tool now supports metagenomic assemblies. It accepts multiple references
      and produces several reports:
       - for all contigs and all input genomes merged into one,
       - separate reports for only contigs aligned to a particular genome,
       - for the contigs not aligned to any reference provided.

      Usage:
         metaquast.py contigs_1 contigs_2 ... -R reference_1,reference_2,reference_3,...

      All other options for metaquast.py are the same as for quast.py.

      2. MetaGeneMark is used to find genes in metagenomic assemblies.
      In metaquast.py by default, in quast.py with --meta option.

      3. In place of --allow-ambiguity, a new option --ambiguity-usage (-a) introduced.
      The new option lets specify a way to process ambiguous regions:
      -a one, -a all or -a none.

      4. A new option --labels (or -l) allows to provide human-readable assembly names.
      Those names will be used in reports, plots and logs, instead of file names.
      For example:
         -l SPAdes,IDBA-UD

      if your labels include spaces, use quotes:
         -l "SPAdes 2.5, SPAdes 2.4, IDBA-UD"

         -l SPAdes,"Assembly 2",Assembly3

      5. Minor improvements of HTML reports.

      6. Fixed bugs in misassemblies detection algorithm.


2.1   Option --strict-NA added to control computation of NAx/NGAx metrics. 
      This option forces QUAST to break contigs by any misassembly event, 
      including local misassemblies (like in v.2.0). By default, QUAST v.2.1
      breaks contigs only by extensive misassemblies to compute NAx/NGAx 
      (like in v.1.*).

      Improvement of indels computation. QUAST now counts consecutive single nucleotide
      indels as one indel. Total length of all indels is also reported (equal to
      # indels metric evaluated with previous versions). Short (<= 5 bp) and long (>5 bp)
      indels are reported.

      Option --est-ref-size added to set estimated reference size for computing NGx 
      metrics in case a reference genome is not available.

      GAGE mode is parallelized.

      Fixed bugs in misassemblies detection algorithm.

      Fixed bugs in SNPs detection algorithm.

      Fixed bugs in processing circular chromosomes (affects Genome fraction, # genes,
      # operons).

      Fixed several minor bugs.


2.0   Significantly improved assessment of large genomes. Current limit on size of a
      reference genome is 536 Mbp PER CHROMOSOME instead of 536 Mbp TOTAL in the
      previous versions. Alignment to different chromosomes is performed in parallel.

      Changes in algorithm for evaluating Genome fraction, # genes and operons.
      Filtration of short, ambiguous, and redundant alignments is performed before
      the evaluation. Option --use-all-alignments is added for compatibility
      with 1.* versions.

      New algorithm for finding SNPs and indels.

      Ability to change colors, line styles, etc. in plots and content, metric names
      in reports.

      GlimmerHMM for predicting genes in eukaryotes.

      Gene Finding is parallelized and its run is controlled by --gene-finding option.

      Improvement of HTML-reports and plotting units.

      Fixed several bugs.

1.3   QUAST is now a multi-threaded tool: the most time-consuming step (alignment to
      a reference genome) is computed in parallel.

      A MacOS version of GeneMark.

      Significantly improved HTML-reports.

      More informative error messages.

      A simple logic for evaluating scaffolds.

      New metrics: duplication ration and largest alignment.

      More careful counting of misassemblies.

      Min contig threshold changed from 200 to 500.

      Fixed several bugs.


1.2   Indels and N's counting.

      More detailed statistics on misassemblies (classification in inversions,
      relocations, translocations, local misassemblies).

      More detailed statistics on partially unaligned contigs.

      Text reports now also available in LaTeX format.

      Python 2.5 now supported.

      Fixed bug in reading genes annotations in GFF and NCBI formats.

      QUAST now can be rerun on existing Nucmer alignments files.


1.1   Mismatches counting.

      Fixed bug in misassemblies counting (some inversions were omitted).

      GC content plot is logarithmically scaled.

      ORFs are not counted, GeneMark added instead (for gene finding, only on Linux).

      Nucmer aligner setting changed (from IDY% = 80 to 95,
      i.e. now all alignments are more robust).


1.0   Initial open source release!
