Commands

CCT includes several scripts that can be used to download sequences and prepare maps.

Contents

cgview_comparison_tool.pl

Map creation with the cgview_comparison_tool.pl script and it's wrappers build_blast_atlas.sh and build_blast_atlas_all_vs_all.sh is a two step process: creating the project and building the maps.

Creating a New CCT Project

A new CCT project is created with the '-p' option and the name of the project:

cgview_comparison_tool.pl -p my_project

This will create a new CCT project directory with the following structure:

CCT New Project Directory Tree

my_project/
├── reference_genome              # Place reference genome here
├── comparison_genomes            # Place comparison genomes here
├── analysis                      # Place analysis GFF files here
├── features                      # Place feature GFF files here
├── blast                         # BLAST results are created here
├── maps                          # Maps will be created here
├── project_settings.conf         # Alter to modify project
├── cgview_xml_builder.pl         # Script that generates CGview XML file
└── log.txt                       # Log of project progress

Generating Maps

After adding reference and comparison genomes and editing the project_settings.conf file (see Customizing CCT maps), simply rerun the command to generate maps.

cgview_comparison_tool.pl -p my_project

Map creation proceeds with the following steps (depending on the number and size of sequences to compare, this can take minutes or hours to complete):

Rerunning CCT

After maps have been created, CCT can be run again to make changes. There are several ways to rerun CCT:

Documentation

USAGE:
   perl cgview_comparison_tool.pl -p DIR [options]

DESCRIPTION:
   Run this command once to generate a project directory. After the project is
   created place a reference genome in the reference_genome directory and any
   genomes to compare with the reference in the comparison_genomes directory.

   [Optional] Make changes to the project_settings.conf file to configure how
   the maps will be drawn. Add additional GFF files to the features and
   analysis directories.

   Draw maps by running this command again with the '-p' option pointing to
   the project directory.

REQUIRED ARGUMENTS:
   -p, --project DIR
      If no project exists yet, a blank project directory will be created.
      If the project exists, maps will be created.

OPTIONAL ARGUMENTS:

   -s, --settings FILE
      The settings file. If none is provided, the default settings file will be
      copied from \$CCT_HOME/conf/project_settings.conf to the project
      directory.
   -g, --config FILE
      The configuration file. The default is to use the
      \$CCT_HOME/conf/global_settings.conf file.
   -z, --map_size STRING
      Size of custom maps to create. For quickly generating new map sizes, use
      this option with the --start_at_xml option. Possible sizes include
      small/medium/large/x-large or a combination separated by commas (e.g.
      small,large). The size(s) provided will override the size(s) in the
      configuration files.
   -x, --start_at_xml
      Jump to XML generation. Skips performing BLAST, which can
      speed map generation if BLAST has already been done. This option is for
      creating new maps after making changes to the .conf files. Note that
      any changes in the .conf files related to BLAST will be ignored.
   -r, --start_at_map
      Start at map generation. Skips performing BLAST and
      generating XML. Useful if manual changes to the XML files have 
      been made or if creating new map sizes (see --map_size).
   -f, --map_prefix STRING
      Prefix to be appended to map names (Default is to add no additional
      prefix).
   -b, --max_blast_comparisons INTEGER
      The maximum number of BLAST results sets to be passed to the XML
      creation phase (Default is 100).
   -t, --sort_blast_tracks
      Sort BLAST results such that genomes with highest similarity are plotted
      first.
   --cct
      Colour BLAST results based on percent identity of hit instead of by
      source genome, and ignore 'use_opacity' setting in configuration file.
   -m, --memory STRING
      Memory string to pass to Java's '-Xmx' option (Default is 1500m).
   -c, --custom STRINGS
      Settings used to customize the appearance of the map.
   -h, --help
      Provide list of arguments and exit.

EXAMPLE: 
   perl cgview_comparison_tool.pl -p my_project -b 50 -t \
     --custom tickLength=20 labelFontSize=15 --map_size medium,x-large

build_blast_atlas.sh

The build_blast_atlas.sh script is a wrapper for cgview_comparison_tool.pl that automatically creates maps for nucleotide (blastn) comparisons and translated coding sequence (blastp) comparisons. It also generates multiple maps for each comparison type, differing in terms of size and detail. Like cgview_comparison_tool.pl, it involves two steps: creating the project and building the maps.

Creating a New BLAST Atlas Project

There are several ways to create a new project:

Generating Maps

After adding reference and comparison genomes and editing the project_settings_cds_vs_cds.conf and project_settings_dna_vs_dna.conf files (see Customizing CCT maps), simple rerun the command, providing the '-p' option and the path to the project directory:

build_blast_atlas.sh -p my_project

Map creation proceeds with the following steps (depending on the number and size of sequences to compare, this can take minutes or hours to complete):

Rerunning build_blast_atlas.sh

After maps have been created, build_blast_atlas.sh can be run again to make changes. There are several ways to do this and they are the same as outlined for CCT.

Documentation

USAGE:
   build_blast_atlas.sh -i FILE [-p DIR] [Options]
   build_blast_atlas.sh -p DIR [Options]

DESCRIPTION:
   This command is used to first create a BLAST atlas project directory and
   then again to generate maps.  Run this command with the '-i' option and a
   GenBank file to create a new project using the GenBank file as the reference
   genome. Alternatively, a blank project can be created using the '-p'
   option, in which case a reference GenBank file will have to be placed in the
   reference_genomes directory. After the project has been created, place the
   genomes to compare with the reference in the comparison_genomes directory.

   [Optional] Make changes to the *.conf files to configure how the maps will
   be drawn. Add additional GFF files to the features and analysis directories.

   Draw maps by running this command again with the '-p' option pointing to the
   project directory.

REQUIRED ARGUMENTS:
   -i, --input FILE
      Sequence file in GenBank format, with a .gbk extension. This option is
      only required when first creating a BLAST atlas project. The project
      directory will be named after this file unless the '-p' option is
      provided.
   -p, --project DIR
      Initiates map creation for the project. If no project exists yet, a blank
      project will be created. When used with the '-i' option, this will be
      where the project is created. This option is only required when creating
      the BLAST atlas maps.

OPTIONAL ARGUMENTS:
   -m, --memory STRING
      Memory value for Java's -Xmx option (Default: 1500m).
   -c, --custom STRING
      Custom settings for map creation.
   -b, --max_blast_comparisons INTEGER
      Maximum number of comparison genomes to display (Default: 100).
   -z, --map_size STRING
      Size of custom maps to create. For quickly regenerating new map sizes,
      use this option with the --start_at_xml option. Possible sizes include
      small/medium/large/x-large or a combination separated by commas (e.g.
      small,large). The size(s) provided will override the size(s) in the
      configuration files.
   -x, --start_at_xml
      Jump to XML generation. Skips performing BLAST, which can speed map
      generation if BLAST has already been done. This option is for creating
      new maps after making changes to the .conf files.  Note that any changes
      in the .conf files related to BLAST will be ignored. This option will be
      ignored if the --start_at_map option is also provided.  
   -r, --start_at_map
      Start at map generation. Skips performing BLAST and
      generating XML. Useful if manual changes to the XML files have 
      been made or if creating new map sizes (see --map_size).
   -h, --help
      Show this message.

NOTE:
   This script will likely not work if there are spaces in the path to the
   project directory because the NCBI tool 'formatdb' cannot handle such
   paths.

build_blast_atlas_all_vs_all.sh

The build_blast_atlas_all_vs_all.sh wrapper script generates several CCT projects automatically, and then combines the results into a single montage map. The montage consists of a separate map for each sequence of interest, allowing each sequence in a group of sequences to be visualized as the reference sequence. Like cgview_comparison_tool.pl, using this script involves two steps: creating the project and building the maps.

Note that this script requires ImageMagick to be installed.

Creating a New BLAST Atlas All vs All Project

A new project is created with the '-p' option and the name of the project:

build_blast_atlas_all_vs_all.sh -p my_project

This will create a new project directory with the following structure:

BLAST Atlas All vs All New Project Directory Tree

my_project/
├── comparison_genomes                  # Place comparison genomes here
├── cct_projects                        # Will contain CCT sub project for each comparison
└── project_settings_multi.conf         # Alter to modify project

Generating Maps

After adding comparison genomes and editing the project_settings_multi.conf file (see Customizing CCT maps), simply rerun the command, providing the '-p' option and the path to the project directory:

build_blast_atlas_all_vs_all.sh -p my_project

The creation of a montage map is very time consuming, as a separate CCT map is first created for each sequence.

Map creation proceeds with the following steps:

Rerunning build_blast_atlas_all_vs_all.sh

After maps have been created, build_blast_atlas_all_vs_all.sh can be run again to make changes.

Documentation

USAGE:
   build_blast_atlas_all_vs_all.sh -p DIR [Options]

DESCRIPTION:
   This script generates several CCT projects automatically, and then it
   combines the results into a single montage map. The montage consists
   of a separate map for each sequence of interest. This allows each sequence
   in a group of sequences to be visualized as the reference sequence.

   This command is used to first create a BLAST atlas all vs all project
   directory and then again to generate the montage. After the project has
   been created, place the genomes to compare in the comparison_genomes
   directory.

   [Optional] Make changes to the project_settings_multi.conf file to
   configure how the maps will be drawn. Add additional GFF files to the
   features and analysis directories.

   Draw maps by running this command again with the '-p' option pointing to the
   project directory.

REQUIRED ARGUMENTS:
   -p, --project DIR
      If no project exists yet, creates a new project directory. Otherwise,
      initiates map creation for the project.

OPTIONAL ARGUMENTS:
   -m, --memory STRING
      Memory value for Java's -Xmx option (Default: 1500m).
   -c, --custom STRING
      Custom settings for map creation.
   -b, --max_blast_comparisons INTEGER
      Maximum number of comparison genomes to display (Default: 100).
   -z, --map_size STRING
      Size of custom maps to create. For quickly regenerating new map sizes,
      use this option with the --start_at_xml option. Possible sizes include
      small/medium/large/x-large or a combination separated by commas (e.g.
      small,large). The size(s) provided will override the size(s) in the
      configuration files.
   -x, --start_at_xml
      Jump to XML generation. Skips performing blast, which can
      speed map generation if blast has already been done. This option is for
      creating new maps after making changes to the .conf files or if creating
      new map sizes (see --map_size). Note that any changes in the .conf files
      related to blast will be ignored. This option will be ignored if
      the --start_at_map or --start_at_montage option is also provided.  
   -r, --start_at_map
      Start at map generation. Skips performing blast and
      generating XML. Useful if manual changes to the XML files have 
      been made. This option will be ignored if the --start_at_montage
      option is also provided.
   -g, --start_at_montage
      Start at montage generation. Skips creating the individual maps.
      Useful if changing how many columns the montage should have.
   -y, --columns INTEGER
      The number of columns to use in the montage image (Default: 4). If the
      maps have already been drawn once, it is best to use this option with
      the --start_at_montage option.
   -h, --help
      Show this message.

NOTES:
   This script will likely not work if there are spaces in the path to the
   project directory because the NCBI tool 'formatdb' cannot handle such
   paths.

convert_vcf_to_features.pl

To visualize the positions of SNPs or other variants described in a VCF file use the convert_vcf_to_features.pl script, as in the following example:

perl convert_vcf_to_features.pl -i variants.vcf -o variants.gff

Where 'variants.vcf' is the name of the VCF file. The resulting .gff file can be placed in the 'features' directory inside a CCT project directory.

Documentation

convert_vcf_to_features.pl - convert VCF file to tab-delimited file for
the CGView Comparison Tool.

DISPLAY HELP AND EXIT:

usage:

  perl convert_vcf_to_features.pl -help

CONVERT VCF TO TAB-DELIMITED

usage:

  perl convert_vcf_to_features.pl -i <file> -o <file>

required arguments:

-i - Input file in VCF format.

-o - Output file in tab-delimited format for CGView Comparison Tool. This name
will have the chromosome name as read from the VCF file added to the end,
before the file extension if one is present. If multiple chromosomes are
present in the VCF file then multiple output files will be generated, each with
a different suffix.

example usage:

  perl convert_vcf_to_features.pl -i input.vcf -o output.gff

create_zoomed_maps.sh

To add zoomed versions of maps to a completed CCT project, use the create_zoomed_maps.sh script, as in the following example:

create_zoomed_maps.sh -p my_project -c 10000 -z 10 -format svg

Where 'my_project' is the name of the CCT project directory, '10000' is the nucleotide position to center the map on, '10' is the zoom multiplier to use when generating the map, and 'svg' is the desired image format.

Documentation

USAGE:
   create_zoomed_maps.sh -p DIR -c INTEGER -z INTEGER [Options]

DESCRIPTION:
   Creates a zoomed map for completed CCT project.

REQUIRED ARGUMENTS:
   -p, --project DIR
      Path to a completed CCT project.
   -c, --center INTEGER
      Nucleotide position to center the zoomed map on.
   -z, --zoom INTEGER
      Zoom multiplier.

OPTIONAL ARGUMENTS:
   -f, --format
      Image format for output map. Options are png, jpg, svg, svgz. 
      (Default: png)
   -m, --memory
      Memory value for Java's -Xmx option (Default: 1500m).
   -h, --help
      Show this message

EXAMPLE:
   create_zoomed_maps.sh -p my_project -c 10000 -z 10 -f svg

fetch_all_refseq_bacterial_genomes.sh

To download all RefSeq records (in GenBank format) for bacterial species, use the fetch_all_refseq_bacterial_genomes.sh script, as in the following example:

fetch_all_refseq_bacterial_genomes.sh -o db/comparison_genomes

Where 'db/comparison_genomes' is the directory to contain the downloaded GenBank files.

Documentation

USAGE:
   fetch_all_refseq_bacterial_genomes.sh -o DIR 

DESCRIPTION:
   Downloads all bacterial RefSeq sequences form NCBI in GenBank format.
   The --min and --max options can be used to restrict the size of the 
   returned sequences.

REQUIRED ARGUMENTS:
   -o, --output DIR
      The output directory to contain the downloaded GenBank files.

OPTIONAL ARGUMENTS:
   -m, --min INTEGER
      Records with a sequence length shorter than this value will be ignored.
   -x, --max INTEGER
      Records with a sequence length longer than this value will be ignored.
   -h, --help
      Show this message.

EXAMPLE:
   fetch_all_refseq_bacterial_genomes.sh -o my_project/comparison_genomes

fetch_all_refseq_chloroplast_genomes.sh

To download all RefSeq records (in GenBank format) for chloroplast genomes, use the fetch_all_refseq_chloroplast_genomes.sh script, as in the following example:

fetch_all_refseq_chloroplast_genomes.sh -o db/comparison_genomes

Where 'db/comparison_genomes' is the directory to contain the downloaded GenBank files.

Documentation

USAGE:
   fetch_all_refseq_chloroplast_genomes.sh -o DIR 

DESCRIPTION:
   Downloads all chloroplast RefSeq sequences form NCBI in GenBank format.

REQUIRED ARGUMENTS:
   -o, --output DIR
      The output directory to contain the downloaded GenBank files.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message.

EXAMPLE:
   fetch_all_refseq_chloroplast_genomes.sh -o my_project/comparison_genomes 

fetch_all_refseq_mitochondrial_genomes.sh

To download all RefSeq records (in GenBank format) for mitochondrial genomes, use the fetch_all_refseq_chloroplast_genomes.sh script, as in the following example:

fetch_all_refseq_mitochondrial_genomes.sh -o db/comparison_genomes

Where 'db/comparison_genomes' is the directory to contain the downloaded GenBank files.

Documentation

USAGE:
   fetch_all_refseq_mitochondrial_genomes.sh -o DIR 

DESCRIPTION:
   Downloads all bacterial RefSeq sequences form NCBI in GenBank format.

REQUIRED ARGUMENTS:
   -o, --output DIR
      The output directory to contain the downloaded GenBank files.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message.

EXAMPLE:
   fetch_all_refseq_mitochondrial_genomes.sh -o my_project/comparison_genomes 

fetch_genome_by_accession.sh

To download a GenBank record using an accession number use the fetch_genome_by_accession.sh script, as in the following example:

fetch_genome_by_accession.sh -a NC_007719 -o my_project/reference_genome

Where 'NC_007719' is the accession of the record to download and 'my_project/reference_genome' is the directory to contain the downloaded GenBank file.

Documentation

USAGE:
   fetch_genome_by_accession.sh -a STRING -o DIR 

DESCRIPTION:
   Downloads a GenBank record using the accession number.

REQUIRED ARGUMENTS:
   -a, --accession STRING
      Accession number of the sequence to download.
   -o, --output DIR
      The output directory to download the GenBank file into.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message.

EXAMPLE:
   fetch_genome_by_accession.sh -a NC_007719 -o my_project/reference_genome

fetch_refseq_bacterial_genomes_by_name.sh

To download RefSeq records (in GenBank format) using a partial or complete bacterial species name, use the fetch_refseq_bacterial_genomes_by_name.sh script, as in the following example:

fetch_refseq_bacterial_genomes_by_name.sh -n 'Escherichia*' -o db/comparison_genomes

Where 'Escherichia*' refers to all RefSeq records from organisms with a name beginning with 'Escherichia' and 'db/comparison_genomes' is the directory to contain the downloaded GenBank files.

Documentation

USAGE:
   fetch_refseq_bacterial_genomes_by_name.sh -n STRING -o DIR 

DESCRIPTION:
   Downloads a GenBank record using a partial or complete bacterial species name.
   The --min and --max options can be used to restrict the size of the 
   returned sequences.

REQUIRED ARGUMENTS:
   -n, --name STRING
      Complete or partial name of the bacterial species.
   -m, --min INTEGER
      Records with a sequence length shorter than this value will be ignored.
   -x, --max INTEGER
      Records with a sequence length longer than this value will be ignored.
   -o, --output DIR
      The output directory to download the GenBank file into.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message.

EXAMPLE:
   fetch_refseq_bacterial_genomes_by_name.sh -n 'Escherichia*' -o my_project/comparison_genomes

To download collections of sequences for use as custom BLAST databases use the ncbi_search.pl script, as in the following example:

  perl ncbi_search.pl -q 'bacteriophage AND Escherichia coli M863[ORGANISM]' \
  -o bacteriophage.faa -d protein -r fasta -v

Where 'bacteriophage AND Escherichia coli M863[ORGANISM]' is the Entrez query to use, 'bacteriophage.faa' is the name of output file to create, 'protein' is the NCBI database to search, and 'fasta' is the desired output format. The '-v' option causes the script to provide progress information.

Documentation

ncbi_search.pl - search NCBI databases.

DISPLAY HELP AND EXIT:

usage:

  perl ncbi_search.pl -help

PERFORM NCBI SEARCH

usage:

  perl ncbi_search.pl -q <string> -o <file> -d <string> [Options]

required arguments:

-q - Entrez query text.

-o - Output file to create. If the -s option is used this is the output
directory to create.

-d - Name of the NCBI database to search, such as 'nuccore', 'protein', or
'gene'.

optional arguments:

-r - Type of information to download. For sequences, 'fasta' is typically
specified. The accepted formats depend on the database being queried. The
default is to specify no format.
  
-m - The maximum number of records to download. Default is to download all
records.
  
-s - Save each record as a separate file. This option is only supported for -r
values of 'gb' and 'gbwithparts'.

-v - Provide progress messages.

example usage:

  perl ncbi_search.pl -q 'NC_045512[Accession]' -o NC_045512.gbk -d nuccore \
  -r gbwithparts

redraw_maps.sh

To redraw maps in a completed project (after editing the CGView XML files for example), use the redraw_maps.sh script, as in the following example:

redraw_maps.sh -p my_project -format svgz

Where 'my_project' is the name of the CCT project directory and SVGZ is the desired image format.

Documentation

USAGE:
   redraw_maps.sh -p DIR [Options]

DESCRIPTION:
   Used to redraw the maps. This can be used after editing the CGView XML file
   or to change the output image formats.

REQUIRED ARGUMENTS:
   -p, --project DIR
      Path to a completed CCT project.

OPTIONAL ARGUMENTS:
   -f, --format STRING
      Image format for output map. Options are png, jpg, svg, svgz. 
      (Default: png)
   -m, --memory STRING
      Memory value for Java's -Xmx option (Default: 1500m).
   -h, --help
      Show this message

EXAMPLE:
   redraw_maps.sh -p my_project -f svg

remove_long_seqs.sh

To remove GenBank sequences that are longer than a specific length from a directory use the remove_long_seqs.sh script, as in the following example:

remove_long_seqs.sh -i db/comparison_genomes/ -l 1000000

Where 'db/comparison_genomes' is the name of the directory containing the GenBank files to filter and 1000000 is the maximum desired sequence length.

Documentation

USAGE:
   remove_long_seqs.sh -i DIR -l INTEGER

DESCRIPTION:
   Removes GenBank files that are longer than the specified length from the
   provided directory.

REQUIRED ARGUMENTS:
   -i, --input DIR
      Input directory of GenBank files with .gbk extensions.
   -l, --length INTEGER
      Remove GenBank files that describe sequences longer than this length.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message

EXAMPLE:
   remove_long_seqs.sh -i my_project/comparison_genomes -l 100000

remove_short_seqs.sh

To remove GenBank sequences that are shorter than a specific length from a directory use the remove_short_seqs.sh script, as in the following example:

remove_short_seqs.sh -i db/comparison_genomes/ -l 1000000

Where 'db/comparison_genomes' is the name of the directory containing the GenBank files to filter and 1000000 is the minimum desired sequence length.

Documentation

USAGE:
   remove_short_seqs.sh -i DIR -l INTEGER

DESCRIPTION:
   Removes GenBank files that are shorter than the specified length from the
   provided directory.

REQUIRED ARGUMENTS:
   -i, --input DIR
      Input directory of GenBank files with .gbk extensions.
   -l, --length INTEGER
      Remove GenBank files that describe sequences shorter than this length.

OPTIONAL ARGUMENTS:
   -h, --help
      Show this message

EXAMPLE:
   remove_short_seqs.sh -i my_project/comparison_genomes -l 100000