Bowtie2AlignOptions

Options to map reads to reference sequence

Description

A Bowtie2AlignOptions object contains options to run the bowtie2 function, which aligns reads to a reference sequence.

Creation

Description

example

alignOptions = Bowtie2AlignOptions creates a Bowtie2AlignOptions object with default property values.

Bowtie2AlignOptions requires the Bioinformatics Toolbox™ Interface for Bowtie Aligner support package. If this support package is not installed, then the function provides a download link.

Note

Bowtie2AlignOptions is supported on Mac and UNIX® platforms only.

example

alignOptions = Bowtie2AlignOptions(Name,Value) sets properties using one or more name-value pair arguments. Enclose each property name in quotes. For example, alignOptions = Bowtie2AlignOptions('Trim5',10) specifies to trim 10 residues from the 5' end.

example

alignOptions = Bowtie2AlignOptions(S) specifies optional parameters in a character vector S.

Input Arguments

expand all

Alignment parameters, specified as a character vector. S must be in the Bowtie 2 option syntax (prefixed by one or two dashes) [1].

Properties

expand all

Flag to allow dovetail configurations, specified as true or false. This property specifies whether the alignment of one mate can extend past the beginning of the alignment of the other mate and be considered concordant.

This property applies to paired-end reads only.

Example: 'AllowDovetail',true

Data Types: logical

Penalty for positions with ambiguous characters on the read sequence, reference sequence, or both, specified as a nonnegative integer.

Example: 'AmbiguousPenalty',2

Data Types: double

Encoding format of the base quality in the input files, specified as one of the following: 'Phred33', 'Phred64', or 'Solexa'.

Example: 'Encoding','Phred64'

Data Types: char | string

Flag to allow one mate alignment to contain the alignment of the other mate and to be considered concordant, specified as true or false.

This property applies to paired-end reads only.

Example: 'ExcludeContain',true

Data Types: logical

Flag to include discordant alignments, specified as true or false. A discordant alignment is an alignment where both mates align uniquely, but not in a way that satisfies the paired-end constraints.

Example: 'ExcludeDiscordant',true

Data Types: logical

Flag to exclude mixed alignments, specified as true or false. A mixed alignment consists of mate reads that are not concordant or discordant, but align individually.

This property applies to paired-end reads only.

Example: 'ExcludeMixed',true

Data Types: logical

Flag to allow the alignment of one mate to overlap with the alignment of the other mate and to be considered concordant, specified as true or false.

Example: 'ExcludeOverlap',true

Data Types: logical

Flag to exclude reads that failed to align, specified as true or false.

Example: 'ExcludeUnaligned',true

Data Types: logical

Additional options not included in the object properties, specified as a character vector. The character vector must be in the Bowtie 2 option syntax (prefixed by one or two dashes). The default value is an empty character vector ''.

Example: 'ExtraBowtie2Command','--version'

Data Types: char | string

Flag to ignore the actual read position quality when a mismatch occurs, specified as true or false. Setting this property to true allows the quality value at that mismatched position to be the highest possible, regardless of the actual value.

Example: 'IgnoreQuality',true

Data Types: logical

Reward added to the alignment score when a position in the read matches a position in the reference, specified as a nonnegative integer.

Example: 'MatchBonus',5

Data Types: double

Function governing the maximum number of ambiguous characters allowed in a read, specified as a character vector or string.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

The default function is 'L,0,0.15', that is, H(x) = 0 + 0.15 * x.

Example: 'MaxAmbiguousFunction','L,-0.4,-0.6'

Data Types: char | string

Flag to use memory mapping (instead of file I/O) when loading the index, specified as true or false. Memory mapping allows many concurrent processes to share the memory image of the index, resulting in a more efficient parallelization of the task.

Example: 'MemoryMappedIndex',true

Data Types: logical

Function governing the minimum score threshold of an alignment, specified as a character vector or string.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

For the 'EndToEnd' alignment mode, the default function is 'L,-0.6,-0.6'. For the 'Local' mode, the default function is 'G,20,8'.

Example: 'MinScoreFunction','L,-0.4,-0.6'

Data Types: char | string

Maximum and minimum values to compute the mismatch penalty during alignment, specified as a two-element vector. The first element is the maximum value and the second element is the minimum value.

A number less than or equal to the maximum value, and greater than or equal to the minimum value is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N character.

Example: 'MismatchPenalty',[5 3]

Data Types: double

Alignment mode, specified as 'EndToEnd' or 'Local'.

In the 'Local' mode, only part of the read must align to the reference, and some residues can be omitted (soft-clipped) to achieve the best alignment score. In the 'EndToEnd' mode, the entire read must align without any soft-clipping.

Example: 'Mode','Local'

Data Types: char | string

Flag to reinitialize the pseudo-random generator for each read using the current time, specified as true or false. If true, the alignments reported for two identical reads can be different. The default value is false, that is, the pseudo-random generator is reinitialized using a seed derived from read information and the seed number.

Example: 'Nondeterministic',true

Data Types: logical

Number of positions at the beginning or end of each read where gaps are not allowed, specified as a nonnegative integer.

Example: 'NoGapPositions',5

Data Types: double

Maximum number of valid alignments to report before terminating the search, specified as a positive integer, 'Best', or 'All'. If you specify a positive integer N, the function searches for up to N distinct, valid alignments for each read. 'Best' reports the best alignment for each read. 'All' reports all the valid alignments for each read sorted by alignment scores.

The alignment score for a paired-end alignment equals the sum of the alignment scores of individual mates.

Example: 'NumAlignments','All'

Data Types: double | char | string

Maximum number of reseeding attempts with repetitive seeds, specified as a nonnegative integer. During reseeding, the function chooses a new set of reads at different offsets to find more alignments.

Example: 'NumReseedings',5

Data Types: double

Maximum number of consecutive seed extension attempts before getting a new seed, specified as a nonnegative integer. A seed extension fails if it does not yield an alignment with the best (or second-best) score.

Example: 'NumSeedExtensions',10

Data Types: double

Number of allowed mismatches in a seed alignment during the multiseed alignment, specified as 0 or 1.

Example: 'NumSeedMismatches',1

Data Types: double

Number of parallel threads to perform the alignment, specified as a positive integer. Threads run on separate processors or cores. Increasing the number of threads provides a significant increase in speed (close to linear) but also increases the memory footprint.

Example: 'NumThreads',4

Data Types: double

Offrate to use when reading the index to reduce the memory footprint, specified as a positive integer. The offrate must be greater than the offrate used to build the index.

Example: 'Offrate',20

Data Types: double

Position in the reference sequence where the alignment for each sequence begins, specified as a nonnegative integer.

Example: 'PadPositions',10

Data Types: double

Gap costs for opening and extending a gap on the read, specified as a two-element vector of nonnegative integers. The first element is the cost of opening a gap, and the second element is the cost of extending a gap. Given the cost vector [GO GE], a read gap of length N is assigned a penalty of GO + N * GE.

Example: 'ReadGapCosts',[4 2]

Data Types: double

Read group ID to add on the @RG header line in the output SAM report, specified as a character vector or string. If you specify any read group ID, the function prints the @RG header line with the tag ID: followed by the specified group ID.

Example: 'ReadGroupID','ID1'

Data Types: char | string

Read group information to add as a field on the @RG header line in the output SAM report, specified as a character vector or string. This property applies only if you specify 'ReadGroupID'.

Example: 'ReadGroup','Control'

Data Types: char | string

Gap costs for opening and extending a gap on the reference, specified as a two-element vector of nonnegative integers. The first element is the cost of opening a gap, and the second element is the cost of extending a gap. Given the cost vector [GO GE], a reference gap of length N is assigned a penalty of GO + N * GE.

Example: 'RefGapCosts',[4 2]

Data Types: double

Flag to reorder SAM records to maintain the same order as in the input files, specified as true or false. This property applies only when the number of parallel threads is greater than one. When you use one thread, the order of the records in the output is the same as the order of the input.

Example: 'Reorder',true

Data Types: logical

Number to set the seed in the pseudo-random number generator, specified as a nonnegative integer.

Example: 'Seed',3

Data Types: double

Function governing the distance between seed substrings during the multiseed alignment, specified as a character vector or string.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

For the 'EndToEnd' alignment mode, the default function is 'S,1,1.15'. For the 'Local' mode, the default function is 'S,1,0.75'.

Example: 'SeedIntervalFunction','S,2,2.15'

Data Types: char | string

Seed substring length to align during the multiseed alignment, specified as a positive integer.

Example: 'SeedLength',25

Data Types: double

Number of reads to ignore from the beginning of the input files, specified as a nonnegative integer.

Example: 'Skip',5

Data Types: double

Number of residues to trim from the 3' end of each read before aligning, specified as a nonnegative integer.

Example: 'Trim3',5

Data Types: double

Number of residues to trim from the 5' end of each read before aligning, specified as a nonnegative integer.

Example: 'Trim5',5

Data Types: double

Number of reads to consider from the beginning of input files, specified as a positive integer. The default value is Inf, that is, all reads are considered.

Example: 'UpTo',1000

Data Types: double

Object Functions

getBowtie2CommandTranslate object properties to Bowtie 2 options
getBowtie2TableRetrieve table with object properties and equivalent Bowtie 2 options
presetSet combination of alignment options
runMap sequence reads to reference sequence using Bowtie 2

Examples

collapse all

Build a set of index files for the Drosophila genome. An error message appears if you do not have the Bioinformatics Toolbox Interface for Bowtie Aligner support package installed when you run the function. Click the provided link to download the package from the Add-on menu.

For this example, the reference sequence Dmel_chr4.fa is already provided with the toolbox.

status = bowtie2build('Dmel_chr4.fa', 'Dmel_chr4_index');

If the index build is successful, the function returns 0 and creates the index files (*.bt2) in the current folder. The files have the prefix 'Dmel_chr4_index'.

Sometimes the index files exist, and you want to know the reference sequence used to build the index. In this case, use the bowtie2inspect function to get more information about the reference.

bowtie2inspect('Dmel_chr4', 'Dmel_chr4_retrieved.fa');

By default, the output file Dmel_chr4_retrieved.fa contains the sequence of the reference. You can also get a summary information about the reference name and lengths instead of the actual sequence. For details on the available options, see Bowtie2InspectOptions.

Once the index is ready, map the read sequences to the reference using the bowtie2 function. The paired-end read files (SRR6008575_10k_1.fq and SRR6008575_10k_2.fq) are already provided with the toolbox.

bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam');

The output is a SAM-formatted file that contains the mapping results.

You can specify different alignment options by passing in a Bowtie 2 syntax string or using a Bowtie2AlignOptions object.

Suppose you want to trim some residues from the 3' end before aligning. First, create a Bowtie2AlignOptions object.

 alignOpt = Bowtie2AlignOptions;

Trim four residues from the 3' end before aligning.

 alignOpt.Trim3 = 4;

Map reads to the reference using the specified alignment option.

flag = bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4_trimmed.sam',alignOpt);

References

[1] Langmead, B., and S. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature Methods. 9, 2012, 357–359.

Introduced in R2018a