nwalign

Globally align two sequences using Needleman-Wunsch algorithm

Syntax

Score = nwalign(Seq1,Seq2)

Score = nwalign(Seq1,Seq2,Name=Value)

[Score,Alignment] = nwalign(Seq1,Seq2,___)

[Score,Alignment,Start] = nwalign(Seq1,Seq2,___)

Description

Score = nwalign(Seq1,Seq2) returns the optimal global alignment score in bits after aligning two sequences Seq1 and Seq2. The scale factor used to calculate the score is provided by ScoringMatrix.

example

Score = nwalign(Seq1,Seq2,Name=Value) uses additional options specified by one or more name-value arguments.

example

[Score,Alignment] = nwalign(Seq1,Seq2,___) also returns a character array Alignment showing the alignment of Seq1 and Seq2.

example

[Score,Alignment,Start] = nwalign(Seq1,Seq2,___) also returns a vector of indices Start as [1;1] indicating the starting point in each sequence for the alignment.

Examples

collapse all

Perform global alignment of two sequences

Open Live Script

Globally align two amino acid sequences using the BLOSUM50 (default) scoring matrix and the default values for the GapOpen and ExtendGap properties. Return the optimal global alignment score in bits and the alignment character array.

seq1 = "VSPAGMASGYD";
seq2 = "IPGKASYD";
[Score, Alignment] = nwalign(seq1,seq2)

Score = 
7.3333

Alignment = 3x11 char array
    'VSPAGMASGYD'
    ': | | || ||'
    'I-P-GKAS-YD'

Specify the PAM250 scoring matrix and a gap open penalty of 5.

[Score,Alignment] = nwalign(seq1,seq2,ScoringMatrix="PAM250",GapOpen=5)

Score = 
6

Alignment = 3x11 char array
    'VSPAGMASGYD'
    ': | |:|| ||'
    'I-P-GKAS-YD'

Return the Score in nat units (nats) by specifying a scale factor of log(2).

[Score,Alignment] = nwalign(seq1,seq2,Scale=log(2))

Score = 
5.0831

Alignment = 3x11 char array
    'VSPAGMASGYD'
    ': | | || ||'
    'I-P-GKAS-YD'

Input Arguments

collapse all

`Seq1` — Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure

Amino or nucleotide sequence to align, specified as a character vector or string scalar, vector of integers, or structure.

You can specify:

Character vector or string scalar representing an amino acid or nucleotide sequence, such as the output from int2aa or int2nt.
Vector of integers representing an amino acid or nucleotide sequence, such as the output from aa2int or nt2int,
Structure containing a Sequence field.

Tip

For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

Data Types: char | string | double | struct

`Seq2` — Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure

Amino or nucleotide sequence to align, specified as a character vector or string scalar, vector of integers, or structure. For details, see Seq1.

Data Types: char | string | double | struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: [s,a] = nwalign("HEAGAWGHEE","PAWHEAE",GapOpen=5,ShowScore=true) specifies to use the value of 5 as a penalty for gap opening and to show the scoring space and winning path.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: [s,a] = nwalign("HEAGAWGHEE","PAWHEAE",'GapOpen',5,'ShowScore',true)

`Alphabet` — Type of sequence
`"AA"` (default) | `"NT"`

Type of sequence, specified as "AA" (amino acid) or "NT" (nucleotide).

Data Types: char | string

`ScoringMatrix` — Scoring matrix for global alignment
`"BLOSUM50"` (for amino acid sequences) (default) | character vector | string scalar | numeric matrix

Scoring matrix for the global alignment, specified as a character vector, string scalar, or numeric matrix.

You can specify a scoring matrix name. Valid choices are:

"BLOSUM50" (default for amino acid sequences)
"NUC44" (default for nucleotide sequences)
"BLOSUM62"
"BLOSUM30" increasing by 5 up to "BLOSUM90"
"BLOSUM100"
"PAM10" increasing by 10 up to "PAM500"
"DAYHOFF"
"GONNET"

Note

The above scoring matrices, provided with the software, also include a scale factor that converts the units of the output score to bits. You can also specify the Scale name-value argument to specify an additional scale factor to convert the output score from bits to another unit.

You can also specify a numeric matrix, such as the one returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.

Note

If you use a scoring matrix that you created or was created by one of these scoring matrix functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the Scale name-value argument to specify a scale factor to convert the output score to another unit.
If you need to compile nwalign into a standalone application or software component using MATLAB^® Compiler™, use a numeric matrix instead of the scoring matrix name.

Data Types: double | char | string

`Scale` — Scale factor applied to output score
`1` (default) | numeric scalar | numeric vector

Scale factor applied to the output score, specified as a numeric scalar or vector. If you specify a vector, the function returns Score as a vector of the same length. By default, there is no scaling or change in the units of the output score.

Use this argument to control the units of the output scores. For example, if the output score is initially determined in bits, you can specify Scale=log(2) to return the output score in nats instead.

Note

If the ScoringMatrix argument also specifies a scale factor, then the function uses it first to scale the output score, then applies the scale factor specified by the Scale argument to rescale the output score.
Before comparing alignment scores from multiple alignments, ensure that the scores are in the same units.

Data Types: double

`GapOpen` — Penalty for opening gap
`8` (default) | positive scalar

Penalty for opening a gap, specified as a positive scalar.

Data Types: double

`ExtendGap` — Penalty for extending gap
positive scalar

Penalty for extending a gap using the affine gap penalty scheme, specified as a positive scalar.

If you specify this value, the function uses the affine gap penalty scheme, that is, it scores the first gap using the GapOpen value and scores subsequent gaps using the ExtendGap value. If you do not specify this value, the function scores all gaps equally, using the GapOpen penalty.

Data Types: double

`Glocal` — Flag to perform semiglobal alignment
`false` or `0` (default) | `true` or `1`

Flag to perform a semiglocal alignment, specified as a numeric or logical 1 (true) or 0 (false).

In a semiglobal alignment, gap penalties at the end of the sequences are null.

`Showscore` — Flag to display scoring space and winning path of alignment
`false` or `0` (default) | `true` or `1`

Flag to display the scoring space and winning path of the alignment, specified as a numeric or logical 1 (true) or 0 (false).

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (n1,n2) coordinate in the scoring space represents the best score for the pairing of subsequences Seq1(1:n1) and Seq2(1:n2), where n1 is a position in Seq1 and n2 is a position in Seq2. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space, and it illustrates the pairing of positions in the optimal global alignment. The color of the last point (lower right) of the winning path represents the optimal global alignment score for the two sequences and is the Score output.

Note

The scoring space visually indicates if there are potential alternate winning paths, which is useful when aligning sequences with big gaps. Visual patterns in the scoring space can also indicate a possible sequence rearrangement.

Output Arguments

collapse all

`Score` — Optimal global alignment score
numeric scalar | numeric vector

Optimal global alignment score, returned as a numeric scalar or vector. It is returned as a vector when you specify a numeric vector for the Scale name-value argument.

`Alignment` — Aligned sequences
character array

Aligned sequences, returned as a character array. The first and third rows are Seq1 and Seq2, respectively. The second row shows symbols representing the optimal global alignment for two sequences. The symbol | indicates amino acids or nucleotides that match exactly. The symbol : indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).

`Start` — Starting point in each sequence for alignment
`[1;1]`

Starting point in each sequence for the alignment, returned as a vector of indices. Because the function performs a global alignment, Start is always returned as [1;1]. The function returns this output to be consistent with the swalign function.

References

[1] Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1st ed. Cambridge University Press, 1998.

Version History

Introduced before R2006a

nwalign

Syntax

Description

Examples

Perform global alignment of two sequences

Input Arguments

Seq1 — Amino or nucleotide sequence to align character vector | string scalar | vector of integers | structure

Seq2 — Amino or nucleotide sequence to align character vector | string scalar | vector of integers | structure

Name-Value Arguments

Alphabet — Type of sequence "AA" (default) | "NT"

ScoringMatrix — Scoring matrix for global alignment "BLOSUM50" (for amino acid sequences) (default) | character vector | string scalar | numeric matrix

Scale — Scale factor applied to output score 1 (default) | numeric scalar | numeric vector

GapOpen — Penalty for opening gap 8 (default) | positive scalar

ExtendGap — Penalty for extending gap positive scalar

Glocal — Flag to perform semiglobal alignment false or 0 (default) | true or 1

Showscore — Flag to display scoring space and winning path of alignment false or 0 (default) | true or 1

Output Arguments

Score — Optimal global alignment score numeric scalar | numeric vector

Alignment — Aligned sequences character array

Start — Starting point in each sequence for alignment [1;1]

References

Version History

See Also

`Seq1` — Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure

`Seq2` — Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure

`Alphabet` — Type of sequence
`"AA"` (default) | `"NT"`

`ScoringMatrix` — Scoring matrix for global alignment
`"BLOSUM50"` (for amino acid sequences) (default) | character vector | string scalar | numeric matrix

`Scale` — Scale factor applied to output score
`1` (default) | numeric scalar | numeric vector

`GapOpen` — Penalty for opening gap
`8` (default) | positive scalar

`ExtendGap` — Penalty for extending gap
positive scalar

`Glocal` — Flag to perform semiglobal alignment
`false` or `0` (default) | `true` or `1`

`Showscore` — Flag to display scoring space and winning path of alignment
`false` or `0` (default) | `true` or `1`

`Score` — Optimal global alignment score
numeric scalar | numeric vector

`Alignment` — Aligned sequences
character array

`Start` — Starting point in each sequence for alignment
`[1;1]`