Directory

Encyclopedia

NodeWorks
                              ENCYCLOPEDIA

Link Checker

Home
Encyclopedia : N : NE : NEE :

Needleman-Wunsch algorithm

 

Needleman-Wunsch algorithm

The Needleman-Wunsch algorithm performs a global alignment on two sequences (called A and B here). It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was proposed in 1970 by Saul Needleman and Christian Wunsch in their paper A general method applicable to the search for similarities in the amino acid sequence of two proteins.

The Needleman-Wunsch algorithm is an example a of dynamic programming, and is guaranteed to find the alignment with the maximum score. Needleman-Wunsch is the first instance of dynamic programming being applied to biological sequence comparison.

Scores for aligned characters are specified by a similarity matrix. Here, is the similarity of characters i and j. It uses a linear gap penalty, here called d.

For example, if the similarity matrix was
-AGCT
A10-1-3-4
G-17-5-3
C-3-590
T-4-308

then the alignment:
AGACTAGTTAC
CGA---GACGT
with a gap penalty of -5, would have the following score...

To find the alignment with the highest score, a two-dimensional array (or matrix) is allocated. This matrix is often called the F matrix, and its (i,j)th entry is often denoted There is one column for each character in sequence A, and one row for each character in sequence B. Thus, if we are aligning sequences of sizes n and m, the amount of memory used by the algorithm is in O(nm). (However, there is a modified version of the algorithm which uses only O(m + n) space, at the cost of a higher running time. This modification is in fact a general technique which applies to many dynamic programming algorithms; this method was introduced in Hirschberg's algorithm for solving the longest common subsequence problem.)

As the algorithm progresses, the will be assigned to be the optimal score for the alignment of the first i characters in A and the first j characters in B. The principle of optimality is then applied as follows.
Basis:
Recursion, based on the principle of optimality:

The pseudo-code for the algorithm to compute the F matrix therefore looks like this (array indexes start at 0):
for i=0 to length(A)-1
F(i,0) <- 0
for j=0 to length(B)-1
F(0,j) <- 0
for i=1 to length(A)-1
for j = 1 to length(B)-1
{
Choice1 <- F(i-1,j-1) + S(A(i-1), B(j-1))
Choice2 <- F(i-1, j) - d
Choice3 <- F(i, j-1) - d
F(i,j) <- max(Choice1, Choice2, Choice3)
}
Once the F matrix is computed, note that the bottom right hand corner of the matrix is the maximum score for any alignments. To compute which alignment actually gives this score, you can start from the bottom left cell, and compare the value with the three possible sources(Choice1, Choice2, and Choice3 above) to see which it came from. If it was Choice1, then A(i) and B(i) are aligned, if it was Choice2 then A(i) is aligned with a gap, and if it was Choice3, then B(i) is aligned with a gap.
AlignmentA <- ""
AlignmentB <- ""
i <- length(A) - 1
j <- length(B) - 1
while (i > 0 AND j > 0)
{
Score <- F(i,j)
ScoreDiag <- F(i - 1, j - 1)
ScoreUp <- F(i, j - 1)
ScoreLeft <- F(i - 1, j)
if (Score - S(A(i), B(j))

ScoreDiag)
{
AlignmentA <- A(i) + AlignmentA
AlignmentB <- B(j) + AlignmentB
i <- i - 1
j <- j - 1
}
else if (Score - d

ScoreLeft)
{
AlignmentA <- A(i) + AlignmentA
AlignmentB <- "-" + AlignmentB
i <- i - 1
}
otherwise (Score - d == ScoreUp)
{
AlignmentA <- "-" + AlignmentA
AlignmentB <- B(j) + AlignmentB
j <- j - 1
}
}
while (i >= 0)
{
AlignmentA <- A(i) + AlignmentA
AlignmentB <- "-" + AlignmentB
i <- i - 1
}
while (j >= 0)
{
AlignmentA <- "-" + AlignmentA
AlignmentB <- B(j) + AlignmentB
j <- j - 1
}



NodeWorks boosts web surfing!
Page Returned in 0.150 seconds - HTML Compressed 68.6%

This article is from Wikipedia. All text is available
under the terms of the GNU Free Documentation License.
 GNU Free Documentation License
© 2008 Chamas Enterprises Inc.