string - Is there an error in Gusfield's description of the dynamic programming algorithm for finding global alignments with constant gap penalty? -
string - Is there an error in Gusfield's description of the dynamic programming algorithm for finding global alignments with constant gap penalty? -
gusfield (algorithms on strings, trees, , sequences, section 11.8.6) describes dynamic programming algorithm finding best alignment between 2 sequences , b under assumption penalty assigned gap of length x in 1 of aligned sequences of form r+sx constants r , s. in special case s=0, there penalty origin gap no penalty lengthening it. seems me there error in gusfield's formulae , hope can help me understand true state of affairs.
gusfield defines 4 functions v(i,j), g(i,j), e(i,j) , f(i,j) next properties:
v(i,j) best score possible alignments of prefixes a[1..i] , b[1..j]. e(i,j) best possible score alignments of these prefixes under status b[j] paired against gap in a. f(i,j) best possible score alignments of these prefixes under status a[i] paired against gap in b. g(i,j) best possible score alignments pair a[i] b[j].the recurrences , j greater or equal 1 are:
v(i,j)=max[e(i,j),f(i,j),g(i,j)] g(i,j)=v(i,j)+w(a[i],b[j]) w score match/mismatch. e(i,j)=max(e(i,j-1),v(i,j-1)-r] f(i,j)=max(f(i-1,j),v(i-1,j)-r]
finally boundary conditions given as:
v(i,0)=e(i,0)=-r v(0,j)=f(0,j)=-r
with preliminaries, consider situation have 2 sequences each of length one, a=p , b=q. there 3 possible alignments in situation:
p p- -p q -q q-
these have scores respectively w(p,q), -2r, -2r. in particular should have e(0,1)=f(1,0)=-2r. however, recurrences give e(0,1) , f(1,0) both bigger or equal -r.
this error in boundary conditions has consequences. suppose illustration a=pppppp...p , b=qqqq...q p , q different. alignment sets off b:
a- -b
should score -2r (it has 2 gaps) , alignment optimal assuming w(p,q)<0.
gusfield's algorithm not seem handle case correctly.
in practical situations doesn't matter, because typical strings have lot of matches , boundary cases don't contribute solution.
comments/criticisms welcome.
yes, 2 of equations incorrect. should be
f(i,j)=max(f(i,j-1),v(i,j-1)-r] e(i,j)=max(e(i-1,j),v(i-1,j)-r]
since e(i,j) best possible score alignments of these prefixes under status a[i] paired against gap in b, alignment made of optimal alignment of a[1:i-l] against b[1:j] , l-long gap (the indels against a[i-l+1:i]).
string algorithm bioinformatics sequence-alignment
Comments
Post a Comment