string - Is there an error in Gusfield's description of the dynamic programming algorithm for finding global alignments with constant gap penalty? -



string - Is there an error in Gusfield's description of the dynamic programming algorithm for finding global alignments with constant gap penalty? -

gusfield (algorithms on strings, trees, , sequences, section 11.8.6) describes dynamic programming algorithm finding best alignment between 2 sequences , b under assumption penalty assigned gap of length x in 1 of aligned sequences of form r+sx constants r , s. in special case s=0, there penalty origin gap no penalty lengthening it. seems me there error in gusfield's formulae , hope can help me understand true state of affairs.

gusfield defines 4 functions v(i,j), g(i,j), e(i,j) , f(i,j) next properties:

v(i,j) best score possible alignments of prefixes a[1..i] , b[1..j]. e(i,j) best possible score alignments of these prefixes under status b[j] paired against gap in a. f(i,j) best possible score alignments of these prefixes under status a[i] paired against gap in b. g(i,j) best possible score alignments pair a[i] b[j].

the recurrences , j greater or equal 1 are:

v(i,j)=max[e(i,j),f(i,j),g(i,j)] g(i,j)=v(i,j)+w(a[i],b[j]) w score match/mismatch. e(i,j)=max(e(i,j-1),v(i,j-1)-r] f(i,j)=max(f(i-1,j),v(i-1,j)-r]

finally boundary conditions given as:

v(i,0)=e(i,0)=-r v(0,j)=f(0,j)=-r

with preliminaries, consider situation have 2 sequences each of length one, a=p , b=q. there 3 possible alignments in situation:

p p- -p q -q q-

these have scores respectively w(p,q), -2r, -2r. in particular should have e(0,1)=f(1,0)=-2r. however, recurrences give e(0,1) , f(1,0) both bigger or equal -r.

this error in boundary conditions has consequences. suppose illustration a=pppppp...p , b=qqqq...q p , q different. alignment sets off b:

a- -b

should score -2r (it has 2 gaps) , alignment optimal assuming w(p,q)<0.

gusfield's algorithm not seem handle case correctly.

in practical situations doesn't matter, because typical strings have lot of matches , boundary cases don't contribute solution.

comments/criticisms welcome.

yes, 2 of equations incorrect. should be

f(i,j)=max(f(i,j-1),v(i,j-1)-r] e(i,j)=max(e(i-1,j),v(i-1,j)-r]

since e(i,j) best possible score alignments of these prefixes under status a[i] paired against gap in b, alignment made of optimal alignment of a[1:i-l] against b[1:j] , l-long gap (the indels against a[i-l+1:i]).

string algorithm bioinformatics sequence-alignment

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -