Protein Sequence Alignment in HTML5

Last semester, I had an assignment to write a parallel algorithm for a protein sequence alignment. I've translated the code over to JavaScript so it can be run in the browser.

Banner credit: Peter Allen, University of Montreal

Sequence Alignment

Illustration credit: Institut für Biochemie, Charité Berlin

I'll be frank; I am not at all familiar with bioinformatics, but like all other programmers, I know my way around strings which is just a sequence of characters. Effectively, this is about the same as the sequence of DNA, RNA and protein structures in Biology, which means you and I are just as well equipped to write algorithms that help Biologists align sequences longer than what they can practically do by hand.

Smith-Waterman Algorithm

The Smith-Waterman Algorithm performs local alignment between two DNA, RNA or protein sequences by performing matching, insertions and deletions to transform one sequence to another sequence. The lower the number of changes, the more similar the sequences are functionally, structurally and evolutionarily.

Unsurprisingly, the Smith-Waterman Algorithm appears superficially similar to other string-based algorithms like the Levenshtein Distance (Edit Distance) Algorithm.

One really neat feature of both Smith-Waterman and Levenshtein is the use of a matrix "workspace" for dynamic programming. This makes them efficient algorithms for their problems.

Demonstration

This particular example uses the BLOSUM62 similarity scoring matrix and affine gap penalties of 10.0 for gap opening and 5.0 for gap extension.

The text boxes are already pre-filled with the complete coding sequences for the mRNA of Xiphophorus helleri strain Rio Sarabia tumor protein p53 (TP53) and Xiphophorus maculatus strain Rio Jamapa tumor protein p53 (TP53).

You can use a software such as JAligner to verify the results, just make sure the gap penalties are correctly configured.

Left Sequence Right Sequence

Output

The top bar represents the left sequence and the bottom bar represents the right sequence. They are both aligned horizontally at the local subsequence match.
Identity
Similar
Mismatch
Deletion
Insertion

Identity / (%)
Similarity / (%)
Mismatches / (%)
Insertions / (%)
Deletions / (%)
Gaps: / (%)
Score:

Nothing yet. Press "Align" to begin.

Relevant Modules in NUS

Interested to delve into the world of bioinformatics?

Some helpful modules include:

  • LSM3231 Protein Structure and Function
  • CS3225 Combinatorial Methods in Bioinformatics
  • LSM4241 Functional Genomics

Unfortunately, I don't have any insight for any of these modules because I don't do biology.

That said, if this is what you want to do as a career, you should consider applying for the Bachelor of Computing in Computational Biology. There are a lot of overlap between this programme and CS/CM, so transfers are also not out of the question.


I hope you've enjoyed this demonstration.

If you have any feedback/questions or if you noticed any mistakes in my article, please contact me at fazli[at]sapuan[dot]org.

Comment section for placebo effect only. Please use email to contact the author