NCBI BLAST is the most popular sequence comparison algorithm, but it was not built with intellectual property sequence search in mind. Here we discuss three problems with using NCBI BLAST alone for such sequence alignments.
You want to search the entire sequence, not just a piece of it.
NCBI BLAST is a so-called local alignment algorithm, which means that it will try to find small stretches of your query that match with very high similarity to a sequence. This is ideal in a biological context where one is looking for conserved sequences. But in patents, we often want to answer a different question, “what are all of the sequences which are 70% identical to my query?” In that case, local alignments are just wrong.
You need objective and repeatable results.
NCBI BLAST is a heuristic algorithm, which means it does not report all alignments it finds because of a complicated statistical model that decides if the match is significant or not. This decision is based of the length of the alignment and the database size, and if the database grows there is a chance that previous findings disappear. Shouldn’t you require an objective and repeatable search result for IP?
Searching for short sequences is tricky
and NCBI BLAST makes it harder because uses algorithm shortcuts to go faster. The most important heuristic is its word size parameter, where it requires an uninterrupted stretch of eleven identical nucleotides, or three identical amino acids, before it even attempts to align two sequences… This makes it less than ideal for searching short sequences like primers, small RNA molecules and antibody CDR regions.
Here is a better way to search: GenePAST
To solve all the problems discussed above GenomeQuest developed and published the GenePAST “percentage identity” algorithm. This algorithm aligns the entire sequence, while minimizing the number of mismatches, insertions, and deletions. No statistical models are used, and scores do not vary based on the changing sizes of the databases searched.