We are proud to announce that there are now over 300 million sequences in GQ-Pat, including 256 million nucleotide sequences and over 45 million protein sequences. And these protein sequences aren’t just automated translations of nuceotides like TrEMBL. All of these sequences are in fact found in patents and patent applications from patent authorities around the world.

To put this accomplishment in perspective, when the Human Genome Project formally began in 1990, there were fewer than 40,000 sequences in GenBank, before being transferred from Stanford to the newly created National Center for Biotechnology Information (NCBI).

As of the summer of 2015, according to the NCBI’s GenBank Statistics page, there are 185 million nucleotide sequences in GenBank/EMBL/DDBJ consortium, the world’s gold standard for sequence databases.* GenBank represents merely slightly more than half of GQ-Pat!


Not only does GQ-Pat have double the sequences of GenBank, these sequences help searchers in other ways as well:

  1. The sheer size of the database itself helps organizations save money through efficiency and the fact that important search results won’t be missed.
  2. Sequences in GQ-Pat are well annotated because all of them have been found in patents, making them more valuable to researchers. Patent information includes descriptions of a particular invention, including the way in which the invention is used, the inventors, the owners, biological information about the sequence, its function, and so on.
  3. Researchers using GQ-Pat can obtain results much sooner than those using public databases like GenBank because patents are typically filed before publications are drafted.

We’re so pleased that more and more researchers are turning to GQ-Pat to search sequences for a huge variety of life science related projects. Our clients point out that when it comes to researching or protecting their own intellectual property, the quality of the results are often only as good as the size of the database they come from. That’s why we’re dedicated to maintaining the world’s largest. Of course, we’re also adding rich annotations and making sure updates are added on a weekly basis.

If you’ve never tried it, and you’re interested in searching our 300 million sequence (and growing) database for yourself, there’s never been a better time to get in touch about a free trial.

* There are another 50 million protein sequences in Uniprot, the leading protein sequence database, although 98% of Uniprot is TrEMBL, which consists of 49 million unreviewed computer-generated protein translations of nucleotide sequences already in the nucleotide databases.

Integrated Full-Text and Sequence Search

Prior art searching in the life sciences is complicated. Tens of millions of patents and applications, dozens of file formats, and handfuls of search platforms force the professional patent searcher to jump through hoops to get answers.

You know GenomeQuest as a Leader in Intellectual Property Sequence Searching.

But we’re branching out to bring you an integrated keyword search platform that is designed for life science patent searchers like you. It’s a full-text search product that we’re calling LifeQuest.


LifeQuest has deep integration with life science ontologies, combined with the most modern search indices, fast navigation of results using keystrokes, set operations on search results, and tight integration with sequence search. It’s being launched in April, 2015.

Download our white paper or Free Trial to LifeQuest today.


Because GenomeQuest’s GQ-Pat is a document database rather than a family database, you might hit the same sequence more than once because it occurs in document A and document B both in the same family. That’s actually incredibly useful because it allows you to examine how sequences found in patents change from patent family member to family member.


What if you could collapse this down so that each unique sequence in a family was represented only once? That would unlock lots of use cases, for instance:

  • removing all redundancy and showing you the top hits to your query for each family, rather than by document
  • breaking down a sequence’s legal status, SEQ ID NO, or claim information as it moves through different family members
  • examining a non-redundant view of the unique projection of sequences across an entire family

This video explains GenomeQuest’s Unique Family Sequence (UFS) capability and how to maximize your use of it.



Start a Free Trial!

Each year, pharmaceuticals, biotech organizations, academic institutions and law firms commit costly errors that happen due to poorly informed IP portfolio decisions. Relating to biological sequence search, here are nine serious mistakes we hate to see life science companies make.

1. Overlooking patent sequence data.

Serious sequence information search require specify and organized efforts, and searching Genbank is not enough. Genbank had 180 million sequences as of its December 2014 build, only 32 million of which are identified in their patent division. As a contrast, GenomeQuest’s GQ-Pat had over 280 million sequences, all found in patents, almost nine times larger.


2. Under-utilizing annotation information.

Ascertaining the legal or biological importance of the similarity between any two sequences requires a clean, curated database with organized annotation fields and content. Additional fields, such as bibliographic references, date of earliest publication, and date of sequence disclosure add analytical speed and precision when used with a rapid search result filtering function.

3. Forgetting the Dark Genome.

Public BLAST portals search only the most readily-accessible elements of the entire universe of genome data. The remaining information is sometimes referred to as the “dark genome.” Poorly annotated data in a readily accessible database may be considered part of the dark genome, in that is “hiding in plain sight.” Additional data with low search accessibility includes the information held in proprietary databases, desktop hard drives, graphic images and illustrations, and print document collections. Searching the “dark genome” requires access to proprietary data and full-time, multiple-media genome information searching and database curation procedures.

4. Taking too much time.

Taking too much time to do a patent-related search is a root cause of research project and intellectual property decision delays. Researchers might spend weeks scouring the internet for new data related to a query sequence, or developing lists of databases holding separate or overlapping sets of genomic information. Unintuitive search software user interfaces cause “learning curve” delays, and sequence search outsourcing can cause vendor transaction and project scheduling delays of weeks or months

5. Hoping for the best.

Moving forward with a research project without properly searching and evaluating sequences can prove to be costly in the long run. An incomplete evaluation of the data early in the research cycle can be costly once a completed project is found to have yielded unusable results.

6. Making decisions based on yesterday’s results.

Genome sequence information is extremely dynamic. In addition to the steady addition of recorded primary sequence data, scientific and patent information about both new and previously existing sequences also grows and changes on a daily basis. A sequence data query affecting important scientific research and business decisions might not yield the same answer one week from now. The more sequences involved in the decision, the greater the risk. Research groups and businesses without access to an automa

ted and continuous search-and-report system are particularly vulnerable.


7. Using the wrong algorithm.

Even the most experienced analyzers can make a mistake choosing the right algorithm for sequence search. For example, using BLAST for short sequences will miss many approximate hits. GenePAST is a better algorithm to use in many sequence search cases.

8. Too many gatekeepers.

Restricted access rights to proprietary databases, cumbersome search software user interfaces, and outdated business practices often prohibit direct utilization of sequence data search systems by the person asking the question, who must instead work through one or more gatekeepers. A well-defined project submission process can prevent intended queries from getting “lost in translation,” but when sequence searches are outsourced, queries are often composed broadly in order to prevent potentially relevant results from being excluded from the search report returned by the service. This results in an oversized report and a long manual search process for the sequence records of real interest. Gatekeeper delays also inhibit creative sequence data exploration, where hunches and hypotheses can be quickly formed and investigated using fast, iterative database queries.

9. Ignoring workflow issues.

Commercially licensed or in-house bioinformatics solutions often become very popular within organizations as researchers learn to use them to great advantage. But an effort to provide genome search capability to the user base that does not consider workflow issues can result in the installation of an isolated, standalone information “silo” with an unfamiliar interface. The standalone solution is itself likely to be underutilized, and also fails to take advantage of organizational knowledge built up around previously existing bioinformatics applications.


Start a Free Trial!

Antibody Patent Searching Made Easy

Researchers often neglect to search antibody patents because it seems complex and due to the perception that there is nothing to be gained from it. Dangerous thinking!

Antibody searching, with the right tools, is in fact quite easy. And the gain is compelling: searching patents is the best way to learn about the competitive landscape because patents are published before scientific papers.


All antibodies have significant sequence homology to each other since they all have the Immunoglobulin (Ig) domain. The functional part of an antibody query, its uniqueness, comes from short (5-20 amino acid) loops in the Ig fold called Complementarity Determining Regions (CDRs). Determining the similarity between the CDRs of different antibodies is the key to proper antibody searching.

This short video shows how easy it is to do antibody similarity searching in GenomeQuest.



Start a Free Trial!

4 Reasons why people pay for IP Sequence Search

Everyone likes free biological sequence search.

A free IP sequence search usually involves the following steps: search the Genbank patent divisions on the NCBI BLAST web site, go through the alignments one by one, and lookup related patent information on the web.

Bad idea. Here are four reasons why.

1. Free sources are incomplete.

Why waste your time searching at all if you aren’t searching the complete register? The Genbank December 2014 release (205) had just under 180 million sequences. The GQ-Pat database had over 280 million sequences at that time, and can be searched right alongside the 180 million sequences in Genbank.

2. Free searches are stuck in the wrong algorithms.

The most common error in IP sequence searching is using a publicly available algorithm like BLAST to determine legal relevance. It’s important to understand that BLAST, the most popular sequence search algorithm, has been created with biology in mind. It answers questions like: “is there an equivalent sequence in another species?” It is the wrong algorithm to answer an IP-related questions such as, “find all sequences in the database that are 70% or more identical to my query sequence.” For that, an algorithm like GenePAST is more appropriate.

3. Free searches take too much effort.

Public services present the outcome of a search as a long, static, list of alignments. From that list there is no easy way to filter the relevant hits and retrieve information about the related IP documents. A common solution is to print out everything, go through the hits one by one, and look up related patent information on the web. Findings are scribbled on to the printout or put into a spreadsheet. This approach is labor intensive, error-prone, and very inflexible. When the question changes the entire procedure has to be repeated from scratch.

GenomeQuest presents the outcome of a search as an interactive and fully queryable list that contains information about the alignments, the sequences, and the related IP documents. With a couple of clicks, users can find all hits with 70% identity or more over at least 500 nucleotides, where the sequence is claimed in a granted patent with the filing date before January 2010.

4. Free searches are not private.

Anyone can see the communication between your web browser and an unencrypted server like NCBI’s. And here you are running searches of the most confidential nature on the open Internet! Use a trusted, HIPAA-complaint, SAS II compliant service to ensure the security of your inventions.


Avoid the pitfalls of using free solutions for IP sequence searching. Start a free trial today!


Start a Free Trial!

BLAST is the most popular sequence comparison algorithm, but it was not built with intellectual property sequence search in mind. Here we discuss three problems with using BLAST alone for such sequence alignments.

You want to search the entire sequence, not just a piece of it.

BLAST is a so-called local alignment algorithm, which means that it will try to find small stretches of your query that match with very high similarity to a sequence. This is ideal in a biological context where one is looking for conserved sequences. But in patents, we often want to answer a different question, “what are all of the sequences which are 70% identical to my query?” In that case, local alignments are just wrong.

You need objective and repeatable results.

BLAST is a heuristic algorithm, which means it does not report all alignments it finds because of a complicated statistical model that decides if the match is significant or not. This decision is based of the length of the alignment and the database size, and if the database grows there is a chance that previous findings disappear. Shouldn’t you require an objective and repeatable search result for IP?

Searching for short sequences is tricky

and BLAST makes it harder because uses algorithm shortcuts to go faster. The most important heuristic is its word size parameter, where it requires an uninterrupted stretch of eleven identical nucleotides, or three identical amino acids, before it even attempts to align two sequences… This makes it less than ideal for searching short sequences like primers, small RNA molecules and antibody CDR regions.

Here is a better way to search: GenePAST

To solve all the problems discussed above GenomeQuest developed and published the GenePAST “percentage identity” algorithm. This algorithm aligns the entire sequence, while minimizing the number of mismatches, insertions, and deletions. No statistical models are used, and scores do not vary based on the changing sizes of the databases searched.


Contact Us For a Personalized Demo

Chinese GMO Approvals Make for IP Opportunities

China’s recent approval of imports of certain genetically modified corn and soybean represents the beginning of a hopeful trend to gain access to their enormous market. According the the U.S. Commerce Department, the Chinese market for U.S. soybeans was $14 billion in 2013, and corn was a significant $3.5 billion.

But inter-governmental agreements aren’t enough if you’re infringing intellectual property. That’s why it’s critical to be able to search the entire Chinese IP landscape to ensure you have freedom to operate as this large market begins to open up. The number of Chinese patents in the GenomeQuest GQ-Pat Platinum database has grown substantially over the past five years as companies vie for space in China. Download a white paper on GQ-Pat Platinum.


Intellectual property sequence search trials are available for free in GenomeQuest for those wishing to evaluate the growing China file in GQ-Pat Platinum.


Start a Free Trial!

3 Reasons why you should search Brazilian IP

moissonneuses-reducedBrazil has a rapidly growing economy and is highly invested in Biotechnology and the Life Sciences.

Understanding the Intellectual Property landscape of Brazil is key for a global life science company’s success.

Our whitepaper talks more about biotech patents and an gives an example of how to blast search them.

Here are three reasons why Brazil is a critical component of your IP strategy:

  • World Leader in Agriculture: Brazil is the world’s second largest Agriculture Market in the world.1 Brazil a major exporter of maize and is the world leader in production of soybean, sugar, and coffee.
  • Focus on Biofuels: The country emphasizes the production of biofuels, mainly sugarcane Ethanol. (Pretty sweet.)
  • Licensing and Collaborations: Brazil is a major licensing and collaborations partner in all areas of biotechnology.2 Nearly three quarters of all of its licensed patents originate from its universities and research centers.

Let’s cut to the chase, patents are the most important tools for protecting R&D capabilities in Brazil’s highly competitive biotech sector. US entities – who are responsible for over half the biotech patent filings in Brazil – clearly value protecting their intellectual property there.

We’re here to help. GenomeQuest is pleased to announce a highly comprehensive Brazilian DNA and protein sequences patent archive that is available through GQ-Pat Platinum. GenomeQuest offers a collection of manually-curated documents ensuring the highest accuracy for sensitive biological sequence IP searching, facilitating your freedom to operate.


Download Brazilian IP Whitepaper