We are proud to announce that there are now over 300 million sequences in GQ-Pat, including 256 million nucleotide sequences and over 45 million protein sequences. And these protein sequences aren’t just automated translations of nuceotides like TrEMBL. All of these sequences are in fact found in patents and patent applications from patent authorities around the world.
To put this accomplishment in perspective, when the Human Genome Project formally began in 1990, there were fewer than 40,000 sequences in GenBank, before being transferred from Stanford to the newly created National Center for Biotechnology Information (NCBI).
As of the summer of 2015, according to the NCBI’s GenBank Statistics page, there are 185 million nucleotide sequences in GenBank/EMBL/DDBJ consortium, the world’s gold standard for sequence databases.* GenBank represents merely slightly more than half of GQ-Pat!
Not only does GQ-Pat have double the sequences of GenBank, these sequences help searchers in other ways as well:
- The sheer size of the database itself helps organizations save money through efficiency and the fact that important search results won’t be missed.
- Sequences in GQ-Pat are well annotated because all of them have been found in patents, making them more valuable to researchers. Patent information includes descriptions of a particular invention, including the way in which the invention is used, the inventors, the owners, biological information about the sequence, its function, and so on.
- Researchers using GQ-Pat can obtain results much sooner than those using public databases like GenBank because patents are typically filed before publications are drafted.
We’re so pleased that more and more researchers are turning to GQ-Pat to search sequences for a huge variety of life science related projects. Our clients point out that when it comes to researching or protecting their own intellectual property, the quality of the results are often only as good as the size of the database they come from. That’s why we’re dedicated to maintaining the world’s largest. Of course, we’re also adding rich annotations and making sure updates are added on a weekly basis.
If you’ve never tried it, and you’re interested in searching our 300 million sequence (and growing) database for yourself, there’s never been a better time to get in touch about a free trial.
* There are another 50 million protein sequences in Uniprot, the leading protein sequence database, although 98% of Uniprot is TrEMBL, which consists of 49 million unreviewed computer-generated protein translations of nucleotide sequences already in the nucleotide databases.