Finding More Needles in the Life Science Patent Haystack

When a search for something (digital or tangible) is started, often, one of the first tactics employed is to narrow down the scope to increase the chances of success, or at least speed up the process. Why search upstairs if you know your car keys are probably somewhere in the kitchen?

It’s the same thing with data.  There’s a reason why lawyers are known to overwhelm their counterparts with a deluge of discovery documents – because it makes it harder to find what’s important.

As someone that’s likely searched for specific patent data, you’re already intimately familiar with the challenges associated with the massive scale of information involved. Based upon data provided by the major IP content aggregators I estimate the total number of patent applications (that is, anything ever filed anywhere in the world) to be between 80 and 100 million.

As our company name suggests, we’re all about Life Sciences, so we asked ourselves how much more we could eliminate in an effort to make life science patent searches more efficient? According to WIPO [1] a little less than 90% of the 17.8 million applications they handled from 2000 to 2013 are unrelated to  life sciences. The remaining 10-11% are what Life Scientists really care about (analysis of biological materials, biotechnology, medical technology, or pharmaceuticals). By narrowing the focus, we can avoid things like electrical machinery, audio-visual technology, communication, IT management, computers, semiconductors, measurement, and optics.

So we set out to create a full text patent search application just for life scientists. Of course, this is more challenging than it sounds for a variety of reasons.

Like us, your first thought might have been “just use classification codes.” However, it’s important to note that life sciences was still a relatively small field when the International Patent Classification (IPC) was created in the early 1970s. As the field grew, newly invented topics were added to branches that seem to make the most sense at the time.

Unfortunately, this kind of organic growth of a taxonomy has scattered life sciences applications throughout the IPC, unlike for example chemistry where all topics are nicely organized in a couple of branches.

It became clear that the only way to find all the life sciences related classification codes would be to go through the CPC, IPCR and ECLA classifications one entry at a time and make a list of codes that could be life sciences related.

That was the easy part. Our team then retrieved all documents containing at least one of these codes, and added every member of their patent family, regardless of their classification codes, to our database.

To further complicate the proverbial haystack, search terms like gene names and common abbreviations often have meanings outside of life sciences, which means searching everything is likely to return false positive hits.

Take the case of a common abbreviation “NIR” which refers to nitrate reductases in biology. Unfortunately, searching for that term will return patent application results in optics, where the same abbreviation stands for Near Infrared. This also makes search alerts frustratingly difficult.

Fortunately, with industry-specific search tools like LifeQuest, you don’t have to weed out these irrelevant documents, (which could cause you to miss good search results as well).

Our team also had the opportunity to leverage an extensive list of customers that represents almost the entire life sciences industry. Using their recent patent applications (filed over the last couple of years), we reviewed the list of classification codes for correctness and completeness. That turned up a couple of classes that you wouldn’t suspect are about biology just from looking at their descriptions like applications on tractors that are good at harvesting specific crops, or computer algorithms for analyzing biological data. As a result, we ended up with a little over 15 million documents, all relevant to the life sciences.

As an independent test for completeness, we compared the resulting database to GQ-Pat, another database of almost 700,000 documents that we know for certain are all life sciences related because they contain biological sequences. The result? Every document in GQ-Pat is included in LifeQuest.

This final number (15 million) comprises about 15-20% of the total number of applications filed, which seems about right when compared to the 10% life sciences documents in recent WIPO applications I mentioned earlier.

So if your organization is involved with life sciences, why continue searching areas for patents that aren’t related to your work? Want to see how much faster you can generate a complete list of results? Get your LifeQuest Trial and let us know what you think.

[1] accessed on November 23, 2015

Try LifeQuest Today!

Search Using Natural Language