Data Mining PubMed
The National Institutes of Health provides a full programming interface to search PubMed called E-Utilities. Interacting with the PubMed database is conveniently through simple HTTP requests and returns the article metadata as XML. Every article in PubMed has a title, author, abstract, journal, year, volume, issue, pages, and keywords, amoung other metadata. Getting the metadata from PubMed, however, involves two separate queries. Very simply, the first query returns a list of PubMed IDs for articles matching the search criteria and the second query returns article data for a given PMID.
The workflow is divided into two parts:
Query E-Search passing it your search term and it returns a list of PMIDs that are used to query E-Fetch for the article metadata.
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=electrical+stimulation
&retmax=10&tool=pmquery&db=pubmed
E-Search returns a list of PMIDs:
eSearchResult>
<Count>157380</Count>
<RetMax>10</RetMax>
<RetStart>0</RetStart>
<IdList>
<Id>23858010</Id>
<Id>23856563</Id>
<Id>23856146</Id>
<Id>23855510</Id>
<Id>23839460</Id>
<Id>23839375</Id>
<Id>23853340</Id>
<Id>23853339</Id>
<Id>23853324</Id>
<Id>23853296</Id>
<IdList>
</ ...
Next, query E-Fetch for the article data. You can request multiple PMIDs at once and even the return type (XML, text, JSON). The API also supports pagination to iteratively get many thousands of results.
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi
?db=pubmed&id=23856563,23858010&retmode=xml
I’ve written several interfaces to access the PubMed
API, including in PHP, Python, and C#. For instance, the
Python script was written specifically to data-mine
PubMed. Given a search term pmquery.py
will query PubMed and save each article to a text file.
For some search terms, like “transcranial magnetic
stimulation” this results in over 9000 articles returned
by Pubmed. So the process is iterative and can take some
time (minutes). The PHP
implementation provides a web-based search
interface. For desktop based applications, see the C#
code and the Scholared
app for a working example.