Skip to content

PubMed

Utility functions to parse and store PubMed searches via BioServices.EUtils

Import Module

using BioMedQuery.PubMed

This module provides utility functions to parse, store and export queries to PubMed via the NCBI EUtils and its julia interface BioServices.EUtils. For many purposes you may interact with the higher level pipelines in [BioMedQuery.Processes]. Here, some of the lower level functions are discussed in case you need to assemble different pipelines.

Basics of searching PubMed

We are often interseted in searching PubMed for all articles related to a search term, and possibly restricted by other search criteria. To do so we use BioServices.EUtils. A basic example of how we may use the functions esearch and efetch to accomplish such task is illustrated below.

using BioServices.EUtils
using XMLDict

search_term = "obstructive sleep apnea[MeSH Major Topic]"

#esearch
esearch_response = esearch(db="pubmed", term = search_term,
retstart = 0, retmax = 20, tool ="BioJulia")

#convert xml to dictionary
esearch_dict = parse_xml(String(esearch_response.data))

#convert id's to a array of numbers
ids = [parse(Int64, id_node) for id_node in esearch_dict["IdList"]["Id"]]

#efetch
efetch_response = efetch(db = "pubmed", tool = "BioJulia", retmode = "xml", rettype = "null", id = ids)

#convert xml to dictionary
efetch_dict = parse_xml(String(efetch_response.data))

Handling XML responses

Many APIs return responses in XML form.

To parse an XML to a Julia dictionary we can use the XMLDict package

using XMLDict
dict = parse_xml(String(response.data))  

You can save directly the XML String to file using LightXML

xdoc = parse_string(esearch)
save_file(xdoc, "./file.xml")

Save eseach/efetch responses

Save PMIDs to MySQL

If we are only interseted in saving a list of PMIDs associated with a query, we can do so as follows

dbname = "entrez_test"
host = "127.0.0.1";
user = "root"
pwd = ""

#Collect PMIDs from esearch result
ids = Array{Int64,1}()
for id_node in esearch_dict["IdList"]["Id"]
    push!(ids, parse(Int64, id_node))
end

# Initialize or connect to database
const conn = DBUtils.init_mysql_database(host, user, pwd, dbname)

# Create `article` table to store pmids
PubMed.create_pmid_table!(conn)

#Save pmids
PubMed.save_pmids!(conn, ids)

#query the article table to explore list of pmids
all_pmids = BioMedQuery.PubMed.all_pmids(conn)

Export efetch response as EndNote citation file

We can export the information returned by efetch as and EndNote/BibTex library file

citation = PubMed.CitationOutput("endnote", "./citations_temp.endnote", true)
nsucceses = PubMed.save_efetch!(citation, efetch_dict, verbose)

Save efetch response to MySQL database

Save the information returned by efetch to a MySQL database

dbname = "efetch_test"
host = "127.0.0.1";
user = "root"
pwd = ""

# Save results of efetch to database
const conn = DBUtils.init_mysql_database(host, user, pwd, dbname)
PubMed.create_tables!(conn)
PubMed.save_efetch!(conn, efetch_dict)

Save efetch response to SQLite database

Save the information returned by efetch to a MySQL database

db_path = "./test_db.db"

const conn = SQLite.DB(db_path)
PubMed.create_tables!(conn)
PubMed.save_efetch!(conn, efetch_dict)

Exploring output databases

The following schema has been used to store the results. If you are interested in having this module store additional fields, feel free to open an issue

alt

We can als eexplore the tables using BioMedQuery.DBUtils, e,g

tables = ["author", "author2article", "mesh_descriptor",
"mesh_qualifier", "mesh_heading"]

for t in tables
    query_str = "SELECT * FROM "*t*" LIMIT 10;"
    q = DBUtils.db_query(db, query_str)
    println(q)
end

Index

Structs and Functions

# BioMedQuery.PubMed.abstractsMethod.

abstracts(db; local_medline=false)

Return all abstracts related to PMIDs in the database. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database

source

# BioMedQuery.PubMed.abstracts_by_yearMethod.

abstracts_by_year(db, pub_year; local_medline=false)

Return all abstracts of article published in the given year. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database

source

# BioMedQuery.PubMed.all_pmidsMethod.

all_pmids(db)

Return all PMIDs stored in the article table of the input database

source

# BioMedQuery.PubMed.citations_bibtexFunction.

citations_bibtex(article::PubMedArticle, verbose=false)

Transforms a PubMedArticle into text corresponding to its bibtex citation

source

# BioMedQuery.PubMed.citations_endnoteFunction.

citations_endnote(article::PubMedArticle, verbose=false)

Transforms a PubMedArticle into text corresponding to its endnote citation

source

# BioMedQuery.PubMed.create_pmid_table!Method.

init_pmid_db!(conn; tablename="article")

Creates a database, using either MySQL of SQLite, with all necessary tables to store Entrez related searches. All tables are empty at this point

source

# BioMedQuery.PubMed.create_tables!Method.

create_tables!(conn)

Create and initialize tables to save results from an Entrez/PubMed search. Caution, all related tables are dropped if they exist

source

# BioMedQuery.PubMed.get_article_meshMethod.

get_article_mesh(db, pmid)

Get the all mesh-descriptors associated with a give article

source

# BioMedQuery.PubMed.get_article_mesh_by_conceptMethod.

get_article_mesh_by_concept(db, pmid, umls_concepts...; local_medline)

Get the all mesh-descriptors associated with a give article

Argumets:

  • query_string: "" - assumes full set of results were saved by BioMedQuery directly from XML

source

# BioMedQuery.PubMed.save_efetch!Function.

pubmed_save_efetch(efetch_dict, conn)

Save the results (dictionary) of an entrez-pubmed fetch to the input database.

source

# BioMedQuery.PubMed.save_efetch!Function.

save_efetch!(output::CitationOutput, efetch_dict, verbose=false)

Save the results of a Entrez efetch to a bibliography file, with format and file path given by output::CitationOutput

source

# BioMedQuery.PubMed.save_pmids!Function.

save_pmids!(conn, pmids::Vector{Int64}, verbose::Bool=false)

Save a list of PMIDS into input database. ###Arguments:

  • conn: Database connection (MySQL or SQLite)
  • pmids: Array of PMIDs
  • verbose: Boolean to turn on extra print statements

source

# BioMedQuery.PubMed.all_meshMethod.

all_mesh(db)

Return all PMIDs stored in the article table of the input database

source