PubMed · BioMedQuery.jl

Utility functions to parse and store PubMed searches via BioServices.EUtils

Import Module

using BioMedQuery.PubMed

This module provides utility functions to parse, store and export queries to PubMed via the NCBI EUtils and its julia interface BioServices.EUtils. For many purposes you may interact with the higher level pipelines in [BioMedQuery.Processes]. Here, some of the lower level functions are discussed in case you need to assemble different pipelines.

Basics of searching PubMed

We are often interested in searching PubMed for all articles related to a search term, and possibly restricted by other search criteria. To do so we use BioServices.EUtils. A basic example of how we may use the functions esearch and efetch to accomplish such task is illustrated below.

using BioServices.EUtils
using XMLDict
using LightXML

search_term = "obstructive sleep apnea[MeSH Major Topic]"

#esearch
esearch_response = esearch(db="pubmed", term = search_term,
retstart = 0, retmax = 20, tool ="BioJulia")

#convert xml to dictionary
esearch_dict = parse_xml(String(esearch_response.body))

#convert id's to a array of numbers
ids = [parse(Int64, id_node) for id_node in esearch_dict["IdList"]["Id"]]

#efetch
efetch_response = efetch(db = "pubmed", tool = "BioJulia", retmode = "xml", rettype = "null", id = ids)

#convert xml to xml node tree
efetch_doc = root(parse_string(String(efetch_response.body)))

Handling XML responses

Many APIs return responses in XML form.

To parse an XML to a Julia dictionary we can use the XMLDict package

using XMLDict
dict = parse_xml(String(response.body))

You can save directly the XML String to file

xdoc = parse_string(esearch)
save_file(xdoc, "./file.xml")

Save eseach/efetch responses

Save PMIDs to MySQL

If we are only interseted in saving a list of PMIDs associated with a query, we can do so as follows

dbname = "entrez_test"
host = "127.0.0.1";
user = "root"
pwd = ""

#Collect PMIDs from esearch result
ids = Array{Int64,1}()
for id_node in esearch_dict["IdList"]["Id"]
    push!(ids, parse(Int64, id_node))
end

# Initialize or connect to database
const conn = DBUtils.init_mysql_database(host, user, pwd, dbname)

# Create `article` table to store pmids
PubMed.create_pmid_table!(conn)

#Save pmids
PubMed.save_pmids!(conn, ids)

#query the article table to explore list of pmids
all_pmids = BioMedQuery.PubMed.all_pmids(conn)

Export efetch response as EndNote citation file

We can export the information returned by efetch as and EndNote/BibTex library file

citation = PubMed.CitationOutput("endnote", "./citations_temp.endnote", true)
nsucceses = PubMed.save_efetch!(citation, efetch_doc, verbose)

Save efetch response to MySQL database

Save the information returned by efetch to a MySQL database

dbname = "efetch_test"
host = "127.0.0.1";
user = "root"
pwd = ""

# Save results of efetch to database and cleanup intermediate CSV files
const conn = DBUtils.init_mysql_database(host, user, pwd, dbname)
PubMed.create_tables!(conn)
PubMed.save_efetch!(conn, efetch_doc, false, true) # verbose = false, drop_csv = true

Save efetch response to SQLite database

Save the information returned by efetch to a MySQL database

db_path = "./test_db.db"

const conn = SQLite.DB(db_path)
PubMed.create_tables!(conn)
PubMed.save_efetch!(conn, efetch_doc)

Return efetch response as a dictionary of DataFrames

The information returned by efetch can also be returned as dataframes. The dataframes match the format of the tables that are created for the sql saving functions (schema image below). These dataframes can also easily be saved to csv files.

    dfs = PubMed.parse(efetch_doc)

    PubMed.dfs_to_csv(dfs, "my/path", "my_file_prefix_")

Exploring output databases

The following schema has been used to store the results. If you are interested in having this module store additional fields, feel free to open an issue

alt

We can also explore the tables using BioMedQuery.DBUtils, e,g

tables = ["author_ref", "mesh_desc",
"mesh_qual", "mesh_heading"]

for t in tables
    query_str = "SELECT * FROM "*t*" LIMIT 10;"
    q = DBUtils.db_query(db, query_str)
    println(q)
end

Index

BioMedQuery.PubMed.abstracts
BioMedQuery.PubMed.abstracts_by_year
BioMedQuery.PubMed.add_mysql_keys!
BioMedQuery.PubMed.all_mesh
BioMedQuery.PubMed.all_pmids
BioMedQuery.PubMed.citations_bibtex
BioMedQuery.PubMed.citations_endnote
BioMedQuery.PubMed.create_pmid_table!
BioMedQuery.PubMed.create_tables!
BioMedQuery.PubMed.db_insert!
BioMedQuery.PubMed.db_insert!
BioMedQuery.PubMed.db_insert!
BioMedQuery.PubMed.dfs_to_csv
BioMedQuery.PubMed.dict_to_array
BioMedQuery.PubMed.drop_mysql_keys!
BioMedQuery.PubMed.get_article_mesh
BioMedQuery.PubMed.get_article_mesh_by_concept
BioMedQuery.PubMed.parse_MedlineDate
BioMedQuery.PubMed.parse_articles
BioMedQuery.PubMed.parse_author
BioMedQuery.PubMed.parse_month
BioMedQuery.PubMed.parse_orcid
BioMedQuery.PubMed.parse_year
BioMedQuery.PubMed.remove_csvs
BioMedQuery.PubMed.remove_csvs
BioMedQuery.PubMed.save_efetch!
BioMedQuery.PubMed.save_efetch!
BioMedQuery.PubMed.save_pmids!
BioMedQuery.PubMed.strip_newline

Structs and Functions

BioMedQuery.PubMed.abstracts — Method.

abstracts(db; local_medline=false)

Return all abstracts related to PMIDs in the database. If local_medline flag is set to true, it is assumed that db contains basic table with only PMIDs and all other info is available in a (same host) medline database