PubMed

Utility functions to parse and store PubMed searches via BioServices.EUtils

Import Module

using BioMedQuery.PubMed

This module provides utility functions to parse, store and export queries to PubMed via the NCBI EUtils and its julia interface BioServices.EUtils. For many purposes you may interact with the higher level pipelines in [BioMedQuery.Processes]. Here, some of the lower level functions are discussed in case you need to assemble different pipelines.

Basics of searching PubMed

We are often interseted in searching PubMed for all articles related to a search term, and possibly restricted by other search criteria. To do so we use BioServices.EUtils. A basic example of how we may use the functions esearch and efetch to accomplish such task is illustrated below.

using BioServices.EUtils
using XMLDict

search_term = "obstructive sleep apnea[MeSH Major Topic]"

#esearch
esearch_response = esearch(db="pubmed", term = search_term,
retstart = 0, retmax = 20, tool ="BioJulia")

#convert xml to dictionary
esearch_dict = parse_xml(String(esearch_response.data))

#convert id's to a array of numbers
ids = [parse(Int64, id_node) for id_node in esearch_dict["IdList"]["Id"]]

#efetch
efetch_response = efetch(db = "pubmed", tool = "BioJulia", retmode = "xml", rettype = "null", id = ids)

#convert xml to dictionary
efetch_dict = parse_xml(String(efetch_response.data))

Handling XML responses

Many APIs return responses in XML form.

To parse an XML to a Julia dictionary we can use the XMLDict package

using XMLDict
dict = parse_xml(String(response.data))  

You can save directly the XML String to file using LightXML

xdoc = parse_string(esearch)
save_file(xdoc, "./file.xml")

Save eseach/efetch responses

Save PMIDs to MySQL

If we are only interseted in saving a list of PMIDs associated with a query, we can do so as follows

dbname = "entrez_test"
config = Dict(:host=>"127.0.0.1", :dbname=>dbname, :username=>"root",
:pswd=>"", :overwrite=>true)
con = PubMed.save_pmid_mysql(ids, config, false)

# get array of PMIDS store in database
all_pmids = BioMedQuery.PubMed.all_pmids(con)

Export efetch response as EndNote citation file

We can export the information returned by efetch as and EndNote/BibTex library file

config = Dict(:type => "endnote", :output_file => "./citations_temp.endnote", :overwrite=>true)
nsucceses = BioMedQuery.PubMed.save_article_citations(efetch_dict, config, verbose)

Save efetch response to MySQL database

Save the information returned by efetch to a MySQL database

dbname = "entrez_test"
config = Dict(:host=>"127.0.0.1", :dbname=>dbname, :username=>"root",
:pswd=>"", :overwrite=>true)
@time db = BioMedQuery.PubMed.save_efetch_mysql(efetch_dict, config, verbose)

Save efetch response to SQLite database

Save the information returned by efetch to a MySQL database

verbose = false
db_path = "./test_db.db"

config = Dict(:db_path=> db_path, :overwrite=>true)
db = BioMedQuery.PubMed.save_efetch_sqlite(efetch_dict, config, verbose)

Exploring output databases

The following schema has been used to store the results. If you are interested in having this module store additional fields, feel free to open an issue

alt

We can als eexplore the tables using BioMedQuery.DBUtils, e,g

tables = ["author", "author2article", "mesh_descriptor",
"mesh_qualifier", "mesh_heading"]

for t in tables
    query_str = "SELECT * FROM "*t*" LIMIT 10;"
    q = BioMedQuery.DBUtils.db_query(db, query_str)
    println(q)
end

Index

Structs and Functions

# BioMedQuery.PubMed.abstractsMethod.

abstracts(db; local_medline=false)

Return all abstracts related to PMIDs in the database. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database

source

# BioMedQuery.PubMed.abstracts_by_yearMethod.

abstracts_by_year(db, pub_year; local_medline=false)

Return all abstracts of article published in the given year. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database

source

# BioMedQuery.PubMed.all_pmidsMethod.

all_pmids(db)

Return all PMIDs stored in the article table of the input database

source

# BioMedQuery.PubMed.get_article_meshMethod.

get_article_mesh(db, pmid)

Get the all mesh-descriptors associated with a give article

source

# BioMedQuery.PubMed.get_article_mesh_by_conceptMethod.

get_article_mesh_by_concept(db, pmid, umls_concepts...; local_medline)

Get the all mesh-descriptors associated with a give article

Argumets:

  • query_string: "" - assumes full set of results were saved by BioMedQuery directly from XML

source

# BioMedQuery.PubMed.save_efetch_mysqlFunction.

save_efetch_mysql(efetch_dict, con::MySQL.MySQLHandle, clean_efetch_tables = false, verbose=false)

Save the results (dictionary) of an entrez fetch to a MySQL database.

Arguments:

  • efetch_dict: Response dictionary from efetch
  • con::MySQL.MySQLHandle: Connection to MySQL database
  • clean_efetch_tables: If true, all tables related to efetch results are dropped
  • verbose: Boolean to turn on extra print statements

Example

julia db_config = Dict(:host=>"localhost", :dbname=>"test", :username=>"root", :pswd=>"", :overwrite=>true) db = save_efetch_mysql(efetch_dict, db_config)

source

# BioMedQuery.PubMed.save_efetch_mysqlMethod.

save_efetch_mysql(efetch_dict, db_config, verbose)

Save the results (dictionary) of an entrez fetch to a MySQL database.

Arguments:

  • efetch_dict: Response dictionary from efetch
  • db_config::Dict{Symbol, T}: Configuration dictionary for initialitizing SQLite

database. Must contain symbols :host, :dbname, :username. pswd, and :overwrite

  • verbose: Boolean to turn on extra print statements

Example

db_config =  Dict(:host=>"localhost", :dbname=>"test", :username=>"root",
:pswd=>"", :overwrite=>true)
db = save_efetch_mysql(efetch_dict, db_config)

source

# BioMedQuery.PubMed.save_efetch_sqliteMethod.

save_efetch_sqlite(efetch_dict, db_config, verbose)

Save the results (dictionary) of an entrez fetch to a SQLite database.

Arguments:

  • efetch_dict: Response dictionary from efetch
  • db_config::Dict{Symbol, T}: Configuration dictionary for initialitizing SQLite

database. Must contain symbols :db_path and :overwrite

  • verbose: Boolean to turn on extra print statements

Example

db_config =  Dict(:db_path=>"test_db.slqite", :overwrite=>true)
db = save_efetch_sqlite(efetch_dict, db_config)

source

# BioMedQuery.PubMed.save_pmid_mysqlMethod.

save_pmid_mysql(pmids, db_config, verbose)

Save a list of PMIDS into input database. ###Arguments:

  • pmids: Array of PMIDs
  • db_config::Dict{Symbol, T}: Configuration dictionary for initialitizing SQLite

database. Must contain symbols :host, :dbname, :username. pswd, and :overwrite

  • verbose: Boolean to turn on extra print statements

source

# BioMedQuery.PubMed.all_meshMethod.

all_mesh(db)

Return all PMIDs stored in the article table of the input database

source

# BioMedQuery.PubMed.pubmed_save_efetch!Function.

pubmed_save_efetch(efetch_dict, db_path)

Save the results (dictionary) of an entrez-pubmed fetch to the input database.

source