PubMed
Utility functions to parse and store PubMed searches via BioServices.EUtils
Import Module
using BioMedQuery.PubMed
This module provides utility functions to parse, store and export queries to PubMed via the NCBI EUtils and its julia interface BioServices.EUtils. For many purposes you may interact with the higher level pipelines in [BioMedQuery.Processes]. Here, some of the lower level functions are discussed in case you need to assemble different pipelines.
Basics of searching PubMed
We are often interseted in searching PubMed for all articles related to a search term, and possibly restricted by other search criteria. To do so we use BioServices.EUtils. A basic example of how we may use the functions esearch
and efetch
to accomplish such task is illustrated below.
using BioServices.EUtils using XMLDict search_term = "obstructive sleep apnea[MeSH Major Topic]" #esearch esearch_response = esearch(db="pubmed", term = search_term, retstart = 0, retmax = 20, tool ="BioJulia") #convert xml to dictionary esearch_dict = parse_xml(String(esearch_response.data)) #convert id's to a array of numbers ids = [parse(Int64, id_node) for id_node in esearch_dict["IdList"]["Id"]] #efetch efetch_response = efetch(db = "pubmed", tool = "BioJulia", retmode = "xml", rettype = "null", id = ids) #convert xml to dictionary efetch_dict = parse_xml(String(efetch_response.data))
Handling XML responses
Many APIs return responses in XML form.
To parse an XML to a Julia dictionary we can use the XMLDict package
using XMLDict dict = parse_xml(String(response.data))
You can save directly the XML String to file using LightXML
xdoc = parse_string(esearch) save_file(xdoc, "./file.xml")
Save eseach/efetch responses
Save PMIDs to MySQL
If we are only interseted in saving a list of PMIDs associated with a query, we can do so as follows
dbname = "entrez_test" config = Dict(:host=>"127.0.0.1", :dbname=>dbname, :username=>"root", :pswd=>"", :overwrite=>true) con = PubMed.save_pmid_mysql(ids, config, false) # get array of PMIDS store in database all_pmids = BioMedQuery.PubMed.all_pmids(con)
Export efetch response as EndNote citation file
We can export the information returned by efetch as and EndNote/BibTex library file
config = Dict(:type => "endnote", :output_file => "./citations_temp.endnote", :overwrite=>true) nsucceses = BioMedQuery.PubMed.save_article_citations(efetch_dict, config, verbose)
Save efetch response to MySQL database
Save the information returned by efetch to a MySQL database
dbname = "entrez_test" config = Dict(:host=>"127.0.0.1", :dbname=>dbname, :username=>"root", :pswd=>"", :overwrite=>true) @time db = BioMedQuery.PubMed.save_efetch_mysql(efetch_dict, config, verbose)
Save efetch response to SQLite database
Save the information returned by efetch to a MySQL database
verbose = false db_path = "./test_db.db" config = Dict(:db_path=> db_path, :overwrite=>true) db = BioMedQuery.PubMed.save_efetch_sqlite(efetch_dict, config, verbose)
Exploring output databases
The following schema has been used to store the results. If you are interested in having this module store additional fields, feel free to open an issue
We can als eexplore the tables using BioMedQuery.DBUtils, e,g
tables = ["author", "author2article", "mesh_descriptor", "mesh_qualifier", "mesh_heading"] for t in tables query_str = "SELECT * FROM "*t*" LIMIT 10;" q = BioMedQuery.DBUtils.db_query(db, query_str) println(q) end
Index
BioMedQuery.PubMed.abstracts
BioMedQuery.PubMed.abstracts_by_year
BioMedQuery.PubMed.all_mesh
BioMedQuery.PubMed.all_pmids
BioMedQuery.PubMed.get_article_mesh
BioMedQuery.PubMed.get_article_mesh_by_concept
BioMedQuery.PubMed.pubmed_save_efetch!
BioMedQuery.PubMed.save_efetch_mysql
BioMedQuery.PubMed.save_efetch_mysql
BioMedQuery.PubMed.save_efetch_sqlite
BioMedQuery.PubMed.save_pmid_mysql
Structs and Functions
#
BioMedQuery.PubMed.abstracts
— Method.
abstracts(db; local_medline=false)
Return all abstracts related to PMIDs in the database. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database
#
BioMedQuery.PubMed.abstracts_by_year
— Method.
abstracts_by_year(db, pub_year; local_medline=false)
Return all abstracts of article published in the given year. If local_medline flag is set to true, it is assumed that db contains article table with only PMIDs and all other info is available in a (same host) medline database
#
BioMedQuery.PubMed.all_pmids
— Method.
all_pmids(db)
Return all PMIDs stored in the article table of the input database
#
BioMedQuery.PubMed.get_article_mesh
— Method.
get_article_mesh(db, pmid)
Get the all mesh-descriptors associated with a give article
#
BioMedQuery.PubMed.get_article_mesh_by_concept
— Method.
get_article_mesh_by_concept(db, pmid, umls_concepts...; local_medline)
Get the all mesh-descriptors associated with a give article
Argumets:
- query_string: "" - assumes full set of results were saved by BioMedQuery directly from XML
#
BioMedQuery.PubMed.save_efetch_mysql
— Function.
save_efetch_mysql(efetch_dict, con::MySQL.MySQLHandle, clean_efetch_tables = false, verbose=false)
Save the results (dictionary) of an entrez fetch to a MySQL database.
Arguments:
efetch_dict
: Response dictionary from efetchcon::MySQL.MySQLHandle
: Connection to MySQL databaseclean_efetch_tables
: If true, all tables related to efetch results are droppedverbose
: Boolean to turn on extra print statements
Example
julia db_config = Dict(:host=>"localhost", :dbname=>"test", :username=>"root", :pswd=>"", :overwrite=>true) db = save_efetch_mysql(efetch_dict, db_config)
#
BioMedQuery.PubMed.save_efetch_mysql
— Method.
save_efetch_mysql(efetch_dict, db_config, verbose)
Save the results (dictionary) of an entrez fetch to a MySQL database.
Arguments:
efetch_dict
: Response dictionary from efetchdb_config::Dict{Symbol, T}
: Configuration dictionary for initialitizing SQLite
database. Must contain symbols :host
, :dbname
, :username
. pswd
, and :overwrite
verbose
: Boolean to turn on extra print statements
Example
db_config = Dict(:host=>"localhost", :dbname=>"test", :username=>"root", :pswd=>"", :overwrite=>true) db = save_efetch_mysql(efetch_dict, db_config)
#
BioMedQuery.PubMed.save_efetch_sqlite
— Method.
save_efetch_sqlite(efetch_dict, db_config, verbose)
Save the results (dictionary) of an entrez fetch to a SQLite database.
Arguments:
efetch_dict
: Response dictionary from efetchdb_config::Dict{Symbol, T}
: Configuration dictionary for initialitizing SQLite
database. Must contain symbols :db_path
and :overwrite
verbose
: Boolean to turn on extra print statements
Example
db_config = Dict(:db_path=>"test_db.slqite", :overwrite=>true) db = save_efetch_sqlite(efetch_dict, db_config)
#
BioMedQuery.PubMed.save_pmid_mysql
— Method.
save_pmid_mysql(pmids, db_config, verbose)
Save a list of PMIDS into input database. ###Arguments:
pmids
: Array of PMIDsdb_config::Dict{Symbol, T}
: Configuration dictionary for initialitizing SQLite
database. Must contain symbols :host
, :dbname
, :username
. pswd
, and :overwrite
verbose
: Boolean to turn on extra print statements
#
BioMedQuery.PubMed.all_mesh
— Method.
all_mesh(db)
Return all PMIDs stored in the article table of the input database
#
BioMedQuery.PubMed.pubmed_save_efetch!
— Function.
pubmed_save_efetch(efetch_dict, db_path)
Save the results (dictionary) of an entrez-pubmed fetch to the input database.