This module provides common processes/workflows when using the BioMedQuery utilities. For instance, searching PubMed, requires calling the NCBI e-utils in a particular order. After the search, the results are often saved to the database. This module contains pre-assembled functions performing all necessary steps. To see sample scripts that use this processes, refer to the following section
##Import
using BioMedQuery.Processes
Index
BioMedQuery.Processes.close_cons
BioMedQuery.Processes.export_citation
BioMedQuery.Processes.export_citation
BioMedQuery.Processes.get_file_name
BioMedQuery.Processes.get_ftp_con
BioMedQuery.Processes.get_ml_file
BioMedQuery.Processes.init_medline
BioMedQuery.Processes.load_medline!
BioMedQuery.Processes.map_mesh_to_umls_async
BioMedQuery.Processes.map_mesh_to_umls_async!
BioMedQuery.Processes.parse_ml_file
BioMedQuery.Processes.pubmed_search_and_parse
BioMedQuery.Processes.pubmed_search_and_save!
BioMedQuery.Processes.umls_semantic_occurrences
BioMedQuery.Processes.umls_semantic_occurrences
Functions
BioMedQuery.Processes.export_citation
— Function.export_citation(pmid::Int64, citation_type, output_file,verbose)
Export, to an output file, the citation for PubMed article identified by the given pmid
Arguments
citation_type::String
: At the moment supported types include: "endnote", "bibtex"
BioMedQuery.Processes.export_citation
— Function.export_citation(pmids::Vector{Int64}, citation_type, output_file,verbose)
Export, to an output file, the citation for collection of PubMed articles identified by the given pmids
Arguments
citation_type::String
: At the moment supported types include: "endnote", "bibtex"
BioMedQuery.Processes.load_medline!
— Method.load_medline(db_con, output_dir; start_file=1, end_file=972, year=2019, test=false)
Given a MySQL connection and optionally the start and end files, fetches the medline files, parses the xml, and loads into a MySQL DB (assumes tables already exist). The raw (xml.gz) and parsed (csv) files will be stored in the output_dir.
Arguments
db_con
: A MySQL Connection to a db (tables must already be created - seePubMed.create_tables!
)output_dir
: root directory where the raw and parsed files should be storedstart_file
: which medline file should the loading start atend_file
: which medline file should the loading end at (default is last file in 2018 baseline)year
: which year medline is (current is 2018)test
: if true, a sample file will be downloaded, parsed, and loaded instead of the baseline files
map_mesh_to_umls_async!(db, c::Credentials; timeout, append_results, verbose)
Build (using async UMLS-API calls) and store in the given database a map from MESH descriptors to UMLS Semantic Concepts. For large queies this function will be faster than it's synchrounous counterpart
Arguments
db
: Database. Must contain TABLE:mesh_descriptor. For each of the descriptors in that table, search and insert the associated semantic concepts into a new (cleared) TABLE:mesh2umlsuser
: UMLS usernamepsswd
: UMLS Passwordappend_results::Bool
: If false a NEW and EMPTY mesh2umls database table in cretedbatch_size
: Number of
map_mesh_to_umls_async(mesh_df, user, psswd; timeout, append_results, verbose)
Build (using async UMLS-API calls) and return a map from MESH descriptors to UMLS Semantic Concepts. For large queies this function will be faster than it's synchrounous counterpart
Arguments
mesh_df
: DataFrame countaining MeshDescriptors. This is the dataframe with the key `meshdesc` that is returned from pubmedsearchand_parse.user
: UMLS usernamepsswd
: UMLS Password
BioMedQuery.Processes.pubmed_search_and_parse
— Function.pubmed_search_and_parse(email, search_term::String, article_max, verbose=false)
Search pubmed and parse the results into a dictionary of DataFrames. The dataframes have the same names and fields as the pubmed database schema. (e.g. df_dict["basic"] returns a dataframe with the basic article info)
Arguments
email
: valid email address (otherwise pubmed may block you)search_term
: search string to submit to PubMed e.g(asthma[MeSH Terms]) AND ("2001/01/29"[Date - Publication] : "2010"[Date - Publication])
see http://www.ncbi.nlm.nih.gov/pubmed/advanced for help constructing the stringarticle_max
: maximum number of articles to returnverbose
: if true, the NCBI xml response files are saved to current directory
BioMedQuery.Processes.pubmed_search_and_save!
— Function.pubmed_search_and_save!(email, search_term::String, article_max,
conn, verbose=false)
Search pubmed and save the results into a database connection. The database is expected to exist and have the appriate pubmed related tables. You can create such tables using PubMed.create_tables(conn)
Arguments
email
: valid email address (otherwise pubmed may block you)search_term
: search string to submit to PubMed e.g(asthma[MeSH Terms]) AND ("2001/01/29"[Date - Publication] : "2010"[Date - Publication])
see http://www.ncbi.nlm.nih.gov/pubmed/advanced for help constructing the stringarticle_max
: maximum number of articles to returnconn
: database connectionverbose
: if true, the NCBI xml response files are saved to current directory
umls_semantic_occurrences(db, umls_semantic_type)
Return a sparse matrix indicating the presence of MESH descriptors associated with a given umls semantic type in all articles of the input database
Output
des_ind_dict
: Dictionary matching row number to descriptor namesdisease_occurances
: Sparse matrix. The columns correspond to a feature vector, where each row is a MESH descriptor. There are as many columns as articles. The occurance/abscense of a descriptor is labeled as 1/0
umls_semantic_occurrences(dfs, mesh2umls_df, umls_semantic_type)
Return a sparse matrix indicating the presence of MESH descriptors associated with a given umls semantic type in all articles of the input database
Output
des_ind_dict
: Dictionary matching row number to descriptor namesdisease_occurances
: Sparse matrix. The columns correspond to a feature vector, where each row is a MESH descriptor. There are as many columns as articles. The occurance/abscense of a descriptor is labeled as 1/0
BioMedQuery.Processes.close_cons
— Method.close_cons(ftp_con)
closes connection and cleans up
BioMedQuery.Processes.get_file_name
— Function.get_file_name(fnum::Int, year::Int = 2018, test = false)
Returns the medline file name given the file number and year.
BioMedQuery.Processes.get_ftp_con
— Function.get_ftp_con(test = false)
Get an FTP connection
BioMedQuery.Processes.get_ml_file
— Function.get_ml_file(fname::String, conn::ConnContext, output_dir)
Retrieves the file with fname and puts in medline/raw_files. Returns the HTTP response.
BioMedQuery.Processes.init_medline
— Function.init_medline(output_dir, test=false)
Sets up environment (folders), and connects to medline FTP Server and returns the connection.
BioMedQuery.Processes.parse_ml_file
— Method.parse_ml_file(fname::String, output_dir::String)
Parses the medline xml file into a dictionary of dataframes. Saves the resulting CSV files to medline/parsed_files.