This module provides common processes/workflows when using the BioMedQuery utilities. For instance, searching PubMed, requires calling the NCBI e-utils in a particular order. After the search, the results are often saved to the database. This module contains pre-assembled functions performing all necessary steps. To see sample scripts that use this processes, refer to the following section
##Import
using BioMedQuery.ProcessesIndex
BioMedQuery.Processes.close_consBioMedQuery.Processes.export_citationBioMedQuery.Processes.export_citationBioMedQuery.Processes.get_file_nameBioMedQuery.Processes.get_ftp_conBioMedQuery.Processes.get_ml_fileBioMedQuery.Processes.init_medlineBioMedQuery.Processes.load_medline!BioMedQuery.Processes.map_mesh_to_umls_asyncBioMedQuery.Processes.map_mesh_to_umls_async!BioMedQuery.Processes.parse_ml_fileBioMedQuery.Processes.pubmed_search_and_parseBioMedQuery.Processes.pubmed_search_and_save!BioMedQuery.Processes.umls_semantic_occurrencesBioMedQuery.Processes.umls_semantic_occurrences
Functions
BioMedQuery.Processes.export_citation — Function.export_citation(pmid::Int64, citation_type, output_file,verbose)Export, to an output file, the citation for PubMed article identified by the given pmid
Arguments
citation_type::String: At the moment supported types include: "endnote", "bibtex"
BioMedQuery.Processes.export_citation — Function.export_citation(pmids::Vector{Int64}, citation_type, output_file,verbose)Export, to an output file, the citation for collection of PubMed articles identified by the given pmids
Arguments
citation_type::String: At the moment supported types include: "endnote", "bibtex"
BioMedQuery.Processes.load_medline! — Method.load_medline(db_con, output_dir; start_file=1, end_file=972, year=2019, test=false)Given a MySQL connection and optionally the start and end files, fetches the medline files, parses the xml, and loads into a MySQL DB (assumes tables already exist). The raw (xml.gz) and parsed (csv) files will be stored in the output_dir.
Arguments
db_con: A MySQL Connection to a db (tables must already be created - seePubMed.create_tables!)output_dir: root directory where the raw and parsed files should be storedstart_file: which medline file should the loading start atend_file: which medline file should the loading end at (default is last file in 2018 baseline)year: which year medline is (current is 2018)test: if true, a sample file will be downloaded, parsed, and loaded instead of the baseline files
map_mesh_to_umls_async!(db, c::Credentials; timeout, append_results, verbose)Build (using async UMLS-API calls) and store in the given database a map from MESH descriptors to UMLS Semantic Concepts. For large queies this function will be faster than it's synchrounous counterpart
Arguments
db: Database. Must contain TABLE:mesh_descriptor. For each of the descriptors in that table, search and insert the associated semantic concepts into a new (cleared) TABLE:mesh2umlsuser: UMLS usernamepsswd: UMLS Passwordappend_results::Bool: If false a NEW and EMPTY mesh2umls database table in cretedbatch_size: Number of
map_mesh_to_umls_async(mesh_df, user, psswd; timeout, append_results, verbose)Build (using async UMLS-API calls) and return a map from MESH descriptors to UMLS Semantic Concepts. For large queies this function will be faster than it's synchrounous counterpart
Arguments
mesh_df: DataFrame countaining MeshDescriptors. This is the dataframe with the key `meshdesc` that is returned from pubmedsearchand_parse.user: UMLS usernamepsswd: UMLS Password
BioMedQuery.Processes.pubmed_search_and_parse — Function.pubmed_search_and_parse(email, search_term::String, article_max, verbose=false)Search pubmed and parse the results into a dictionary of DataFrames. The dataframes have the same names and fields as the pubmed database schema. (e.g. df_dict["basic"] returns a dataframe with the basic article info)
Arguments
email: valid email address (otherwise pubmed may block you)search_term: search string to submit to PubMed e.g(asthma[MeSH Terms]) AND ("2001/01/29"[Date - Publication] : "2010"[Date - Publication])see http://www.ncbi.nlm.nih.gov/pubmed/advanced for help constructing the stringarticle_max: maximum number of articles to returnverbose: if true, the NCBI xml response files are saved to current directory
BioMedQuery.Processes.pubmed_search_and_save! — Function.pubmed_search_and_save!(email, search_term::String, article_max,
conn, verbose=false)Search pubmed and save the results into a database connection. The database is expected to exist and have the appriate pubmed related tables. You can create such tables using PubMed.create_tables(conn)
Arguments
email: valid email address (otherwise pubmed may block you)search_term: search string to submit to PubMed e.g(asthma[MeSH Terms]) AND ("2001/01/29"[Date - Publication] : "2010"[Date - Publication])see http://www.ncbi.nlm.nih.gov/pubmed/advanced for help constructing the stringarticle_max: maximum number of articles to returnconn: database connectionverbose: if true, the NCBI xml response files are saved to current directory
umls_semantic_occurrences(db, umls_semantic_type)Return a sparse matrix indicating the presence of MESH descriptors associated with a given umls semantic type in all articles of the input database
Output
des_ind_dict: Dictionary matching row number to descriptor namesdisease_occurances: Sparse matrix. The columns correspond to a feature vector, where each row is a MESH descriptor. There are as many columns as articles. The occurance/abscense of a descriptor is labeled as 1/0
umls_semantic_occurrences(dfs, mesh2umls_df, umls_semantic_type)Return a sparse matrix indicating the presence of MESH descriptors associated with a given umls semantic type in all articles of the input database
Output
des_ind_dict: Dictionary matching row number to descriptor namesdisease_occurances: Sparse matrix. The columns correspond to a feature vector, where each row is a MESH descriptor. There are as many columns as articles. The occurance/abscense of a descriptor is labeled as 1/0
BioMedQuery.Processes.close_cons — Method.close_cons(ftp_con)closes connection and cleans up
BioMedQuery.Processes.get_file_name — Function.get_file_name(fnum::Int, year::Int = 2018, test = false)Returns the medline file name given the file number and year.
BioMedQuery.Processes.get_ftp_con — Function.get_ftp_con(test = false)Get an FTP connection
BioMedQuery.Processes.get_ml_file — Function.get_ml_file(fname::String, conn::ConnContext, output_dir)Retrieves the file with fname and puts in medline/raw_files. Returns the HTTP response.
BioMedQuery.Processes.init_medline — Function.init_medline(output_dir, test=false)Sets up environment (folders), and connects to medline FTP Server and returns the connection.
BioMedQuery.Processes.parse_ml_file — Method.parse_ml_file(fname::String, output_dir::String)Parses the medline xml file into a dictionary of dataframes. Saves the resulting CSV files to medline/parsed_files.