Function reference • wateReview

All functions
`EDA_trainingData()`	Performs a simple visualization of multilabel training data using mldr package
`PerfVisMultilabel()`	Make a performance plot for multilabel classification
`QA.AuthKeywords()`	Quality analysis on the author keywords retrieved
`QA_EndNoteIdCorpusLDA()`	Performs a quick QA on both sources for document ID
`QA_alignedData()`	Perform QA/QC on aligned data
`QA_oldXnewPredictions()`	this function performs a quick quality analysis and difference with historical predictions
`VizSpots()`	Produces a chord diagram visualization with country cluster colors and topics categories
`add_abstractsToCorpus()`	Add retrieved abstract to corpus
`align_dataWithEndNoteIdLDA()`	Align databases based on shared ID
`align_dataWithEndNoteIdcorpus()`	Align databases based on shared ID
`align_englishCorpus()`	Align englishCorpus with matching records in both databases and assign webscrapped abstracts
`align_humanReadingTopicModel()`	identifies the subset of paper with validation data and align databases
`article_selection()`	Randomly select articles for human-reading
`assign_articles_to_players()`	Create a `.csv` files with the information to download the documents
`assign_articles_to_readers()`	Chunk the selected documents between a given number of human readers and copy them into a given directory
`check_duplicate_row()`	Return the unique rows of a data.frame.
`check_duplicate_title()`	Return a data.frame with unique Titles.
`clStab()`	Evaluates the cluster stability
`consolidate_LDA_results()`	Consolidates lda results by adding year and country prediction to the topicDocs
`count_nas()`	Count the number of `NA`
`diversity_LAC()`	Calculates a diversity over the entire LAC region
`diversity_country()`	Calculates the diversity
`filter_by_country()`	Filter rows of a data.frame by country
`filter_columns()`	Filter columns of a data.frame
`filter_dfm()`	Filter the complete document-feature matrix to retain all features with occurence higher than the lowest occurence of country tokens. This function mainly serves to limit the size of the document-feature matrix
`fix_names()`	Fix the format of the document names from an existing database
`format_data4get_ps()`	Format the data to the format expected by get_ps
`generate_label_df()`	General a data.frame of labels for a violin plot
`get_DocTermMatrix()`	read document-term matrix created by the text mining code
`get_EndNoteIdLDA()`	get document ID from LDA corpus database
`get_EndNoteIdcorpus()`	get document ID from EndNote query corpus database
`get_JSd_corpus()`	Calculates the Jensen-Shannon distance for countries
`get_JSd_country()`	Calculates the Jensen-Shannon distance for countries
`get_MLDR()`	internal function to get MLDR
`get_allAuthKeywords()`	Extracts all author keywords from the metadata results
`get_allMetadata()`	Extract metadata from Scopus or Web of Science identifiers
`get_binaryRelevanceLearners()`	Convience legacy function to create binary relance wrappers from MLR
`get_boolean_AuthKeywords()`	Transform the author keywords into a multilabel dataset
`get_chainingOrder()`	Get chaining order from MLDR
`get_citation_dataframe()`	Read .csv files to create a citation data.frame
`get_countries()`	Extract the list of possible `country`
`get_country_distance()`	Get the Jensen-Shannon distance across countries
`get_csv_files()`	List .csv files in a directory
`get_dfm()`	Retrieve or create a document-feature matrix (dfm) from hard coded options relevant to current project
`get_dtm()`	Get document-term matrix from a document-feature matrix and a list of tokens
`get_endnote_titles()`	Retrieve the titles of the documents in the English corpus
`get_endnote_xml()`	Parse the .xml database from EndNote
`get_ind_hasCountryTag()`	Identifies if a document has a tag
`get_language()`	Interactive prompt to select language
`get_language_dfs()`	Read the citation data frame and store them into a named list
`get_mail()`	Extract email addresses from `pdf` documents You probably want to execute this code on a linux server to avoid the issues with special character handling on Windows
`get_meta_df()`	Binds the separate languages data.frame into a meta data.frame
`get_n_players()`	Prompt to get the number of players
`get_network()`	Creates the adjencency matrix of a bi-partite network: country to topics
`get_non_duplicate_pdfs()`	Get the indices of duplicated documents
`get_optimk()`	Display the optimum number of cluster for a given clustering method
`get_pdf_files()`	Get the list of files in a given directory
`get_ps()`	Get the parameter set to tune to for a given learner
`get_relevantCountries()`	Extract countries names to be searched for in the author keywords
`get_rootdir()`	Convenience functions to allow for inter-operability between different systems
`get_samples()`	Select the documents for downloading while correcting for bias in terms of year and sources
`get_scopusAbstract()`	Parse htlm page from Scopus to retrieve the abstract of a document
`get_short_long_term_pred()`	Legacy function to assess performance of learner that learned the fine temporal scale against the aggregated temporal scale
`get_titleDocs()`	Read topic model file titles
`get_titleInd()`	This function performs the cross-walk between human reading databases and topic model databases using document titles
`get_topColors()`	Get the dominant colors from country flags The country flag are extracted from http://hjnilsson.github.io/country-flags/
`get_topicDocs()`	Read topic model data
`get_topic_distance()`	Get the Jensen-Shannon distance across topics
`get_tuningPar()`	Retrieve the most frequent best tuned hyper-parameters Ties are broken by taking the mininum
`get_validationHumanReading()`	Read human reading database
`get_webscrapped_trainingLabels()`	read webscrapped training labels
`get_webscrapped_validationDTM()`	read training data (document-term matrix corresponding to webscrapped labels)
`get_wosAbstract()`	Use Elsevier API to retrieve the abstract of a document
`get_wosAuthKeywords()`	Use Elsevier API to retrieve the author keywords of a document
`get_wosFullResult()`	Use DOI to extract metadata
`get_wrappedLearnersList()`	Create a list of wrapped learners
`join_database_shapefile()`	Assign the data from the country database to the country shapefile
`make_AUCPlot()`	Create a comparison plot of AUC
`make_corrmatrix()`	Make a correlation matrix
`make_countryNetwork()`	Creates a weighted adjacency matrix of probability of citation between countries
`make_country_tokens()`	Tokenize labels, here a list of relevant countries
`make_dendrogram()`	Make a dendrogram
`make_df_docs()`	Creates a subset of the LDA topics dataframe into theme dataframe
`make_hist()`	Makes a density plot of a selected attribute
`make_humanReadingTrainingLabels()`	Create training labels from aligned human reading database
`make_map()`	Makes a map for a give attribute
`make_predictions()`	Make randomForest prediction
`make_pretty_str()`	This function takes care of some formatting issue that appeared in the process of aligning database coming from the query and from EndNote. The function removes alpha-numeric characters, some special characters, trim whitespaces and concatenate them.
`make_targetData()`	Create the target data to predict from
`make_task()`	Make an MLR task
`make_topicNetwork()`	Creates a weighted adjacency matrix of probability of citation between topics
`make_trainingData()`	Create training data for a multilabel classification
`make_trainingDataMulticlass()`	Create training data for a multiclass classification
`make_webscrapped_trainingData()`	Create a data.frame of multilabels using the webscrapped author keywords
`melt_df_country()`	Manually melt a data.frame
`multiclassBenchmark()`	Perform a benchmark between non-tuned algorithm adaptation and multilabel and binary relevance wrappers
`multilabelBenchmark()`	Perform a benchmark between algorithm adaptation and multilabel and binary relevance wrappers
`normalize_adj_matrix()`	Normalize an adjacency matrix by its rows (i.e., "from")
`order_data()`	Order data to LDA order
`plot_df_country()`	Plot each variables in a melted data.frame in a faceted `ggplot`
`plot_estimates()`	Plot a time estimate matrix for a different number of downlaoders
`print_estimate()`	Print the time estimate for manual downloading
`query_QA_plots()`	Make QA/QC plot for a given data.frame by comparing corpus present in the query and actually collected
`read_citation_dataframe()`	Read the citation data.frame exported by the query processing
`read_countries_database()`	Read the country database file
`read_countries_shapefile()`	Read the country shapefile
`read_survey_df()`	Read the survey data
`reduce_docs_for_JSd()`	Reduce the documents before calculating the Jensen-Shannon distance
`remove_country()`	Remove `country` column
`remove_irrelevant()`	Removes `Irrelevant` entries from the `country` column of a data.frame
`remove_year()`	Remove `year` column
`remove_year_country()`	Remove `year` and `country` columns
`rgb2hex()`	Convert rgb to hex colors
`select_data()`	Select the type of data to display
`select_list()`	Convenience function for the shinyApp
`source_plot()`	Produce a barplot of the different sources present in the query and in the collected corpus
`transform_DTM()`	This function formats the DTM to be used by the ML models In particular, merge together terms corresponding to one country, e.g. `costa` and `rica`
`transform_data()`	Transform the data with centering, scaling and Box-Cox transformations
`update_database()`	Update the database by marking the manuall downloaded articles as `in corpus`
`write_citation_dataframe()`	Save a data.frame.

Reference

All functions