Create a data.frame of multilabels using the webscrapped author keywords

make_webscrapped_trainingData(
  boolean_AuthKeywords,
  ind_hasCountryTag,
  englishCorpus,
  englishCorpus_file
)

Arguments

boolean_AuthKeywords

data.frame of multilabels webscrapped author keywords

ind_hasCountryTag

list of boolean indicating if an entry has at least one label

englishCorpus

databse of corpus of document with abstracts

englishCorpus_file

file path to the complete corpus

Value

list with 3 elements: country_tokens: tokenized country labels, webscrapped_validationDTM: a document term matrix derived from the tokenized country labels, webscrapped_trainingLabels: webscrapped multilabels