make_webscrapped_trainingData.RdCreate a data.frame of multilabels using the webscrapped author keywords
make_webscrapped_trainingData( boolean_AuthKeywords, ind_hasCountryTag, englishCorpus, englishCorpus_file )
| boolean_AuthKeywords | data.frame of multilabels webscrapped author keywords |
|---|---|
| ind_hasCountryTag | list of boolean indicating if an entry has at least one label |
| englishCorpus | databse of corpus of document with abstracts |
| englishCorpus_file | file path to the complete corpus |
list with 3 elements: country_tokens: tokenized country labels, webscrapped_validationDTM: a document term matrix derived from the tokenized country labels, webscrapped_trainingLabels: webscrapped multilabels