make_webscrapped_trainingData.Rd
Create a data.frame of multilabels using the webscrapped author keywords
make_webscrapped_trainingData( boolean_AuthKeywords, ind_hasCountryTag, englishCorpus, englishCorpus_file )
boolean_AuthKeywords | data.frame of multilabels webscrapped author keywords |
---|---|
ind_hasCountryTag | list of boolean indicating if an entry has at least one label |
englishCorpus | databse of corpus of document with abstracts |
englishCorpus_file | file path to the complete corpus |
list with 3 elements: country_tokens: tokenized country labels, webscrapped_validationDTM: a document term matrix derived from the tokenized country labels, webscrapped_trainingLabels: webscrapped multilabels