bipartite_networks.Rmd
This vignette clarifies the calculation of the citation network. Unfortunately, this vignette requires a large file containing the entire data of the corpus which cannot be distributed with the package. For that reason, the code of this vignette is not executed.
The citation network was extracted from article metadata which stores identifiers for the cited articles and citing article (i.e. the article to which the metadata are attached to). A total of 29,900 citations were found between 4,603 unique articles of the English corpus. The resulting bibliographic network is defined by its \(4,603\times4,603\)-adjacency matrix \(B\) and is filtered to remove edges between node external to the collected corpus. The citation network between countries, \(B_C\) is obtained by:
\[ B_C = C^T \times B \times C \]
where \(C\) is a \(4,603\times23\)-matrix containing the output of the machine learning probabilistic predictions of the location of study between \(23\) possible country of study. In consequence, \(B_C\) is a weighted directed adjacency matrix. Similar reasoning leads to the citation network between specific research topics, method topics and water budget topics by changing the matrix \(C\) in the previous equation for the LDA probabilistic predictions of topics.
library(wateReview)
englishCorpus_file <- "F:/hguillon/research/exploitation/R/latin_america/data/english_corpus.Rds"
englishCorpus <- readRDS(englishCorpus_file)
in_corpus <- readRDS(system.file("extdata", "in_corpus.Rds", package = "wateReview"))
source_ids <- readRDS(system.file("extdata", "source_ids.Rds", package = "wateReview")) # aligned with corpus
citation_network <- readRDS(system.file("extdata", "citingDf.Rds", package = "wateReview"))
The following code chunks perform a cross-walk between the topic model and EndNote databases using document id: EndNoteIdcorpus
and EndNoteIdLDA
. This ensures that the two databases can communicate and are aligned.
EndNoteIdcorpus <- get_EndNoteIdcorpus(in_corpus)
EndNoteIdLDA <- get_EndNoteIdLDA(englishCorpus)
QA.EndNoteIdCorpusLDA(EndNoteIdLDA, EndNoteIdcorpus)
in_corpus <- align_dataWithEndNoteIdcorpus(in_corpus, EndNoteIdcorpus, EndNoteIdLDA)
source_ids <- align_dataWithEndNoteIdcorpus(source_ids, EndNoteIdcorpus, EndNoteIdLDA)
englishCorpus <- align_dataWithEndNoteIdLDA(englishCorpus, EndNoteIdLDA, EndNoteIdcorpus)
EndNoteIdLDA <- align_dataWithEndNoteIdLDA(EndNoteIdLDA, EndNoteIdLDA, EndNoteIdcorpus)
EndNoteIdcorpus <- align_dataWithEndNoteIdcorpus(EndNoteIdcorpus, EndNoteIdcorpus, EndNoteIdLDA)
QA_EndNoteIdCorpusLDA(EndNoteIdLDA, EndNoteIdcorpus)
source_ids <- order_data(source_ids, EndNoteIdLDA, EndNoteIdcorpus)
in_corpus <- order_data(in_corpus, EndNoteIdLDA, EndNoteIdcorpus)
predCountryMembership <- readRDS(system.file("extdata", "predCountryMembership.Rds", package = "wateReview"))
colnames(predCountryMembership) <- gsub("prob.", "", colnames(predCountryMembership))
For countries, the citation network is obtained using make_countryNetwork()
For topics, citation network are performed with make_topicNetwork()
for six categories of topics: methods
, NSF_general
, NSF_specific
, spatial scale
, theme
, and water budget
.
countryNetwork <- make_countryNetwork(source_ids, citation_network)
type_list <- c("methods", "NSF_general", "NSF_specific", "spatial scale", "theme", "water budget")
for (tt in type_list) `make_topicNetwork`(source_ids, citation_network, type = tt, percentile.threshold = 0.90)