bib list print to a character string in R - r

I'm reading a bib file extracted from Google Scholar with BIB <- bibtex::read.bib("file.bib") command and this created a list object. If I use paste(BIB) or as.character(BIB) the console shows for all items in the list lines like:
"list(title = "A Lealdade no Sistema Financeiro Portugu{\\^e}s", author = list(list(given = c("Francisco", "José", "dos", "Santos", "Mota", "Ferreira"), family = "Guerra", role = NULL, email = NULL, comment = NULL)), year = "2017", school = "Universidade de Coimbra")"
And if I use print() shows:
Guerra FJdSMF (2017). A Lealdade no Sistema Financeiro Português. Ph.D. thesis,
Universidade de Coimbra.
I need to extract the second kind to a new character string, but any command I try just doesn't work. I've tried A <- paste(print(BIB)), A <- as.character(print(BIB)) or just A <- print(BIB). I just get the first kind of line or an equal object.
I have already tried open the same file with bib2df::bib2df() but has some problems with the encoding and the dataframe's columns and rows

Try format(BIB) For example
bib <- read.bib( package = "bibtex" )
x <- format(bib)
x
# [1] "R Development Core Team (2009). _R: A Language and Environment for\nStatistical Computing_. R Foundation for Statistical Computing, Vienna,\nAustria. ISBN 3-900051-07-0, <http://www.R-project.org>."
I found this by looking at class(BIB) and saw "bibentry" then looked for all methods that recognize that object methods(class="bibentry") and format seemed like a good candidate.

Related

How does R Markdown automatically format print effects into dataframes? Or how can I access special print methods?

I'm working with the WRS2 package and there are cases where it'll output its analysis (bwtrim) into a list with a special class of the analysis type class = "bwtrim". I can't as.data.frame() it, but I found that there is a custom print method called print.bwtrim associated with it.
As an example let's say this is the output: bwtrim.out <- bwtrim(...). When I run the analysis output in an Rmarkdown chunk, it seems to "steal" part of the text output and make it into a dataframe.
So here's my question, how can I either access print.bwtrim or how does R markdown automatically format certain outputs into dataframes? Because I'd like to take this outputted dataframe and use it for other purposes.
Update: Here is a minimally working example -- put the following in a chunk in Rmd file."
```{r}
library(WRS2)
df <-
data.frame(
subject = rep(c(1:100), each = 2),
group = rep(c("treatment", "control"), each = 2),
timepoint = rep(c("pre", "post"), times = 2),
dv = rnorm(200, mean = 2)
)
analysis <- WRS2::bwtrim(dv ~ group * timepoint,
id = subject,
data = df,
tr = .2)
analysis
```
With this, a data.frame automatically shows up in the chunk afterwards and it shows all the values very nicely. My main question is how can I get this data.frame for my own uses. Because if you do str(analysis), you see that it's a list. If you do class(analysis) you get "bwtrim". if you do methods(class = "bwtrim"), you get the print method. And methods(print) will have a line that says print.bwtrim*. But I can't seem to figure out how to call print.bwtrim myself.
Regarding what Rmarkdown is doing, compare the following
If you run this in a chunk, it actually steals the data.frame part and puts it into a separate figure.
```{r}
capture.output(analysis)
```
However, if you run the same line in the console, the entire output comes out properly. What's also interesting is that if you try to assign it to another object, the output will be stolen before it can be assigned.
Compare x when you run the following in either a chunk or the console.
```{r}
x<-capture.output(analysis)
```
This is what I get from the chunk approach when I call x
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] ""
This is what I get when I do it all in the console
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] " value df1 df2 p.value"
[6] "group 1.0397 1 56.2774 0.3123"
[7] "timepoint 0.0001 1 57.8269 0.9904"
[8] "group:timepoint 0.5316 1 57.8269 0.4689"
[9] ""
My question is what can I call whatever Rstudio/Rmarkdown is doing to make data.frames, so that I can have an easy data.frame myself?
Update 2: This is probably not a bug, as discussed here https://github.com/rstudio/rmarkdown/issues/1150.
Update 3: You can access the method by using WRS2:::bwtrim(analysis), though I'm still interested in what Rmarkdown is doing.
Update 4: It might not be the case that Rmarkdown is stealing the output and automatically making dataframes from it, as you can see when you call x after you've already captured the output. Looking at WRS2:::print.bwtrim, it prints a dataframe that it creates, which I'm guessing Rmarkdown recognizes then formats it out.
See below for the print.bwtrim.
function (x, ...)
{
cat("Call:\n")
print(x$call)
cat("\n")
dfx <- data.frame(value = c(x$Qa, x$Qb, x$Qab), df1 = c(x$A.df[1],
x$B.df[1], x$AB.df[1]), df2 = c(x$A.df[2], x$B.df[2],
x$AB.df[2]), p.value = c(x$A.p.value, x$B.p.value, x$AB.p.value))
rownames(dfx) <- c(x$varnames[2], x$varnames[3], paste0(x$varnames[2],
":", x$varnames[3]))
dfx <- round(dfx, 4)
print(dfx)
cat("\n")
}
<bytecode: 0x000001f587dc6078>
<environment: namespace:WRS2>
In R Markdown documents, automatic printing is done by knitr::knit_print rather than print. I don't think there's a knit_print.bwtrim method defined, so it will use the default method, which is defined as
function (x, ..., inline = FALSE)
{
if (inline)
x
else normal_print(x)
}
and normal_print will call print().
You are asking why the output is different. I don't see that when I knit the document to html_document, but I do see it with html_notebook. I don't know the details of what is being done, but if you look at https://rmarkdown.rstudio.com/r_notebook_format.html you can see a discussion of "output source functions", which manipulate chunks to produce different output.
The fancy output you're seeing looks a lot like what knitr::knit_print does for a dataframe, so maybe html_notebook is substituting that in place of print.

CleanNLP package in R: metadata data frame?

Let's assume my dataframe looks like this:
bio_text <- c("Georg Aemilius, eigentlich Georg Oemler, andere Namensvariationen „Aemylius“ und „Emilius“ (* 25. Juni 1517 in Mansfeld; † 22. Mai 1569 in Stolberg (Harz))...", "Johannes Aepinus auch: Johann Hoeck, Huck, Hugk, Hoch oder Äpinus (* um 1499 in Ziesar; † 13. Mai 1553 in Hamburg) war ein deutscher evangelischer Theologe und Reformator.\nAepinus wurde als Sohn des Ratsherrn Hans Hoeck im brandenburgischen Ziesar 1499 geboren...")
doc_id <- c("1", "2")
url <- c("https://de.wikipedia.org/wiki/Georg_Aemilius", "https://de.wikipedia.org/wiki/Johannes_Aepinus")
name <- c("Aemilius, Georg", "Aepinus, Johannes")
place_of_birth <- c("Mansfeld", "Ziesar")
full_wikidata <- data.frame(bio_text, doc_id, url, name, place_of_birth)
I want to carry out Named Entity Recognition with the cleanNLP package in R. Therefore, I initialize the tokenizers and the spaCy backend, everything works fine:
options(stringsAsFactors = FALSE)
library(cleanNLP)
cnlp_init_tokenizers()
require(reticulate)
cnlp_init_spacy("de")
wikidata <- full_wikidata[,c("doc_id", "bio_text")]
wikimeta <- full_wikidata[,c("url", "name", "place_of_birth")]
spacy_annotatedWikidata <- cleanNLP::cnlp_annotate(wikidata, as_strings = TRUE, meta = wikimeta)
My only problem is the metadata. When I run it like this, I get the following warning message: In cleanNLP::cnlp_annotate(full_wikidata, as_strings = TRUE, meta = wikimeta) : data frame input given along with meta; ignoring the latter. To be honest, I don't get the documentation concerning meta in cnlp_annotate: "an optional data frame to bind to the document table". This means that I should deliver a data frame containing the metadata, right?! Later on, I want to be able to do something like this, e.g. filter out all person entities in document no. 3:
cnlp_get_entity(spacy_annotatedWikidata) %>%
filter(doc_id == 3, entity_type == "PER") %>%
count(entity)
Therefore, I have to find a way to access the metadata. Any help would be highly appreciated!
Fortunatelly, in the meantime I got some help and the advice to take a closer look at the method code of cnlp_annotate on Github: https://github.com/statsmaths/cleanNLP/blob/master/R/annotate.R
It says that you only can pass in a metadata dataframe if the input itself is not a dataframe but a file path. So if you do want to pass in a dataframe, the first row has to be doc_id, the second text and the remaining ones are automatically considered as metadata! So in my example only the order in full_wikidata has to be changed:
full_wikidata <- data.frame(doc_id, bio_text, url, name, place_of_birth)
Like this, it can be directly used as an input in clnp_annotate:
spacy_annotatedWikidata <- cleanNLP::cnlp_annotate(full_wikidata, as_strings = TRUE)

Convert R JSON Twitter data to list

When using SearchTwitter, I converted to dataframe and then exported to JSON. However, all the text is in one line, etc (sample below). I need to separate so that each tweet is its own.
phish <- searchTwitteR('phish', n = 5, lang = 'en')
phishdf <- do.call("rbind", lapply(phish, as.data.frame))
exportJson <-toJSON(phishdf)
write(exportJson, file = "phishdf.json")
json_phishdf <- fromJSON(file="phishdf.json")
I tried converting to a list and am wondering if maybe converting to a data frame is a mistake.
However, for a list, I tried:
newlist['text']=phish[[1]]$getText()
But this will just give me the text for the first tweet. Is there a way to iterate over the entire data set, maybe in a for loop?
{"text":["#ilazer #abbijacobson I do feel compelled to say that I phind phish awphul... sorry, Abbi!","#phish This on-sale was an embarrassment. Something needs to change.","FS: Have 2 Tix To Phish In Chula Vista #Phish #facevaluetickets #phish #facevalue GO: https://t.co/dFdrpyaotp","RT #WKUPhiDelt: Come unwind from a busy week of class and kick off the weekend with a Phish Fry! 4:30-7:30 at the Phi Delt house. Cost is $\u2026","RT #phish: Tickets for Phish's July 15 & 16 shows at The Gorge go on sale in fifteen minutes at 1PM ET: https://t.co/tEKLNjI5u7 https://t.c\u2026"],
"favorited":[false,false,false,false,false],
"favoriteCount":[0,0,0,0,0],
"replyToSN":["rAlexandria","phish","NA","NA","NA"],
"created":[1456521159,1456521114,1456521022,1456521016,1456520988],
"truncated":[false,false,false,false,false],
"replyToSID":["703326502629277696","703304948990222337","NA","NA","NA"],
"id":["703326837720662016","703326646074343424","703326261045829632","703326236722991105","703326119328686080"],
"replyToUID":["26152867","14503997","NA","NA","NA"],"statusSource":["Mobile Web (M5)","Twitter for iPhone","CashorTrade - Face Value Tickets","Twitter for iPhone","Twitter for Android"],
"screenName":["rAlexandria","adamgelvan","CashorTrade","Kyle_Smith1087","timogrennell"],
"retweetCount":[0,0,0,2,5],
"isRetweet":[false,false,false,true,true],
"retweeted":[false,false,false,false,false],
"longitude":["NA","NA","NA","NA","NA"],
"latitude":["NA","NA","NA","NA","NA"]}
I followed your code and don't have the issue you're describing. Are you using library(twitteR) and library(jsonlite)?
Here is the code, and a screenshot of it working
library(twitteR)
library(jsonlite)
phish <- searchTwitteR('phish', n = 5, lang = 'en')
phishdf <- do.call("rbind", lapply(phish, as.data.frame))
exportJson <-toJSON(phishdf)
write(exportJson, file = "./../phishdf.json")
## note the `txt` argument, as opposed to `file` used in the question
json_phishdf <- fromJSON(txt="./../phishdf.json")

Input ID in Shiny selectInput() not refreshing the results

I am trying to provide Organization name as input in ui.R as following -
selectInput("Organization", "Enter an Org:", choices = c("Blenheim Palace", "Chatsworth", "Gloucester Cathedral", "Manchester Cathedral", "Royal Albert Hall", "StPauls Cathedral"))
I try to use this input to refresh my wordcloud. Essentially when I select an organization, say Blenheim Palace, wordcloud should change with the comments for that organization from tripadvisor.com.
My server.R code is as following -
OrganizationInput <- reactive({switch(input$Organization, "Blenheim Palace" = Blenheim_Palace, "Chatsworth" = Chatsworth, "Gloucester Cathedral" = Gloucester_Cathedral, "Manchester Cathedral" = Manchester_Cathedral, "Royal Albert Hall" = Royal_Albert_Hall, "StPauls Cathedral" = StPauls_Cathedral)})
rawData <- reactive(function(){
some_txt <- sqlQuery(dbhandle, 'SELECT REVIEW_COMMENTS FROM XXXXXX.tripadvisor_data where brand_name = "OrganizationInput()"')
some_txt <- data.frame(some_txt)
I try to use rawData() as input for wordcloud. But I get following error -
Error: invalid 'cex' value, if I give individual names (say Blenheim Palace) in rawData, it works.
Any help/clarification will be highly appreciated.
If you want OrganizationInput to be a character string, do you need to enquote the values in switch a la:
OrganizationInput <- reactive({switch(input$Organization, "Blenheim Palace" = "Blenheim_Palace", "Chatsworth" = "Chatsworth", ...
Otherwise, you're trying to reference a variable named named Blenheim_Palace, which likely doesn't exist, right?
It was essentially a problem of parametrizing input$Organization in sqlQuery(). I used the idea from
R Shiny error 'closure' not subsettable
and it worked. thanks for looking into my question.

Get function's title from documentation

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"
If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"
It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Resources