Scraping Comment replies with Rfacebook - r

I am using the Rfacebook package to scrape a List of public pages that are of interest for my research question. The authentification works properly and I can get dataframes of all public posts, reactions towards the posts and comments made on these posts.
However, I´m running into an issue when I try to extract the replies to comments under the public posts. This is the code that I´m using:
BSBKB <-getPage("bersenbrueckerkreisblatt", token = my_OAuth, feed = TRUE, reactions = TRUE,verbose = TRUE, n = 1000)
#Getting comments for Post No.4
Comments <- getPost(BSBKB$id[4],token = my_OAuth, reactions = TRUE, n =180,likes=TRUE)
#Getting replies to comment No.4 under Post No.4
replies <- getCommentReplies(Comments$comments$id[4], token = my_OAuth, n = 500, replies = FALSE, likes= TRUE)
This code throws the following Error:
Error in data.frame(from_id = json$from$id, from_name = json$from$name, : arguments imply differing number of rows: 0, 1
Strangely enough, the same Error occurs when I try to run the example code from the ?getCommentReplies() page:
## Not run:
## See examples for fbOAuth to know how token was created.
## Getting information about Facebook's Facebook Page
load("fb_oauth")
fb_page <- getPage(page="facebook", token=my_OAuth)
## Getting information and likes/comments about most recent post
post <- getPost(post=fb_page$id[1], n=2000, token=my_OAuth)
## Downloading list of replies to first comment
replies <- getCommentReplies(comment_id=post$comments$id[1], token=my_OAuth)
## End(Not run)
Resulting in:
Error in data.frame(from_id = json$from$id, from_name = json$from$name, :
arguments imply differing number of rows: 0, 1
Is this a systematic error in the package, a recent change in the API or did I make a mistake somewhere? Any suggestions on how to work around this and to extract comment replies (and reactions to them ideally) would be great!

The sourcecode of the function getCommentReplies is published on Github: https://github.com/yanturgeon/R_Script/blob/master/getCommentReplies_dev.R
Reload this code in your own environment, but before you do it, commentout the line:
out[["reply"]] <- replyDataToDF(content)
The effect will be still list, not a dataframe.

Related

rscopus scopus_search() only returns first author. Need full author list

I am performing a bibliometric analysis, and have chosen to use rscopus to automate my document searches. I performed a test search, and it worked; the documents returned by scopus_search() exactly matched a manual check that I performed. Here's my issue: rscopus returned only information on the first author (and their affiliation) of each article, but I need information on all authors/affiliations for each article pulled for my particular research questions. I've scoured the rscopus documentation, as well as Elsevier's Developer notes for API use, but can't figure this out. Any ideas on what I'm missing?
query1 <- 'TITLE-ABS-KEY ( ( recreation ) AND ( management ) AND (challenge)'
run1 <- scopus_search(query = query1, api_key = apikey, count = 20,
view = c('STANDARD', 'COMPLETE'), start = 0, verbose = TRUE,
max_count = 20000, http = 'https://api.elsevier.com/content/search/scopus',
headers = NULL, wait_time = 0)
I wanted to post an update since I figured out what was going wrong. I was using the university VPN to access the Scopus API, but the IP address associated with that VPN was not within the range of addresses included in my institution's Scopus license. So, I did not have permission to get "COMPLETE" results. I reached out to Elsevier and very quickly got an institution key that I could add to the search. My working search looks as follows...
query1 <- 'TITLE-ABS-KEY ( ( recreation ) AND ( management ) AND (challenge)'
run1 <- scopus_search(query = query1, api_key = apikey, count = 20,
view = c('COMPLETE'), start = 0, verbose = TRUE,
max_count = 20000, http='https://api.elsevier.com/content/search/scopus',
headers = inst_token_header(insttoken), wait_time = 0)
Just wanted to reiterate Brenna's comment - I had the same issue using the VPN to access the API (which can be resolved by being on campus). Elsevier were very helpful and provided an institutional token very quickly - problem solved.
Otherwise the other workaround I found was to use CrossRef data using library(rcrossref)
I used the doi column from the scopusdata from my original Scopus search:
crossrefdata <- scopusdata %>%
pmap(function(doi){
cr_works(dois = doi) # returns CrossRef metadata for each doi
}) %>%
map(pluck("data")) %>% # cr_works returns a list, select only the 'data'
bind_rows()
You can then manipulate the crossref metadata however you need with full author list.

How do i fix the warning message "Closing open result set, cancelling previous query" when querying a PostgreSQL database in R?

Below is a snippet of my code that I use in R to extract IDs from a PostgreSQL database. When I run the function I get the following warning message from R:
In result_create(conn#ptr, statement) :
Closing open result set, cancelling previous query
How do I avoid this warning message from happening without making use of options(warn=-1) at the beginning of my code, suppressing the warning instead of
con <- dbConnect(RPostgres::Postgres(),
user = "postgres",
dbname = "DataBaseName",
password = "123456",
port = 5431)
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
return(data_extract)
}
get_id(con, "users")
I found a method for solving the problem.
I found a thread on GitHub for RSQLite a https://github.com/r-dbi/RSQLite/issues/143. In this thread, they explicitly set n = -1 in the dbFetch() function.
This seemed to solve my problem, and the warning message did not show up again by editing the code like the following:
data_extract <- dbFetch(data_extract_query, n = -1)
The meaning of n is the number of rows that the query should return. By setting this to -1 all rows will be retrieved. By default, it is set to n = -1 but for some reason, in this build of R (3.6.3) the warning will still be shown.
Calling ?dbFetch in R you can see more information on this. I have included a snippet from the R-help page:
Usage
dbFetch(res, n = -1, ...)
fetch(res, n = -1, ...)
Arguments
res An object
inheriting from DBIResult, created by dbSendQuery().
n maximum number of records to retrieve per fetch. Use n = -1 or
n = Inf to retrieve all pending records. Some implementations may
recognize other special values.
... Other arguments passed on to methods.
This issue comes up with other database implementations if the results are not cleared before submitting a new one. From the docs of DBI::dbSendQuery
Usage
dbSendQuery(conn, statement, ...)
...
Value
dbSendQuery() returns an S4 object that inherits from DBIResult. The result set can be used with dbFetch() to extract records. Once you have finished using a result, make sure to clear it with dbClearResult(). An error is raised when issuing a query over a closed or invalid connection, or if the query is not a non-NA string. An error is also raised if the syntax of the query is invalid and all query parameters are given (by passing the params argument) or the immediate argument is set to TRUE.
To get rid of the warning the get_id() function must be modified as follows:
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
# Here we clear whatever remains on the server
dbClearResult(data_extract_query)
return(data_extract)
}
See Examples section in help for more.

failed to authenticate google translate in R

So I tried to use gl_translate function to 500,000 characters in Rstudio, which means I have to authenticate my google translate API. The problem is that I tried it like two months ago with my old google account and now I'm using the new one.
So when I tried to authenticate new client_id with my new google account, I got error message that my API hadn't been enabled yet, which I had enabled it. I restarted my Rstudio and now I got this error message:
020-01-22 19:01:24 -- Translating html: 147 characters -
2020-01-22 19:01:24> Request Status Code: 403
Error: API returned: Request had insufficient authentication scopes.
It is very frustrating because then I tried to enable the old google account and it requires me to put my credit card number, which is again I did and then now they asked me to wait several days.
Anyone can figure out what's problem with this?
here is my R code for authentication:
install.packages("googleAnalyticsR", dependencies = TRUE)
library(googleAnalyticsR)
install.packages("googleLanguageR")
library(googleLanguageR)
install.packages("dplyr")
library(dplyr)
library(tidyverse)
install.packages("googleAuthR")
library(googleAuthR)
client_id <- "107033903887214478396"
private_key <- "-----BEGIN PRIVATE KEY-----\nMIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQChPmvib1v9/CFA\nX7fG8b8iXyS370ivkufMmX30C6/rUNOttA+zMhamS/EO0uaYtPw44B4QdNzRsSGq\nm+0fQ5Sp1SHJaVPkoImuUXZdMlLO73pvY48nMmEFg7deoOZI+CNWZYgIvPY8whMo\nk4vKE2iyuG+pl9MT7dT6dwWNmXDWr8kgxAfryfVEUIeqaf+57Z9g3FfVPLARz4iH\nCaTu55hhbmo/XknUx0hPsrwBMPaNLGl2+o5MU1ZuIkl47EJvdL8CdUuEmb9qJDtv\nUyKqANlwFa7cVW8ij2tjFpjJ7bigRVJsI8odbsEbwmx1b5SLckDOQ7t4l8zERtmo\nUv5fxyNNAgMBAAECggEAApeLyWyL2IXcjPnc7OxG68kGwJQuoW/lnQLcpPcpIUm/\n1Vt/IxzLg2nWGqxmO48xPMLRiOcwA4jq5yCxi56c/avo6qFwUU0JWY2CrxXXge8U\nk0TQ8MrdB2cqI/HHMeYXP1TLfoR3GtvtzemtRhbQyIqxdNL1eC0LDump47BTQYg0\nVPuCxU3zXVIj+Qt0FZa+Pa/nAOGHf5b4ye56L7vxL2BCeRncuHdDcE6Ilkpz79Gv\nkXP1K5j22uEVCmobe1qRlq3BLx2Qimj4h8MI8CKiNS40tGR/oewJ5uMgmeCePYKv\nqSFOwCDvRkw9V2KdGu40WQFEq21mczlv9gMWhp2/EQKBgQDRmBZZM7ugIpo64wx6\nDFYhZo05LmMiwymIfWs2CibzKDeXPuy3OSytvTPVFCkG+RlcYthxAIAn1Z/qJ4UI\n+8c8Zwfg+toYtEa2gTYM2185vmnqQwqmAsaK+4xKZzgfqxie/CBuPzUOZO41q6P8\ni7A2KqXHcDb4SMqnkdGGLk/7+QKBgQDE8dBesgx4DsHFYg1sJyIqKO4d2pnLPkDS\nAzx5xvQuUcVCNTbugOC7e0vGxWmQ/Eqal5b3nitH590m8WHnU9UiE4HciVLe+JDe\nDe5CWzBslnncBjpgiDudeeEubhO7MHv/qZyZXMh73H2NBdO8j0uiGTNbBFoOSTYq\nsFACiCZu9QKBgE2KjMoXn5SQ+KpMkbMdmUfmHt1G0hpsRZNfgyiM/Pf8qwRjnUPz\n/RmR4/ky6jLQOZe6YgT8gG08VVtVn5xBOeaY34tWgxWcrIScrRh4mHROg/TNNMVS\nRY3pnm9wXI0qyYMYGA9xhvl6Ub69b3/hViHUCV0NoOieVYtFIVUZETJRAoGAW/Y2\nQCGPpPfvD0Xr0parY1hdZ99NdRQKnIYaVRrLpl1UaMgEcHYJekHmblh8JNFJ3Mnw\nGovm1dq075xDBQumOBU3zEzrP2Z97tI+cQm3oNza5hyaYbz7aVsiBNYtrHjFTepb\nT1l93ChnD9SqvB+FR5nQ2y07B/SzsFdH5QbCO4kCgYBEdRFzRLvjdnUcxoXRcUpf\nfVMZ6fnRYeV1+apRSiaEDHCO5dyQP8vnW4ewISnAKdjKv/AtaMdzJ5L3asGRWDKU\n1kP/KDBlJkOsOvTkmJ4TxbIhgcSI62/wqDBi5Xqw1ljR2mh8njzRwqDRKs12EtQ0\n9VaUDm7LCNTAskn2SR/o4Q==\n-----END PRIVATE KEY-----\n"
options(googleAuthR.client_id = client_id)
options(googleAuthR.client_secret = private_key)
devtools::reload(pkg = devtools::inst("googleAnalyticsR"))
ga_auth()
in case you need to see what's my translate code like:
translate <- function(tibble) {
tibble <- tibble
count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
for (i in 1:nrow(tibble)) {
des <- pull(tibble[i,2]) # extract description as single character string
if (count$cumsum[nrow(count)] >= 80000) { # API limit check
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
Sys.sleep(100)
count <- count[1,] # reset count file
}
if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
} else { # else request translation from API
trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
}
tibble[i,3:4] <- trns[,1:2] # add to tibble
nchar = nchar(pull(tibble[i,2])) # count number of characters
req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
count <- rbind(count, req) # add to count file
if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
print("large description (>20,000), pausing to manage API limit")
Sys.sleep(100)
count <- count[1,] # reset count file
}
}
return(tibble)
}
I figured it out after 24 hours.
Apparently it is really easy. I just followed the step from this link.
But yesterday I make mistake because the json file I downloaded is the json file from service client Id, while I actually need the json file from service account.
Then I install the googleLanguageR package with this code:
remotes::install_github("ropensci/googleLanguageR")
library(googleLanguageR)
and then just set the file location of my download Google Project JSON file in a GL_AUTH argument like this:
gl_auth("G:/My Drive/0. Thesis/R-Script/ZakiServiceAccou***************kjadjib****.json")
and now I'm happy :)

Error in Rfacebook getPost - Argument has length 0

Whenever i'm trying to get a post with a lot of comments from Facebook with Rfacebook's getPost-function, i get the following error:
Error in while (n.l < n.likes & length(content$data) > 0 & !is.null(url <- content$paging$`next`)) { :
Argument has length 0
The code i'm trying to run looks like this:
post <- getPost(post = "Post-ID", token = token, n = 200)
I've also tried playing around with the different arguments of the function but nothing so far has worked... Anyone has an idea what could have caused this error? Any help is greatly appreciated!
Here's the link to the documentation of the getPost function: https://www.rdocumentation.org/packages/Rfacebook/versions/0.6.15/topics/getPost
I have a way that attacks your problem from a slightly different angle.
Instead of tackling the post ID you could, extract it from the 'Page' angle, and also this is an easier way of getting the Post ID
Step1:
see what 'Page' the post is on then you can extract the 'post' but making sure to use time parameters - for example:
"If you want to extract a post form the Nike FB page that has a massive amount of comments - which happened to fall on June, 6th 2016"
nike_posts <- getPage("nike", token = fboauth, n=100000, since = '2016/06/05', until = '2016/06/07')
Step 2:
You will then have a data frame of posts - lets say example 7 observations for that time (they possibly post multiple times a day)
If the post you are looking for is observation #3, then extract the comments by:
Comments <- getPost(nike_posts$id[3], token = fboauth, n = 10000, comments = TRUE, likes = FALSE, n.likes = 1, n.comments = 100000)
to convert this output to a DataFrame
library(plyr)
Comments <- ldply(Comments, data.frame)

Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds

So I've got a list of "Player" objects, each with an ID, called players and I'm trying to reach a web JSON with information related to the relevant ID, using JSONlite.
The HTML stem is: 'https://fantasy.premierleague.com/drf/element-summary/'
I need to access every players respective page.
I'm trying to do so as follows:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
for (player in players) {
player_data_url <- paste(playerDataURLStem,player#id,sep = "")
player_data <- fromJSON(player_data_url)
# DO SOME STUFF #
}
When I run it, I'm getting the error Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds. This error is produced at a different position in my list of players each time I run the code and when I check the webpage that is causing the error, I can't see anything erroneous about it. This leads me to believe that sometimes the pages just take longer than 10000 milliseconds to reply, but using
options(timeout = x)
for some x, doesn't seem to make it wait longer for a response.
For a minimum working example, try:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
ids <- c(1:540)
for (id in ids) {
player_data_url <- paste(playerDataURLStem, id, sep = "")
player_data <- fromJSON(player_data_url)
print(player_data$history$id[1])
}
options(timeout= 4000000) is working for me .try increasing value of timeout to higher number

Resources