Appending rows to existing Dataframe in R

Appending rows to existing Dataframe in R - r

I am extracting twitter data using the twitteR package and storing them in a dataframe x.
I first created the dataframe.
x <- data.frame(
name = character(),
screen_name = character(),
id = integer(),
description = character(),
statuses_count = integer(),
followersCount = integer(),
favoritesCount = integer(),
friendsCount = integer(),
url = character(),
created = integer(),
verified = integer(),
profile_image_url = character(),
stringsAsFactors=FALSE
)
Then created a function to return the data of a specific user
adduserdata <- function(username = ""){
user <- getUser(username)
userdata = c(name = user$name,
screen_name = user$screenName,
id = user$id,
description = user$description,
statuses_count = user$statusesCount,
followersCount = user$followersCount,
favoritesCount = user$favoritesCount,
friendsCount = user$friendsCount,
url = user$url,
created = user$created,
verified = user$verified,
profile_image_url = user$profileImageUrl)
return(userdata)
}
I now want to get the data of each user in the list ns and append them to the dataframe x
ns <- c("realDonaldTrump","BarackObama")
for (n in ns) {
user <- adduserdata(n)
x <- bind(x, user)
}
But I get an error stating 'invalid factor level'. I'm not sure why.

Return a dataframe from addUser function.
adduserdata <- function(username = ""){
user <- getUser(username)
userdata = data.frame(name = user$name,
screen_name = user$screenName,
id = user$id,
description = user$description,
statuses_count = user$statusesCount,
followersCount = user$followersCount,
favoritesCount = user$favoritesCount,
friendsCount = user$friendsCount,
url = user$url,
created = user$created,
verified = user$verified,
profile_image_url = user$profileImageUrl)
return(userdata)
}
and try :
result <- do.call(rbind, lapply(ns, adduserdata))
Or
result <- purrr::map_df(ns, adduserdata)

Related

Using reactive input within reactiveValue() function

I am new to shiny and trying to figure out some reactive stuff.
Currently this works for a static csv.
## function to return random row from twitter csv
tweetData <- read.csv('twitterData1.csv')
## stores reactive values
appVals <- reactiveValues(
tweet = tweetData[sample(nrow(tweetData), 1), ],
ratings = data.frame(tweet = character(), screen_name = character(), rating = character())
)
I need the same block of reactive values to be funciton but using a selected csv using input$file.
appVals <- reactiveValues(
csvName <- paste0('../path/', input$file),
tweetData <- read.csv(csvName),
tweet = tweetData[sample(nrow(tweetData), 1), ],
ratings = data.frame(tweet = character(), screen_name = character(), rating = character())
)
I get the error:
Warning: Error in : Can't access reactive value 'file' outside of reactive consumer.
I've tried moving things around but I keep getting stuck, help appreciated!

The error is telling that you should update the values inside a reactive expression.
First initialize the reactive values:
tweetData <- read.csv('twitterData1.csv')
appVals <- reactiveValues()
appVals$tweet <- tweetData[sample(nrow(tweetData), 1), ]
appVals$ratings <- data.frame(tweet = character(), screen_name = character())
Then update them with a reactive:
observeEvent(input$file,{
csvName <- paste0('../path/', input$file)
if (file.exists(csvName) {
tweetData <- read.csv(csvName)
appVals$tweet = tweetData[sample(nrow(tweetData), 1), ]
appVals$ratings = data.frame(tweet = character(), screen_name = character(), rating = character())
}
})

How do I get attributes of list elements recursively in R?

I have a nested list structure with some elements (not all) having attributes that I want to keep (I've converted some xml output to a list). I'm trying to flatten it into a data.frame. The structure is something like this:
myList <- structure(list(address = structure(list(Address = list(Line = list("xxxxxxx"),
Line = list("xxxxxxx"), Line = list("xxxxxxx"), PostCode = list(
"XXX XXX"))), type = "Residential", verified = "Unverified"),
amount = structure(list(paymentAmount = list(maxAmount = list(
amountPart = structure(list(Amount = list("0.00")), component = "Standard"),
amountPart = structure(list(Amount = list("0.00")), component = "Thing1"),
amountPart = structure(list(Amount = list("0.00")), component = "Thing2"),
amountPart = structure(list(Amount = list("0.00")), component = "Thing3"),
amountPart = structure(list(Amount = list("0.00")), component = "Thing4"),
amountPart = structure(list(Amount = list("100.00")), component = "Thing5"),
amountPart = structure(list(Amount = list("0.00")), component = "Thing6")),
otherAmount = list(Amount = list("0.00")),
discount = list("0.00"),
transition = list(
"0.00"), discounts = list(), regularPayment = list(
"200.00")),
paymentInfo = list(income = structure(list(
net = list("0")), refNumber = "xxxxxxx"))),
paymentDate = "2021-03-22", startDate = "2021-02-16", endDate = "2021-03-15")),
type = "Normal")
I've tried rapply(myList, attributes) but that just seems to return NULL.
I've also tried using a loop in a recursive function:
get_attributes <- function(myList, attribute_list = NULL) {
if (is.null(attribute_list)) attribute_list <- list()
for (i in seq_along(myList)) {
if (is.list(myList[[i]])) {
attribute_list <- c(attribute_list, sapply(myList[[i]], attributes))
attribute_list <- get_attributes(myList[[i]], attribute_list)
} else {
attribute_list <- c(attribute_list, attributes(myList[[i]]))
}
}
attribute_list
}
Once I've got the list of attributes, I then want to put them in a one row data.frame - something like data.frame(address.type = "Residential", address.verified = "Unverified", component.1 = "Standard", component.2 = "Thing1"
The function with a loop is a bit messy and not very 'R', and it also seems to spit out lots of repeated elements that I don't want. Does anyone have any idea how to implement this more elegantly?
UPDATE
I've refined the loop implementation to this, which seems to work, but I just couldn't figure out how to use either purrr or one of the *apply functions in place of the loop:
get_attributes <- function(myList, attribute_list = NULL, prefix = NULL) {
if (is.null(attribute_list)) {
attribute_list <- list()
}
if (is.null(prefix)) {
prefix <- ""
}
for (i in seq_along(myList)) {
name <- names(myList)[i]
attrs <- attributes(myList[[i]])
if (!is.null(attrs)) {
names(attrs) <- paste0(prefix, name, ".", names(attrs))
attrs <- attrs[!grepl("\\.names$", names(attrs))]
attribute_list <- c(attribute_list, attrs)
}
if (is.list(myList[[i]])) {
attribute_list <- get_attributes(myList[[i]],
attribute_list,
paste0(prefix, name, "."))
}
}
attribute_list
}
do.call(data.frame, get_attributes(myList))

You can gather all the attributes available and just keep the ones you are interested from it.
library(purrr)
map_df(myList, ~map_chr(attributes(.x), toString))
# names type verified paymentDate startDate endDate
# <chr> <chr> <chr> <chr> <chr> <chr>
#1 Address Residential Unverified NA NA NA
#2 paymentAmount, paymentInfo NA NA 2021-03-22 2021-02-16 2021-03-15

Create argument list using lapply for do.call

I'm trying to pass a set of modified arguments from a larger function to arguments in a nested function. This is an argument supplied from the larger function:
time_dep_covariates_list = c(therapy_start = "Start of Therapy",
therapy_end = "End of Therapy")
I have these sets of constant arguments:
tmerge_args_1 <- alist(data1 = analytic_dataset,
data2 = analytic_dataset,
id = patientid,
tstop = adv_dx_to_event,
death_censor = event(adv_dx_to_event))
And I want to append these modified arguments to that argument list:
tmerge_args_2 <- lapply(1:length(time_dep_covariates_list), function(x){
tmerge_args <<- c(tmerge_args, alist('var' = tdc(var)) )
paste0(names(time_dep_covariates_list[x])," =
tdc(",names(time_dep_covariates_list[x]), ")")
})
> tdc_args
[[1]]
[1] "therapy_start = tdc(therapy_start)"
[[2]]
[1] "therapy_end = tdc(therapy_end)"
I want to create a do.call that handles the arguments like so:
count_process_form <- do.call(tmerge, args = c(tmerge_args_1,
tmerge_args_2)
That would be identical to the following:
tmerge(data1 = analytic_dataset, data2 = analytic_dataset,
id = patientid, tstop = adv_dx_to_event,
therapy_start = tdc(therapy_start), therapy_end = tdc(therapy_end)
It works fine with tmerge_args_1 by itself, but as the args_2 are character and not language elements, I get this error:
Error in (function (data1, data2, id, ..., tstart, tstop, options) :
all additional argments [sic] must have a name:
How can I modify the list I'm creating for args_2 so they're stored as arguments that do.call can understand? Or am I approaching this all wrong?
Thanks!
Here is a reproducible example:
analytic_dataset= data_frame(patientid = sample(1:1000,5),
adv_dx_to_event = sample(100:200, 5),
death_censor = sample(0:1,5, replace = T),
therapy_start = sample(1:20,5),
therapy_stop = sample(40:100,5))
The below would be passed in from a function:
time_dep_covariates_list = c(therapy_start = "Start of Therapy",
therapy_end = "End of Therapy")
tmerge_args_1 <- alist(data1 = analytic_dataset,
data2 = analytic_dataset,
id = patientid,
tstop = adv_dx_to_event,
death_censor = event(adv_dx_to_event))
do.call(tmerge,tmerge_args_1) #this works
tmerge_args_2 <- lapply(1:length(time_dep_covariates_list), function(x){
tmerge_args <<- c(tmerge_args, alist('var' = tdc(var)) )
paste0(names(time_dep_covariates_list[x])," = tdc(",names(time_dep_covariates_list[x]), ")")
})
do.call(tmerge,tmerge_args_1,tmerge_args_2) # this doesn't```

Scraping Reddit in R with RedditExtractoR

I'm trying to scrape Reddit data (I'm pretty new to web scraping and half decent at R). The RedditExtractor package has a nice function that does 90% of what I need, but it doesn't grab the "flair" associated with users who make comments. I'm trying to play around with the package's function but I'm a bit over my head.
There are examples of Reddit threads with flairs here. I think I'm looking for the text in these bits of XML:
<span class="flair flair-orthodox" title="Eastern Orthodox">Eastern Orthodox</span>
I've pasted the code from the reddit_content() function along with comments where I think the extra code should go, but I'm not quite sure where to go from here. At the moment the function returns a data frame with columns for the comment, time stamp, user, etc. I need it to also produce a comment with user flairs if they exist. Thanks in advance!
redd_content_flair <- function (URL, wait_time = 2)
{
if (is.null(URL) | length(URL) == 0 | !is.character(URL)) {
stop("invalid URL parameter")
}
GetAttribute = function(node, feature) {
Attribute = node$data[[feature]]
replies = node$data$replies
reply.nodes = if (is.list(replies))
replies$data$children
else NULL
return(list(Attribute, lapply(reply.nodes, function(x) {
GetAttribute(x, feature)
})))
}
get.structure = function(node, depth = 0) {
if (is.null(node)) {
return(list())
}
filter = is.null(node$data$author)
replies = node$data$replies
reply.nodes = if (is.list(replies))
replies$data$children
else NULL
return(list(paste0(filter, " ", depth), lapply(1:length(reply.nodes),
function(x) get.structure(reply.nodes[[x]], paste0(depth,
"_", x)))))
}
data_extract = data.frame(id = numeric(), structure = character(),
post_date = as.Date(character()), comm_date = as.Date(character()),
num_comments = numeric(), subreddit = character(), upvote_prop = numeric(),
post_score = numeric(), author = character(), user = character(),
comment_score = numeric(), controversiality = numeric(),
comment = character(), title = character(), post_text = character(),
link = character(), domain = character(),
#flair = character(),
URL = character())
pb = utils::txtProgressBar(min = 0, max = length(URL), style = 3)
for (i in seq(URL)) {
if (!grepl("^https?://(.*)", URL[i]))
URL[i] = paste0("https://www.", gsub("^.*(reddit\\..*$)",
"\\1", URL[i]))
if (!grepl("\\?ref=search_posts$", URL[i]))
URL[i] = paste0(gsub("/$", "", URL[i]), "/?ref=search_posts")
X = paste0(gsub("\\?ref=search_posts$", "", URL[i]),
".json?limit=500")
raw_data = tryCatch(RJSONIO::fromJSON(readLines(X, warn = FALSE)),
error = function(e) NULL)
if (is.null(raw_data)) {
Sys.sleep(min(1, wait_time))
raw_data = tryCatch(RJSONIO::fromJSON(readLines(X,
warn = FALSE)), error = function(e) NULL)
}
if (is.null(raw_data) == FALSE) {
meta.node = raw_data[[1]]$data$children[[1]]$data
main.node = raw_data[[2]]$data$children
if (min(length(meta.node), length(main.node)) > 0) {
structure = unlist(lapply(1:length(main.node),
function(x) get.structure(main.node[[x]], x)))
TEMP = data.frame(id = NA, structure = gsub("FALSE ",
"", structure[!grepl("TRUE", structure)]),
post_date = format(as.Date(as.POSIXct(meta.node$created_utc,
origin = "1970-01-01")), "%d-%m-%y"),
comm_date = format(as.Date(as.POSIXct(unlist(lapply(main.node,
function(x) {
GetAttribute(x, "created_utc")
})), origin = "1970-01-01")), "%d-%m-%y"),
num_comments = meta.node$num_comments,
subreddit = ifelse(is.null(meta.node$subreddit),
"UNKNOWN", meta.node$subreddit), upvote_prop = meta.node$upvote_ratio,
post_score = meta.node$score, author = meta.node$author,
user = unlist(lapply(main.node, function(x) {
GetAttribute(x, "author")
})),
comment_score = unlist(lapply(main.node,
function(x) {
GetAttribute(x, "score")
})),
controversiality = unlist(lapply(main.node,
function(x) {
GetAttribute(x, "controversiality")
})),
comment = unlist(lapply(main.node, function(x) {
GetAttribute(x, "body")
})),
title = meta.node$title, post_text = meta.node$selftext,
link = meta.node$url, domain = meta.node$domain,
#flair = unlist(lapply(main.node, function(x) {GetAttribute(x, "flair")})),
URL = URL[i], stringsAsFactors = FALSE)
TEMP$id = 1:nrow(TEMP)
if (dim(TEMP)[1] > 0 & dim(TEMP)[2] > 0)
data_extract = rbind(TEMP, data_extract)
else print(paste("missed", i, ":", URL[i]))
}
}
utils::setTxtProgressBar(pb, i)
Sys.sleep(min(2, wait_time))
}
close(pb)
return(data_extract)
}
Edit: I'd also like to grab the URL for the "parent" comment, which looks like its in tags like
<p class="parent"><a name="d3t1p1r"></a></p>

I managed to come up with an ad hoc solution. I'll post it here for posterity. The issue is the function as-is wasn't set up to handle NULL JSON values. It was a quick fix.
About midway down there are two raw_data = lines. You need to add the nullValue = 'your null text' argument to the fromJSON function. Then you can add whatever metadata you wanted to both the empty data frame and the TEMP data frame, using the same construction as elsewhere. In the function below I've added both the user's flair text and the ID of the parent comment.
(Note, the wonky indenting is from the original function...I've left it as is to prevent accidentally changing something.)
reddit.fixed <- function (URL, wait_time = 2)
{
if (is.null(URL) | length(URL) == 0 | !is.character(URL)) {
stop("invalid URL parameter")
}
GetAttribute = function(node, feature) {
Attribute = node$data[[feature]]
replies = node$data$replies
reply.nodes = if (is.list(replies))
replies$data$children
else NULL
return(list(Attribute, lapply(reply.nodes, function(x) {
GetAttribute(x, feature)
})))
}
get.structure = function(node, depth = 0) {
if (is.null(node)) {
return(list())
}
filter = is.null(node$data$author)
replies = node$data$replies
reply.nodes = if (is.list(replies))
replies$data$children
else NULL
return(list(paste0(filter, " ", depth), lapply(1:length(reply.nodes),
function(x) get.structure(reply.nodes[[x]], paste0(depth,
"_", x)))))
}
data_extract = data.frame(id = numeric(), structure = character(),
post_date = as.Date(character()), comm_date = as.Date(character()),
num_comments = numeric(), subreddit = character(), upvote_prop = numeric(),
post_score = numeric(), author = character(), user = character(),
comment_score = numeric(), controversiality = numeric(),
comment = character(), title = character(), post_text = character(),
link = character(), domain = character(), URL = character(), flair = character(), parent = character())
pb = utils::txtProgressBar(min = 0, max = length(URL), style = 3)
for (i in seq(URL)) {
if (!grepl("^https?://(.*)", URL[i]))
URL[i] = paste0("https://www.", gsub("^.*(reddit\\..*$)",
"\\1", URL[i]))
if (!grepl("\\?ref=search_posts$", URL[i]))
URL[i] = paste0(gsub("/$", "", URL[i]), "/?ref=search_posts")
X = paste0(gsub("\\?ref=search_posts$", "", URL[i]),
".json?limit=500")
raw_data = tryCatch(RJSONIO::fromJSON(readLines(X, warn = FALSE), nullValue = "none"),
error = function(e) NULL)
if (is.null(raw_data)) {
Sys.sleep(min(1, wait_time))
raw_data = tryCatch(RJSONIO::fromJSON(readLines(X,
warn = FALSE), nullValue = "none"), error = function(e) NULL)
}
if (is.null(raw_data) == FALSE) {
meta.node = raw_data[[1]]$data$children[[1]]$data
main.node = raw_data[[2]]$data$children
if (min(length(meta.node), length(main.node)) > 0) {
structure = unlist(lapply(1:length(main.node),
function(x) get.structure(main.node[[x]], x)))
TEMP = data.frame(id = NA, structure = gsub("FALSE ",
"", structure[!grepl("TRUE", structure)]),
post_date = format(as.Date(as.POSIXct(meta.node$created_utc,
origin = "1970-01-01")), "%d-%m-%y"), comm_date = format(as.Date(as.POSIXct(unlist(lapply(main.node,
function(x) {
GetAttribute(x, "created_utc")
})), origin = "1970-01-01")), "%d-%m-%y"),
num_comments = meta.node$num_comments, subreddit = ifelse(is.null(meta.node$subreddit),
"UNKNOWN", meta.node$subreddit), upvote_prop = meta.node$upvote_ratio,
post_score = meta.node$score, author = meta.node$author,
user = unlist(lapply(main.node, function(x) {
GetAttribute(x, "author")
})), comment_score = unlist(lapply(main.node,
function(x) {
GetAttribute(x, "score")
})), controversiality = unlist(lapply(main.node,
function(x) {
GetAttribute(x, "controversiality")
})), comment = unlist(lapply(main.node, function(x) {
GetAttribute(x, "body")
})), title = meta.node$title, post_text = meta.node$selftext,
link = meta.node$url, domain = meta.node$domain,
URL = URL[i],
flair = unlist(lapply(main.node, function(x) {
GetAttribute(x, "author_flair_text")
})),
parent = unlist(lapply(main.node, function(x) {GetAttribute(x, "parent_id")})),
stringsAsFactors = FALSE)
TEMP$id = 1:nrow(TEMP)
if (dim(TEMP)[1] > 0 & dim(TEMP)[2] > 0)
data_extract = rbind(TEMP, data_extract)
else print(paste("missed", i, ":", URL[i]))
}
}
utils::setTxtProgressBar(pb, i)
Sys.sleep(min(2, wait_time))
}
close(pb)
return(data_extract)
}

Twitter GET not working with since_id

Working in R, but that shouldn't really matter.
I want to gather all tweets after : https://twitter.com/ChrisChristie/status/663046613779156996
So Tweet ID : 663046613779156996
base = "https://ontributor_details = "contributor_details=true"
## include_rts
include_rts = "include_rts=true"
## exclude_replies
exclude_replies = "exclude_replies=false"api.twitter.com/1.1/statuses/user_timeline.json?"
queryName = "chrischristie"
query = paste("q=", queryName, sep="")
secondary_url = paste(query, count, contributor_details,include_rts,exclude_replies, sep="&")
final_url = paste(base, secondary_url, sep="")
timeline = GET(final_url, sig)
This (the above) works. There is no since_id. The URL comes out to be
"https://api.twitter.com/1.1/statuses/user_timeline.json?q=chrischristie&count=200&contributor_details=true&include_rts=true&exclude_replies=false"
The below does not, just by adding in the following
cur_since_id_url = "since_id=663046613779156996"
secondary_url = paste(query, count,
contributor_details,include_rts,exclude_replies,cur_since_id_url, sep="&")
final_url = paste(base, secondary_url, sep="")
timeline = GET(final_url, sig)
The url for the above there is
"https://api.twitter.com/1.1/statuses/user_timeline.json?q=chrischristie&count=200&contributor_details=true&include_rts=true&exclude_replies=false&since_id=663046613779156992"

This seems to work:
require(httr)
myapp <- oauth_app(
"twitter",
key = "......",
secret = ".......")
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
req <- GET("https://api.twitter.com/1.1/statuses/user_timeline.json",
query = list(
screen_name="chrischristie",
count=10,
contributor_details=TRUE,
include_rts=TRUE,
exclude_replies=FALSE,
since_id=663046613779156992),
config(token = twitter_token))
content(req)
Have a look at GET statuses/user_timeline

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Appending rows to existing Dataframe in R - r

Related

Using reactive input within reactiveValue() function

How do I get attributes of list elements recursively in R?

Create argument list using lapply for do.call

Scraping Reddit in R with RedditExtractoR

Twitter GET not working with since_id

Categories

Resources