Lists to Dataframe in R - r

I am using the following API endpoint to pull a data table.
"https://statsapi.web.nhl.com/api/v1/game/2022020002/feed/live"
I am able to make a successful connection to the API (RESPONSE = 200) using the code below:
LIVE <- GET("https://statsapi.web.nhl.com/api/v1/game/2022020002/feed/live")
The API provides JSON data which I flatten with the following:
LIVE2 <- rawToChar(LIVE$content) %>% jsonlite::fromJSON(., flatten = TRUE)
The result is a number of lists and when I try to convert it to a dataframe I am unsuccesful.
LIVE2 <- rawToChar(LIVE$content) %>% jsonlite::fromJSON(., flatten = TRUE) %>% as.data.frame(.)
Here is the error I get:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
If someone would be able to help me figure out how to solve this final step of converting the lists of data from the API to an R dataframe I would be very grateful.

Related

Data frame with different number of columns in a for loop

I am new to R and got stuck on a for loop for my web scraping project using rvest.
I am trying to extract notes (nested within scorecard URL) from ESPN cricinfo website. My code is this;
library(dplyr)
library(rvest)
get_notes = function(score){
score_page = read_html(score)
score_notes = score_page %>% html_nodes(".ds-mt-3 .ds-mb-4 .ds-p-4,
.ds-mb-4~ .ds-mb-4+ .ds-mb-4 .ds-p-4,
.ds-mt-3 .ds-text-typo-title .ds-text-tight-s") %>% html_text()
}
notesdata = data.frame()
for (page_result in c(2019,2020,2021)){
link = paste0("https://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=",
page_result,";type=year")
pages = read_html(link)
scorecard = pages %>% html_nodes("td:nth-child(7) .data-link") %>% html_text()
match_url = pages %>% html_nodes("td:nth-child(7) .data-link") %>%
html_attr("href") %>%
paste("https://www.espncricinfo.com/",., sep="")
notes = sapply(match_url, FUN = get_notes, USE.NAMES = FALSE)
notesdata = rbind(notesdata,
data.frame(t(notes),
desperse.level = 0)
)
print(paste("page:", page_result))
}
When I run this code, I get the following error message;
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
Can someone help me create a data frame (or anything that I can turn into csv file)? Thanks a lot!
It is not easy to run your "reproducible" example due to the oget_notes function returns very large matrices. However, I think that your problem would be associated with the next:
onotes = sapply(omatch_url, FUN = oget_notes, USE.NAMES = FALSE)
That line returns a matrix of dimensions 3xL, where L is the length of omatch_url.
The other objects (oteam_one, oteam_two, oMatch_dates, oground and oscorecard) are of length L.
When you try to run data.frame(oteam_one,oteam_two,oMatch_dates,oground,oscorecard,onotes), the data.frame function is expecting a vector of some same length (L) or matrices with as many rows as the length of the vectors (L x #).
So, my suggestion would be to change your line 32 from this:
onotesdata = rbind(onotesdata,
data.frame(oteam_one,
oteam_two,
oMatch_dates,
oground,
oscorecard,
onotes))
To this:
onotesdata = rbind(onotesdata,
data.frame(oteam_one,
oteam_two,
oMatch_dates,
oground,
oscorecard,
t(onotes)),
deparse.level = 0)
Again, I couldn't run your script because the output of oget_notes, so I don't know if this solution will solve your issue.

How can transform JSON to DF?

I m working with a Rstudio code, i have 450 JSON files, i have all in my workspace, with some JSON files are all rigth, but with some files like this one (https://drive.google.com/file/d/1DsezCmN8_8iLNCAsLZiRnxTrwnWu6LkD/view?usp=sharing , is a 296kb json) when i try to make the field tm to dataframe i have this mistake
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 1
The code that i use is
JSONList <- rjson::fromJSON(file = "2.json", simplify = F)
DF <- as.data.frame(JSONList$tm)
With the files that are ok i obtain a 1 observation of 5168 variables.
How can i avoid this priblem with some files?
Thanks
Another posibility that i think is select the rows that i need
candidatos = list(
"name",
"score",
"tot_sFieldGoalsMade",
"tot_sFieldGoalsAttempted",
"tot_sTwoPointersMade",
"tot_sTwoPointersAttempted",
"tot_sThreePointersMade",
"tot_sThreePointersAttempted",
"tot_sFreeThrowsMade",
"tot_sFreeThrowsAttempted",
"tot_sReboundsDefensive",
"tot_sReboundsOffensive",
"tot_sReboundsTotal",
"tot_sAssists",
"tot_sBlocks",
"tot_sTurnovers",
"tot_sFoulsPersonal",
"tot_sPointsInThePaint",
"tot_sPointsSecondChance",
"tot_sPointsFromTurnovers",
"tot_sBenchPoints",
"tot_sPointsFastBreak",
"tot_sSteals"
)
ListColum<-map(candidatos, function(x){
as.data.frame(data$tm$"2"$x)
} )
But R give me a list of 23 DF with no elements

Using API in R and converting to dataframe format

I'm basically trying to call an API to retrieve weather information from a government website.
library(data.table)
library(jsonlite)
library(httr)
base<-"https://api.data.gov.sg/v1/environment/rainfall"
date1<-"2020-01-25"
call1<-paste(base,"?","date","=",date1,sep="")
get_rainfall<-GET(call1)
get_rainfall_text<-content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.frame(get_rainfall_json)
I'm getting an error
"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 52, 287, 1"
Not too sure how to resolve this, i'm trying to format the retrieved data into a dataframe format so i can make sense of the readings.
Your "get_rainfall_json" object comes back as a "list". Trying to turn this into a data frame is where you are getting the error. If you specify the "items" object within the list, your error is resolved! (The outcome of this looks like it has some more embedded data within objects... So you'll have to parse through that into a format you're interested in.)
get_rainfall_df <- as.data.frame(get_rainfall_json$items)
Update
In order to loop through the next data frame. Here is one way you could do it. Which loops through each row, extracts the list in each row and turns that into a data frame and appends it to the "df". Then, you are left with one final df with all the data in one place.
library(data.table)
library(jsonlite)
library(httr)
library(dplyr)
base <- "https://api.data.gov.sg/v1/environment/rainfall"
date1 <- "2020-01-25"
call1 <- paste(base, "?", "date", "=", date1, sep = "")
get_rainfall <- GET(call1)
get_rainfall_text <- content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.table(get_rainfall_json$items)
df <- data.frame()
for (row in 1:nrow(get_rainfall_df)) {
new_date <- get_rainfall_df[row, ]$readings[[1]]
colnames(new_date) <- c("stationid", "value")
date <- get_rainfall_df[row, ]$timestamp
new_date$date <- date
df <- bind_rows(df, new_date)
}

How do I iterate over a range of revision ID's when querying WikipediR?

I am using WikipediR to query revision ids to see if the very next edit is a 'rollback' or an 'undo'
I am interested in the tag and revision comment to identify if the edit was undone/rolled back.
my code for this for a single revision id is:
library(WikipediR)
wp_diff<- revision_diff("en", "wikipedia", revisions = "883987486", properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
I then convert the output of this to a df using the code
library(dplyr)
library(tibble)
diff <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
This works great for a single revision id.
I am wondering how I would loop or map over a vector of many revision ID's
I tried
vec <- c("883987486","911412795")
for (i in 1:length(vec)){
wp_diff[i]<- revision_diff("en", "wikipedia", revisions = i, properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
}
But this creates the error
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
When I try to convert the output list to a dataframe.
Does anybody have any suggestions. I am not sure how to proceed.
Thanks.
Try the following code:
# Make a function
make_diff_df <- function(rev){
wp_diff <- revision_diff("en", "wikipedia", revisions = rev,
properties = c("tags", "comment"),
direction = "next", clean_response = TRUE,
as_wikitext = TRUE)
DF <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
# Define the names of the DF
names(DF) <- c("pageid","ns","title","revisions.diff.from",
"revisions.diff.to","revisions.diff..",
"revisions.comment","revisions..mw.rollback.")
return(DF)
}
vec <- c("883987486","911412795")
# Use do.call and lapply with the function
do.call("rbind",lapply(vec,make_diff_df))
Note that you have to fixed the names of the DF inside the make_diff_df function in order to "rbind" inside do.call could work. The names with the 2 versions from the example are pretty similar.
Hope this can help

Fuzzy lookup for two different data tables in r by using stringdist_left_Join

Getting below mentioned error
> Error in stringdist::stringdist(v1, v2, method = method, ...) :
> unused argument (distance_col = "dist")
while running this:
stringdist_left_join(data1, data2, by = "ADDRESS", max_dist = 0.50, ignore_case = FALSE, method = "jw",distance_col ="dist")
Without distance_col ="dist" i can run and got output without matching score
So,I need matching score of ADDRESS column of data1 and data2 through the distance_col="dist" variable.
Pls help me to get the matching score

Resources