How can transform JSON to DF? - r

I m working with a Rstudio code, i have 450 JSON files, i have all in my workspace, with some JSON files are all rigth, but with some files like this one (https://drive.google.com/file/d/1DsezCmN8_8iLNCAsLZiRnxTrwnWu6LkD/view?usp=sharing , is a 296kb json) when i try to make the field tm to dataframe i have this mistake
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 1
The code that i use is
JSONList <- rjson::fromJSON(file = "2.json", simplify = F)
DF <- as.data.frame(JSONList$tm)
With the files that are ok i obtain a 1 observation of 5168 variables.
How can i avoid this priblem with some files?
Thanks

Another posibility that i think is select the rows that i need
candidatos = list(
"name",
"score",
"tot_sFieldGoalsMade",
"tot_sFieldGoalsAttempted",
"tot_sTwoPointersMade",
"tot_sTwoPointersAttempted",
"tot_sThreePointersMade",
"tot_sThreePointersAttempted",
"tot_sFreeThrowsMade",
"tot_sFreeThrowsAttempted",
"tot_sReboundsDefensive",
"tot_sReboundsOffensive",
"tot_sReboundsTotal",
"tot_sAssists",
"tot_sBlocks",
"tot_sTurnovers",
"tot_sFoulsPersonal",
"tot_sPointsInThePaint",
"tot_sPointsSecondChance",
"tot_sPointsFromTurnovers",
"tot_sBenchPoints",
"tot_sPointsFastBreak",
"tot_sSteals"
)
ListColum<-map(candidatos, function(x){
as.data.frame(data$tm$"2"$x)
} )
But R give me a list of 23 DF with no elements

Related

Lists to Dataframe in R

I am using the following API endpoint to pull a data table.
"https://statsapi.web.nhl.com/api/v1/game/2022020002/feed/live"
I am able to make a successful connection to the API (RESPONSE = 200) using the code below:
LIVE <- GET("https://statsapi.web.nhl.com/api/v1/game/2022020002/feed/live")
The API provides JSON data which I flatten with the following:
LIVE2 <- rawToChar(LIVE$content) %>% jsonlite::fromJSON(., flatten = TRUE)
The result is a number of lists and when I try to convert it to a dataframe I am unsuccesful.
LIVE2 <- rawToChar(LIVE$content) %>% jsonlite::fromJSON(., flatten = TRUE) %>% as.data.frame(.)
Here is the error I get:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
If someone would be able to help me figure out how to solve this final step of converting the lists of data from the API to an R dataframe I would be very grateful.

Trying to use multimerge in R and get error message

I am trying to do [multimerge][1] but it keep getting an error message
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 119, 75, 78, 71, 74
I thought merge would be able to handle different row numbers - which is why I am using merge rather than cbind (and the fact that i need each row in each column to match specifically to a rat ID).
Here is what I've got:
datalist = list()
modeldf <- d %>%
select("Dam", "Rat", "Group", "Sex", "CS.NCS")
datalist <- list(modeldf) #adding to list
for (i in colnames(d[c(6:47)])) {
#STUFF IN FOR LOOP
resultcolumn <- newdf %>% #final result dataframe
select("Rat", all_of(i))
datalist[[i]] <- resultcolumn #adding result dataframe to list
}
resultsdf <- merge(datalist, by="Rat", all.x=TRUE, sort = FALSE)
The datalist is a perfect list of dataframes, the resultcolumn outputs a perfect dataframe (i checked with class()).
What is the problem and how do I fix it?
edit: typo
[1]: https://cran.r-project.org/web/packages/Orcs/vignettes/merge.html
turns out i'm an idiot. solved by installing this package and using multimerge

Why doesn't dataframe set string column to non-factor despite setting the option, stringsAsFactors = F?

HI I like to generate a frequency table with string as a non factor, however I find this error.
> str ( data.frame ( table ( iris$Species) , stringsAsFactors = F ) )
'data.frame': 3 obs. of 2 variables:
$ Var1: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
$ Freq: int 50 50 50
the only way that I can do this now is save the dataframe, temp for example and set the Var1 column as.character()
In addition, setting options(stringsAsFactors = FALSE) also does not remove factors.
thanks!
You can use as.data.frame instead to get the desired behavior (as already mentioned). But this does not answer your question why data.frame fails.
The reason is not obvious and I had to look inside the source code of data.frame to find it. The reason is that table returns a single named numeric vector. I.e. the "Var1" column is inferred from the names of a numeric vector, not from a character vector.
The following lines in data.frame then come into play
xi <- if (is.character(x[[i]]) || is.list(x[[i]]))
as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
else as.data.frame(x[[i]], optional = TRUE)
Here we see that as.data.frame gets called without invoking stringsAsFactors = F, because it has detected a numeric vector.
Addendum:
An additional question arose in the comments. Namely that "using as.data.frame without explicitly setting stringAsfactors = F still fails despite setting options(stringsAsFactors = FALSE)"
The reason for this is separate, and could arguably be described as a bug in the R source code. Here is the function definition header for as.data.frame.table which is the method that gets dispatched for a table object:
function (x, row.names = NULL, ..., responseName = "Freq",
stringsAsFactors = TRUE,
sep = "", base = list(LETTERS))
Notice that stringsAsFactors = TRUE is taken as the default value - the value set in options never gets checked. Compare this to other as.data.frame methods which typically defer to the value set in options as their default. For example, as.data.frame.list is defined like this:
function (x, row.names = NULL, optional = FALSE, ..., cut.names = FALSE,
col.names = names(x), fix.empty.names = TRUE,
stringsAsFactors = default.stringsAsFactors())

How do I iterate over a range of revision ID's when querying WikipediR?

I am using WikipediR to query revision ids to see if the very next edit is a 'rollback' or an 'undo'
I am interested in the tag and revision comment to identify if the edit was undone/rolled back.
my code for this for a single revision id is:
library(WikipediR)
wp_diff<- revision_diff("en", "wikipedia", revisions = "883987486", properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
I then convert the output of this to a df using the code
library(dplyr)
library(tibble)
diff <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
This works great for a single revision id.
I am wondering how I would loop or map over a vector of many revision ID's
I tried
vec <- c("883987486","911412795")
for (i in 1:length(vec)){
wp_diff[i]<- revision_diff("en", "wikipedia", revisions = i, properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
}
But this creates the error
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
When I try to convert the output list to a dataframe.
Does anybody have any suggestions. I am not sure how to proceed.
Thanks.
Try the following code:
# Make a function
make_diff_df <- function(rev){
wp_diff <- revision_diff("en", "wikipedia", revisions = rev,
properties = c("tags", "comment"),
direction = "next", clean_response = TRUE,
as_wikitext = TRUE)
DF <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
# Define the names of the DF
names(DF) <- c("pageid","ns","title","revisions.diff.from",
"revisions.diff.to","revisions.diff..",
"revisions.comment","revisions..mw.rollback.")
return(DF)
}
vec <- c("883987486","911412795")
# Use do.call and lapply with the function
do.call("rbind",lapply(vec,make_diff_df))
Note that you have to fixed the names of the DF inside the make_diff_df function in order to "rbind" inside do.call could work. The names with the 2 versions from the example are pretty similar.
Hope this can help

Reading Spss Data file in R

i am using Expss pakage .
df<-read_spss("test.SAV")
I shows the following:
Warning message: In foreign::read.spss(enc2native(file),
use.value.labels = FALSE, : Tally.SAV: Very long string record(s)
found (record type 7, subtype 14), each will be imported in
consecutive separate variables
It shows 4174 Variables in environment Panel.Actual Number of Variables in the Data file around 400.
Can anyone among you please help me on this.
As mentioned in the comment foreign::read.spss split SPSS long (>255 chars) characters variables into the several columns. If the such columns are empty you can drop them without any issues.
Convenience function for this:
remove_empty_characters_after_foreign = function(data){
empty_chars = vapply(data, FUN = function(column) is.character(column) & all(is.na(column)), FUN.VALUE = logical(1))
additional_chars = grepl("00\\d$", colnames(data), perl = TRUE)
to_remove = empty_chars & additional_chars
if(any(to_remove)){
message(paste0("Removing ", paste(colnames(data)[to_remove], collapse = ", "),"..."))
}
data[,!to_remove, drop = FALSE]
}
df = remove_empty_characters_after_foreign(df)

Resources