This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head
Related
This question already has an answer here:
Read a CSV in R as a data.frame
(1 answer)
Closed 1 year ago.
One similar question did have a similar type of issue but it looks like there were some typos involved and it wasn't using the right functions, it was unknown that it wasn't a data frame. Mine is being read in as a list from a table structured .csv file. I've also attempted the rbind option in do.call suggested in this question but that didn't work so it's commented out, it still showed up as a list under typeof().
Here's a public Google spreadsheet of the dataset and here's my reproduceable code:
# Read study file
getwd()
bank <- read.csv("attemptCSV.csv")
bank
typeof(bank)
bank <- data.frame(bank)
typeof(bank)
colnames(bank)
#do.call(rbind.data.frame, bank)
#typeof(bank)
#> typeof(bank)
#[1] "list"
#> bank <- data.frame(bank)
#> typeof(bank)
#[1] "list"
#> colnames(bank)
# [1] "age" "job" "marital" "education" "default" "balance" "housing" "loan"
# [9] "contact" "day" "month" "duration" "campaign" "pdays" "previous" #"poutcome"
#[17] "y"
It should be class and not typeof because according to ?typeof
typeof determines the (R internal) type or storage mode of any object
class(bank)
-checking
> data(iris)
> typeof(iris)
[1] "list"
> class(iris)
[1] "data.frame"
> is.list(iris)
[1] TRUE
> is.data.frame(iris)
[1] TRUE
The reason is also that data.frame is a list with elements (columns) of equal length
I have a list looks like this
[[1]]
[[1]][[1]]
[[1]][[1]]$p1est.z
[1] 2.890829
[[1]][[1]]$p1se.z
[1] 0.1418367
[[1]][[2]]
[[1]][[2]]$p2est.w
[1] 4.947014
[[1]][[2]]$p2se.w
[1] 0.5986682
[[2]]
[[2]][[1]]
[[2]][[1]]$p1est.z
[1] 3.158164
[[2]][[1]]$p1se.z
[1] 0.138770
[[2]][[2]]
[[2]][[2]]$p2est.w
[1] 5.052874
[[2]][[2]]$p2se.w
[1] 0.585608
How can I extract values of "p1est.z" from both levels? since I need to compute the average of them.
Thanks!
Actually the unlist() function out of the box should probably work here:
output <- unlist(your_list)
output[names(output) == "p1est.z"]
p1est.z p1est.z
2.890829 3.158164
Data:
your_list <- list(
list(list(p1est.z=2.890829, p1se.z=0.1418367),
list(p1est.w=4.947014, p2se.w=0.5986682)),
list(list(p1est.z=3.158164, p1se.z=0.138770),
list(p1est.w=5.052874, p2se.w=0.585608)))
One way to do this, using Tim Biegeleisen's representation of your data is to make a function to extract p1est.z and apply that. Your top level list has two elements, in both, the first element has a p1est.z so you could do
fn <- function(x) { x[[1]]$p1est.z }
and then apply it
sapply(your_list, fn)
# [1] 2.890829 3.158164
This might be a job for Purr or maybe I just need to change the layout of my function or my data here.
I've got a function that takes 2 arguments which I'm trying to apply across a list of data frames. One of the arguments should be the list element (the name) name whereas the other will be a list component (a value in the list).
Book List - Book title and Chapters
my_list <- list(Book1 = c("ABC", "DEF", "GHI"), Book2 = c("ABB", "BCC"), Book3 = c("AAA", "BBB", "CCC", "DDD"))
Function with arg for list component and value
my_function <- function(Book, Chapter) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
##Would have API call here, but let's just print path
path
}
I can easily call this on an individual item by specifying the arguments in map
map(my_list$Book1, function(Chapter) my_function(Chapter =
Chapter, Book = "Book1"))
Output:
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
But How do I apply the function to each list element, calling the function to each Book name and the chapter values?
I'm hoping for something like
[[1]]
[1] "www.fake-website.comctp:Book1/ABC"
[[2]]
[1] "www.fake-website.comctp:Book1/DEF"
[[3]]
[1] "www.fake-website.comctp:Book1/GHI"
[[4]]
[1] "www.fake-website.comctp:Book2/ABB"
[[5]]
[1] "www.fake-website.comctp:Book2/BCC"
[[6]]
[1] "www.fake-website.comctp:Book2/AAA"
[[7]]
[1] "www.fake-website.comctp:Book2/BBB"
[[8]]
[1] "www.fake-website.comctp:Book2/CCC"
[[9]]
[1] "www.fake-website.comctp:Book2/DDD"
My function actually isn't simply pasting Books and Chapters, but getting a bunch of info from the API and parsing it.
However, what I need help with is mapping across the list of data frames and pairing the book arg with the chapter arg.
You can use purrr::imap, which passes the names of the list as second argument to the function:
library(purrr)
imap(my_list, ~ my_function(..2, ..1))
# or imap(my_list, ~ my_function(.y, .x))
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"
And if you switch the arguments of your function, Book and Chapter, you can simply do:
my_function <- function(Chapter, Book) {
path <- paste("www.fake-website.com", "ctp:", Book, "/", Chapter, sep = "")
path
}
imap(my_list, my_function)
$Book1
[1] "www.fake-website.comctp:Book1/ABC" "www.fake-website.comctp:Book1/DEF"
[3] "www.fake-website.comctp:Book1/GHI"
$Book2
[1] "www.fake-website.comctp:Book2/ABB" "www.fake-website.comctp:Book2/BCC"
$Book3
[1] "www.fake-website.comctp:Book3/AAA" "www.fake-website.comctp:Book3/BBB"
[3] "www.fake-website.comctp:Book3/CCC" "www.fake-website.comctp:Book3/DDD"
When I read the csv file into df, SoftwareOwner is a character column
> df
Software SoftwareOwner
<chr> <chr>
1 I-DEAS Siemens
2 TeamViewer Autodesk, TeamViewer, Siemens
3 Inventor PTC, Google, SpaceClaim, Bricys
4 AutoCAD Autodesk
I want to make SoftwareOwner a list within this data frame so I tried the simple solution
> df$SoftwareOwner <- as.list(df$SoftwareOwner)
But all this did was make each entry in the column a list with one entry
> df$SoftwareOwner[2]
[[1]]
[1] "Autodesk, TeamViewer, Siemens"
I've tried adding parameters like sep = "," and all.names = TRUE to as.list but neither worked. Is there any way to access just Autodesk or TeamViewer or Siemens when calling something like what I have just above?
Might I recommend making Siemens, Autodesk, Teamviewer, etc. their own columns and coding a 1 or 0 to indicate ownership? In my experience this is a far more flexible approach.
A possible solution :
# recreate your data.frame
df <- read.csv(text=
"Software;SoftwareOwner
I-DEAS;Siemens
TeamViewer;Autodesk, TeamViewer, Siemens
Inventor;PTC, Google, SpaceClaim, Bricys
AutoCAD;Autodesk",sep=";")
df$SoftwareOwner <- lapply(strsplit(as.character(df$SoftwareOwner),split=','),trimws)
# > df$SoftwareOwner
# [[1]]
# [1] "Siemens"
#
# [[2]]
# [1] "Autodesk" "TeamViewer" "Siemens"
#
# [[3]]
# [1] "PTC" "Google" "SpaceClaim" "Bricys"
#
# [[4]]
# [1] "Autodesk"
# > df$SoftwareOwner[[2]][3]
# [1] "Siemens"
# > df$SoftwareOwner[[3]][2]
# [1] "Google"
I have two matrix with the same number of columns, but with different number of rows:
a <- cbind(runif(5), runif(5))
b <- cbind(runif(8), runif(8))
I want to associate these in a same list, so that the first columns of a and b are associated with each other, and so on:
my_result <- list(list(a[,1], b[,1]), list(a[,2], b[,2]))
So the result would look like this:
> print(my_result)
[[1]]
[[1]][[1]]
[1] 0.9440956 0.7259602 0.7804068 0.7115368 0.2771190
[[1]][[2]]
[1] 0.4155642 0.1535414 0.6983123 0.7578231 0.2126765 0.6753884 0.8160817
[8] 0.6548915
[[2]]
[[2]][[1]]
[1] 0.7343330 0.7751599 0.4463870 0.6926663 0.9692621
[[2]][[2]]
[1] 0.5708726 0.1234482 0.2875474 0.4760349 0.2027653 0.5142006 0.4788264
[8] 0.7935544
I can't figure how to do that without a for loop, but I'm pretty sure some *pply magic could be used here.
Any directions would be much appreciated.
I'm not sure how general a solution you're looking for (arbitrary number of matrices, ability to pass a list of matrices, etc.) but this works for your specific example:
lapply(1:2,function(i){list(a[,i],b[,i])})