jsonlite array of arrays - r

when using jsonlite to import a json that has an array inside other array I get an undesired unnamed list. Exemple below:
myjson=jsonlite::fromJSON('{
"class" : "human",
"type" : [{
"shape":"thin",
"face":[{"eyes":"blues","hair":"brown"}]
}]
}')
str(myjson)
List of 2
$ class: chr "human"
$ type :'data.frame': 1 obs. of 2 variables:
..$ shape: chr "thin"
..$ face :List of 1
.. ..$ :'data.frame': 1 obs. of 2 variables:
.. .. ..$ eyes: chr "blues"
.. .. ..$ hair: chr "brown"
I would like to access the "eyes" field as below (however it doesn't work):
myjson[["type"]][["face"]][["eyes"]]
NULL
Instead, I need to add "[[1]]" in order to make it works:
myjson[["type"]][["face"]][[1]][["eyes"]]
[1] "blues"
Any ideas how could I format the json to get rid of this unnamed list?

The thing is, unnamed lists are used whenever there is a JSON vector [{}, {}, ...]. The fact that your first vector is turned into a named list and the second, inner one, is turned into an unnamed list is because jsonlite::fromJSON has arguments simplifyDataFrame = TRUE and flatten = TRUE by default, which have this behavior. I haven't looked into the source code, but it seems that the simplification involved (transforming a vector with only one element into a named list) only simplify the top-level objects.
A work around is to apply a function that turns any unnamed list with only a single object into the object itself.
my_json <- lapply(my_json, function(x) {
if (is.list(x)) # if element is a list, replace it with its first element
return(lapply(x, function(y) {
return(y[[1]])
}))
else
return(x)
})

Related

complex nested list to a clean matrix in R?

I have struggled for two days longs to find a way to create a specific matrix from a nested list
First of all, I am sorry if I don't explain my issue correctly I am one week new to StackOverflow* and R (and programming...)!
I use a file that you can find there :
original link: https://parltrack.org/dumps/ep_mep_activities.json.lz
Uncompressed by me here: https://wetransfer.com/downloads/701b7ac5250f451c6cb26d29b41bd88020200808183632/bb08429ca5102e3dc277f2f44d08f82220200808183652/666973
first 3 lists and last one (out of 23905) past here: https://pastebin.com/Kq7mjis5
With rjson, I have a nested list like this :
Nested list of MEP Votes
List of 23905
$ :List of 7
..$ ts : chr "2004-12-16T11:49:02"
..$ url : chr "http://www.europarl.europa.eu/RegData/seance_pleniere/proces_verbal/2004/12-16/votes_nominaux/xml/P6_PV(2004)12-16(RCV)_XC.xml"
..$ voteid : num 7829
..$ title : chr "Projet de budget général 2005 modifié - bloc 3"
..$ votes :List of 3
.. ..$ +:List of 2
.. .. ..$ total : num 45
.. .. ..$ groups:List of 6
.. .. .. ..$ ALDE :List of 1
.. .. .. .. ..$ : Named num 4404
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. ..$ GUE/NGL:List of 25
.. .. .. .. ..$ : Named num 28469
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. .. ..$ : Named num 4298
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
then my goal is to have something like this :
final matrix
First I would like to keep only the lists (from [[1]] to [[23905]]) containing $vote$+$groups$Renew or $vote$-$groups$Renew or $vote$'0'$groups$Renew. The main list (the 23905) are registered votes. My work is on the Renew group so my only interest is to have a vote where the Renew groups exist to compare them with other groups.
After that my goal is to create a matrix like this all the [[x]] where we can find groups$Renewexists:
final matrix
V1 V2 (not mandatory) V3[[x]]$voteid
[mepid==666] GUE/NGL + (mepid==[666] is found in [[1]]$vote$+$groups$GUE/NGL)
[mepid==777] Renew - (mepid==[777] is found in [[1]]$vote$-$groups$GUE/NGL)
I want to create a matrix so I can process the votes of each MEP (referenced by their MEPid). Their votes are either + (for yea), - (for nay) or 0 (for abstain). Moreover, I would like to have political groups of MEP displayed in the column next to their mepid. We can find their political group thanks to the place where their votes are stored. If the mepid is shown in the list [[x]]$vote$+$groups$GUE/NGL she or he belongs to the GUE/NGL groups.
What I want to do might look like this
# Clean the nested list
Keep Vote[[x]] if Vote[[x]] list contain ,
$vote$+$groups$Renew,
or $vote$-$groups$Renew,
or $vote$'0'$groups$Renew
# Create the matrix (or a data.frame if it is easier)
VoteMatrix <- as.matrix(
V1 = all "mepid" found in the nested list
V2 = groups (name of the list where we can find the mepid) (not mandatory)
V3 to Vy = If.else(mepid is in [[x]]$vote$+ then “+”,
mepid is in [[x]]$vote$- then “-“, "0")
)
Thank you in advance,
*Nevertheless, I am reading this website actively since I started R!
You can see that the 'votes' sublist is composed of three items a list of member numbers stored within what I think are party designators. Here's how you might "straighten" the positive voter 'memids' by party:
str( unlist( sapply(names(jlis[[1]]$votes$'+'$groups), function(x) unlist(jlis[[1]]$votes$'+'$groups[[x]]) ) ) )
Named num [1:104] 28268 4514 28841 28314 28241 ...
- attr(*, "names")= chr [1:104] "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" ...
You get a named numeric vector with 108 entries. Perhaps this will demonstrate what sort of terminology to use in better describing your desired result. (Just giving a partial schema for the desired result leaves way too much ambiguity to support a fully formed request.)
I do NOT see the number 23905 anywhere in what I downloaded from your link. We are clearly looking at different data. I see this for the timestamp: chr "2004-12-01T15:20:31". I'm not going to cut you any slack for not knowing R, since the task needs to be fully explained in a natural language. I will cut you slack regarding grammar if English is not your native tongue, but you definitely need to make a better effort at explication. This is what I see for the names with the votes$'+'$groups sublists of the first three items, but since RENEW is not in any of them there's not a lot that could be demonstrated about picking items:
> names( jlis[[1]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN"
> names( jlis[[2]]$votes$'+'$groups)
[1] "GUE/NGL" "IND/DEM" "NI" "PPE-DE"
> names( jlis[[3]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN" "Verts/ALE"
Furthermore, when I looked at all of the possible votes values using this method (for all three of the items you made available) I still see no RENEW names.
sapply( jlis[[1]]$votes[c("+","-","0")], function(x) names(x$groups) )
After second edit: Here's the next step of isolating those votes that contain a "Renew` value. I'm assuming that its possible to have a "Renew" value in only one of the three possible 'votes' values (+,-.0). If not (and there are always "Renew" values in each of them when there is one in any of them) then you might be able to simplify the logic. We make three logical vectors:
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
And then wrap them in a matrix call with 3 columns and take the maximum of each row (the maximum of c(TRUE,FALSE) is 1 and then convert back to logical.
selection_vec = as.logical( apply( matrix( c(
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } ) ),
ncol=3 ), 1,max))
> selection_vec
[1] FALSE FALSE FALSE TRUE

all_equal: Can't join because of incompatible types (list / list)

I have a data.frame, created from JSON format. It contains arrays of simple types, which are by default represented as a list of single vector. Then in unit tests I try to check it with dplyr::all_equal (I don't care about column order now, they get reordered by jsonlite::flatten and I don't have any idea how to deal with that).
But all_equal fails with a not very helpful message Can't join on 'arr' x 'arr' because of incompatible types (list / list). I don't know what is wrong - is list not acceptable at all? Should I convert this column to something else? (what?)
Minimal example showing what I'm trying to do:
f <- function() {
current <- jsonlite::fromJSON('{"objs":[{"arr":["l1", "l2"], "key":"1"}]}')$objs
expected <- data.frame("key" = "1", stringsAsFactors = F) %>%
mutate("arr" = list(c("l1", "l2")))
str(current)
str(expected)
all_equal(current, expected, ignore_col_order = TRUE)
}
f()
Output:
'data.frame': 1 obs. of 2 variables:
$ key: chr "1"
$ arr:List of 1
..$ : chr "l1" "l2"
'data.frame': 1 obs. of 2 variables:
$ key: chr "1"
$ arr:List of 1
..$ : chr "l1" "l2"
Error: Can't join on 'arr' x 'arr' because of incompatible types (list / list)
Note: I've seen How to fix “Error: Can't join on '.rows' x '.rows' because of incompatible types (list / list)” using censReg, but I don't think I have this issue, because the data frames are not grouped. I tried to ungroup them both anyway, there was no change in output.

Use purrr on a position of element in nested list?

Situation: I have a nested list in the image below. I want to use purrr to iterate over the second element of each nested list and apply a date conversion function.
Problem: I can write a for loop easily to iterate over it but I want to use this with purrr. My nested list attempts have not worked out. Normal list fine, nested by position, not fine.
Reproducible example code from Maurits Evers (Thank you!)
lst <- list(
list("one", "12345", "2019-01-01"),
list("two", "67890", "2019-01-02"))
Any assistance appreciated!
Please see the comment above to understand how to provide a reproducible example including sample data.
Since you don't provide sample data, let's create some minimal mock data similar to what is shown in your screenshot.
lst <- list(
list("one", "12345", "2019-01-01"),
list("two", "67890", "2019-01-02"))
To cast the third element of every list element as.Date we can then do
lst <- map(lst, ~{.x[[3]] <- as.Date(.x[[3]]); .x})
We can confirm that the third element of every list element is an object of type Date
str(lst)
#List of 2
# $ :List of 3
# ..$ : chr "one"
# ..$ : chr "12345"
# ..$ : Date[1:1], format: "2019-01-01"
# $ :List of 3
# ..$ : chr "two"
# ..$ : chr "67890"
# ..$ : Date[1:1], format: "2019-01-02"
Update
A more purrr/tidyverse-canonical approach would be to use modify_at (thanks #H1)
lst <- map(lst, ~modify_at(.x, 3, as.Date))
The result is the same as before.

Adding attributes to a name list erases the names of that list

Simple question based on an unexpected behavior I observed. I have a named list in R on which I add attributes with the attributes<- call. This erases the name of the list. Why and how can I prevent that?
ex:
ll <- list(a=1:4, b="der")
str(ll)
List of 2
$ a: int [1:4] 1 2 3 4
$ b: chr "der"
attributes(ll) <- list(attr1 = "my_attr")
str(ll)
List of 2
$ : int [1:4] 1 2 3 4
$ : chr "der"
- attr(*, "attr1")= chr "my_attr"
There are no names anymore.
I can get them back doing this:
names(ll) <- c("a", "b")
str(ll)
List of 2
$ a: int [1:4] 1 2 3 4
$ b: chr "der"
- attr(*, "attr1")= chr "my_attr"
However I would like not to have to record the names before and reapply them after. I have a feeling the original names are an attribute that gets overwritten by attributes<- call. Any idea how to get over that?
I think this (i.e., setting a single new attribute, or modifying an existing one, while leaving existing attributes in place) is exactly what attr()<- is for:
> attr(ll,"attr1") <- "my_attr"
> ll
$a
[1] 1 2 3 4
$b
[1] "der"
attr(,"attr1")
[1] "my_attr"
From the documentation for attributes:
Assigning attributes first removes all attributes, then sets any dim
attribute and then the remaining attributes in the order given: this
ensures that setting a dim attribute always precedes the dimnames
attribute.
I think capturing names beforehand may indeed be the only way, if you must use attributes. But I would consider changing the attribute with a more targeted function, if possible. What are you trying to set?
You may for instance consider adding a comment. See the documentation here.
A good way to add attributes to an existing object is to do:
attributes(ll) <- append(attributes(ll), list(attr1 = "my_attr"))
This is more robust as it works for attributes in list AND in data.frame and requires only one row.

Can't write data frame to database

I can't really create a code example because I'm not quite sure what the problem is and my actual problem is rather involved. That said it seems like kind of a generic problem that maybe somebody's seen before.
Basically I'm constructing 3 different dataframes and rbinding them together, which is all as expected smooth sailing but when I try to write that merged frame back to the DB I get this error:
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'
I've tried manually coercing them using as.data.frame() before and after the rbinds and the returned object (the same one that fails to write with the above error message) exists in the environment as class data.frame so why does dbWriteTable not seem to have got the memo?
Sorry, I'm connecting to a MySQL DB using RMySQL. The problem I think as I look a little closer and try to explain myself is that the columns of my data frame are themselves lists (of the same length), which sorta makes sense of the error. I'd think (or like to think anyways) that a call to as.data.frame() would take care of that but I guess not?
A portion of my str() since it's long looks like:
.. [list output truncated]
$ stcong :List of 29809
..$ : int 3
..$ : int 8
..$ : int 4
..$ : int 2
I guess I'm wondering if there's an easy way to force this coercion?
Hard to say for sure, since you provided so little concrete information, but this would be one way to convert a list column to an atomic vector column:
> d <- data.frame(x = 1:5)
> d$y <- as.list(letters[1:5])
> str(d)
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y:List of 5
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
..$ : chr "d"
..$ : chr "e"
> d$y <- unlist(d$y)
> str(d)
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y: chr "a" "b" "c" "d" ...
This assumes that each element of your list column is only a length one vector. If any aren't, things will be more complicated, and you'd likely need to rethink your data structure anyhow.

Resources