error in export CSV in R Using write.csv() - r

I got this error after runing write.csv(),
how I can fix it?
Thanks
write.csv (res_basic,"ceRNA_basic_result", row.names=TRUE)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 43131, 10499
str(res_basic):
List of 2
$ cesig :'data.frame': 43131 obs. of 5 variables:
..$ targetce : chr [1:43131] "AASDHPPT" "AASDHPPT" "AASDHPPT" "AASDHPPT" ...
..$ anotherce : chr [1:43131] "ADGRG1" "AFAP1" "BCL3" "C1orf147" ...
..$ miRNAs : chr [1:43131] "hsa-miR-6837-3p" "hsa-miR-1185-1-3p" "hsa-miR-6837-3p" "hsa-miR-1185-1-3p" ...
..$ miRNAs_num: num [1:43131] 1 1 1 1 1 1 1 1 1 1 ...
..$ ratio : num [1:43131] 0.5 1 1 1 1 0.5 1 1 1 1 ...
$ cenotsig:'data.frame': 10499 obs. of 5 variables:
..$ targetce : chr [1:10499] "AASDHPPT" "AASDHPPT" "AASDHPPT" "AASDHPPT" ...
..$ anotherce : chr [1:10499] "ARCN1" "BACH1" "CDK6" "DNAJA1" ...
..$ miRNAs : chr [1:10499] "hsa-miR-1185-1-3p" "hsa-miR-1185-1-3p" "hsa-miR-7849-3p" "hsa-miR-6837-3p" ...
..$ miRNAs_num: num [1:10499] 1 1 1 1 1 1 1 1 1 1 ...
..$ ratio : num [1:10499] 0.333 0.25 0.2 0.333 0.333 ...

We have a list of data.frame. Thus, we need to loop if we want to write as two separate datasets
Map(function(x, y) write.csv(x, paste0(y, ".csv"), row.names = TRUE),
res_basic, names(res_basic))
If it needs to be a single file, bind them together into a single data and then write it back
library(dplyr)
write.csv(bind_rows(res_basic, .id = 'grp'),
"ceRNA_basic_result.csv", row.names=TRUE)

Related

How to select multiple, inner list elements with lapply in R?

as a beginner I am grateful for every hint and explanation. I have a nested list with redditors (I was scraping with RedditExtractor) and their about, comments and threads.
With lapply I could select the necessary elements, that I need for further analysis.
df_raw_comments_threads <- lapply(all_authors_2019_14_content, `[`,
c("comments", "threads"))
So this worked and next I wanted to select only date_utc and subreddit,
df_test_comments_threads <- lapply(df_raw_comments_threads, `[`,
c("subreddit", "date_utc"))
which did not worked because it turned out as "NULL"
I thought I could do something like this:
lapply(df_raw_comments_threads[[x]][[i]][[c("date_utc", "subreddit"]]) since the code says: df_raw_comments_threads[["The_Wombles"]][["comments"]][["date_utc"]], though I want date_utc und subreddit for both, comments and threads and for every redditor.
I also tried:
df_test_comments_threads <- lapply(df_raw_comments_threads[[]][[i]],
function(i) "subreddit")
my_list_subset <- df_raw_comments_threads[sapply(df_raw_comments_threads,
function(x) "subreddit")]
df_test_comments_threads <- map(df_raw_comments_threads,
"date_utc")
df_test_comments_threads <- lapply(df_raw_comments_threads[[]][["threads"]][["date_utc"]])
df_test_comments_threads <- lapply(all_authors_2019_14_content, `[`,
c("date_utc", "subreddit"))
None worked and I am confused why it turned out as NULL since both redditors have entries in every section.
str(df_raw_comments_threads)
$ The_Wombles :List of 2
..$ comments:'data.frame': 1000 obs. of 12 variables:
.. ..$ url : chr [1:1000] "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/nursing/comments/uwk2wj/deleted_by_user/" "https://www.reddit.com/r/LifeProTips/comments/uwd8d2/lpt_let_your_daughters_paint_your_nails_have_a/" ...
.. ..$ date_utc : chr [1:1000] "2022-05-24" "2022-05-24" "2022-05-24" "2022-05-24" ...
.. ..$ timestamp : num [1:1000] 1.65e+09 1.65e+09 1.65e+09 1.65e+09 1.65e+09 ...
.. ..$ subreddit : chr [1:1000] "nursing" "nursing" "nursing" "LifeProTips" ...
.. ..$ thread_author : chr [1:1000] "[deleted]" "[deleted]" "[deleted]" "ckayfish" ...
.. ..$ comment_author: chr [1:1000] "The_Wombles" "The_Wombles" "The_Wombles" "The_Wombles" ...
.. ..$ thread_title : chr [1:1000] "[deleted by user]" "[deleted by user]" "[deleted by user]" "LPT: Let your daughters paint your nails, have a tea party with them, and help them set up a lemon-aid stand. B"| __truncated__ ...
.. ..$ comment : chr [1:1000] "Every place is going to be different. because you have EAP you may (depending on your job and the offense) be a"| __truncated__ "Get off your high horse. I suppose you have never made a mistake in your life. Not everything is so black and w"| __truncated__ "For being a sub about nursing a lot of people here don\031t seem to understand addiction." "Lol when you know you know" ...
.. ..$ score : num [1:1000] 1 3 1 1 1 1 3 51 10 3 ...
.. ..$ up : num [1:1000] 1 3 1 1 1 1 3 51 10 3 ...
.. ..$ downs : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ golds : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
..$ threads :'data.frame': 34 obs. of 11 variables:
.. ..$ url : chr [1:34] "https://www.reddit.com/gallery/uwivn2" "https://i.redd.it/amr5yi8acjv81.jpg" "https://www.reddit.com/r/copypasta/comments/t8rvfe/i_stopped_smoking_weed_recently_and_guy_fieri_has/" "https://m.youtube.com/watch?v=_Z7603OvpO0" ...
.. ..$ date_utc : chr [1:34] "2022-05-24" "2022-04-24" "2022-03-07" "2022-01-30" ...
.. ..$ timestamp: num [1:34] 1.65e+09 1.65e+09 1.65e+09 1.64e+09 1.64e+09 ...
.. ..$ subreddit: chr [1:34] "Miata" "stihl" "copypasta" "UnsolvedMysteries" ...
.. ..$ author : chr [1:34] "The_Wombles" "The_Wombles" "The_Wombles" "The_Wombles" ...
.. ..$ title : chr [1:34] "A Diamond in the Dirt" "This Farmboss is almost 30 years old and starts fist pull." "I stopped smoking weed recently and Guy Fieri has been attacking me in my dreams since" "In February 2018, a Toronto firefighter enjoying the last day of his ski trip in New York with friends and coll"| __truncated__ ...
.. ..$ text : chr [1:34] "" "" "I stopped smoking weed recently and Guy Fieri has been attacking me in my dreams since\n\nI wasn\031t sure wher"| __truncated__ "" ...
.. ..$ golds : num [1:34] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ score : num [1:34] 22 29 7 1 3 ...
.. ..$ ups : num [1:34] 22 29 7 1 3 ...
.. ..$ downs : num [1:34] 0 0 0 0 0 0 0 0 0 0 ...
$ Europa_Teles_BTR:List of 2
..$ comments:'data.frame': 1000 obs. of 12 variables:
.. ..$ url : chr [1:1000] "https://www.reddit.com/r/Warthunder/comments/grilnr/ww2_german_uboat_submarine_development_19331945/" "https://www.reddit.com/r/HistoriaEmPortugues/comments/grhtvq/sabia_que_madrid_chegou_a_ser_ocupada_pelos/" "https://www.reddit.com/r/Warthunder/comments/grilnr/ww2_german_uboat_submarine_development_19331945/" "https://www.reddit.com/r/Warthunder/comments/gr4zmg/ww2_history_luftwaffe_pilots_studying_the/" ...
.. ..$ date_utc : chr [1:1000] "2020-05-27" "2020-05-27" "2020-05-27" "2020-05-26" ...
.. ..$ timestamp : num [1:1000] 1.59e+09 1.59e+09 1.59e+09 1.59e+09 1.59e+09 ...
.. ..$ subreddit : chr [1:1000] "Warthunder" "HistoriaEmPortugues" "Warthunder" "Warthunder" ...
.. ..$ thread_author : chr [1:1000] "[deleted]" "fan_of_the_pikachu" "[deleted]" "Europa_Teles_BTR" ...
.. ..$ comment_author: chr [1:1000] "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" ...
.. ..$ thread_title : chr [1:1000] "WW2 German U-boat (submarine) development, 1933-1945" "Sabia que Madrid chegou a ser ocupada pelos portugueses?" "WW2 German U-boat (submarine) development, 1933-1945" "[WW2 History] Luftwaffe pilots studying the defensive angles of hostile bombers" ...
.. ..$ comment : chr [1:1000] "In reply of u/RhodieRanger ; u/aintme_mustbeyou ; u/quietbob515\n\n&#x200B;\n\nTake in mind this post was m"| __truncated__ "Muito interessante, obrigado pela partilha!\n\n&#x200B;\n\ndevia ser partilhado com a malta do r/PORTUGALCARALHO ahah" "In the ['STARFIGHTERS UPDATE / WAR THUNDER'](https://youtu.be/_4s6xOWwXQM?t=349) video by the official War Thun"| __truncated__ "Thank you man =)" ...
.. ..$ score : num [1:1000] -1 4 -1 5 62 10 6 1 8 1 ...
.. ..$ up : num [1:1000] -1 4 -1 5 62 10 6 1 8 1 ...
.. ..$ downs : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ golds : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
..$ threads :'data.frame': 699 obs. of 11 variables:
.. ..$ url : chr [1:699] "https://www.reddit.com/r/quotes/comments/9ppwro/faith_is_the_sword_forged_against_fate_europa/" "https://www.reddit.com/r/EuropaTelesBTR/comments/9ppgge/5_reasons_why_you_must_quit_gaming_today/" "https://www.reddit.com/r/EuropaTelesBTR/comments/9ppgge/5_reasons_why_you_must_quit_gaming_today/" "https://www.youtube.com/watch?v=m5zN5niz7X0" ...
.. ..$ date_utc : chr [1:699] "2018-10-20" "2018-10-20" "2018-10-20" "2018-10-19" ...
.. ..$ timestamp: num [1:699] 1.54e+09 1.54e+09 1.54e+09 1.54e+09 1.54e+09 ...
.. ..$ subreddit: chr [1:699] "quotes" "StopGaming" "EuropaTelesBTR" "portugal" ...
.. ..$ author : chr [1:699] "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" "Europa_Teles_BTR" ...
.. ..$ title : chr [1:699] "\"Faith is the sword forged against fate\" - Europa_Teles_BTR" "5 REASONS WHY you must quit gaming TODAY (x-post from r/EuropaTelesBTR)" "5 REASONS WHY you must quit gaming TODAY" "Enfermeiras Pára-quedistas | Guerra colonial Portuguesa [Vídeo] (x-post de r/HistoriaEmPortugues)" ...
.. ..$ text : chr [1:699] "" "" "Playing videogames can be seen as a hobby, but it can easily build up into something very destructive and addic"| __truncated__ "" ...
.. ..$ golds : num [1:699] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ score : num [1:699] 3 12 9 14 9 70 26 1 7 17 ...
.. ..$ ups : num [1:699] 3 12 9 14 9 70 26 1 7 17 ...
.. ..$ downs : num [1:699] 0 0 0 0 0 0 0 0 0 0 ...
With the list structure clarified, package {purrr} offers a very concise solution.
library(purrr)
## example structure:
all_authors <- list(`The Wombles` = list(comments = structure(list(url = "foo-url1",
date_utc = "2022-06-07", score = 3), class = "data.frame", row.names = c(NA,
-1L)), threads = structure(list(url = "foo-url2", date_utc = "2022-04-07",
title = "title wombles"), class = "data.frame", row.names = c(NA,
-1L))), `The Who` = list(comments = structure(list(url = "foo-url3",
date_utc = "2022-06-17", score = 3), class = "data.frame", row.names = c(NA,
-1L)), threads = structure(list(url = c("foo-url4", "foo-url5"
), date_utc = c("2022-03-12", "2015-02-04"), title = c("title who",
"title2 who")), class = "data.frame", row.names = c(NA, -2L))))
map2_dfr(
all_authors |> map('comments', ~ pmap(.x, c('url', 'date_utc'))),
all_authors |> map('threads', ~ pmap(.x, c('url', 'author', 'title'))),
~ list(comments = .x, threads = .y)
)
output:
# A tibble: 3 x 2
comments$url $date_utc $score threads$url $date_utc $title
<chr> <chr> <dbl> <chr> <chr> <chr>
1 foo-url1 2022-06-07 3 foo-url2 2022-04-07 title wombles
2 foo-url3 2022-06-17 3 foo-url4 2022-03-12 title who
3 foo-url3 2022-06-17 3 foo-url5 2015-02-04 title2 who
>
Since you're using package {purrr} anyway, you could apply map variants:
example data structure (you forgot to dput one):
all_authors <- list(The_Wombles = list(about = list(), comments = structure(list(
url = c("url1", "url2"), date_utc = c("2022-05-24",
"2022-05-25")), class = "data.frame", row.names = c(NA, -2L
))), Talking_Foo = list(about = list(), comments = structure(list(
url = c("url3", "url4"), date_utc = c("2022-05-24",
"2022-05-25")), class = "data.frame", row.names = c(NA, -2L
))))
use map:
library(dplyr) ## provides `select`
all_authors |>
map(~ .x$comments |> select(url, date_utc))
(where ~is short for the function you apply to each list element and .x is short for the list item currently fed into the function: ~ .x is equivalent to function(x){x})
output:
+ $The_Wombles
url date_utc
1 url1 2022-05-24
2 url2 2022-05-25
$Talking_Foo
url date_utc
1 url3 2022-05-24
2 url4 2022-05-25
or imap_dfr (dfr = dataframe row) to obtain a dataframe:
all_authors |>
imap_dfr(~ list(author =.y ,
comments = .x$comments |>
select(url, date_utc))
)
(where .y is short for the name, and .x for the content of the list item currently being mapped)
output:
# A tibble: 4 x 2
author comments$url $date_utc
<chr> <chr> <chr>
1 The_Wombles url1 2022-05-24
2 The_Wombles url2 2022-05-25
3 Talking_Foo url3 2022-05-24
4 Talking_Foo url4 2022-05-25

Get elements of a list with the same name in R

I have a list of elements where several elements have the same name (i.e. Name or S)
This a part of the list:
> str(MKtf)
List of 160
$ Name : chr "S09489500"
$ S : num 0
$ Var : num 34147
$ Z : num 0
$ Significance: chr "X"
$ Name : chr "S09489499"
$ S : num -1
$ Var : num 4957
$ Z : num 0
$ Significance: chr "X"
$ Name : chr "S09511300"
$ S : num 1
$ Var : num 11890
$ Z : num 0
$ Significance: chr "X"
$ Name : chr "S09498400"
$ S : num 0
$ Var : num 7367
$ Z : num 0
$ Significance: chr "X"
$ Name : chr "S09498500"
$ S : num 0
$ Var : num 134177
How is possible to access all the elements of the list with the same name? for example I want to get all the elements ($S) of the list
You can use :
MKtf[names(MKtf) == "S"]
Using a reproducible example :
MKtf <- list(S = 1:4, B = 2:3, S = 4:5, B = 21:23)
MKtf[names(MKtf) == "S"]
#$S
#[1] 1 2 3 4
#$S
#[1] 4 5

Building a table/dataframe/something exportable from Desc function output in R

I'm definitely a noob, though I have used R for various small tasks for several years.
For the life of me, I cannot figure out how to get the results from the "Desc" function into something I can work with. When I save the x<-Desc(mydata) the class(x) shows up as "Desc." In R studio it is under Values and says "List of 1." Then when I click on x it says ":List of 25" in the first line. There is a list of data in this object, but I cannot for the life of me figure out how to grab any of it.
Clearly I have a severe misunderstanding of the R data structures, but I have been searching for the past 90 minutes to no avail so figured I would reach out.
In short, I just want to pull certain aspects (N, mean, UB, LB, median) of the descriptive statistics provided from the Desc results for multiple datasets and build a little table that I can then work with.
Thanks for the help.
Say you have a dataframe, x, where:
x <- data.frame(i=c(1,2,3),j=c(4,5,6))
You could set:
desc.x <- Desc(x)
And access the info on any given column like:
desc.x$i
desc.x$i$mead
desc.x$j$sd
And any other stats Desc comes up with. The $ is the key here, it's how you access the named fields of the list that Desc returns.
Edit: In case you pass a single column (as the asker does), or simply a vector to Desc, you are then returned a 1 item list. The same principle applies but the usual syntax is different. Now you would use:
desc.x <- Desc(df$my.col)
desc.x[[1]]$mean
In the future, the way to attack this is to either look in the environment window in RStudio and play around trying to figure out how to access the fields, check the source code on github or elsewhere, or (best first choice) use str(desc.x), which gives us:
> str(desc.x)
List of 1
$ :List of 25
..$ xname : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i"
..$ label : NULL
..$ class : chr "numeric"
..$ classlabel: chr "numeric"
..$ length : int 3
..$ n : int 3
..$ NAs : int 0
..$ main : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i (numeric)"
..$ unique : int 3
..$ 0s : int 0
..$ mean : num 2
..$ meanSE : num 0.577
..$ quant : Named num [1:9] 1 1.1 1.2 1.5 2 2.5 2.8 2.9 3
.. ..- attr(*, "names")= chr [1:9] "min" ".05" ".10" ".25" ...
..$ range : num 2
..$ sd : num 1
..$ vcoef : num 0.5
..$ mad : num 1.48
..$ IQR : num 1
..$ skew : num 0
..$ kurt : num -2.33
..$ small :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 1 2 3
.. ..$ freq: num [1:3] 1 1 1
..$ large :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 3 2 1
.. ..$ freq: num [1:3] 1 1 1
..$ freq :Classes ‘Freq’ and 'data.frame': 3 obs. of 5 variables:
.. ..$ level : Factor w/ 3 levels "1","2","3": 1 2 3
.. ..$ freq : int [1:3] 1 1 1
.. ..$ perc : num [1:3] 0.333 0.333 0.333
.. ..$ cumfreq: int [1:3] 1 2 3
.. ..$ cumperc: num [1:3] 0.333 0.667 1
..$ maxrows : num 12
..$ x : num [1:3] 1 2 3
- attr(*, "class")= chr "Desc"
"List of 1" means you access it by desc.x[[1]], and below that follow the $s. When you see something like num[1:3] that means it's an atomic vector so you access the first member like var$field$numbers[1]

Convert Nested List into data.frame with different column length

I was trying to convert below nested list into data.frame but without luck. There are a few complications, mainly the column "results" of position 1 is inconsistent with position 2, as there is no result in position 2.
item length inconsistent across different positions
[[1]]
[[1]]$html_attributions
list()
[[1]]$results
geometry.location.lat geometry.location.lng
1 25.66544 -100.4354
id place_id
1 6ce0a030663144c8e992cbce51eb00479ef7db89 ChIJVy7b7FW9YoYRdaH2I_gOJIk
reference
1 CmRSAAAATdtVfB4Tz1aQ8GhGaw4-nRJ5lZlVNgiOR3ciF4QjmYC56bn6b7omWh1SJEWWqQQEFNXxGZndgEwSgl8sRCOtdF8aXpngUY878Q__yH4in8EMZMCIqSHLARqNgGlV4mKgEhDlvkHLXLiBW4F_KQVT83jIGhS5DJipk6PAnpPDXP2p-4X5NPuG9w
[[1]]$status
[1] "OK"
[[2]]
[[2]]$html_attributions
list()
[[2]]$results
list()
[[2]]$status
[1] "ZERO_RESULTS"
I tried the following codes but they aint' working.
#1
m1 <- do.call(rbind, lapply(myDataFrames, function(y) do.call(rbind, y)))
relist(m1, skeleton = myDataFrames)
#2
relist(matrix(unlist(myDataFrames), ncol = 4, byrow = T), skeleton = myDataFrames)
#3
library(data.table)
df<-rbindlist(myDataFrames, idcol = "index")
df<-rbindlist(myDataFrames, fill=TRUE)
#4
myDataFrame <- do.call(rbind.data.frame, c(myDataFrames, list(stringsAsFactors = FALSE)))
I think I have enough of the original JSON to be able to create a reproducible example:
okjson <- '{"html_attributions":[],"results":[{"geometry":{"location":{"lat":25.66544,"lon":-100.4354},"id":"foo","place_id":"quux"}}],"status":"OK"}'
emptyjson <- '{"html_attributions":[],"results":[],"status":"ZERO_RESULTS"}'
jsons <- list(okjson, emptyjson, okjson)
From here, I'll step (slowly) through the process. I've included much of the intermediate structure for reproducibility, I apologize for the verbosity. This can easily be grouped together and/or put within a magrittr pipeline.
lists <- lapply(jsons, jsonlite::fromJSON)
str(lists)
# List of 3
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results : list()
# ..$ status : chr "ZERO_RESULTS"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
goodlists <- Filter(function(a) "results" %in% names(a) && length(a$results) > 0, lists)
goodresults <- lapply(goodlists, `[[`, "results")
str(goodresults)
# List of 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
goodresultsdf <- lapply(goodresults, function(a) jsonlite::flatten(as.data.frame(a)))
str(goodresultsdf)
# List of 2
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
We now have a list-of-data.frames, a good place to be.
do.call(rbind.data.frame, c(goodresultsdf, stringsAsFactors = FALSE))
# geometry.id geometry.place_id geometry.location.lat geometry.location.lon
# 1 foo quux 25.66544 -100.4354
# 2 foo quux 25.66544 -100.4354

How to effectively flatten nested lists and dataframes into a single dataframe?

I have some data that is formatted in a way that's difficult to use, so I'm trying to flatten it out. The minimum reproducible example is here.
> str(sampleData)
List of 4
$ Events :'data.frame': 2 obs. of 3 variables:
..$ CateringOptions:List of 2
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "SpicyOWing"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105031"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "New York"
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "BaconEggs"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105032"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "Seattle"
..$ Action : num [1:2] 1 1
..$ Volume : num [1:2] 1000 2000
$ Host :List of 5
..$ Identity : chr "John"
..$ Schema : logi NA
..$ ElementId : chr "101505"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ Sender :List of 5
..$ Identity : chr "Jane"
..$ Schema : logi NA
..$ ElementId : chr "101005"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ CompletedDate: chr "/Date(1490112000000)/"
Expected
> expectedOutcome
Events.CateringOptions.Agreed Events.CateringOptions.Tnc.Identity Events.CateringOptions.Tnc.Schema Events.CateringOptions.Tnc.ElementId
1 NA SpicyOWing TRUE 105031
2 NA BaconEggs TRUE 105032
Events.CateringOptions.Tnc.ElementType Events.CateringOptions.Tnc.ElementVersion Events.CateringOptions.Address Events.Action Events.Volume Host.Identity
1 NA NA New York 1 1000 John
2 NA NA Seattle 1 2000 John
Host.Schema Host.ElementId Host.ElementType Host.ElementVersion Sender.Identity Sender.Schema Sender.ElementId Sender.ElementType Sender.ElementVersion
1 NA 101505 NA NA Jane NA 101005 NA NA
2 NA 101505 NA NA Jane NA 101005 NA NA
CompletedDate
1 /Date(1490112000000)/
2 /Date(1490112000000)/
The check function
check<-function(li){
areDF<-sapply(1:length(li), function(i) class(li[[i]]) == "data.frame")
areList<-sapply(1:length(li), function(i) class(li[[i]]) == "list")
tmp1 <- NULL
tmp2 <- NULL
if(any(areDF)){
for(j in which(areDF)){
columns <- jsonlite::flatten(li[[j]])
li[[j]] <- check(columns)
}
tmp1<-plyr::rbind.fill(li[areDF])
#return(tmp1)
}
if(any(areList)){
for(j in which(areList)){
li[[j]]<-check(li[[j]])
}
tmp2<-do.call(cbind,li)
#return(tmp2)
}
if(!is.null(tmp1) & !is.null(tmp2)){
return (cbind(tmp1,tmp2))
}
else if(!is.null(tmp1)){
return (tmp1)
}
else if(!is.null(tmp2)){
return (tmp2)
}
return(li)
}
Results
> str(check(sampleData))
'data.frame': 2 obs. of 29 variables:
$ CateringOptions.Agreed : logi TRUE TRUE
$ CateringOptions.Address : chr "New York" "Seattle"
$ CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ CateringOptions.Tnc.Schema : logi NA NA
$ CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ CateringOptions.Tnc.ElementType : logi NA NA
$ CateringOptions.Tnc.ElementVersion : logi NA NA
$ Action : num 1 1
$ Volume : num 1000 2000
$ Events.CateringOptions.Agreed : logi TRUE TRUE
$ Events.CateringOptions.Address : chr "New York" "Seattle"
$ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ Events.CateringOptions.Tnc.Schema : logi NA NA
$ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ Events.CateringOptions.Tnc.ElementType : logi NA NA
$ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
$ Events.Action : num 1 1
$ Events.Volume : num 1000 2000
$ Host.Identity : Factor w/ 1 level "John": 1 1
$ Host.Schema : logi NA NA
$ Host.ElementId : Factor w/ 1 level "101505": 1 1
$ Host.ElementType : logi NA NA
$ Host.ElementVersion : logi NA NA
$ Sender.Identity : Factor w/ 1 level "Jane": 1 1
$ Sender.Schema : logi NA NA
$ Sender.ElementId : Factor w/ 1 level "101005": 1 1
$ Sender.ElementType : logi NA NA
$ Sender.ElementVersion : logi NA NA
$ CompletedDate : Factor w/ 1 level "/Date(1490112000000)/": 1 1
I almost have it, but the nested dataframe is being duped. Also, my code takes fairly long. Does anyone have any idea how I can go about flattening this?
Edit:
I added my solution in the end in the gist
Here is my take at it, with help from purrr.
The idea is similar to yours, only with a different syntax: flatten() the most nested dataframes, then rbind() them.
If I understand your code properly, mine is slightly different at the end, since I'll try to get a more "jsonlite::flatten-friendly" structure to apply it once more to the end result:
library(jsonlite)
library(purrr)
res <-
sampleData %>%
modify_if(
is.list,
.f = ~ modify_if(
.x,
.p = function(x) all(sapply(x, is.data.frame)),
.f = ~ do.call("rbind", lapply(.x, jsonlite::flatten))
)
) %>%
as.data.frame() %>%
jsonlite::flatten()
str(res)
# 'data.frame': 2 obs. of 20 variables:
# $ Events.Action : num 1 1
# $ Events.Volume : num 1000 2000
# $ Host.Identity : chr "John" "John"
# $ Host.Schema : logi NA NA
# $ Host.ElementId : chr "101505" "101505"
# $ Host.ElementType : logi NA NA
# $ Host.ElementVersion : logi NA NA
# $ Sender.Identity : chr "Jane" "Jane"
# $ Sender.Schema : logi NA NA
# $ Sender.ElementId : chr "101005" "101005"
# $ Sender.ElementType : logi NA NA
# $ Sender.ElementVersion : logi NA NA
# $ CompletedDate : chr "/Date(1490112000000)/" "/Date(1490112000000)/"
# $ Events.CateringOptions.Agreed : logi TRUE TRUE
# $ Events.CateringOptions.Address : chr "New York" "Seattle"
# $ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
# $ Events.CateringOptions.Tnc.Schema : logi NA NA
# $ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
# $ Events.CateringOptions.Tnc.ElementType : logi NA NA
# $ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
I've got one mismatch with your expectedOutcome but if I may, it might be on your side:
all.equal(expectedOutcome[sort(names(expectedOutcome))], res[sort(names(res))])
# [1] "Component “Events.CateringOptions.Agreed”: 'is.NA' value mismatch: 0 in current 2 in target"
Not sure if this over-simplifies your problem, but with the sample you shared, it seems to work. Basically, if the column is not already a vector when you do data.frame(your_list), it unlists the data and makes a matrix.
FLAT <- function(inlist) {
A <- data.frame(inlist)
out <- lapply(A, function(y) {
if (is.list(y)) {
y <- unlist(y)
m <- matrix(y, nrow(A), byrow = TRUE, dimnames = list(NULL, unique(names(y))))
y <- data.frame(m, stringsAsFactors = FALSE)
y[] <- lapply(y, type.convert)
}
y
})
do.call(cbind, out)
}
FLAT(sampleData)
Here's the str on your sample data:
str(FLAT(sampleData))
## 'data.frame': 2 obs. of 20 variables:
## $ Events.CateringOptions.Agreed : logi TRUE TRUE
## $ Events.CateringOptions.Tnc.Identity : Factor w/ 2 levels "BaconEggs","SpicyOWing": 2 1
## $ Events.CateringOptions.Tnc.Schema : logi NA NA
## $ Events.CateringOptions.Tnc.ElementId : int 105031 105032
## $ Events.CateringOptions.Tnc.ElementType : logi NA NA
## $ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
## $ Events.CateringOptions.Address : Factor w/ 2 levels "New York","Seattle": 1 2
## $ Events.Action : num 1 1
## $ Events.Volume : num 1000 2000
## $ Host.Identity : Factor w/ 1 level "John": 1 1
## $ Host.Schema : logi NA NA
## $ Host.ElementId : Factor w/ 1 level "101505": 1 1
## $ Host.ElementType : logi NA NA
## $ Host.ElementVersion : logi NA NA
## $ Sender.Identity : Factor w/ 1 level "Jane": 1 1
## $ Sender.Schema : logi NA NA
## $ Sender.ElementId : Factor w/ 1 level "101005": 1 1
## $ Sender.ElementType : logi NA NA
## $ Sender.ElementVersion : logi NA NA
## $ CompletedDate : Factor w/ 1 level "/Date(1490112000000)/": 1 1

Resources