build json from nested dataframe in r - r

I'm struggling with building a json.
I have a dataframe with 3 rows, and two columns: "id" (a list of ids), and then "text" (tweets).
df$id= c(78198310004451, 78198310004451, 88198310004453)
df$text = c("I love you", "I just got married!", "I just got a new job!")
and four other fixed variables whose values are static:
Models = c(1:7)
orgId= 1
and two attributes of the twitter id
include_outcome: logi FALSE
twitterId = 70051429
I pulled down the template json and converted it into a dataframe (see below), for an example of 3 tweets.
I cannot figure out how to generate such a dataframe from my pre-existing data frame i mentioned above (which i will then convert to json (toJSON) ).
List of 3
$ Models: num [1:7] 1 2 3 4 5 6 7
$ orgId : num 1
$ userData :List of 1
..$ :List of 3
.. ..$ tweets :List of 3
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I love you"
.. .. .. ..$ id : num 78198310004451
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got married!"
.. .. .. ..$ id : num 78198310004452
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got a new job!"
.. .. .. ..$ id : num 88198310004453
.. ..$ twitterId : num 70051429
.. ..$ include_outcome: logi FALSE
here's the dput output
list(Models = c(1, 2, 3, 4, 5, 6, 7), orgId = 1, userData = list(
list(tweets = list(list(text = "I love you", id = 78198310004451),
list(text = "I just got married!", id = 78198310004452),
list(text = "I just got a new job!", id = 88198310004453)),
twitterId = 70051429, include_outcome = FALSE)))```

Related

R: Loaded tweets structure is untidy when str()

Differently from my collegue, after I load the tweets with R and I try to see the structure with str() the data appears in a messy way with a lot of dots, rather than being organized as a table, which is what happens with my collegue's computer, even if the codes are the same. I can't understand what is the problem, we have the same packages installed and the same R version.
library(rtweet)
library(ggplot2)
library(dplyr)
library(tibble)
library(tidytext)
library(stringr)
library(stringi)
library(igraph)
library(ggraph)
library(readr)
library(lubridate)
library(zoo)
appname <- ""
key <- ""
secret <- ""
twitter_token <- create_token( app = "", consumer_key = "", consumer_secret = "", access_token = "", access_secret = "")
tweets <- search_tweets(q = "#water + #climatechange", n = 10000, lang = "en", include_rts = FALSE)
str(tweets)
.. ..$ media :'data.frame': 1 obs. of 11 variables:
.. .. ..$ id : num 1.57e+18
.. .. ..$ id_str : chr "1573815153484759040"
.. .. ..$ indices :List of 1
.. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
.. .. .. .. ..$ start: int 241
.. .. .. .. ..$ end : int 264
.. .. .. ..- attr(*, "class")= chr "AsIs"
.. .. ..$ media_url : chr "http://pbs.twimg.com/media/FddQiy2WAAAl59Q.jpg"
.. .. ..$ media_url_https: chr "https://pbs.twimg.com/media/FddQiy2WAAAl59Q.jpg"
.. .. ..$ url : chr "https
.. .. ..$ display_url : chr "pic.twitter.com/iFJTkF1S9S"
.. .. ..$ expanded_url : chr "https://twitter.com/TreeBanker/status/1573815156768968706/photo/1"
.. .. ..$ type : chr "photo"
.. .. ..$ sizes :List of 1
.. .. .. ..$ :'data.frame': 4 obs. of 4 variables:
.. .. .. .. ..$ w : int [1:4] 1096 680 150 1096
.. .. .. .. ..$ h : int [1:4] 733 455 150 733
.. .. .. .. ..$ resize: chr [1:4] "fit" "fit" "crop" "fit"
.. .. .. .. ..$ type : chr [1:4] "large" "small" "thumb" "medium"
.. .. ..$ ext_alt_text : logi NA
..$ :List of 5
.. ..$ media :'data.frame': 1 obs. of 11 variables:
.. .. ..$ id : num 1.57e+18
.. .. ..$ id_str : chr "1573815153484759040"
.. .. ..$ indices :List of 1
.. .. .. ..$ :'data.frame': 1 obs. of 2 variables:

R - How to build a nested list without duplicated sub-list declaration?

Would there be a standard approach to building nested lists in which the sub-lists contain the same structure, to remove duplication ?
Dummy example structure:
a "top" list (wordsList), containing one sub-list (languageType)
this sub-list contains "sub-sub lists" (french, english, falty), which each contain another list with the same structure:
sentenceType, with each elems a sub-list (leBien, leMal)
How to build such a structure without duplication ? (so to assign to elem of languageType the list sentenceType)
Example expected output:
wordsList <- list(
languageType = list(
french = list(
sentenceType = list(
leBien = c("Bien", "le", "bonjour", "messieurs", "dames"),
leMal = c("Mal", "la", "bonsoir", "mesdames", "sieurs")
)
),
english = list(
sentenceType = list(
leBien = c("Well", "hello", "my", "good", "sire", "or", "madame"),
leMal = c("Bad", "goodbye", "your", "bad", "madame", "and", "sire")
)
),
falty = list(
sentenceType = list(
leBien = c(1:10),
leMal = list("this", "is", 0, 5, "not correct !")
)
)
)
)
str(wordsList)
List of 1
$ languageType:List of 3
..$ french :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:5] "Bien" "le" "bonjour" "messieurs" ...
.. .. ..$ leMal : chr [1:5] "Mal" "la" "bonsoir" "mesdames" ...
..$ english:List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:7] "Well" "hello" "my" "good" ...
.. .. ..$ leMal : chr [1:7] "Bad" "goodbye" "your" "bad" ...
..$ falty :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: int [1:10] 1 2 3 4 5 6 7 8 9 10
.. .. ..$ leMal :List of 5
.. .. .. ..$ : chr "this"
.. .. .. ..$ : chr "is"
.. .. .. ..$ : num 0
.. .. .. ..$ : num 5
.. .. .. ..$ : chr "not correct !"

handling lists in lists to Dataframe in R

I´m new and i have some problems handling list and transform to dataframe
I have a list "ddt"
str(ddt)
List of 4
$ id : chr "18136"
$ comments.data:List of 3
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:17:32+0000"
.. ..$ text : chr "Mocaaa"
.. ..$ id : chr "18096"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:00:00+0000"
.. ..$ text : chr "Capucchino"
.. ..$ id : chr "17846"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T14:42:53+0000"
.. ..$ text : chr "Mocachino"
.. ..$ id : chr "18037"
$ id : chr "17920"
$ comments.data:List of 1
..$ :List of 3
.. ..$ timestamp: chr "2020-05-24T15:31:30+0000"
.. ..$ text : chr "Hello"
.. ..$ id : chr "18054"
And i need this result
id timestamp text id2
1 18136 2020-05-25T16:17:32+0000 Mocaaa 18096
2 18136 2020-05-25T16:00:00+0000 Capucchino 17846
3 18136 2020-05-25T14:42:53+0000 Mocachino 18037
4 17920 2020-05-24T15:31:30+0000 Hello 18054
I think this can be done well with data.table.
set.seed(42)
df <- replicate(2, list(id = sample(1e5, 1), comments = replicate(3, list(tm = as.character(Sys.time() + sample(10, 1)), text = sample(LETTERS, 1), id = sample(1e5, 1)), simplify = FALSE)), simplify = FALSE)
str(df)
# List of 2
# $ :List of 2
# ..$ id : int 91481
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:08"
# .. .. ..$ text: chr "H"
# .. .. ..$ id : int 83045
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:05"
# .. .. ..$ text: chr "N"
# .. .. ..$ id : int 73659
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "R"
# .. .. ..$ id : int 70507
# $ :List of 2
# ..$ id : int 45775
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:06"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 25543
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:03"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 97823
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "M"
# .. .. ..$ id : int 56034
One thing we'll have to contend with is that you have id on the top-level as well as internally within each list.
library(data.table)
library(magrittr) # for %>%, demonstrative only, can be done without
data.table::rbindlist(df) %>%
.[, comments := lapply(comments, as.data.table) ] %>%
# we have a duplicate name 'id', rename in the inner ones
.[, comments := lapply(comments, setnames, "id", "innerid") ] %>%
.[, unlist(comments, recursive = FALSE), by = seq_len(nrow(.)) ]
# seq_len tm text innerid
# 1: 1 2020-05-26 14:49:21 H 83045
# 2: 2 2020-05-26 14:49:18 N 73659
# 3: 3 2020-05-26 14:49:13 R 70507
# 4: 4 2020-05-26 14:49:19 Y 25543
# 5: 5 2020-05-26 14:49:16 Y 97823
# 6: 6 2020-05-26 14:49:13 M 56034
I suspect that the by=seq_len(nrow(.)) is not going to scale well to larger data. Since Rdatatable/data.table#3672 is still open, an alternative is to replace the last line (including unlist and seq_len) with just %>% tidyr::unnest(comments). I suspect that the combination of data.table and tidyr is at times contentious, I suggest that this non-partisan approach capitalizes on the strengths of both.
The structure seems to look just like a java script object.
You could do:
library(jsonlite)
library(tidyr)
unnest(unnest(fromJSON(toJSON(df))))
# A tibble: 6 x 4
id tm text id1
<int> <chr> <chr> <int>
1 92345 2020-05-26 14:53:53 X 6730
2 92345 2020-05-26 14:53:56 Q 92812
3 92345 2020-05-26 14:53:56 D 25304
4 9847 2020-05-26 14:53:56 E 82734
5 9847 2020-05-26 14:54:01 I 75079
6 9847 2020-05-26 14:54:02 H 89373

Insert elements into a list based on depth and `if` conditions using modify_depth and modify_if (purrr)

I'm learning some purrr commands, specifically the modify_* family of functions. I'm attemping to add price bins to items found in a grocery store (see below for my attempt and error code).
library(tidyverse)
Data
easybuy <- list(
"5520 N Division St, Spokane, WA 99208, USA",
list("bananas", "oranges"),
canned = list("olives", "fish", "jam"),
list("pork", "beef"),
list("hammer", "tape")
) %>%
map(list) %>%
# name the sublists
set_names(c("address",
"fruit",
"canned",
"meat",
"other")) %>%
# except for address, names the sublists "items"
modify_at(c(2:5), ~ set_names(.x, "items"))
Take a peek:
glimpse(easybuy)
#> List of 5
#> $ address:List of 1
#> ..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
#> $ fruit :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "bananas"
#> .. ..$ : chr "oranges"
#> $ canned :List of 1
#> ..$ items:List of 3
#> .. ..$ : chr "olives"
#> .. ..$ : chr "fish"
#> .. ..$ : chr "jam"
#> $ meat :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "pork"
#> .. ..$ : chr "beef"
#> $ other :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "hammer"
#> .. ..$ : chr "tape"
My Attempt
Idea: go in a depth of two, and look for "items", append a "price". I'm not sure if I can nest the modify functions like this.
easybuy %>%
modify_depth(2, ~ modify_at(., "items", ~ append("price")))
#> Error: character indexing requires a named object
Desired
I would like the following structure (note the addition of "price" under each item):
List of 5
$ address:List of 1
..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
$ fruit :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "bananas"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "oranges"
.. .. ..$ : chr "price"
$ canned :List of 1
..$ items:List of 3
.. ..$ :List of 2
.. .. ..$ : chr "olives"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "fish"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "jam"
.. .. ..$ : chr "price"
$ meat :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "pork"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "beef"
.. .. ..$ : chr "price"
$ other :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "hammer"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "tape"
.. .. ..$ : chr "price"
This seems working. The map_if and function(x) !is.null(names(x)) make sure the change only happen if the name of the item is not NULL. ~modify_depth(.x, 2, function(y) list(y, "price")) creates the list you need.
library(tidyverse)
easybuy2 <- easybuy %>%
map_if(function(x) !is.null(names(x)),
~modify_depth(.x, 2, function(y) list(y, "price")))
Here is how the second item looks like.
easybuy2[[2]][[1]]
# [[1]]
# [[1]][[1]]
# [1] "bananas"
#
# [[1]][[2]]
# [1] "price"
#
#
# [[2]]
# [[2]][[1]]
# [1] "oranges"
#
# [[2]][[2]]
# [1] "price"
Or this also works.
easybuy3 <- easybuy %>%
modify_at(2:5, ~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy3)
# [1] TRUE
Update
easybuy4 <- easybuy %>%
map_if(function(x){
name <- names(x)
if(is.null(name)){
return(FALSE)
} else {
return(name %in% "items")
}
},
~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy4)
# [1] TRUE

Extracting elements of nested list by contained value using purrr

I have a nested list where each nested list has the same elements but not in the same order and the elements are not named explicitly but do have a name value within the list.
As you can see in the structure the list containing the 'Date' field appears at the second position in the first list and the third position in the second list, so I can't extract at a position.
I would like to extract the lists where name : is Date and keep the value associated with it, using the purrr package.
STRUCTURE
dplyr::glimpse(my_list)
List of 2
$ :List of 10
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 15:20:40 -0800"
..$ :List of 2
.. ..$ name : chr "References"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "In-Reply-To"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr "Re:"
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Cc"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\""
$ :List of 7
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 12:18:32 -0800"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr "Daniel Seneca <senecad#gene.com>"
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr "Daniel Seneca <seneca.daniel#gene.com>"
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\""
DATA
my_list <-list(list(list(name = "MIME-Version", value = "1.0"), list(name = "Date", value = "Wed, 13 Feb 2019 15:20:40 -0800"), list(name = "References", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"), list(name = "In-Reply-To", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"),list(name = "Message-ID", value = "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"), list(name = "Subject", value = "Re:"), list(name = "From", value = ""),list(name = "To", value = ""),list(name = "Cc", value = ""),list(name = "Content-Type", value = "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\"")),
list(list(name = "MIME-Version", value = "1.0"), list(name = "Message-ID",value = "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"),list(name = "Date",value = "Wed, 13 Feb 2019 12:18:32 -0800"), list(name = "Subject", value = ""), list(name = "From",value = "Daniel Seneca <senecad#gene.com>"), list(name = "To", value = "Daniel Seneca <seneca.daniel#gene.com>"),list(name = "Content-Type", value = "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\"")))
The question does not define what the output is supposed to look like so we will assume that it should be a list of lists. Remove the first layer using flatten as shown and then filter its elements using keep.
library(purrr)
my_list %>%
flatten %>%
keep(~ .x$name == "Date")
In base R this could be written:
Filter(function(x) x$name == "Date", do.call("c", my_list))

Resources