convert named list with mixed content to data frame - r

Is there a better and nicer way to convert named list with mixed content to data frame?
The working example:
my_list <- list("a" = 1.0, "b" = "foo", "c" = TRUE)
my_df <- data.frame(
"key" = names(my_list),
stringsAsFactors = F
)
my_df[["value"]] <- unname(my_list)
Is it possible to do this conversion in one step?

We can use stack from base R
stack(my_list)
According to ?stack
The stack function is used to transform data available as separate columns in a data frame or list into a single column that can be used in an analysis of variance model or other linear model. The unstack function reverses this operation.
Or with enframe
library(tidyverse)
enframe(my_list) %>% # creates the 'value' as a `list` column
mutate(value = map(value, as.character)) %>% # change to single type
unnest

You can use dplyr::as_tibble to coerce the list into a data frame / tibble. This will automatically create a data frame where the list's names are column names, and the list items correspond to rows.
library(dplyr)
library(tidyr)
my_list <- list("a" = 1.0, "b" = "foo", "c" = TRUE)
as_tibble(my_list)
#> # A tibble: 1 x 3
#> a b c
#> <dbl> <chr> <lgl>
#> 1 1 foo TRUE
To reshape into the two-column format you have, pipe it into tidyr::gather, where the default column names are key and value. Because of the different data types in the column value, this will coerce all the values to character.
as_tibble(my_list) %>%
gather()
#> # A tibble: 3 x 2
#> key value
#> <chr> <chr>
#> 1 a 1
#> 2 b foo
#> 3 c TRUE
Created on 2018-11-09 by the reprex package (v0.2.1)

Related

Extracting JSON data with asymetric content from a dataframe column in R

I loaded a table from a database which contains a column that has JSON data in each row.
The table looks something like the example below. (I was not able to replicate the data.frame I have, due to the format of the column data)
dataframe_example <- data.frame(id = c(1,2,3),
name = c("name1","name2","name3"),
JSON_col = c({"_inv": [10,20,30,40]}, "_person": ["_personid": "green"],
{"_inv": [15,22]}, "_person": ["_personid": "blue"],
{"_inv": []}, "_person": ["_personid": "red"]))
I have the following two issues:
Some of the items (e.g. "_inv") sometimes have the full 4 numeric entries, sometimes less, and sometimes nothing. Some of the other items (e.g. "_person") usually contain another header, but only one character data point.
My goal is to preserve the existing dataframes colums (such as id and name) and spread the data in the json column such that I have new columns containing each point of information. The target dataframe would look a little like this:
data.frame(id = c(1,2,3),
name = c("name1","name2","name3"),
`_inv_1` = c(10,15,NA),
`_inv_2` = c(20,22,NA),
`_inv_3` = c(30,NA,NA),
`_inv_4` = c(40,NA,NA),
`_person_id` = c("green","blue","red"))
Please bear in mind that I have very little experience handling JSON data and no experience dealing with uneven JSON data.
Using purrr I got:
frame <- purrr::map(dataframe_example$JSON_col, jsonlite::fromJSON)
This gave me a large list with n elements, where n is the length of the original dataframe. The "Name" item contains n lists [[1]], each one with its own type of object, ranging from double to data.frame. The double object contain four numeric observations, (such as _inv), some of the objects are lists themselves (such as _person), which further contains "_personid" and then a single entry. The dataframe contains the datetime stamps for each observation in the JSON data. (each _inv item has a timestamp)
Is there a way to obtain the solution above, either by extracting the data from my "frame" object, or an altogether different solution?
library(tidyverse)
library(jsonlite)
#>
#> Attaching package: 'jsonlite'
#> The following object is masked from 'package:purrr':
#>
#> flatten
dataframe_example <-
data.frame(
id = c(1, 2, 3),
name = c("name1", "name2", "name3"),
JSON_col = c(
"{\"_inv\": [10,20,30,40], \"_person\": {\"_personid\": \"green\"}}",
"{\"_inv\": [15,22], \"_person\": {\"_personid\": \"blue\"}}",
"{\"_inv\": [], \"_person\": {\"_personid\": \"red\"}}"
)
)
dataframe_example %>%
as_tibble() %>%
mutate(
JSON_col = JSON_col %>% map(parse_json)
) %>%
unnest_wider(JSON_col) %>%
unnest(`_inv`) %>%
unnest(`_inv`) %>%
unnest(`_person`) %>%
unnest(`_person`) %>%
group_by(id, name) %>%
mutate(inv_id = row_number()) %>%
pivot_wider(names_from = inv_id, values_from = `_inv`, names_prefix = "_inv_")
#> # A tibble: 2 x 7
#> # Groups: id, name [2]
#> id name `_person` `_inv_1` `_inv_2` `_inv_3` `_inv_4`
#> <dbl> <chr> <chr> <int> <int> <int> <int>
#> 1 1 name1 green 10 20 30 40
#> 2 2 name2 blue 15 22 NA NA
Created on 2021-11-25 by the reprex package (v2.0.1)

filter data frame based on multiple dynamic conditons, which depend on a subset of the data, eg. via applying a loop

I have data frame with hundreds of names and hundreds of values per name. Now I want filter some of the values based on some mathematical rule applied only to a certain subset of the data. A simplified example would filtering the max value for each name.
I can hard code it as shown below, but would love to avoid it.
library(dplyr)
##
names <- c('A', 'A', 'B', 'B')
values <- c(1,2,3,4)
df <- data.frame(names, values)
##
df%>%filter(names!='A' | values!=max(subset(df, names =='A')$values)
,names!='B' | values!=max(subset(df, names =='B')$values))
Desired ouptut:
names values
1 A 1
2 B 3
I would consider creating a loop within a dplyr filter, that calculates the max value per name and then applies both conditions within the filter, if possible.
Filtering out max value for each name:
df %>%
group_by(names) %>%
filter(values != max(values))
# # A tibble: 2 x 2
# # Groups: names [2]
# names values
# <chr> <dbl>
# 1 A 1
# 2 B 3
Or if you mean removing the max values per name from the entire data frame, whenever they occur:
df %>%
group_by(names) %>%
slice_max(values) %>%
select(values) %>%
anti_join(df, ., by = "values")
# # A tibble: 2 x 2
# # Groups: names [2]
# names values
# <chr> <dbl>
# 1 A 1
# 2 B 3
An option in base R
subset(df, values != ave(values, names, FUN = max))

Map readr::type_convert to specific columns only

readr::type_convert guesses the class of each column in a data frame. I would like to apply type_convert to only some columns in a data frame (to preserve other columns as character). MWE:
# A data frame with multiple character columns containing numbers.
df <- data.frame(A = letters[1:10],
B = as.character(1:10),
C = as.character(1:10))
# This works
df %>% type_convert()
Parsed with column specification:
cols(
A = col_character(),
B = col_double(),
C = col_double()
)
A B C
1 a 1 1
2 b 2 2
...
However, I would like to only apply the function to column B (this is a stylised example; there may be multiple columns to try and convert). I tried using purrr::map_at as well as sapply, as follows:
# This does not work
map_at(df, "B", type_convert)
Error in .f(.x[[i]], ...) : is.data.frame(df) is not TRUE
# This does not work
sapply(df["B"], type_convert)
Error in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE
Is there a way to apply type_convert selectively to only some columns of a data frame?
Edit: #ekoam provides an answer for type_convert. However, applying this answer to many columns would be tedious. It might be better to use the base::type.convert function, which can be mapped:
purrr::map_at(df, "B", type.convert) %>%
bind_cols()
# A tibble: 10 x 3
A B C
<chr> <int> <chr>
1 a 1 1
2 b 2 2
Try this:
df %>% type_convert(cols(B = "?", C = "?", .default = "c"))
Guess the type of B; any other character column stays as is. The tricky part is that if any column is not of a character type, then type_convert will also leave it as is. So if you really have to type_convert, maybe you have to first convert all columns to characters.
type_convert does not seem to support it. One trick which I have used a few times is using combination of select & bind_cols as shown below.
df %>%
select(B) %>%
type_convert() %>%
bind_cols(df %>% select(-B))

extract identically named vectors from nested lists, where the list names vary? Using purrr?

I have to work with some data that is in recursive lists like this (simplified reproducible example below):
groups
#> $group1
#> $group1$countries
#> [1] "USA" "JPN"
#>
#>
#> $group2
#> $group2$countries
#> [1] "AUS" "GBR"
Code for data input below:
chars <- c("USA", "JPN")
chars2 <- c("AUS", "GBR")
group1 <- list(countries = chars)
group2 <- list(countries = chars2)
groups <- list(group1 = group1, group2 = group2)
groups
I'm trying to work out how to extract the vectors that are in the lists, without manually having to write a line of code for each group. The code below works, but my example has a large number of groups (and the number of groups will change), so it would be great to work out how to extract all of the vectors in a more efficient manner. This is the brute force way, that works:
countries1 <- groups$group1$countries
countries2 <- groups$group2$countries
In the example, the bottom level vector I'm trying to extract is always called countries, but the lists they're contained in change name, varying only by numbering.
Would there be an easy purrr solution? Or tidyverse solution? Or other solution?
Add some additional cases to your list
groups[["group3"]] <- list()
groups[["group4"]] <- list(foo = letters[1:2])
groups[["group5"]] <- list(foo = letters[1:2], countries = LETTERS[1:2])
Here's a function that maps any list to just the elements named "countries"; it returns NULL if there are no elements
fun = function(x)
x[["countries"]]
Map your original list to contain just the elements you're interested in
interesting <- Map(fun, groups)
Then transform these into a data.frame using a combination of unlist() and rep()
df <- data.frame(
country = unlist(interesting, use.names = FALSE),
name = rep(names(interesting), lengths(interesting))
)
Alternatively, use tidy syntax, e.g.,
interesting %>%
tibble(group = names(.), value = .) %>%
unnest("value")
The output is
# A tibble: 6 x 2
group value
<chr> <chr>
1 group1 USA
2 group1 JPN
3 group2 AUS
4 group2 GBR
5 group5 A
6 group5 B
If there are additional problems parsing individual elements of groups, then modify fun, e.g.,
fun = function(x)
as.character(x[["countries"]])
This will put the output in a list which will handle any number of groups
countries <- unlist(groups, recursive = FALSE)
names(countries) <- sub("^\\w+(\\d+)\\.(\\w+)", "\\2\\1", names(countries), perl = TRUE)
> countries
$countries1
[1] "USA" "JPN"
$countries2
[1] "AUS" "GBR"
You can simply transform your nested list to a data.frame and then unnest the country column.
library(dplyr)
library(tidyr)
groups %>%
tibble(group = names(groups),
country = .) %>%
unnest(country) %>%
unnest(country)
#> # A tibble: 4 x 2
#> group country
#> <chr> <chr>
#> 1 group1 USA
#> 2 group1 JPN
#> 3 group2 AUS
#> 4 group2 GBR
Created on 2020-01-15 by the reprex package (v0.3.0)
Since the countries are hidden 2 layers deep, you have to run unnest twice. Otherwise I think this is straightforward.
If you actually want to have each vector as a an object in you global environment a combination of purrr::map2/walk and list2env will work. In order to make this work, we have to give the country entries in the list individual names first, otherwise list2env just overwrites the same object over and over again.
library(purrr)
groups <-
map2(groups, 1:length(groups), ~setNames(.x, paste0(names(.x), .y)))
walk(groups, ~list2env(. , envir = .GlobalEnv))
This would create the exact same results you are describing in your question. I am not sure though, if it is the best solution for a smooth workflow, since I don't know where you are going with this.

Accessing (attributes of) a list of variables from a data frame based on a vector

I have a (big) data frame with variables which each have a comment attribute.
# Basic sample data
df <- data.frame(a = 1:5, b = 5:1, c = 5:9, d = 9:5, e = 1:5)
comment(df$a) <- "Some explanation"
comment(df$b) <- "Some description"
comment(df$c) <- "etc."
I would like to extract the comment attributes for some of those variables, as well as a lit of possible values.
So I start by defining the list of variables I want to extract:
variables_to_extract = c("a", "b", "e")
I would normally work on a subset of the data frame, but then I cannot access the attributes (e.g., comment) nor the list of possible values of each variable.
library(tidyverse)
df %>% select(one_of(variables_to_export)) %>% comment()
# accesses only the 'comment' attribute of the whole data frame (df), hence NULL
I also tried to access through df[[variables_to_export]], but it generates an error...
df[[variables_to_export]]
# Error: Recursive Indexing failed at level 2
I wanted to extract everything into a data frame, but because of the recursive indexing error, it doesn't work.
meta <- data.frame(variable = variables_to_export,
description = comment(papers[[variables_to_export]]),
values = papers[[vairables_to_export]] %>%
unique() %>% na.omit() %>% sort() %>% paste(collapse = ", "))
# Error: Recursive Indexing failed at level 2
Since a data.frame is a list, you can use lapply or purrr::map to apply a function (e.g. comment) to each vector it contains:
library(tidyverse)
df %>% select(one_of(variables_to_extract)) %>% map(comment) # in base R, lapply(df[variables_to_extract], comment)
#> $a
#> [1] "Some explanation"
#>
#> $b
#> [1] "Some description"
#>
#> $e
#> NULL
To put it in a data.frame,
data_frame(variable = variables_to_extract,
# iterate over variable, subset df and if NULL replace with NA; collapse to chr
description = map_chr(variable, ~comment(df[[.x]]) %||% NA),
values = map(variable, ~df[[.x]] %>% unique() %>% sort()))
#> # A tibble: 3 x 3
#> variable description values
#> <chr> <chr> <list>
#> 1 a Some explanation <int [5]>
#> 2 b Some description <int [5]>
#> 3 e <NA> <int [5]>
This leaves values as a list column, which is usually more useful, but if you'd rather, add in toString to collapse it and use map_chr to simplify.
we can use Map from base R
Map(comment, df[variables_to_extract])
#$a
#[1] "Some explanation"
#$b
#[1] "Some description"
#$e
#NULL

Resources