r - Split dataframe into multiple dataframes and save in environment - r

This is a follow up on this quesiton:
split into multiple subset of dataframes with dplyr:group_by? .
Reproducible example:
test <- data.frame(a = c(1,1,1,2,2,2,3,3,3), b = c(1:9))
I'm interested on how to save the dataframes from the following output:
test %>%
group_by(a) %>%
nest() %>%
select(data) %>%
unlist(recursive = F)
as separate dataframes in the environment ? The desired output is the following:
data1 <- data.frame(a = c(1,1,1), b = c(1:3))
data2 <- data.frame(a = c(2,2,2), b = c(4:6))
data3 <- data.frame(a = c(3,3,3), b = c(7:9))
There are many groups so automation is required giving: data1,data2,data3, ... data(n) dataframes.

If you want the dataframe names to be created automatically as well, you could try something like this.
test <- data.frame(a = c(1,1,1,2,2,2,3,3,3), b = c(1:9))
test
n <- length(unique(test$a))
eval(parse(text = paste0("data", seq(1:n), " <- ", split(test, test$a))))
eval(parse(text = paste0("data", seq(1:n), " <- as.data.frame(data", seq(1:3), ")")))

Related

How to mutate the same variables across two or more dataframes?

I am looking to mutate the same variables with two or more dataframes. What is the best way to implement to reduce redundant code?
library(dplyr)
df1 <- tibble(a = 0.125068, b = 0.144623)
df2 <- tibble(a = 0.226018, b = 0.423600)
df1 <- df1 %>%
mutate(a = round(a, 1),
b = round(b, 2))
df2 <- df2 %>%
mutate(a = round(a, 1),
b = round(b, 2))
It may be interesting to put the dataframes in a list first:
my_dfs <- list(df1, df2)
Then use a loop-apply function like lapply:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2))
If we really need the dataframes in the global environment, instead of in a dedicated list, we can simply call list2env(), as in:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2)) |>
list2env(envir = .GlobalEnv))
You could make a function
rnd <- function(x) {
x %>%
mutate(a = round(a, 1),
b = round(b, 2))
}
df1 %>% rnd()

Can you pipe data into a pairwise.t.test?

I'm wondering if the following code can be simplified to allow the data to be piped directly from the summarise command to the pairwise.t.test, without creating the intermediary object?
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT))
pairwise.t.test(x = data_for_PTT$meanRT, g = data_for_PTT$TT, paired = TRUE)
I tried x = .$meanRT but it didn't like it, returning:
Error in match.arg(p.adjust.method) :
'arg' must be NULL or a character vector
You can use curly braces:
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT)) %>%
{pairwise.t.test(x = .$meanRT, g = .$TT, paired = TRUE)}
Reproducible:
df <- data.frame(X1 = runif(1000), X2 = runif(1000), subj = rep(c("A", "B")))
df %>%
{pairwise.t.test(.$X1, .$subj, paired = TRUE)}

Rules to change words

I have a dataframe like this:
df <- data.frame(id = c(1,2), keywords = c("google, yahoo, air, cookie", "cookie, air"))
I would like to implement rules like the following:
stocks <- c("google, yahoo")
climate <- c("air")
cuisine <- c("cookie")
and take the results like this:
df_ne <- data.frame(id = c(1,2), keywords = c("stocks, climate, cuisine", "climate, cuisine")
How is it possible to make it?
You can use str_replace_all from stringr package
library(dplyr)
library(stringr)
df <- data.frame(id = c(1,2), keywords = c("google, yahoo, air, cookie", "cookie, air"))
df %>%
mutate(keywords = str_replace_all(keywords,
c("google, yahoo" = "stocks","air" = "climate", "cookie" = "cuisine")))
I liked the cholland answer (+1), but you can also use tidytext::unnest_tokens(), that is going to be easier imho if you're goint to have many more than six words.
First you can create a mapping df:
mapped <- rbind (data.frame(word_a = stocks, type = "stock", stringsAsFactors = F),
data.frame(word_a = climate, type = "climate", stringsAsFactors = F),
data.frame(word_a = cuisine, type = "cuisine", stringsAsFactors = F))
Now you can use the mentioned function to have a couple of unnested df to reach the goal:
library(tidytext)
library(stringr)
library(tidyverse)
mapped <- mapped %>% unnest_tokens(word, word_a)
df %>%
unnest_tokens(word, keywords) %>% # split words
left_join(mapped) %>% # join to map
group_by(id) %>% # group
summarise(keywords = str_c(unique(type), collapse = ",")) # collapse the word (unique)
# A tibble: 2 x 2
id keywords
<dbl> <chr>
1 1 stock,climate,cuisine
2 2 cuisine,climate
Note the second row has inverted words rather than your expected output due they are in that order the corrispondent words in the first df.
With data:
df <- data.frame(id = c(1,2), keywords = c("google, yahoo, air, cookie", "cookie, air"), stringsAsFactors = F)
stocks <- c("google, yahoo")
climate <- c("air")
cuisine <- c("cookie")
Here is a naïve solution to start with :
key <- list(
stocks = c("google", "yahoo"),
climate = "air",
cuisine = "cookie"
)
df2 <- df
#replace by the key
for (k in 1:length(key)){
for(sk in key[[k]]){
df2$keywords <- gsub(sk, names(key)[k], df2$keywords, fixed = TRUE)
}
}
#remove duplicated items
df2$keywords <- lapply(strsplit(df2$keywords, ", "), function(l) paste(unique(l), sep = ","))

How to flatten a list with tibbles and tibbles within lists to have all tibbles on the same level?

I have a list where the list elements are tibbles or lists that contain multiple tibbles. I would like to get a list where all the tibbles are on the same level.
How would I do that?
library(tibble)
tib_1 <- tibble(a = 1:4, b = LETTERS[1:4])
tib_2 <- tibble(c = 1:4, d = LETTERS[1:4])
tib_3 <- tibble(e = 1:4, f = LETTERS[1:4])
tib_4 <- tibble(g = 1:4, h = LETTERS[1:4])
my_list <- list(tib_1, tib_2, list(tib_3, tib_4))
desired_list <- list(tib_1, tib_2, tib_3, tib_4)
We can just use flatten
library(rlang)
out <- flatten(my_list)
-checking
identical(desired_list, out)
#[1] TRUE

Apply a function to multiple datasets using lapply

I have a large number of datasets for which I want to create the same variable. I would like to create a function to avoid having to repeat the same code many times.
I tried the code below: the first 3 lines describe the creation of the variable that I am trying to apply through the function created below.
data1 <- data1 %>%
dplyr::group_by(id)%>%
dplyr::mutate(new_var = sum(score))
list_data <- c(data1, data2, data3)
my_func <- function(x) {
x <- x %>%
dplyr::group_by(id) %>%
dplyr::mutate(new_var = sum(score))
}
lapply(list_data, my_func)
I obtain the error message
no applicable method for 'group_by' applied to an object of class
"character".
Could you please help me figure this out?
for me this works fine:
my_func <- function(x) {
x <- x %>%
dplyr::group_by(id) %>%
dplyr::mutate(new_var = sum(score))
}
data1 <- data.frame(id = rep(1:3, each = 3), score = 1:9)
data2 <- data.frame(id = rep(1:3, each = 3), score = 11:19)
data3 <- data.frame(id = rep(1:3, each = 3), score = 21:29)
list_data <- list(data1, data2, data3)
lapply(list_data, my_func)

Resources