Using lapply over a list and adding a column with data frame name - r

I have a list containing two data frames:
sample_list <- list("tables" = data.frame(weight = sample(1:50, 20, replace = T)),
"chairs" = data.frame(height = sample(1:50, 20, replace = T)))
I would like to use lapply to run a function over all the data frames in this list. In the output of each function, I need to create another column with the name of the source data frame (see mutate):
lapply(sample_list, function(x) {
x %>%
filter(x >= 20) %>%
mutate(groupName = names(x))
})
For some reason, I can't figure out how to make this work. How do I pass the name of the data frame into mutate? Right now it is returning the name of the first column in that data frame, rather than the name of the data frame itself.
Thanks!

We can loop through names of sample_list instead of looping through the list
lapply(names(sample_list), function(x) {
sample_list[[x]] %>%
filter_at(vars(1),~. >= 20) %>%
mutate(groupName = x)
})
Update Sep-2021
cleaner way using purrr::map
purrr::map(names(sample_list), ~sample_list[[.x]] %>%
filter_at(vars(1),~. >= 20) %>%
mutate(groupName = .x)
)

You can try purrr::imap() to map over both elements and elements' name.
# purrr::imap
purrr::imap(sample_list, function(element,name){
head(mutate(element,groupName = name))
})
# or mapply, but you need to specify names of the list
myfun <- function(element,name){
head(mutate(element,groupName = name))
}
mapply(myfun,sample_list,names(sample_list),SIMPLIFY = FALSE)
$tables
weight groupName
1 42 tables
2 24 tables
3 13 tables
4 31 tables
5 9 tables
6 27 tables
$chairs
height groupName
1 18 chairs
2 6 chairs
3 34 chairs
4 37 chairs
5 36 chairs
6 49 chairs

Using Map from base R
Map(function(dat, grp) cbind(dat, group_name = grp)[dat[[1]] > 20,],
sample_list, names(sample_list))

You can use Map with the function data.frame to add the names.
Map(`data.frame`, sample_list, groupName = names(sample_list))
#Map(`[<-`, sample_list, "groupName", value = names(sample_list)) #Alternative
#$tables
# weight groupName
#1 22 tables
#2 12 tables
#3 9 tables
#4 26 tables
#5 39 tables
#6 6 tables
#7 31 tables
#8 9 tables
#9 39 tables
#10 4 tables
#11 37 tables
#12 30 tables
#13 20 tables
#14 35 tables
#15 31 tables
#16 46 tables
#17 44 tables
#18 30 tables
#19 12 tables
#20 46 tables
#
#$chairs
# height groupName
#1 12 chairs
#2 17 chairs
#3 35 chairs
#4 40 chairs
#5 23 chairs
#6 21 chairs
#7 48 chairs
#8 24 chairs
#9 20 chairs
#10 41 chairs
#11 43 chairs
#12 45 chairs
#13 47 chairs
#14 13 chairs
#15 35 chairs
#16 32 chairs
#17 26 chairs
#18 34 chairs
#19 33 chairs
#20 8 chairs
In case it should also be subseted to those >= 20:
lapply(sample_list, function(x) x[x[,1] >= 20,, drop = FALSE])
When it should be done in one step I would use the way already posted by #akrun.

Related

Apply loop for rollapply windows

I currently have a dataset with 50,000+ rows of data for which I need to find rolling sums. I have completed this using rollaply which has worked perfectly. I need to apply these rolling sums across a range of widths (600, 1200, 1800...6000) which I have done by cut and pasting each line of script and changing the width. While it works, I'd like to tidy my script but applying a loop, or similar, if possible so that once the rollapply function has completed it's first 'pass' at 600 width, it then completes the same with 1200 and so on. Example:
Var1 Var2 Var3
1 11 19
43 12 1
4 13 47
21 14 29
41 15 42
16 16 5
17 17 16
10 18 15
20 19 41
44 20 27
width_2 <- rollapply(x$Var1, FUN = sum, width = 2)
width_3 <- rollapply(x$Var1, FUN = sum, width = 3)
width_4 <- rollapply(x$Var1, FUN = sum, width = 4)
Is there a way to run widths 2, 3, then 4 in a simpler way rather than cut and paste, particularly when I have up to 10 widths, and then need to run this across other cols. Any help would be appreciated.
We can use lapply in base R
lst1 <- lapply(2:4, function(i) rollapply(x$Var1, FUN = sum, width = i))
names(lst1) <- paste0('width_', 2:4)
list2env(lst1, .GlobalEnv)
NOTE: It is not recommended to create multiple objects in the global environment. Instead, the list would be better
Or with a for loop
for(v in 2:4) {
assign(paste0('width_', v), rollapply(x$Var1, FUN = sum, width = v))
}
Create a function to do this for multiple dataset
f1 <- function(col1, i) {
rollapply(col1, FUN = sum, width = i)
}
lapply(x[c('Var1', 'Var2')], function(x) lapply(2:4, function(i)
f1(x, i)))
Instead of creating separate vectors in global environment probably you can add these as new columns in the already existing dataframe.
Note that rollaplly(..., FUN = sum) is same as rollsum.
library(dplyr)
library(zoo)
bind_cols(x, purrr::map_dfc(2:4,
~x %>% transmute(!!paste0('Var1_roll_', .x) := rollsumr(Var1, .x, fill = NA))))
# Var1 Var2 Var3 Var1_roll_2 Var1_roll_3 Var1_roll_4
#1 1 11 19 NA NA NA
#2 43 12 1 44 NA NA
#3 4 13 47 47 48 NA
#4 21 14 29 25 68 69
#5 41 15 42 62 66 109
#6 16 16 5 57 78 82
#7 17 17 16 33 74 95
#8 10 18 15 27 43 84
#9 20 19 41 30 47 63
#10 44 20 27 64 74 91
You can use seq to generate the variable window size.
seq(600, 6000, 600)
#[1] 600 1200 1800 2400 3000 3600 4200 4800 5400 6000

Fill in empty values in column of dataframe by condition

I have the followin dataframe. Now I want to fill in the empty values in "product" by determining the value of the code 44 and 90. 44 should be "shirt" and 90 "sweater".
What's the best way to do this? With a for loop?
data = data.frame("code" = c(44,78,21,90,100,44,90), "product" = c("","hat","shoe","","umbrella","",""))
> data
code product
1 44
2 78 hat
3 21 shoe
4 90
5 100 umbrella
6 44
7 90
Using dplyr first convert the product variable to character (from factor), then use case_when
library(dplyr)
data %>%
mutate_if(is.factor, as.character) %>%
mutate(product = case_when(product == "" & code == 44 ~ "shirt",
product == "" & code == 90 ~ "sweater",
TRUE ~ product))
code product
1 44 shirt
2 78 hat
3 21 shoe
4 90 sweater
5 100 umbrella
6 44 shirt
7 90 sweater
Using base, same idea - first convert factors to character than then use ifelse
i <- sapply(data, is.factor)
data[i] <- lapply(data[i], as.character)
data$product[data$product == ""] <- ifelse(data$code[data$product == ""] == 44, "shirt", "sweater")
data
code product
1 44 shirt
2 78 hat
3 21 shoe
4 90 sweater
5 100 umbrella
6 44 shirt
7 90 sweater
Also worth noting, if you use data.frame with stringsAsFactors = FALSE all the factor converting becomes unnecessary.
You can use match and use the indices for subsetting.
i <- match(data$code, c(44, 90))
j <- !is.na(i)
data$product[j] <- c("shirt", "sweater")[i[j]]
data
# code product
#1 44 shirt
#2 78 hat
#3 21 shoe
#4 90 sweater
#5 100 umbrella
#6 44 shirt
#7 90 sweater

How to generate z-scores separately for each factor level using a loop in R?

I would like to convert variable to z-scores. How to do that for each factor cell level separately using a loop?
Example DATA:
df = data.frame(Cell = c(rep("13a",5),rep("1b",5),rep("5b",5)),
condition = rep(c("a","b","c","d","e"),3),
variable = c(58,55,36,29,53,57,53,54,52,52,45,49,48,46,45))
Is this a good start?... Maybe the loop is not necessary buy I would like to learn how to write loops...
# Final data frame containing the results of all loops
df_z = data.frame()
# Loop through by cell
for (i in 1:unique(df$Cell)) {
df_z$myZ <- scale(variable)
}
It can be done with a group_by operation
library(dplyr)
df %>%
group_by(Cell) %>%
mutate(myZ = as.numeric(scale(variable)))
Or with data.table
library(data.table)
setDT(df)[, myZ := as.numeric(scale(variable)), by = Cell][]
In the case for for loop, we can subset the in each of the iteration and assign the scaled values to the created 'myZ' variable
un1 <- unique(df$Cell)
df$myZ <- NA
for(un in un1) {
i1 <- df$Cell == un
df$myZ[i1] <- as.numeric(scale(df$variable[i1]))
}
Or with split
df$myZ <- unsplit(lapply(split(df$variable, df$Cell), scale), df$Cell)
We can use ave in base R :
df$myZ <- with(df, ave(variable, Cell, FUN = scale))
df
# Cell condition variable myZ
#1 13a a 58 0.917
#2 13a b 55 0.684
#3 13a c 36 -0.792
#4 13a d 29 -1.336
#5 13a e 53 0.528
#6 1b a 57 1.640
#7 1b b 53 -0.289
#8 1b c 54 0.193
#9 1b d 52 -0.772
#10 1b e 52 -0.772
#11 5b a 45 -0.881
#12 5b b 49 1.321
#13 5b c 48 0.771
#14 5b d 46 -0.330
#15 5b e 45 -0.881

How to change column names for mrset in R?

I am trying to create crosstabs I have a dataframe in which I have multiple select questions. I am importing the data frame from SPSS file using foreign and expss package. I am creating the multiple select questions using the mrset function. Here's the demo code for this to make it clear.
Banner1 = w %>%
tab_cells(mrset(as.category( temp1,counted_value = "Checked"))) %>%
tab_cols(total(),mrset(as.category( temp2, counted_value = "Checked"))) %>%
tab_stat_cases(total_row_position = "none",label = "")
tab_pivot(Banner1)
The datatable imported looks like this
Total Q12_1 Q12_2 Q12_3 Q12_4 Q12_5
A B C D E F
Total Cases 803 34 18 14 38 37
Q13_1 64 11 7 8 9 7
Q13_2 12 54 54 43 13 12
Q13_3 67 54 23 21 6 4
Sorry about the alignment here....So this is the imported dataset.
Coming to the problem, As you can see this dataset has column labels as Question numbers and not variable labels. For single select questions everything works fine. Is there any function I can change the colnames for mrset functions dynamically?
The desired output should be something like this. For eg,
Total Apple Mango Banana Orange Grapes
A B C D E F
Total Cases 803 34 18 14 38 37
Apple 64 11 7 8 9 7
Mango 12 54 54 43 13 12
banana 67 54 23 21 6 4
Any help would be greatly appreciated.

Split a vector list with M elements into 2 lists of N and M-N elements

I created a vector list, aa, with 50 elements. And I need to split aa into two vector lists called bb and cc. bb has the first 20 elements of aa while cc has the last 30 elements of aa. How do I do it?
Creation of original vector list
aa <- list (sample (1:50))
aa
#[[1]]
# [1] 29 30 39 45 17 11 43 14 24 34 3 1 28 2 21 23 6 31 5 27 44 7 4 46 49 22 33 38 50 36 15 48 8 16 25 42 13 41 47
#[40] 37 26 32 35 9 18 10 20 40 19 12
Sorry all, I know my question is really basic. Maybe it is because the question is too simple and the solution is thus not easily found from the internet.
Since I couldn't a direct question answering this adding an answer. We can first subset the list using [[ and then select individual elements in it with [.
bb <- aa[[1]][1:20]
cc <- aa[[1]][21:50]
We can also use head and tail to select first 20 and last 30 elements respectively.
bb <- head(aa[[1]], 20)
cc <- tail(aa[[1]], 30)
We can use split to create a list of vectors
lst1 <- split(aa[[1]], rep(1:2, c(20, 30)))
and extract the vector with [[
lst[[1]]
lst1[[2]]
It can be extended to any number of splits (i.e. generalized version) where we just need to change the rep

Resources