How to pass argument into user defined function when using lapply - r

I have a list of dataframes and I want to apply a custom function to it using lapply
Here is my function:
rename_cols_pattern <- function (df, pattern, replacement = "") {
names(df) <- gsub(names(df), pattern = pattern, replacement = replacement)
}
How do I use this function with lapply? This does not work because the df variable is missing. How do I pass in the df variable which would be the dataframes in the di_data list
di_data <- lapply(di_data, rename_cols_pattern(pattern = "X"))
I can get this to work like so:
di_data <- lapply(di_data, function(x) {
names(x) <- gsub(names(x), pattern = "X", replacement = "")
x
})
However I want the function to be separate and want to understand how to achieve this

You probably missed the return statement of your function.
rename_cols_pattern <- function(df, pattern, replacement="") {
names(df) <- gsub(names(df), pattern=pattern, replacement=replacement)
return(df)
}
Normal usage:
rename_cols_pattern(dat, pattern="X", replacement="COL")
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Using lapply:
lapply(list(dat, dat), rename_cols_pattern, pattern="X", replacement="COL")
# [[1]]
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
#
# [[2]]
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Data:
dat <- structure(list(X1 = 1:3, X2 = 4:6, X3 = 7:9, X4 = 10:12), class = "data.frame", row.names = c(NA,
-3L))

rename_with was created to solve these kind of problems
library(tidyverse)
mtcars %>%
rename_with(.fn = ~ str_remove_all(.x,"X"))

Related

fill list of list by summing rows in each data frame in all combinations

I have 4 data frames saved as .csv files or .txt (df1.csv,df2.csv,df3.txt,df4.txt).
I would like to sum all the rows of df1 with df3 and df4 and of df2 with df3 and df4 in a loop. I would like to store the results in two separate sublists (one for df1 and one for df2) of a main list.
Example:
df1 df3
colA colA
5 1
1 4
3 1
df2 df4
colA colA
0 0
2 0
1 9
Output:
I would like to have a list_ALL which contains listDF1 with the results of the sum of df1 with df3 and df4 and a listDF2 which contains results of df2 with df3 and df4
LISTDF1
df5 df6
colA colA
6 5
5 1
4 12
LISTDF2
df7 df8
colA colA
1 0
6 2
2 10
list_ALL<-list()
files.csv<-list.files(pattern = "*.csv")
files.txt<-list.files(pattern = "*.txt")
for (i in 1:length(files.csv)) {
list_ALL[[i]]<-list()
}
names(listALL)<-files.csv
for (i in 1:length(files.csv))
for (j in 1:length(files.txt))
{{ list_ALL[[i]][[j]] <- rowSums([[i]][[j]])}}
I tried this however, only the first one of the sublist gets filled up.
Something like this?
lst1 <- list(df1<-c(5,1,3), df2<-c(0,2,1))
lst2 <- list(df3<-c(1,4,1), df4<-c(0,0,9))
res_lst <- list()
for(i in seq_along(lst1)){
for(j in seq_along(lst2)){
res <- lst1[[i]]+lst2[[j]]
res_lst <- append(res_lst, list(res))
}
}
splt_lst <- split(x = res_lst, f = rep(1:2, each=2))
$`1`
$`1`[[1]]
[1] 6 5 4
$`1`[[2]]
[1] 5 1 12
$`2`
$`2`[[1]]
[1] 1 6 2
$`2`[[2]]
[1] 0 2 10
Another possible solution:
set.seed(123)
df1 <- data.frame(colA = sample(1:10, 3))
df2 <- data.frame(colA = sample(1:10, 3))
df3 <- data.frame(colA = sample(1:10, 3))
df4 <- data.frame(colA = sample(1:10, 3))
lapply(list(df1, df2),
\(x) asplit(sapply(list(df3, df4), \(y) rowSums(cbind(x,y))), 2))
#> [[1]]
#> [[1]][[1]]
#> [1] 8 14 8
#>
#> [[1]][[2]]
#> [1] 9 19 4
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 7 10 9
#>
#> [[2]][[2]]
#> [1] 8 15 5
for (j in 1:length(files))
{
cog_out_dat[[j]]<- lapply(meta_exp_dat,function(df) try(read_outcome_data(
snps = df$SNP,
filename = files[[j]],
gene_col = "anno",
sep = " ",
chr_col = "CHR",
pos_col = "BP",
snp_col = "SNP",
beta_col = "BETA",
se_col = "SE",
effect_allele_col = "ALLELE1",
other_allele_col = "ALLELE0",
eaf_col = "EAF",
pval_col = "P",
phenotype_col = "DATA"
)))
}

Split df column of integers into individual digits in R

I have a df where one variable is an integer. I'd like to split this column into it's individual digits. See my example below
Group Number
A 456
B 3
C 18
To
Group Number Digit1 Digit2 Digit3
A 456 4 5 6
B 3 3 NA NA
C 18 1 8 NA
We can use read.fwf from base R. Find the max number of character (nchar) in 'Number' column (mx). Read the 'Number' column after converting to character (as.character), specify the 'widths' as 1 by replicating 1 with mx and assign the output to new 'Digit' columns in the data
mx <- max(nchar(df1$Number))
df1[paste0("Digit", seq_len(mx))] <- read.fwf(textConnection(
as.character(df1$Number)), widths = rep(1, mx))
-output
df1
# Group Number Digit1 Digit2 Digit3
#1 A 456 4 5 6
#2 B 3 3 NA NA
#3 C 18 1 8 NA
data
df1 <- structure(list(Group = c("A", "B", "C"), Number = c(456L, 3L,
18L)), class = "data.frame", row.names = c(NA, -3L))
Another base R option (I think #akrun's approach using read.fwf is much simpler)
cbind(
df,
with(
df,
type.convert(
`colnames<-`(do.call(
rbind,
lapply(
strsplit(as.character(Number), ""),
`length<-`, max(nchar(Number))
)
), paste0("Digit", seq(max(nchar(Number))))),
as.is = TRUE
)
)
)
which gives
Group Number Digit1 Digit2 Digit3
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
Using splitstackshape::cSplit
splitstackshape::cSplit(df, 'Number', sep = '', stripWhite = FALSE, drop = FALSE)
# Group Number Number_1 Number_2 Number_3
#1: A 456 4 5 6
#2: B 3 3 NA NA
#3: C 18 1 8 NA
Updated
I realized I could use max function for counting characters limit in each row so that I could include it in my map2 function and save some lines of codes thanks to an accident that led to an inspiration by dear #ThomasIsCoding.
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>%
rowwise() %>%
mutate(map2_dfc(Number, 1:max(nchar(Number)), ~ str_sub(.x, .y, .y))) %>%
unnest(cols = !c(Group, Number)) %>%
rename_with(~ str_replace(., "\\.\\.\\.", "Digit"), .cols = !c(Group, Number)) %>%
mutate(across(!c(Group, Number), as.numeric, na.rm = TRUE))
# A tibble: 3 x 5
Group Number Digit1 Digit2 Digit3
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
Data
df <- tribble(
~Group, ~Number,
"A", 456,
"B", 3,
"C", 18
)
Two base r methods:
no_cols <- max(nchar(as.character(df1$Number)))
# Using `strsplit()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(strsplit(as.character(df1$Number), ""),
function(x) {
length(x) <- no_cols
x
}
)
)
), paste0("Digit", seq_len(no_cols))))
# Using `regmatches()` and `gregexpr()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(regmatches(df1$Number, gregexpr("\\d", df1$Number)),
function(x) {
length(x) <- no_cols
x
}
)
)
), paste0("Digit", seq_len(no_cols))))

Change the column names of a list of dataframes in R [duplicate]

This question already has answers here:
Changing Column Names in a List of Data Frames in R
(6 answers)
Rename Columns of dataframe based on names of list in R
(2 answers)
Closed 2 years ago.
I have a list of dataframes in this form.
d1 <- data.frame(i = c("a","b","c"), var = 1:3, stringsAsFactors=FALSE)
d2 <- data.frame(i = c("b","c","d"), var = 5:8, stringsAsFactors=FALSE)
d3 <- data.frame(i = c("c","d","a"), var = 2:4, stringsAsFactors=FALSE)
dfList <- list(d1,d2,d3)
I want to change the var variables to var_d1, var_d2, var_d3 respectively to do a full-join later. How do I implement this? How do I retrive the name of the data frames and make them into strings?
Start with naming the list
names(dfList) <- paste0('d', seq_along(dfList))
Once you do that you can use Map to rename columns :
Map(function(x, y) {names(x)[-1] <- paste(names(x)[-1], y, sep = "_");x},
dfList, names(dfList))
#$d1
# i var_d1
#1 a 1
#2 b 2
#3 c 3
#$d2
# i var_d2
#1 b 5
#2 c 6
#3 d 7
#$d3
# i var_d3
#1 c 2
#2 d 3
#3 a 4
Or in tidyverse :
library(dplyr)
library(purrr)
imap(dfList, function(x, y) x %>% rename_with(~paste(., y, sep = "_"), -1))
dfList <- mget(paste0("d", 1:3))
mapply(function(df, name) {
names(df)[names(df) == "var"] <- paste0("var_", name)
df
}, dfList, names(dfList), SIMPLIFY = FALSE)
#> $d1
#> i var_d1
#> 1 a 1
#> 2 b 2
#> 3 c 3
#>
#> $d2
#> i var_d2
#> 1 b 5
#> 2 c 6
#> 3 d 7
#>
#> $d3
#> i var_d3
#> 1 c 2
#> 2 d 3
#> 3 a 4
To changes the variables and then save them in a list of strings you can do something like this.
(I think you made a mistake in d2 so I changed it)
d1 <- data.frame(i = c("a","b","c"), var = 1:3, stringsAsFactors=FALSE)
d2 <- data.frame(i = c("b","c","d"), var = 5:7, stringsAsFactors=FALSE)
d3 <- data.frame(i = c("c","d","a"), var = 2:4, stringsAsFactors=FALSE)
dfList <- list(d1,d2,d3)
column_names <- list()
for (i in 1:length(dfList)){
colnames(dfList[[i]]) <- c("i",paste0("var_d",i))
column_names[[i]] <- names(dfList[[i]])
}
# they are stored here
column_names
[[1]]
[1] "i" "var_d1"
[[2]]
[1] "i" "var_d2"
[[3]]
[1] "i" "var_d3"
Maybe we can try the code below
> Map(function(k) setNames(dfList[[k]],c("i",paste0("var_d",k))),seq_along(dfList))
[[1]]
i var_d1
1 a 1
2 b 2
3 c 3
[[2]]
i var_d2
1 b 6
2 c 7
3 d 8
[[3]]
i var_d3
1 c 2
2 d 3
3 a 4
An approach quite similar to the ones proposed using Map, that uses lapply instead:
dfList <- lapply(
1:length(dfList),
function(x) setNames(dfList[[x]],
c('i', paste0('var_d', x))
)
)

How to do a mutate_all within a list of dataframes to remove bad data [duplicate]

This question already has an answer here:
lapply and mutate_all/for loops
(1 answer)
Closed 2 years ago.
Sample data:
dat1 <- structure(list(id = 1:3, des.1 = 4:6, x = 7:9, not = 10:12), class = "data.frame", row.names = c(NA,-3L))
dat2 <- structure(list(id = 1:3, descript = 4:6, y = 7:9, yes = 10:12), class = "data.frame", row.names = c(NA,-3L))
dat3 <- structure(list(id = 1:3, description = 4:6, x = 7:9, X4 = 10:12), class = "data.frame", row.names = c(NA,-3L))
dat1[1,2] <- "ERROR"
dat2[2,1] <- "ERROR"
dat_list <- list(dat1, dat2, dat3)
How can I set all instances of 'ERROR' to 0 within this list of dataframe? If possible, a plyr solution would be preferred.
Many thanks.
You can use map to iterate over list :
library(dplyr)
library(purrr)
map(dat_list, ~.x %>% mutate_all(~replace(., . == 'ERROR', 0)) %>% type.convert)
In new dplyr you can use across :
map(dat_list, ~.x %>%
mutate(across(everything(), ~replace(., . == 'ERROR', 0))) %>%
type.convert)
In base R, we can use lapply :
lapply(dat_list, function(x) {x[x == 'ERROR'] <- 0;type.convert(x)})
#[[1]]
# id des.1 x not
#1 1 0 7 10
#2 2 5 8 11
#3 3 6 9 12
#[[2]]
# id descript y yes
#1 1 4 7 10
#2 0 5 8 11
#3 3 6 9 12
#[[3]]
# id description x X4
#1 1 4 7 10
#2 2 5 8 11
#3 3 6 9 12

Apply na.fill to every column

I have a dataset that looks like this:
Col1 Col2 Col3 Col4 Col5
A B 4 5 7
G H 5 6 NA
H I NA 9 8
K F 9 NA NA
E L NA 8 9
H I 1 0 10
How do I apply the na.fill() function to all the columns after Col2?
If I were to do it individually, it would be something like this:
df$Col3<-na.fill(df$Col3, c(NA, "extend", NA))
df$Col4<-na.fill(df$Col4, c(NA, "extend", NA))
df$Col5<-na.fill(df$Col5, c(NA, "extend", NA))
The problem is that my actual dataframe has over 100 columns. Is there a quick way to apply this function to all the columns after the first 2?
na.fill does handle multiple columns. Really no need to use lapply, mutate, etc. Just replace the relevant columns with the result of running na.fill on those same columns. If you know what ix is then you could replace the first line with it so that in this example we could alternately use ix <- 3:5 or ix <- -(1:2) .
ix <- sapply(DF, is.numeric)
replace(DF, ix, na.fill(DF[ix], c(NA, "extend", NA)))
giving:
Col1 Col2 Col3 Col4 Col5
1 A B 4 5.0 7.0
2 G H 5 6.0 7.5
3 H I 7 9.0 8.0
4 K F 9 8.5 8.5
5 E L 5 8.0 9.0
6 H I 1 0.0 10.0
Note that you could alternately use na.approx:
replace(DF, ix, na.approx(DF[ix], na.rm = FALSE))
Note
Lines <- "Col1 Col2 Col3 Col4 Col5
A B 4 5 7
G H 5 6 NA
H I NA 9 8
K F 9 NA NA
E L NA 8 9
H I 1 0 10"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, strip.white = TRUE)
The mutate_-family of functions in the dplyr package would do the trick.
There are a few ways to do this. Some may work better than others depending on what your other columns look like. Here are three versions that would work better in different circumstances.
# Make dummy data.
df <- data.frame(
Col1 = LETTERS[1:6],
Col2 = LETTERS[7:12],
Col3 = c(4, 5, NA, 9, NA, 1),
Col4 = c(5,6,9,NA,8,0),
Col5 = c(7,NA,8,NA,9,10)
)
You can apply the na.fill function to columns specified by name vector. This is useful if you want to use a regular expression to select columns with certain name parts.
cn <- names(df) %>%
str_subset("[345]") # Column names with 3, 4 or 5 in them.
result_1 <- df %>%
mutate_at(vars(cn),
zoo::na.fill, c(NA, 'extend', NA)
)
You can apply the na.fill function to any numeric column.
result_2 <- df %>%
mutate_if(is.numeric, # First argument is function that returns a logical vector.
zoo::na.fill, c(NA, 'extend', NA)
)
You can apply the function to columns specified in an numeric index vector.
result_3 <- df
result_3[ , 3:5] <- result_3[ , 3:5] %>% # Just replace columns 3 through 5
mutate_all(
zoo::na.fill, c(NA, 'extend', NA)
)
In this case, all three versions should have done the same thing.
all.equal(result_1, result_2) # TRUE
all.equal(result_1, result_3) # TRUE

Resources