I need an efficient way to convert the column names of a number of data frames to lowercase.
Suppose we have:
df1 <- data.frame(VAR1=c(1,2), VAR2=c("a", "b"))
df2 <- data.frame(VAR1=c(TRUE,FALSE), VAR2=c("foo", "bar"))
A simple way to get what I want is:
names(df1) <- tolower(names(df1))
names(df2) <- tolower(names(df2))
A little tedious if you have a large number of data frames, though.
I need something better.
I thought I could use get() in a loop:
my.files <- ls()
for(i in 1:2) names(get(my.files[i])) <- tolower(names(get(my.files[i])))
but it doesn't work. I couldn't find a solution using lapply() either.
Any suggestion to modify the column names of a large number of data frames without too much coding?
Here's a one-liner that uses setNames, which is a nice function for modifying the "names" attribute of an object without having to temporarily create a copy.
for(i in ls(pattern = "df")) assign(i, setNames(get(i), tolower(names(get(i)))))
df1
# var1 var2
# 1 1 a
# 2 2 b
df2
# var1 var2
# 1 TRUE foo
# 2 FALSE bar
Generally doing this kind of get and assign routine is discouraged. It's better to just put your data.frames in a list rather than a bunch of named objects in the .GlobalEnv. In your case, you could do something like the following:
a <- list(df1 = df1, df2 = df2)
a
# $df1
# VAR1 VAR2
# 1 1 a
# 2 2 b
#
# $df2
# VAR1 VAR2
# 1 TRUE foo
# 2 FALSE bar
lapply(a, function(x) setNames(x, tolower(names(x))))
# $df1
# var1 var2
# 1 1 a
# 2 2 b
#
# $df2
# var1 var2
# 1 TRUE foo
# 2 FALSE bar
Related
Consider the following list structure:
AA <- data.frame("variable1" = c("a", "b"), "variable2" = 1:2)
BB <- data.frame("variable1" = c("a", "b"), "variable2" = 3:4)
my_list <- list(AA=AA, BB=BB)
> my_list
$AA
variable1 variable2
1 a 1
2 b 2
$BB
variable1 variable2
1 a 3
2 b 4
Even though the individual list-elements in my_list have the same variable1 names a and b, those must be treated as unique to each list-element (as the real data has similarly duplicated variable names and values). Thus, I have two functions designed to manipulate each specific list-element:
AA_recoding <- function(x) {
x$variable1 <- x$variable1 %>%
recode("a" = "hello")
return(x)
}
BB_recoding <- function(x) {
x$variable1 <- x$variable1 %>%
recode("a" = "goodbye")
return(x)
}
My objective is to apply the AA_recoding function to the AA list-element, and the BB_recoding function to BB, to achieve an output like so:
$AA
variable1 variable2
1 hello 1
2 b 2
$BB
variable1 variable2
1 goodbye 3
2 b 4
This seems like a job for a purrr functions like map/imap, but I can't see a way to specifically orient my functions to their respective list-elements by name. My attempts using glue (and paste0) encounter the following errors:
> my_list %>% imap(~.x %>% glue("{.y}_recoding"))
Error: All unnamed arguments must be length 1
> my_list %>% map(~.x %>% paste0(.y,"_recoding"))
Error in paste0(., .y, "_recoding") :
the ... list contains fewer than 2 elements
Am I fundamentally approaching this problem in the correct way?
We may use map2 that applies corresponding functions to elements of the list by wrapping the functions in a list
library(purrr)
map2(my_list, list(AA_recoding, BB_recoding), ~ .y(.x))
#$AA
# variable1 variable2
#1 hello 1
#2 b 2
#$BB
# variable1 variable2
#1 goodbye 3
#2 b 4
Note that the above list (list(AA_recoding, BB_recoding)) was created manually in the same order of names as in 'my_list', but it can be automated as well with paste/str_c and mget (to return the value)
library(stringr)
map2(my_list, mget(str_c(names(my_list), '_recoding')), ~ .y(.x))
Or if we want to get the function value from the names of the list from imap, either get the value by wrapping with match.fun
my_list %>%
imap(~ match.fun(str_c(.y, '_recoding'))(.x))
or use get
my_list %>%
imap(~ get(str_c(.y, '_recoding'))(.x))
Depending on how many conditions you have, you could combine your function into a single recoding function and then use lapply to apply it conditionally based on the names of the items in the list.
This is a bit hacky, since lapply doesn't retain the names of individual lists. So, create a column in each data frame that corresponds to the name of the list, and then apply your new combined function using lapply.
new_list <- my_list
list_names <- c("AA", "BB")
for(i in 1:length(my_list)){
new_list[[i]]$name <- list_names[[i]]
}
> new_list # Looks like this
$AA
variable1 variable2 name
1 a 1 AA
2 b 2 AA
$BB
variable1 variable2 name
1 a 3 BB
2 b 4 BB
# Combined function
AA_BB_recoding <- function(x){
x$variable1 <- ifelse(x$name == "AA", x$variable1 %>%
recode("a" = "hello"), x$variable1 %>%
recode("a" = "goodbye"))
return(x)
}
> lapply(new_list, function(f) AA_BB_recoding(f))
# returns
$AA
variable1 variable2 name
1 hello 1 AA
2 b 2 AA
$BB
variable1 variable2 name
1 goodbye 3 BB
2 b 4 BB
Is it possible in R to create argument names in a function call dynamically?
For example, if we start with
name <- "variable"
I would like to create a new data frame like this
a.new.data.frame <- data.frame(name = c(1, 2))
which of course does not work.
The only solution I could invent was
arg <- list(c(1, 2))
names(arg) <- name
a.new.data.frame <- do.call(data.frame, arg)
a.new.data.frame
# variable
#1 1
#2 2
I don't like this code, since it seems not to be elegant.
Is there a better way to do it?
PS Important! This is a more general problem I have when writing R-programmes (e.g. when I use ggplot, and many other cases). So, I expect general solutions to this (creation of data.frame is only an example).
A more compact code for dynamic args could look like this:
df <- do.call(data.frame, list(name = c(1, 2)))
You could use the ?dotsMethods to encapsulate the do.call in a generic function like this to save the noisy list() part of the call:
call.with.dyn.args <- function(f, ...) {
args <- list(...)
do.call(f, args)
}
df1 <- call.with.dyn.args(data.frame, a = 1:2, b = letters[1:2])
df1
# a b
# 1 1 a
# 2 2 b
But you also have other options for dynamic argument passing to functions without a do.call, eg.:
dyn.values <- c(1:2)
name = "dyn.values"
df2 <- data.frame(dyn.values, # values from a variable
name = dyn.values, # values from a variable + new name
static.arg = letters[1:2], # usual direct passing of an arg
name.from.variable = get(name)) # get the values from a variable whose name is stored in another variable
df2
# dyn.values name static.arg name.from.variable
# 1 1 1 a 1
# 2 2 2 b 2
An option using tidyverse
library(tibble)
library(dplyr)
tibble(!! name := c(1, 2))
# A tibble: 2 x 1
# variable
# <dbl>
#1 1
#2 2
I need to merge several different dataframes.
On the one hand, I have several data frames with metadata A and, on the other hand, respective information B.
A.
[1] "LOJun_Meta" "LOMay_Meta" "VOJul_Meta" "VOJun_Meta" "VOMay_Meta" "ZOJun_Meta"
[7] "ZOMay_Meta"
B.
[1] "LOJun_All." "LOMay_all." "VOJul_All." "VOJun_all." "VOMay_all." "ZOJun_all."
[7] "ZOMay_all."
The names of the data frames are already in a list format (i.e. list1 and list2) and the data frames are already imported in R.
My aim is to create a loop which would merge dplyr > left-join the respective dataframes. For example:
LOJun_Meta + LOJun_All; LoMay_Meta + LOJun_all etc...
What I have a hard time on is creating the loop that would "synchronize" the "merging" procedure.
I am unsure if I should create a function which would have two inputs and would do such "merging".
It would be something like
merging(list1, list2){
for i in length(list):
left_join(list1[i], list[2], by = c("PrimaryKey" = "ForeignKey"))
}
I reckon the problem is that the function should refer to data frames which are not list1 & list2 values but data frame names stored in list1 & list2.
Any ideas?
Thanks a lot! Cheers
A diagram of what I intend to achieve is presented below:
[Diagram of loop - dplyr / several dataframes1
An example of what I am keen to automate would be this action:
ZOMay<- left_join(ZOMay_Meta, ZOMay_all., by = c("Primary Key" = "Foreign key"))
ZOJun<- left_join(ZOJun_Meta, ZOJun_all., by = c("Primary Key" = "Foreign Key"))
write.csv(ZOMay, file = "ZOMay_Consolidated.csv")
write.csv(ZOMay, file = "ZOJun_Consolidated.csv")
Here's an example of how you could build a reproducible example for your situation:
library(tidyverse)
df1a <- data_frame(id = 1:3, var1 = LETTERS[1:3])
df2a <- data_frame(id = 1:3, var1 = LETTERS[4:6])
df1b <- data_frame(id = 1:3, var2 = LETTERS[7:9])
df2b <- data_frame(id = 1:3, var2 = LETTERS[10:12])
list1 <- list(df1a, df2a)
list2 <- list(df1b, df2b)
Now as I understand it you want to do a left_join for df1a and df1b, as well as df2a and df2b. Instead of a loop, you can use map2 from the purrr package. This will iterate over two lists and apply a function to each pair of elements.
map2(list1, list2, left_join)
# [[1]]
# # A tibble: 3 x 3
# id var1 var2
# <int> <chr> <chr>
# 1 1 A G
# 2 2 B H
# 3 3 C I
#
# [[2]]
# # A tibble: 3 x 3
# id var1 var2
# <int> <chr> <chr>
# 1 1 D J
# 2 2 E K
# 3 3 F L
I have several data.frames in my Global Environment that I need to merge. Many of the data.frames have identical column names. I want to append a suffix to each column that marks its originating data.frame. Because I have many data.frames, I wanted to automate the process as in the following example.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj){
s <- sub('df','',eval(o))
names(get(o))[-1] <- paste0(names(get(o))[-1],'.',s)
}
# Error in get(o) <- `*vtmp*` : could not find function "get<-"'
But the individual pieces of the assignment work fine:
names(get(o))[-1]
# [1] "x"
paste0(names(get(o))[-1],'.',s)
# [1] "x.1"
I've used get in a similar way to write.csveach object to a file.
for(o in obj){
write.csv(get(o),file = paste0(o,'.csv'),row.names = F)
}
Any ideas why it's not working in the assignment to change the column names?
The error "could not find function get<-" is R telling you that you can't use <- to update a "got" object. You could probably use assign, but this code is already difficult enough to read. The better solution is to use a list.
From your example:
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
# put your data frames in a list
df_names = ls(pattern = "df[0-9]+")
df_names # make sure this is the objects you want
# [1] "df1" "df2"
df_list = mget(df_names)
# now we can use a simple for loop (or lapply, mapply, etc.)
for(i in seq_along(df_list)) {
names(df_list[[i]])[-1] =
paste(names(df_list[[i]])[-1],
sub('df', '', names(df_list)[i]),
sep = "."
)
}
# and the column names of the data frames in the list have been updated
df_list
# $df1
# id x.1
# 1 1 A
# 2 2 B
# 3 3 C
# 4 4 D
# 5 5 E
#
# $df2
# id x.2
# 1 1 F
# 2 2 G
# 3 3 H
# 4 4 I
# 5 5 J
It's also now easy to merge them:
Reduce(f = merge, x = df_list)
# id x.1 x.2
# 1 1 A F
# 2 2 B G
# 3 3 C H
# 4 4 D I
# 5 5 E J
For more discussion and examples, see How do I make a list of data frames?
Using setnames from library(data.table) you can do
for(o in obj) {
oldnames = names(get(o))[-1]
newnames = paste0(oldnames, ".new")
setnames(get(o), oldnames, newnames)
}
You can use eval which evaluate an R expression in a specified environment.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj) {
s <- sub('df', '', o)
new_name <- paste0(names(get(o))[-1], '.', s)
eval(parse(text = paste0('names(', o, ')[-1] <- ', substitute(new_name))))
}
modify df1 and df2
id x.1
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
Z = data.frame(var1 = c(1,2,3,4,5), var2 = LETTERS[1:5])
testfun <- function(x){
print(x) # prints the data
# but how to get names of the list coming in?
return(NULL)
}
res = lapply(Z, testfun)
I want to access variables "var1" and "var2" inside testfun. How do I retrieve those variables inside testfun? Does lapply even pass that information? colnames(x) does not work.
No, lapply doesn't pass this information to the function. You could lapply along the names and use subsetting to get the list content inside the function.
testfun <- function(nam, mylist){
print(nam) # prints the names
mylist[[nam]] #get list content using subsetting
}
res <- lapply(names(Z), testfun, mylist=Z)
# [1] "var1"
# [1] "var2"
res
# [[1]]
# [1] 1 2 3 4 5
#
# [[2]]
# [1] A B C D E
# Levels: A B C D E
Similar to #Roland's answer, I would just do the apply on 1:length(Z), and pass the list and the list names to the function.
nams <- names(Z)
testfun <- function(i,Z,nams){
print(Z[[i]])
print(nams[i])}
res <- lapply(1:length(Z),testfun,Z=Z,nams=nams)
If you just want to preserve the labels, you can use llply from plyr package.
testfun <- function(x){x}
res <- llply(Z,testfun)
res
result will be:
> res
$var1
[1] 1 2 3 4 5
$var2
[1] A B C D E
Levels: A B C D E