I have three data frame and I would like to perform some operation on them in the loop (transpose and assign names to columns). The problem with my code is that the data frames are not updated and the result is a completely new data frame.
df1 = data.frame(A = c(1, 2), B = c(1, 2))
df2 = data.frame(A = c(1, 2), B = c(1, 2))
df3 = data.frame(A = c(1, 2), B = c(1, 2))
names = c("df1", "df2", "df3")
for(df in names) {
df = get(names)
df = t(df)
colnames(df) = df[1, ]
df = df[-1, ]
}
My recommendation is place all of the dataframes in a list and then work with the list instead of the individual dataframes.
df1 = data.frame(A = c(1, 2), B = c(1, 2))
df2 = data.frame(A = c(1, 2), B = c(1, 2))
df3 = data.frame(A = c(1, 2), B = c(1, 2))
names = list(df1, df2, df3)
names<-lapply(names, function(df){
df = t(df)
colnames(df) = df[1, ]
df = df[-1, ]
})
Of course the list "names" have then updated dataframes and the original dataframes are untouched.
EDIT
In order to address your comment of reducing data redundancy. I tweaked your code and used the assign() function to update the data frames in the global environment.
df1 = data.frame(A = c(1, 2), B = c(1, 2))
df2 = data.frame(A = c(1, 2), B = c(1, 2))
df3 = data.frame(A = c(1, 2), B = c(1, 2))
names = c("df1", "df2", "df3")
for(name in names) {
df = get(name)
df = t(df)
colnames(df) = df[1, ]
df = df[-1, ]
assign(name, df)
}
Related
I have >100 dataframes loaded into R. I want to remove all the columns from all data frames containing a certain pattern, in the example case below "abc".
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
I would thus like to remove the column abc_1 in df1, e_abc in df2 and i_a_abc in df3. How could this be done?
Do all of your dataframes start with or contain a shared string (e.g., df)? If yes, then it might be easier to put all your dataframes in a list by using that shared string and then apply the function to remove the abc columns in every dataframe in that list.
You can then read your dataframes back into your environment with list2env(), but it probably is in your interest to keep everything in a list for convenience.
library(dplyr)
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
dfpattern <- grep("df", names(.GlobalEnv), value = TRUE)
dflist <- do.call("list", mget(dfpattern))
dflist <- lapply(dflist, function(x){ x <- x %>% select(!contains("abc")) })
list2env(dflist, envir = .GlobalEnv)
I am looking to mutate the same variables with two or more dataframes. What is the best way to implement to reduce redundant code?
library(dplyr)
df1 <- tibble(a = 0.125068, b = 0.144623)
df2 <- tibble(a = 0.226018, b = 0.423600)
df1 <- df1 %>%
mutate(a = round(a, 1),
b = round(b, 2))
df2 <- df2 %>%
mutate(a = round(a, 1),
b = round(b, 2))
It may be interesting to put the dataframes in a list first:
my_dfs <- list(df1, df2)
Then use a loop-apply function like lapply:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2))
If we really need the dataframes in the global environment, instead of in a dedicated list, we can simply call list2env(), as in:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2)) |>
list2env(envir = .GlobalEnv))
You could make a function
rnd <- function(x) {
x %>%
mutate(a = round(a, 1),
b = round(b, 2))
}
df1 %>% rnd()
I have a bunch of lists, and I want to rename the rows using a master data.frame.
Here's an example list I created:
df <- data.frame(Value = c(1, 0, 6, 3, 4))
rownames(df) <- c("B144","B211","B678","B111", "B7090")
df2 <- data.frame(Value = c(0, 0, 1, 4, 2))
rownames(df2) <- c("V2","V543","V6577","V4322", "V55")
List<-list(df,df2)
And here's the master data.frame:
master=data.frame(ID=c("B144", "B211", "B7090","B111", "B678", "B242", "V4322", "V2694", "V4399", "V543","V6577","V2", "V55", "V554", "V2009"),
rename=c("S1","S2", "S3","N1","N2","S4","S5","N3","N4","N5","S6","S7","S8", "N6", "N7"))
Is there a way to rename the row names in the lists using this master?
Here's example output I want:
df <- data.frame(Value = c(1, 0, 6, 3, 4))
rownames(df) <- c("S1","S2","N2","N1", "S3")
df2 <- data.frame(Value = c(0, 0, 1, 4, 2))
rownames(df2) <- c("S7","N5","S6","S5", "S8")
List<-list(df,df2)
The function is long in one line. So I define it separately for better readability:
func <- function (DF) {
lookup <- match(row.names(DF), master$ID)
row.names(DF) <- master$rename[lookup]
DF
}
List <- lapply(List, func)
I'm trying to loop through a list of data frames, dropping columns that don't match some condition. I want to change the data frames such that they're missing 1 column essentially. After executing the function, I'm able to change the LIST of data frames, but not the original data frames themselves.
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
l <- list(df1, df2, df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
When I call the list l, I get a list of data frames with the changes I want. When I call an element in the list, df1 or df2 or df3, they do not have a dropped column.
I've looked at this solution and many others, I'm obviously missing something.
l list and df1 , df2 etc. dataframes are independent. They have nothing to do with each other. One way to get new changed dataframes is to assign names to the list and create new dataframe.
l <- lapply(l, drop_col)
names(l) <- paste0("df", 1:3)
list2env(l, .GlobalEnv)
The problem is that when you are creating l, you are filling it with copies of your data frames df1, df2, df3.
In R, it is not generally possible to pass references to variables. One workaround is to create an environment as #Ronak Shah does.
Another is to use get() and <<- to change the variable within the function.
drop_cols <- function(x) {
for(iter in x)
do.call("<<-", list(iter, drop_col(get(iter))))
}
drop_cols(c("df1","df2","df3"))
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
# Name the list elements:
l <- list(df1 = df1, df2 = df2, df3 = df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
# View altered dfs:
View(l["df1"])
I have a list of SpatialPolygonDataFrame that I can assimilate to dataframe like this:
df.1 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 5:10))
df.2 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 2, 2, 7:10))
df.3 <- data.frame(A = c(1:10), B = c(1, 2, 2, 4:10))
list.df <- list(df.1, df.2, df.3)
I would like to get a list of a subset of each dataframe based on condition (list.df.sub is the result I am looking for):
df.1.sub <- subset(df.1, df.1$B != 2)
df.2.sub <- subset(df.2, df.2$B != 2)
df.3.sub <- subset(df.3, df.3$B != 2)
list.df.sub <- list(df.1.sub, df.2.sub, df.3.sub)
I would like to apply directly my subset on list.df. I know that I have to use lapply function but don't know how?
This should work:
lapply(list.df, function(x)x[x$B!=2,])
or with subset:
lapply(list.df, subset, B!=2)
If you only want to subset one column, you can also use the "[[" function
example_list <- list(iris, iris, iris)
lapply(example_list, "[[", "Species")