Rename 1 Column in every dataframe in a List [duplicate] - r

This question already has an answer here:
Rename Columns of Data.frame in list
(1 answer)
Closed 4 years ago.
I'm trying to rename one column in a dataframe list.
my_list <- list(data.frame(a = 1:5, b = 1:5), data.frame(a = 1:5, b = 1:5))
[[1]]
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
lapply(my_list, function(x){
k <- my_list[[ x ]]
# set 2nd column to a new name
names(k)[2] <- "NEW COLUMN"
# return
})
This is the output I hope to achieve
[[1]]
a NEW COLUMN
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
a NEW COLUMN
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
However, my lapply does not seem to work. The error code is below:
Error in my_list[[x]] : invalid subscript type 'list'

We are looping through the list itself and not its index. So, in the anonymous call, 'x' is the value i.e. the element data.frame of the list.
lapply(my_list, function(x) {names(x)[2] <- "NEW COLUMN"; x})
Suppose, if we loop through the sequence, the OP's code would be right
lapply(seq_along(my_list), function(i) {
k <- my_list[[ i ]] # extracted the list element
names(k)[2] <- "NEW COLUMN"
k
})

Related

I have a list of data frames and a character vector. I want to rename the second column of each data frame by iterating through the vector. How do I?

I have a list of dataframes. Each of these dataframes has the same number of columns and rows, and has a similar data structure:
df.list <- list(data.frame1, data.frame2, data.frame3)
I have a vector of characters:
charvec <- c("a","b","c")
I want to replace the column name of the second column in each data frame by iterating through the above character vector. For example, the first data frame's second column should be "a". The second data frame's second column should be "b".
[[1]]
col1 a
1 1 2
2 2 3
[[2]]
col1 b
1 1 2
2 2 3
A reproducible example:
charvec <- c("a","b","c")
df_list <- list(df1 = data.frame(x = seq_len(3), y = seq_len(3)), df2 = data.frame(x = seq_len(4), y = seq_len(4)), df3 = data.frame(x = seq_len(5), y = seq_len(5)))
for(i in seq_along(df_list)){
names(df_list[[i]])[2] <- charvec[i]
}
> df_list
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Also can use map2 from purrr. Thanks to #ismirsehregal for example data.
library(purrr)
map2(
df_list,
charvec,
\(x, y) {
names(x)[2] <- y
x
}
)
Output
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

Remove the last column of dataframe in R in a function

I need to remove the last column of 10 dataframes, so I decided to put it in lapply(). I wrote a function to remove the col, like below,
remove_col <- function(mydata){
mydata = subset(mydata, select=-c(24))
}
and create a mylist <- (data1, data2.... data10), then I passed lapply as
lapply(mylist, FUN = remove_col)
It did give me a list of the removed dataframe, however, when I checked the original dataframe, the last column is still there.
How should I change the code to change the original dataset?
You need to assign the result of the function call to the input list on the LHS:
mylist <- lapply(mylist, FUN = remove_col)
Had you defined your function with an explicit return value, this might have been more obvious:
remove_col <- function(mydata) {
mydata <- subset(mydata, select=-c(24))
return(mydata) # return the modified list/data frame
}
Instead of hardcoding the column number to remove you can use ncol to remove the last column from each dataframe.
remove_col <- function(mydata){
mydata[, -ncol(mydata)]
}
mylist <- lapply(mylist, remove_col)
To see the changes in the original dataframe you can assign names to list of dataframe and use list2env.
names(mylist) <- paste0('data', seq_along(mylist))
list2env(mylist, .GlobalEnv)
Using base R and lapply, Note, you can remove ", drop = F" from your script if there are more than 2 columns in all dataframes in the list.
> d1
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
> d2
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> mylist <- list(d1, d2)
> mylist
[[1]]
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
[[2]]
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> lapply(mylist, function(x) x[,1:(ncol(x)-1), drop = F] )
[[1]]
c1
1 1
2 2
3 3
4 4
5 5
[[2]]
c1
1 5
2 4
3 3
4 2
5 1
>

Changing a subset of column names in a list of data frames in R

This question is an extension of Changing Column Names in a List of Data Frames in R.
That post addresses changing names of all columns of a data.frame.
But how do you change the names of only a selected number of columns?
Example:
I want to change the name of the first column only in each data.frame in my list:
dat <- data.frame(Foo = 1:5,Bar = 1:5)
lst <- list(dat,dat)
print(lst)
[[1]]
Foo Bar
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
Foo Bar
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
(Failed) Attempts:
lapply(1:2, function(x) names(lst[[x]])[names(lst[[x]]) == 'Foo'] <- 'New')
lapply(1:2, function(x) names(lst[[x]])[names(lst[[x]]) == 'Foo']) <- rep('New',2)
lapply(1:2, function(x) setNames(lst[[x]][names(lst[[x]]) == 'Foo'],'New'))
Here is one possibility using setNames and gsub:
# Sample data
dat <- data.frame(Foo = 1:5,Bar = 1:5)
lst <- list(dat,dat[, 2:1])
# Replace Foo with FooFoo
lst <- lapply(lst, function(x) setNames(x, gsub("^Foo$", "FooFoo", names(x))) )
#[[1]]
# FooFoo Bar
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#
#[[2]]
# Bar FooFoo
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
Two problems with your attempts:
It's weird to use lapply(1:2, ...) instead of lapply(lst, ...). This makes your anonymous function more awkward.
Your anonymous function doesn't return the data frame. The last line of a function is returned (in absence of a return() statement). In your first attempt, the value of the last line is just the value assigned, "new" - we need to return the whole data frame with the modified name.
Solution:
lapply(lst, function(x) {names(x)[names(x) == 'Foo'] <- 'New'; x})
# [[1]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
#
# [[2]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
Here is a way to change the name of the column by column index.
lapply(lst, function(x, pos = 1, newname = "New"){
# x: data frame, pos: column index, newname: new name of the column
column <- names(x)
column[pos] <- newname
names(x) <- column
return(x)
})
# [[1]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
#
# [[2]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
I posted this answer before I saw an updated comment from the OP saying that the index of the target column from each data frame could be different. This is not mentioned in the original post. Please see others' post as my answer only works if the column index is consistent.
My solution is more complicated than the others but here it goes.
The main difference is that instead of == it uses grep (with argument ignore.case = TRUE).
lapply(lst, function(DF) {
inx <- grep("^foo$", names(DF), ignore.case = TRUE)
names(DF)[inx] <- "New"
DF
})
#[[1]]
# New Bar
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#
#[[2]]
# New Bar
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
Using tidyverse:
library(tidyverse)
map(lst,rename_at,"Foo",~"New")
# [[1]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
#
# [[2]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
Using data.table:
library(data.table)
lst2 <- copy(lst)
lapply(lst2,setnames,"Foo","New")
# [[1]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
#
# [[2]]
# New Bar
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
Here changes are made by reference so we make a copy first.
Note without the assignment, it doesn't change the original object.
lst <- purrr::map(lst, ~setNames(.x, c('new', names(.x)[-1])))

How to get the index column in a data frame in R [duplicate]

Starting with a data.frame...
df = data.frame(k=c(1,5,4,7,6), v=c(3,1,4,1,5))
> df
k v
1 1 3
2 5 1
3 4 4
4 7 1
5 6 5
I might run some number of arbitrary manipulations...
> foo1 = df[df$k>3,]
> foo2 = head(foo1[order(foo1$v),], 2)
> foo2
k v
2 5 1
4 7 1
At this point foo2 has somehow retained the original row numbers fromdf (in this case 2 and 4).
How do I extract these?
> insert_magic_function_here(foo2)
[1] 2 4
I think you're looking for rownames.

Return row number(s) for a particular value in a column in a dataframe

I have a data frame (df) and I was wondering how to return the row number(s) for a particular value (2585) in the 4th column (height_chad1) of the same data frame?
I've tried:
row(mydata_2$height_chad1, 2585)
and I get the following error:
Error in factor(.Internal(row(dim(x))), labels = labs) :
a matrix-like object is required as argument to 'row'
Is there an equivalent line of code that works for data frames instead of matrix-like objects?
Any help would be appreciated.
Use which(mydata_2$height_chad1 == 2585)
Short example
df <- data.frame(x = c(1,1,2,3,4,5,6,3),
y = c(5,4,6,7,8,3,2,4))
df
x y
1 1 5
2 1 4
3 2 6
4 3 7
5 4 8
6 5 3
7 6 2
8 3 4
which(df$x == 3)
[1] 4 8
length(which(df$x == 3))
[1] 2
count(df, vars = "x")
x freq
1 1 2
2 2 1
3 3 2
4 4 1
5 5 1
6 6 1
df[which(df$x == 3),]
x y
4 3 7
8 3 4
As Matt Weller pointed out, you can use the length function.
The count function in plyr can be used to return the count of each unique column value.
which(df==my.val, arr.ind=TRUE)

Resources