Remove the last column of dataframe in R in a function - r

I need to remove the last column of 10 dataframes, so I decided to put it in lapply(). I wrote a function to remove the col, like below,
remove_col <- function(mydata){
mydata = subset(mydata, select=-c(24))
}
and create a mylist <- (data1, data2.... data10), then I passed lapply as
lapply(mylist, FUN = remove_col)
It did give me a list of the removed dataframe, however, when I checked the original dataframe, the last column is still there.
How should I change the code to change the original dataset?

You need to assign the result of the function call to the input list on the LHS:
mylist <- lapply(mylist, FUN = remove_col)
Had you defined your function with an explicit return value, this might have been more obvious:
remove_col <- function(mydata) {
mydata <- subset(mydata, select=-c(24))
return(mydata) # return the modified list/data frame
}

Instead of hardcoding the column number to remove you can use ncol to remove the last column from each dataframe.
remove_col <- function(mydata){
mydata[, -ncol(mydata)]
}
mylist <- lapply(mylist, remove_col)
To see the changes in the original dataframe you can assign names to list of dataframe and use list2env.
names(mylist) <- paste0('data', seq_along(mylist))
list2env(mylist, .GlobalEnv)

Using base R and lapply, Note, you can remove ", drop = F" from your script if there are more than 2 columns in all dataframes in the list.
> d1
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
> d2
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> mylist <- list(d1, d2)
> mylist
[[1]]
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
[[2]]
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> lapply(mylist, function(x) x[,1:(ncol(x)-1), drop = F] )
[[1]]
c1
1 1
2 2
3 3
4 4
5 5
[[2]]
c1
1 5
2 4
3 3
4 2
5 1
>

Related

select columns after named columns

I have a data frame of the following form in R
First
a
b
c
Second
a
b
c
3
8
1
7
6
8
5
9
4
2
8
5
I'm trying to write something that selects the three columns following "First" & "Second", and puts them into new data frames titled "First" & "Second" respectively. I'm thinking of using the strategy below (where df is the dataframe I outline above), but am unsure how to make it such that R takes the columns that follow the ones I specify
names <- c("First", "Second")
for (i in c){
i <- (something to specify the 3 columns following df$i)
}
An option is to split.default to split the data.frame into a list of data.frames
split.default(df, cumsum(names(df) %in% names))
#$`1`
# First a b c
#1 NA 3 8 1
#2 NA 5 9 4
#
#$`2`
# Second a b c
#1 NA 7 6 8
#2 NA 2 8 5
The expression cumsum(...) creates the indices according to which to group and split columns.
Sample data
df <- read.table(text = "First a b c Second a b c
'' 3 8 1 '' 7 6 8
'' 5 9 4 '' 2 8 5", header = T, check.names = F)
You can get position of names vector in column names of the data and subset the next 3 columns from it.
names <- c("First", "Second")
inds <- which(names(df) %in% names)
result <- Map(function(x, y) df[x:y], inds + 1, inds + 3)
result
#[[1]]
# a b c
#1 3 8 1
#2 5 9 4
#[[2]]
# a b c
#1 7 6 8
#2 2 8 5
To create separate dataframes you can name the list and use list2env
names(result) <- names
list2env(result, .GlobalEnv)

Rename 1 Column in every dataframe in a List [duplicate]

This question already has an answer here:
Rename Columns of Data.frame in list
(1 answer)
Closed 4 years ago.
I'm trying to rename one column in a dataframe list.
my_list <- list(data.frame(a = 1:5, b = 1:5), data.frame(a = 1:5, b = 1:5))
[[1]]
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
lapply(my_list, function(x){
k <- my_list[[ x ]]
# set 2nd column to a new name
names(k)[2] <- "NEW COLUMN"
# return
})
This is the output I hope to achieve
[[1]]
a NEW COLUMN
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
a NEW COLUMN
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
However, my lapply does not seem to work. The error code is below:
Error in my_list[[x]] : invalid subscript type 'list'
We are looping through the list itself and not its index. So, in the anonymous call, 'x' is the value i.e. the element data.frame of the list.
lapply(my_list, function(x) {names(x)[2] <- "NEW COLUMN"; x})
Suppose, if we loop through the sequence, the OP's code would be right
lapply(seq_along(my_list), function(i) {
k <- my_list[[ i ]] # extracted the list element
names(k)[2] <- "NEW COLUMN"
k
})

Subset columns of data frames contained in list based on matrix of indices

I have a list that contains many data frames, and I have a matrix representing the index positions of columns of interest, with each row for each successive data frame. I am trying to subset each of the data frames within that list based on the matrix.
df1 <- data.frame(id=letters[1:4], result1=1:4, result2=1:4, result3=1:4)
df2 <- data.frame(id=letters[1:4], result1=5:8, result2=1:4, result3=1:4)
df3 <- data.frame(id=letters[1:4], result1=9:12, result2=1:4, result3=1:4)
df4 <- data.frame(id=letters[1:4], result1=13:16, result2=1:4, result3=1:4)
dflist <- list(df1, df2, df3, df4)
indices <- matrix(c(1,1,1,1,2,2,4,3),nrow=4, ncol=2)
So the data frames look like this:
[[1]]
id result1 result2 result3
1 a 1 1 1
2 b 2 2 2
3 c 3 3 3
4 d 4 4 4
[[2]]
id result1 result2 result3
1 a 5 1 1
2 b 6 2 2
3 c 7 3 3
4 d 8 4 4
[[3]]
id result1 result2 result3
1 a 9 1 1
2 b 10 2 2
3 c 11 3 3
4 d 12 4 4
[[4]]
id result1 result2 result3
1 a 13 1 1
2 b 14 2 2
3 c 15 3 3
4 d 16 4 4
and the index matrix looks like this
[,1] [,2]
[1,] 1 2
[2,] 1 2
[3,] 1 4
[4,] 1 3
From the first data frame, I want to subset columns 1 and 2, from the second dataframe I want columns 1, 2, from the third, I want columns 1 and 4, etc.
I can achieve this one by one using:
dflist[[1]][indices[1,]]
But I can't figure out a way to do for all at once (I tried lapply() and sapply() without luck)
You could loop on the indices
lapply(1:4, function(i) dflist[[i]][indices[i,]]) # or 1:nrow(indices) as #bgoldst suggests
Or, using mapply to operate on the rows of indices and the dflist
mapply(function(a, b) a[,b], dflist, split(indices, row(indices)), SIMPLIFY = F)
This could be simplified further as suggested by #Frank, using Map (a wrapper for mapply) and removing the anonymous function
Map(`[`, dflist, split(indices,row(indices)))

How to assign new values from lapply to new column in dataframes in list

I have a list of dataframes. I want to perform an operation on columns of the dataframes and then create a new column in the dataframes with the resulting new column.
a <- data.frame(c(1,2,3), c(2,3,4))
b <- data.frame(c(7,8,9), c(5,6,2))
l <- list(a, b)
lapply(l, function(x) x[,2]*2)
What I want is for 4 6 8 and 10 12 4 to be assigned to the third columns of the first and second dataframes, respectively.
This does not seem to work:
lapply(l, function(x) x[,2]*2 -> x$new)
You can use cbind to add the new column to the data frames in the list:
lapply(l, function(x) cbind(x, x[,2]*2))
# [[1]]
# c.1..2..3. c.2..3..4. x[, 2] * 2
# 1 1 2 4
# 2 2 3 6
# 3 3 4 8
#
# [[2]]
# c.7..8..9. c.5..6..2. x[, 2] * 2
# 1 7 5 10
# 2 8 6 12
# 3 9 2 4

Combine list of vectors to a data.frame, with list number

We have a list of vectors (of different lengths):
foo <- list(1:3,NULL,2:7)
What we need is a data.frame with two columns: item and list number, like below:
data.frame(Item=c(1:3,2:7), List=c(1,1,1,3,3,3,3,3,3))
Here the Item column is the vector of items in foo, and List columns shows to which list of foo each item belongs.
This can be done like below:
data.frame(Item=unlist(foo),
List=unlist(lapply(seq_along(foo), function(i) rep(i, length(foo[[i]])))))
But I am looking for more creative and efficient solutions. Do you have better ideas?
This answer sort of depends on the type of data in "foo", but you can try stack after adding names to your list:
names(foo) <- seq_along(foo)
stack(foo)
# values ind
# 1 1 1
# 2 2 1
# 3 3 1
# 4 2 3
# 5 3 3
# 6 4 3
# 7 5 3
# 8 6 3
# 9 7 3
# Warning message:
# In stack.default(foo) : non-vector elements will be ignored
A slightly more compact version of your current approach would be to use sapply instead of lapply:
> foo <- list(1:3,NULL,2:7)
> data.frame(Item = unlist(foo), List = rep(seq_along(foo), sapply(foo, length)))
Using plyr you can have more readble solution:
library(plyr)
ldply(seq_along(foo),
function(x)data.frame(Item=foo[[x]],
List=rep(x,length(foo[[x]]))))
Item List
1 1 1
2 2 1
3 3 1
4 2 3
5 3 3
6 4 3
7 5 3
8 6 3
9 7 3
I would do:
data.frame(Item = unlist(foo),
List = rep(seq_along(foo), sapply(foo, length)))
Replacing sapply(foo, length) with vapply(foo, length, integer(1)) or unlist(lapply(foo, length)) will also be a little more efficient. And I don't think you can go faster.
Less efficient but somewhat creative is:
i <- sapply(foo, Negate(is.null))
do.call(rbind, Map(data.frame, Item = foo[i], List = seq_along(foo)[i]))

Resources