How to assign multiple columns to data.frame without repeating function call - r

Why doesn't this work for an example? There's same value in each row and warning as well
data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))
data[,c("d", "e")] <- sapply(data$id, function(id) {
tmp <- slowCall(id)
list(sum(tmp$b), min(tmp$c))
})
Warning message:
In `[<-.data.frame`(`*tmp*`, , c("d", "e"), value = list(3L, 0.104784948984161, :
provided 20 variables to replace 2 variables
print(data)
id d e
1 1 3 0.1047849
2 2 3 0.1047849
3 3 3 0.1047849
4 4 3 0.1047849
5 5 3 0.1047849
6 6 3 0.1047849
7 7 3 0.1047849
8 8 3 0.1047849
9 9 3 0.1047849
10 10 3 0.1047849

You could try something like this. First, vectorize the assign function (per #Joran's answer here), then modify your code slightly.
# vectorize
assignVec <- Vectorize("assign",c("x","value"))
library(plyr)
set.seed(1) # this is just here for reproducibility
data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))
# I store this as `tmp` just to make the code below look cleaner
tmp <- mlply(sapply(data$id, function(id) {
tmp <- slowCall(id)
list(sum(tmp$b), min(tmp$c))
}), c)
# here's the key part:
data <- within(data, assignVec(c('d','e'), tmp, envir=environment()))
Output:
> data
id e d
1 1 0.26550866 3
2 2 0.20168193 6
3 3 0.62911404 9
4 4 0.06178627 12
5 5 0.38410372 15
6 6 0.49769924 18
7 7 0.38003518 21
8 8 0.12555510 24
9 9 0.01339033 27
10 10 0.34034900 30
Note: I invoke plyr::mlply to get your sapply output into a list.
The simpler answer, though, is to change the righthand side of your operation into:
data[,c("d", "e")] <- as.data.frame(t(sapply(data$id, function(id) {
tmp <- slowCall(id)
list(sum(tmp$b), min(tmp$c))
})))
which would give you the same result.

The problem here is that the matrix returned by your sapply contains one-element lists instead of numeric values. Change your list to a c and transpose the output, then it will work.
data[, c("d", "e")] <- t(sapply(data$id, function(id) {
tmp <- slowCall(id)
c(sum(tmp$b), min(tmp$c))
}))

Here's a generic method to add two columns of different data types (e.g. character and numeric). It uses lists and transposes lists (via this answer).
Here, this answer would preserve the integer and numeric types of the two outputs.
rowwise <- lapply(data$id, function(id) {
tmp <- slowCall(id)
list(sum(tmp$b), min(tmp$c))
})
colwise <- lapply(seq_along(rowwise[[1]]), function(i) lapply(rowwise, "[[", i))
data[,c("d", "e")] <- colwise

Related

How do I use tidyverse select when the column is input as a character object?

I'm trying to create a function which selects columns based on the input to a function:
f <- function(string) {
quosure <- quo(!!sym(string))
dplyr::select(data, !!quosure)
}
temp <- f("id") # returns " Error in !quosure : invalid argument type"
Strangely, this very similar looking code seems to work.
g <- function(string) {
quosure <- quo(!!sym(string))
dplyr::pull(data, !!quosure)
}
temp <- g("id") # Works fine
What is the difference between the first and the second function which means that the first fails and the second works?
It works fine for me with dplyr version '0.8.0.1'.
library(dplyr)
packageVersion("dplyr")
'0.8.0.1'
data <- data.frame(id= 1:10, othervariable= 11:20)
f <- function(string) {
quosure <- quo(!!sym(string))
dplyr::select(data, !!quosure)
}
temp <- f("id")
temp
id
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
And if you need to select (multiple) column(s) from a dataframe with a vector of characters I would rather do
df <- data.frame(id= 1:10, othervariable= 11:20, x= 21:30)
f <- function(data, string) {
data[ , string]
}
temp <- f(data= df, string= c("id", "x"))
temp
id x
1 1 21
2 2 22
3 3 23
4 4 24
5 5 25
6 6 26
7 7 27
8 8 28
9 9 29
10 10 30

Using lapply to change column names of a list of data frames

I'm trying to use lapply on a list of data frames; but failing at passing the parameters correctly (I think).
List of data frames:
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2,df3) #multiple data frames w. way less columns than the length of vector todos
Vector with columns names:
todos <-c('col1','col2', ......'colN')
I'd like to change the column names using lapply:
lapply (listDF, function(x) { colnames(x)[2:length(x)] <-todos[1:length(x)-1] } )
but this doesn't change the names at all. Am I not passing the data frames themselves, but something else? I just want to change names, not to return the result to a new object.
Thanks in advance, p.
You can also use setNames if you want to replace all columns
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2)
new_col_name <- c("C", "D")
lapply(listDF, setNames, nm = new_col_name)
## [[1]]
## C D
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
## [[2]]
## C D
## 1 21 31
## 2 22 32
## 3 23 33
## 4 24 34
## 5 25 35
## 6 26 36
## 7 27 37
## 8 28 38
## 9 29 39
## 10 30 40
If you need to replace only a subset of column names, then you can use the solution of #Jogo
lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
A last point, in R there is a difference between a:b - 1 and a:(b - 1)
1:10 - 1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(10 - 1)
## [1] 1 2 3 4 5 6 7 8 9
EDIT
If you want to change the column names of the data.frame in global environment from a list, you can use list2env but I'm not sure it is the best way to achieve want you want. You also need to modify your list and use named list, the name should be the same as name of the data.frame you need to replace.
listDF <- list(df1 = df1, df2 = df2)
new_col_name <- c("C", "D")
listDF <- lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
list2env(listDF, envir = .GlobalEnv)
str(df1)
## 'data.frame': 10 obs. of 2 variables:
## $ A: int 1 2 3 4 5 6 7 8 9 10
## $ C: int 11 12 13 14 15 16 17 18 19 20
try this:
lapply (listDF, function(x) {
names(x)[-1] <- todos[-length(x)]
x
})
you will get a new list with changed dataframes. If you want to manipulate the listDF directly:
for (i in 1:length(listDF)) names(listDF[[i]])[-1] <- todos[-length(listDF[[i]])]
I was not able to get the code used in these answers to work. I found some code from another forum which did work. This will assign the new column names into each dataframe, the other methods created a copy of the dataframes. For anyone else here is the code.
# Create some dataframes
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- c("df1", "df2") #Notice this is NOT a list
new_col_name <- c("C", "D") #What do you want the new columns to be named?
# Assign the new column names to each dataframe in "listDF"
for(df in listDF) {
df.tmp <- get(df)
names(df.tmp) <- new_col_name
assign(df, df.tmp)
}

data.frame() : make object's string value the object name to use for columns [duplicate]

Is there a way in R to have a variable evaluated as a column name when creating a data frame (or in similar situations like using cbind)?
For example
a <- "mycol";
d <- data.frame(a=1:10)
this creates a data frame with one column named a rather than mycol.
This is less important than the case that would help me remove quite a few lines from my code:
a <- "mycol";
d <- cbind(some.dataframe, a=some.sequence)
My current code has the tortured:
names(d)[dim(d)[2]] <- a;
which is aesthetically barftastic.
> d <- setNames( data.frame(a=1:10), a)
> d
mycol
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Is structure(data.frame(1:10),names="mycol") aesthetically pleasing to you? :-)
just use colnames after creation.
eg
a <- "mycolA"
b<- "mycolB"
d <- data.frame(a=1:10, b=rnorm(1:10))
colnames(d)<-c(a,b)
d
mycolA mycolB
1 -1.5873866
2 -0.4195322
3 -0.9511075
4 0.2259858
5 -0.6619433
6 3.4669774
7 0.4087541
8 -0.3891437
9 -1.6163175
10 0.7642909
Simple solution:
df <- data.frame(1:5, letters[1:5])
logics <- c(T,T,F,F,T)
cities <- c("Warsaw","London","Paris","NY","Tokio")
m <- as.matrix(logics)
m2 <- as.matrix(cities)
name <- "MyCities"
colnames(m) <- deparse(substitute(logics))
colnames(m2) <- eval(name)
df<-cbind(df,m)
cbind(df,m2)
X1.5 letters.1.5. logics MyCities
1 a TRUE Warsaw
2 b TRUE London
3 c FALSE Paris
4 d FALSE NY
5 e TRUE Tokio

Using variable value as column name in data.frame or cbind

Is there a way in R to have a variable evaluated as a column name when creating a data frame (or in similar situations like using cbind)?
For example
a <- "mycol";
d <- data.frame(a=1:10)
this creates a data frame with one column named a rather than mycol.
This is less important than the case that would help me remove quite a few lines from my code:
a <- "mycol";
d <- cbind(some.dataframe, a=some.sequence)
My current code has the tortured:
names(d)[dim(d)[2]] <- a;
which is aesthetically barftastic.
> d <- setNames( data.frame(a=1:10), a)
> d
mycol
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Is structure(data.frame(1:10),names="mycol") aesthetically pleasing to you? :-)
just use colnames after creation.
eg
a <- "mycolA"
b<- "mycolB"
d <- data.frame(a=1:10, b=rnorm(1:10))
colnames(d)<-c(a,b)
d
mycolA mycolB
1 -1.5873866
2 -0.4195322
3 -0.9511075
4 0.2259858
5 -0.6619433
6 3.4669774
7 0.4087541
8 -0.3891437
9 -1.6163175
10 0.7642909
Simple solution:
df <- data.frame(1:5, letters[1:5])
logics <- c(T,T,F,F,T)
cities <- c("Warsaw","London","Paris","NY","Tokio")
m <- as.matrix(logics)
m2 <- as.matrix(cities)
name <- "MyCities"
colnames(m) <- deparse(substitute(logics))
colnames(m2) <- eval(name)
df<-cbind(df,m)
cbind(df,m2)
X1.5 letters.1.5. logics MyCities
1 a TRUE Warsaw
2 b TRUE London
3 c FALSE Paris
4 d FALSE NY
5 e TRUE Tokio

retain improper name with indexing

I have need to name columns of a data.frame with duplicate names. inside of data.frame you can use check.names = FALSE to do the naughty name deed. But if you index this then you lose the naughty names when indexing. I want to retain those names. So beloe is an example and the output I get and I'd like to get:
x <- data.frame(b= 4:6, a =6:8, a =6:8, check.names = FALSE)
x[, -1]
I get:
a a.1
1 6 6
2 7 7
3 8 8
I'd like:
a a
1 6 6
2 7 7
3 8 8
How about this:
subdf <- function(df, ii) {
do.call("data.frame", c(as.list(df)[ii], check.names=FALSE))
}
subdf(x, -1)
# a a
# 1 6 6
# 2 7 7
# 3 8 8
subdf(x, 2:3)
# a a
# 1 6 6
# 2 7 7
# 3 8 8
Here's an ugly solution
> tmp <- data.frame(b=4:6, a=6:8, a=6:8, check.names=FALSE)
> setNames(tmp[, -1], names(tmp)[-1])
a a
1 6 6
2 7 7
3 8 8
Looking at the code for [.data.frame gives this as part of the code
if (anyDuplicated(cols))
names(y) <- make.unique(cols)
and I couldn't see anything in the code that would allow one to skip that check. So it looks like we'll just have to write our own function. It's not very safe though and I'm sure a much better version could be created...
dropCols <- function(x, cols){
nm <- colnames(x)
x <- x[, -cols]
colnames(x) <- nm[-cols]
x
}
x <- data.frame(b= 4:6, a =6:8, a =6:8, check.names = FALSE)
#x[, -1]
dropCols(x, 1)
# a a
#1 6 6
#2 7 7
#3 8 8
per dirks tongue in cheek comment:
safe.data.frame <- function(dat, index) {
colnam <-colnames(dat)[index]
dat2 <- dat[, index]
colnames(dat2) <- colnam
dat2
}
safe.data.frame(x, -1)
I was hoping for something better :)

Resources