I'm trying to call a vector "a" from a data frame "df" using a function. I know I could do this just fine with the following:
> df$a
[1] 1 2 3
But I'd like to use a function where both the data frame and vector names are input separately as arguments. This is the best that I've come up with:
show_vector <- function(data.set, column) {
data.set$column
}
But here's how it goes when I try it out:
> show_vector(df, a)
NULL
How could I change this function in order to successfully reference vector df$a where the names of both are input to a function as arguments?
It's actually possible to do this without passing the column name as a string (in other words, you can pass in the unquoted column name:
show_vector <- function(data.set, column) {
eval(substitute(column), envir = data.set)
}
Usage example:
df <- data.frame(a = 1:3, b = 4:6)
show_vector(df, b)
# 4 5 6
I've wondered about this kind of thing a lot in the past and haven't found an easy fix. The best I've come up with is this:
df <- data.frame(c(1, 2, 3), c(4, 5, 6))
colnames(df) <- c("A", "B")
test <- function(dataframe, columnName) {
return(dataframe[, match(columnName, colnames(dataframe))])
}
test(df, "A")
Your code would work if you only put the column name in quotes i.e. show_vector(df, "a")
Other multiple ways to do this:
Using base functionality
func <- function(df, cname){
return(df[, grep(cname, colnames(df))])
}
Or even
func <- function(df, cname){
return(df[, cname])
}
You can use substitute to capture the input vector name as it is then use `as.character to make it as a character.
show_vector <- function(data.set, column) {
data.set[,as.character(substitute(column))]
}
Now lets take a look:
(dat=data.frame(a=1:3,b=4:6,c=10:12))
a b c
1 1 4 10
2 2 5 11
3 3 6 12
show_vector(dat,a)
[1] 1 2 3
show_vector(dat,"a")
[1] 1 2 3
It works.
we can also write a simple one where we just input a character string:
show_vector1 <- function(data.set, column) {
data.set[,column]
}
show_vector1(dat,"a")
[1] 1 2 3
Although this will not work if the column name is not a character:
show_vector1(dat,a)
**Show Traceback
Rerun with Debug
Error in `[.data.frame`(data.set, , column) : undefined columns selected**
Related
I am currently developing an application and I need to loop through the columns of the data frame. For instance, if the data frame has the columns
char_set <- data.frame(character(),character(),character(),character(),stringsAsFactors = FALSE)
names(char_set) <- c("a","b","c","d")
If the input is given as "a", then the column name "b" should be assigned to the variable, say promote.
It throws an error Error in[.data.frame(char_set, i + 1) : undefined columns selected. Is there any solution?
char_name <- "a"
char_set <- data.frame(character(),character(),character(),character(),stringsAsFactors = FALSE)
names(char_set) <- c("a","b","c","d")
for (i in 1:ncol(char_set)) {
promote <- ifelse(names(char_set) == char_name,char_set[i+1], "-")
print(promote)
}
Thanks in advance!!!
This is actually quite interesting. I would suggest doing something on those lines:
char_name <- "a"
char_set <- data.frame(
a = 1:2,
b = 3:4,
c = 5:6,
d = 8:9,
stringsAsFactors = FALSE
)
res_dta <- data.frame(matrix(nrow = 2, ncol = 3))
for (i in wrapr::seqi(1, NCOL(char_set) - 1)) {
print(i)
if (names(char_set)[i] == char_name) {
res_dta[i] <- char_set[i + 1]
} else {
res_dta[i] <- char_set[i]
}
}
Results
char_set
a b c d
1 1 3 5 8
2 2 4 6 9
res_dta
X1 X2 X3
1 3 3 5
2 4 4 6
There are few generic points:
When you are looping through columns be mindful not fall outside data frame dimensions; running i + 1 on i = 4 will give you column 5 which will return an error for data frame with four columns. You may then decide to run to one column less or break for a specific i value
Not sure if I got your request right, for column names a you want to take values of column b; then column b stays as it was?
Broadly speaking, I'm of a view that this names(char_set)[i] == char_name requires more thought but you have a start with this answer. Updating your post with desired results would help to design a solution.
The problem in your code is that you are looping from 1 to the number of columns of the char_set df, then you are calling the variable char_set[i+1].
This, when the i index takes the maximum value, the instruction char_set[i+1] returns an error because there is no element with that index.
You can try with this solution:
char_name<-"a"
promote<-ifelse((which(names(char_set)==char_name)+1)<ncol(char_set),names(char_set)[which(names(char_set)==char_name)+1],"-")
promote
> [1] "b"
char_name<-"d"
promote<-ifelse((which(names(char_set)==char_name)+1)<ncol(char_set),names(char_set)[which(names(char_set)==char_name)+1],"-")
promote
> [1] "-"
However. when the variable char_name takes the value a, the variable promote will take the value that the set char_set has at the position after the element named a, which matches char_name.
I suggest you to think about the case in which the variable char_name takes the value d and you don't have any values in the char_set after d.
It is of course possible to store functions in a list to call it.
It is also possible to name that list entry to have a better access to it later.
Now I need the list item name to be a regular expression like this:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
a <- funcList$"+12"(a,12) # this will fire function "lead"
a <- funcList$"-4"(a,-4) # this will fire function "lag"
a <- funcList$"^\\+[0-9]{1,3}$"(a,12) # this works of course but is not what I want...
Of course this is not working correctly and I am getting the error "Error: attempt to apply non-function" because it is not used as regex but as a normal string value.
Is it possible to do what I need?
You could use the names of the array as parameters for grepl:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
f1 <- funcList[sapply(names(funcList), function(x) grepl(x,"+12"))][[1]]
f2 <- funcList[sapply(names(funcList), function(x) grepl(x,"-4"))][[1]]
> f1(seq(1,10))
[1] 2 3 4 5 6 7 8 9 10 NA
> f2(seq(1,10))
[1] NA 1 2 3 4 5 6 7 8 9
I think you can map strings like "+4" and "-12" to lead/lag more straightforwardly like:
set.seed(123)
df = data.frame(
x = sample(1:20, 10)
)
shifted = function(x, shift) {
direction = substr(shift, 1, 1)
amount = as.integer(substr(shift, 2, nchar(shift)))
if (direction == "+") {
return(lead(x, amount))
} else {
return(lag(x, amount))
}
}
df %>%
mutate(
plus4 = shifted(x, "+4"),
minus3 = shifted(x, "-3")
)
You could use regex within the shifted function if you need to do more validation of the "+4" strings, but I prefer not to go for complicated regexes unless they're definitely needed.
I am having a problem with get() in R.
I have a set of data.frames with a common structure in my environment. I want to loop through these data frames and change the name of the 2nd column so that the name of the 2nd column contains a prefix from the 1st column.
For example, if column 1 = A_cat and column 2 is dog, I want column 2 to be changed to A_dog.
Below is an example of the R code I am using:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
for( element in grep('^df$', names(environment()), value=TRUE) ) {
colnames(get(element))[2] <- paste(strsplit(colnames(get(element)) [1], '`_`')[[1]][1],
colnames(get(element))[2], sep='`_`')
}
The arguments within the for loop, on either side of the assignment operator, both give the expected result if I run them separately but when run together produce the following error.
Error in colnames(get(element))[2] <- paste(strsplit(colnames(get(element))[1], :
could not find function "get<-"
Any help with this problem would be greatly appreciated.
This does the same thing as the code in the question without using get:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
e <- environment() ##
df.names <- grep("^df$", names(e), value = TRUE)
# nm is the current data frame name and nms are its column names
for(nm in df.names) {
nms <- names(e[[nm]])
names(e[[nm]])[2] <- paste0(sub("_.*", "_", nms[1]), nms[2])
}
giving:
> df
A_cat A_dog
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Keeping the data.frames in a named list as suggested in a comment to the question might be even better. For example, if instead of keeping the data.frames in an environment they were in a list called e
e <- list(df = df)
then omit the line marked ## and the rest works as is.
Here would be one way to accomplish this goal if the data.frames have systematic names (here, df1 df2 df3, etc) and the prefix ends with "_" as in the example:
# suggested by #roland roll them up in a list:
myDfList <- mget(ls(pattern="^df"))
# change names
for(dfName in names(myDfList)) {
names(myDfList[[dfName]])[2] <- paste0(gsub("^(.*_)", "\\1",
names(myDfList[[dfName]])[1]),
names(myDfList[[dfName]])[2])
}
I am struggling to make my apply() work: I have two dataframes:
from <- c(1,2,3)
to <- c(2,3,4)
df1 <- data.frame(from, to)
long <-c(9,9.2,9.4,9.6)
lat <- c(45,45.2,45.4,45.6)
id <- c(1,2,3,4)
df2 <- data.frame(long, lat, id)
Now I want something like this:
myFunction <- function(arg){
>>> How do I access arg$from and arg$to? <<<<
}
apply(df1,1,myFunction)
In myFunction I need to make some calculations and return a value for each from-to pair. I don't understand how to access parts of the arg, since arg[0] gives me numeric(0) and arg$from just crashes.
The problem is that apply(...) requires a matrix or array as the first argument. If you pass a dataframe, it will coerce that to a matrix. Matrices are 1 indexed, so the upper left element is [1,1], not [0,0]. Also, matrix columns cannot be referenced using the $ notation.
So,
f <- function(x) {
from <- x[1]
to <- x[2]
# do stuff with from and to...
}
apply(df,1,f)
would work.
One other thing to watch out for is that if your dataframe has (other) columns that have character strings, the conversion will make everything character (including the numbers!). This is because, by definition, all elements of a matrix must have the same data type. Your example does not have that problem, though.
Try mapply(). It's a multivariate version of sapply(). For example:
> myFunction <- function(arg1, arg2){
+ return(sum(arg1, arg2))
+ }
>
> mapply(myFunction, df1$from, df1$to)
[1] 3 5 7
You can also use it to make a new variable in your data frame.
> df1$newvar <- mapply(myFunction, df1$from, df1$to)
> df1
from to newvar
1 1 2 3
2 2 3 5
3 3 4 7
In my code, I am filling the columns of a dataframe with vectors, as so:
df1[columnNum] <- barWidth
This works fine, except for one thing: I want the name of the vector variable (barWidth above) to be retained as the column header, one column at a time. Furthermore, I do not wish to use cbind. This slows the execution of my code down considerably. Consequently, I am using a pre-allocated dataframe.
Can this be done in the vector-to-column assignment? If not, then how do I change it after the fact? I can't find the right syntax to do this with colNames().
TIA
It's being done by the [<-.data.frame function. It could conceivably be replaced by one that looked at the name of the argument but it's such a fundamental function I would be hesitant. Furthermore there appears to be an aversion to that practice signaled by this code at the top of the function definition:
> `[<-.data.frame`
function (x, i, j, value)
{
if (!all(names(sys.call()) %in% c("", "value")))
warning("named arguments are discouraged")
nA <- nargs()
if (nA == 4L) {
<snipped rest of rather long definition>
I don't know why that is there, but it is. Maybe you should either be thinking about using names<- after the column assignment, or using this method:
> dfrm["barWidth"] <- barWidth
> dfrm
a V2 barWidth
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
This can be generalized to a list of new columns:
dfrm <- data.frame(a=letters[1:4])
barWidth <- 1:4
newcols <- list(barWidth=barWidth, bw2 =barWidth)
dfrm[names(newcol)] <- newcol
dfrm
#
a barWidth bw2
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
If you have the list of names of vectors you want to apply you could do:
namevec <- c(...,"barWidth"...,)
columnNums <- c(...,10,...)
df1[columnNums[i]] <- get(namevec[i])
names(df1)[columnNums[i]] <- namevec[i]
or even
columnNums <- c(barWidth=4,...)
for (i in seq_along(columnNums)) {
df1[columnNums[i]] <- get(names(columnNums)[i])
}
names(df1)[columnNums] <- names(columnNums)
but the deeper question would be where this set of vectors is coming from in the first place: could you have them in a list all along?
I'd simply use cbind():
df1 <- cbind( df1, barWidth )
which retains the name. It will, however, end up as the last column in df1