ave function in R: First argument a vector - r

I'm trying to use the following code in R:
ID=seq(1,11)
g=c(1,2,3,1,1,2,3,4,4,1,3)
x <- sample(11)
d <- data.frame(ID,g, x)
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(d$var,d$category,FUN=rank)
return(d)
}
and I get the following error message:
Error in split.default(x, g) : first argument must be a vector.
Variables var and category (character) are columns of the dataframe d that user needs to specify in order to get the desired result. I need to refer to this names when I use the function ave() as you can see.

You need to use [[ to get the var and category columns by name:
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(d[[var]],d[[category]],FUN=rank)
return(d)
}
... because d$var tries to get the column called "var", and there is none.
UPDATE
> Ranking_Categoria(d, "x", "g")
ID g x rank
1 1 1 10 3
2 2 2 9 2
3 3 3 4 1
4 4 1 11 4
5 5 1 1 1
6 6 2 8 1
7 7 3 6 2
8 8 4 2 1
9 9 4 5 2
10 10 1 3 2
11 11 3 7 3

The best solution would be not to use names at all:
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(var,category,FUN=rank)
return(d)
}
Then call it as
Ranking_Categoria(d,d$x,d$g)
The reason why the function in your question didn't work as you thought it would is partially because R's syntax and DWIM-ness for string manipulation sucks. Here's a hacky, fragile solution using eval and parse:
Ranking_Categoria<-function(d,var,category)
{
string=paste('d$rank<-ave(d$',var,',d$',category,',FUN=rank)',sep="")
eval(parse(text=string))
return(d)
}
However, you still have to call it as
Ranking_Categoria(d,"x","g")
And if you already have objects with the names of x and g, then may the gods help you if you try to do Ranking_Categoria(d,x,g)... Crap like this is why I've gone from using Perl and R equally to sticking with Perl (my first and native programming language) and using R only when necessary.

Related

Apply function that return data.frame/tibble on vector/data.frame column and bind results

I have a function that fetches some data from a database. It takes a single parameter and returns a data.frame. I would like to use an input vector of these parameters and pipe them to map or similar function that takes each elment and returns the db results. The results can differ in rows but columns are always the same. How do I go about without looping and row-binding? (for i in ..)
I tried the following route:
myfuncSingleRow<-function(nbr){
data.frame(a=nbr,b=nbr^2,c=nbr^3)}
myfuncMultipleRow<-function(nbr){
data.frame(a=rep(nbr,3),b=rep(nbr^2,3),c=rep(nbr^3,3))}
a<-data.frame(count=c(1,2,3))
myfuncSingleRow(2)
myfuncMultipleRow(2)
a %>% select(count) %>% map_dfr(.f=myfuncSingleRow) #output as expected
a %>% select(count) %>% map_dfr(.f=myfuncMultipleRow) #output not as expected
Now this does not work as intended either. Example myFuncMultipleRow, I was expecting the first 3 rows to be equal, the next 3 equal, and the same for the final 3. Example using myFuncMultipleRow:
Getting
a b c
1 1 1 1
2 2 4 8
3 3 9 27
4 1 1 1
5 2 4 8
6 3 9 27
7 1 1 1
8 2 4 8
9 3 9 27
Wanting:
a b c
1 1 1 1
2 1 1 1
3 1 1 1
4 2 4 8
5 2 4 8
6 2 4 8
7 3 9 27
8 3 9 27
9 3 9 27
As usual, I am probably not using the functions correctly, but a bit stuck here a do not want to resolve to the old loop and rbind which would probably be a performance bottleneck. Any takers?
EDIT: As pointed out "each" argument in "rep" does solve this one, but does not solve the main issue. If map did iterate and call the function for each element, then using parameter "each" and "times" for function "rep" should yield the same result. The function passed to map is not vectorized, but assumes a single parameter of length 1.
The solution need to do:
res<-data.frame()
for(i in a) res<-rbind(res,myfuncMultipleRow(i))
So, after looking at latest purrr 0.3.0 (was on older version) map_depth pointed to the right direction.
a %>% select(count)%>% map_depth(.depth=2,.f=myfuncMultipleRow) %>% map_dfr(.f=bind_rows)
Dropping map_depth() , bind_rows() and nesting instead:
a %>% select(count)%>% map_dfr(~map_dfr(.,myfuncMultipleRow))
a %>% select(count)%>% map_dfr(.f=function(x) map_dfr(x,.f=myfuncMultipleRow))

Using Strings to Identify Sequence of Column Names in R

I am currently try to use pre-defined strings in order to identify multiple column names in R.
To be more explicit, I am using the ave function to create identification variables for subgroups of a dataframe. The twist is that I want the identification variables to be flexible, in such a manner that I would just pass it as a generic string.
A sample code would be:
ids = with(df,ave(rep(1,nrow(df)),subcolumn1,subcolumn2,subcolumn3,FUN=seq_along))
I would like to run this code in the following fashion (code below does not work as expected):
subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),subColumnsString ,FUN=seq_along))
I tried something with eval, but still did not work:
subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),eval(parse(text=subColumnsString)),FUN=seq_along))
Any ideas?
Thanks.
EDIT: Working code example of what I want:
df = mtcars
id_names = c("vs","am")
idDF_correct = transform(df,idItem = as.numeric(interaction(vs,am)))
idDF_wrong = cbind(df,ave(rep(1,nrow(df)),df[id_names],FUN=seq_along))
Note how in idDF_correct, the unique combinations are correctly mapped into unique values of idItem. In idDF_wrong this is not the case.
I think this achieves what you requested. Here I use the mtcars dataset that ships with R:
subColumnsString <- c("cyl","gear")
ids = with(mtcars, ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along))
Just index your data.frame using the sub columns which returns a list that naturally works with ave
EDIT
ids = ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along)
You can omit the with and just call plain 'ol ave, as G. Grothendieck, stated and you should also use their answer as it is much more general.
This defines a function whose arguments are:
data, the input data frame
by, a character vector of column names in data
fun, a function to use in ave
Code--
Ave <- function(data, by, fun = seq_along) {
do.call(function(...) ave(rep(1, nrow(data)), ..., FUN = fun), data[by])
}
# test
Ave(CO2, c("Plant", "Treatment"), seq_along)
giving:
[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
[39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6
[77] 7 1 2 3 4 5 6 7

Arguments for Subset within a function in R colon v. greater or equal to

Suppose I have the following data.
x<- c(1,2, 3,4,5,1,3,8,2)
y<- c(4,2, 5,6,7,6,7,8,9)
data<-cbind(x,y)
x y
1 1 4
2 2 2
3 3 5
4 4 6
5 5 7
6 1 6
7 3 7
8 8 8
9 2 9
Now, if I subset this data to select only the observations with "x" between 1 and 3 I can do:
s1<- subset(data, x>=1 & x<=3)
and obtain my desired output:
x y
1 1 4
2 2 2
3 3 5
4 1 6
5 3 7
6 2 9
However, if I subset using the colon operator I obtained a different result:
s2<- subset(data, x==1:3)
x y
1 1 4
2 2 2
3 3 5
This time it only includes the first observation in which "x" was 1,2, or 3. Why?
I would like to use the ":" operator because I am writing a function so the user would input a range of values from which she wants to see an average calculated over the "y" variable. I would prefer if they can use ":" operator to pass this argument to the subset function inside my function but I don't know why subsetting with ":" gives me different results.
I'd appreciate any suggestions on this regard.
You can use %in% instead of ==
subset(data, x %in% 1:3)
In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is double that of the second. Some examples with some description is here.

Passing a variable name to a function in R

I've noticed that quite a few packages allow you to pass symbol names that may not even be valid in the context where the function is called. I'm wondering how this works and how I can use it in my own code?
Here is an example with ggplot2:
a <- data.frame(x=1:10,y=1:10)
library(ggplot2)
qplot(data=a,x=x,y=y)
x and y don't exist in my namespace, but ggplot understands that they are part of the data frame and postpones their evaluation to a context in which they are valid. I've tried doing the same thing:
b <- function(data,name) { within(data,print(name)) }
b(a,x)
However, this fails miserably:
Error in print(name) : object 'x' not found
What am I doing wrong? How does this work?
Note: this is not a duplicate of Pass variable name to a function in r
I've recently discovered what I think is a better approach to passing variable names.
a <- data.frame(x = 1:10, y = 1:10)
b <- function(df, name){
eval(substitute(name), df)
}
b(a, x)
[1] 1 2 3 4 5 6 7 8 9 10
Update The approach uses non standard evaluation. I began explaining but quickly realized that Hadley Wickham does it much better than I could. Read this http://adv-r.had.co.nz/Computing-on-the-language.html
You can do this using match.call for example:
b <- function(data,name) {
## match.call return a call containing the specified arguments
## and the function name also
## I convert it to a list , from which I remove the first element(-1)
## which is the function name
pars <- as.list(match.call()[-1])
data[,as.character(pars$name)]
}
b(mtcars,cyl)
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
explanation:
match.call returns a call in which all of the specified arguments are
specified by their full names.
So here the output of match.call is 2 symbols:
b <- function(data,name) {
str(as.list(match.call()[-1])) ## I am using str to get the type and name
}
b(mtcars,cyl)
List of 2
$ data: symbol mtcars
$ name: symbol cyl
So Then I use first symbol mtcars ansd convert the second to a string:
mtcars[,"cyl"]
or equivalent to :
eval(pars$data)[,as.character(pars$name)]
Very old thread but you can also use the get command as well. It seems to work better for me.
a <- data.frame(x = 1:10, y = 11:20)
b <- function(df, name){
get(name, df)
}
b(a, "x")
[1] 1 2 3 4 5 6 7 8 9 10
If you put the variable name between quotes when you call the function, it works:
> b <- function(data,name) { within(data,print(name)) }
> b(a, "x")
[1] "x"
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

recursive replacement in R

I am trying to clean some data and would like to replace zeros with values from the previous date. I was hoping the following code works but it doesn't
temp = c(1,2,4,5,0,0,6,7)
temp[which(temp==0)]=temp[which(temp==0)-1]
returns
1 2 4 5 5 0 6 7
instead of
1 2 4 5 5 5 6 7
Which I was hoping for.
Is there a nice way of doing this without looping?
The operation is called "Last Observation Carried Forward" and usually used to fill data gaps. It's a common operation for time series and thus implemented in package zoo:
temp = c(1,2,4,5,0,0,6,7)
temp[temp==0] <- NA
library(zoo)
na.locf(temp)
#[1] 1 2 4 5 5 5 6 7
You could use essentially your same logic except you'll want to apply it to the values vector that results from using rle
temp = c(1,2,4,5,0,0,6,0)
o <- rle(temp)
o$values[o$values == 0] <- o$values[which(o$values == 0) - 1]
inverse.rle(o)
#[1] 1 2 4 5 5 5 6 6

Resources