How to use apply() with my function - r

bmi<-function(x,y){
(x)/((y/100)^2)
}
bmi(70,177) it can work
but with apply() it does't work
apply(Student,1:2,bmi(Student$weight,Student$height))
Error in match.fun(FUN) :
'bmi(Student$weight, Student$height)' is not a function, character or symbol

It's a bit unclear what the goal is. If it's just to get an answer, then the comments do answer it. If on the other hand, the goal is to understand what you are doing wrong, then read on. I'd say the first error going from left to right is passing the whole dataframe. I would have only passed the 'height' and 'weight' columns.
The next error, again going from left to right, is the use of 1:2 as the second argument to apply. You obviously want to do this "by rows" which mean you should use only 1, i.e. the first dimension of the dataframe.
And the third error is using a function call rather than the function name. Functions with arguments in parentheses don't work when an R function (meaning apply in this case) is expecting a function name or an anonymous function as illustrated in comments.
Fourth error is not assigning the value to a column in your dataframe. So this probably would have succeeded in making the desired extra column via the apply method. But, as noted in comments this is not the most efficient method.:
Student$bmi_val <- apply(Student[ ,c("weight", "height")], bmi)
# didn't want my column name to be the same as the function name
The apply function was actually designed to work with matrices and arrays, so for many purposes it is ill-suited when used with dataframes. In this case where all the arguments to the bmi function are numeric and you can control the order of argument in the first argument to match the x and y positions, it's arguably an acceptable strategy, but not most R-ish method. When working with dates or factor variables, you should definitely avoid apply.

Related

Replace for loop with vectorized call of a function returning multiple values

I have the following function: problema_firma_emprestimo(r,w,r_emprestimo,posicao,posicao_banco), where all input are scalars.
This function return three different matrix, using
return demanda_k_emprestimo,demanda_l_emprestimo,lucro_emprestimo
I need to run this function for a series of values of posicao_banco that are stored in a vector.
I'm doing this using a for loop, because I need three separate matrix with each of them storing one of the three outputs of the function, and the first dimension of each matrix corresponds to the index of posicao_banco. My code for this part is:
demanda_k_emprestimo = zeros(num_bancos,na,ny);
demanda_l_emprestimo = similar(demanda_k_emprestimo);
lucro_emprestimo = similar(demanda_k_emprestimo);
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[i,:,:] , demanda_l_emprestimo[i,:,:] , lucro_emprestimo[i,:,:] = problema_firma_emprestimo(r,w,r_emprestimo[i],posicao,posicao_bancos[i]);
end
Is there a fast and clean way of doing this using vectorized functions? Something like problema_firma_emprestimo.(r,w,r_emprestimo[i],posicao,posicao_bancos) ? When I do this, I got a tuple with the result, but I can't find a good way of unpacking the answer.
Thanks!
Unfortunately, it's not easy to use broadcasting here, since then you will end up with output that is an array of tuples, instead of a tuple of arrays. I think a loop is a very good approach, and has no performance penalty compared to broadcasting.
I would suggest, however, that you organize your output array dimensions differently, so that i indexes into the last dimension instead of the first:
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[:, :, i] , ...
end
This is because Julia arrays are column major, and this way the output values are filled into the output arrays in the most efficient way. You could also consider making the output arrays into vectors of matrices, instead of 3D arrays.
On a side note: since you are (or should be) creating an MWE for the sake of the people answering, it would be better if you used shorter and less confusing variable names. In particular for people who don't understand Portuguese (I'm guessing), your variable names are super long, confusing and make the code visually dense. Telling the difference between demanda_k_emprestimo and demanda_l_emprestimo at a glance is hard. The meaning of the variables are not important either, so it's better to just call them A and B or X and Y, and the functions foo or something.

not error, but not results either in R

I am trying to make a function in R that calculates the mean of nitrate, sulfate and ID. My original dataframe have 4 columns (date,nitrate, sulfulfate,ID). So I designed the next code
prueba<-read.csv("C:/Users/User/Desktop/coursera/001.csv",header=T)
columnmean<-function(y, removeNA=TRUE){ #y will be a matrix
whichnumeric<-sapply(y, is.numeric)#which columns are numeric
onlynumeric<-y[ , whichnumeric] #selecting just the numeric columns
nc<-ncol(onlynumeric) #lenght of onlynumeric
means<-numeric(nc)#empty vector for the means
for(i in 1:nc){
means[i]<-mean(onlynumeric[,i], na.rm = TRUE)
}
}
columnmean(prueba)
When I run my data without using the function(), but I use row by row with my data it will give me the mean values. Nevertheless if I try to use the function so it will make all the steps by itself, it wont mark me error but it also won't compute any value, as in my environment the dataframe 'prueba' and the columnmean function
what am I doing wrong?
A reproducible example would be nice (although not absolutely necessary in this case).
You need a final line return(means) at the end of your function. (Some old-school R users maintain that means alone is OK - R automatically returns the value of the last expression evaluated within the function whether return() is specified or not - but I feel that using return() explicitly is better practice.)
colMeans(y[sapply(y, is.numeric)], na.rm=TRUE)
is a slightly more compact way to achieve your goal (although there's nothing wrong with being a little more verbose if it makes your code easier for you to read and understand).
The result of an R function is the value of the last expression. Your last expression is:
for(i in 1:nc){
means[i]<-mean(onlynumeric[,i], na.rm = TRUE)
}
It may seem strange that the value of that expression is NULL, but that's the way it is with for-loops in R. The means vector does get changed sequentially, which means that BenBolker's advice to use return(.) is correct (as his advice almost always is.) . For-loops in R are a notable exception to the functional programming paradigm. They provide a mechanism for looping (as do the various *apply functions) but the commands inside the loop exert their effects in the calling environment via side effects (unlike the apply functions).

Working with "..." input in R function

I am putting together an R function that takes some undefined input through the ... argument described in the docs as:
"..." the special variable length argument ***
The idea is that the user will enter a number of column names here, each belonging to a dataset also specified by the user. These columns will then be cross-tabulated in comparison to the dependent variable by tapply. The function is to return a table (independent variable x indedependent variable).
Thus, I tried:
plotter=function(dataset, dependent_variable, ...)
{
indi_variables=list(...); # making a list of the ... input as described in the docs
result=with (dataset, tapply(dependent_variable, indi_variables, mean); # this fails
}
I figured this should work as tapply can take a list as input.
But it does not in this case ('Error in tapply...arguments must have same length') and I think it is because indi_variables is a list of strings.
If I input the contents of the list by hand and leave out the quotation marks, everything works just fine.
However, if the user feeds the function the column names as non-strings, R will interpret them as variable names; and I cannot figure out how to transform the list indi_variables in the right way, unsuccessfully trying things like this:
indi_variables=lapply(indi_variables, as.factor)
So I am wondering
What causes the error described above? Is my interpretation correct?
How would one go about transforming the list created through ... in the right way?
Is there an overall better way of doing this, in the input or the implementation of tapply?
Any help is much appreciated!
Thanks to Joran's helpful reading, I have come up with these improvements than make things work out...
indi_variables=substitute(list(...));
result=with (dataset, tapply(dependent_variable, eval(indi_variables, dataset), FUN=mean));

ggplot iterate several columns

lapply(7:12, function(x) ggplot(mydf)+geom_histogram(aes(mydf[,x])))
will give an error Error in [.data.frame(mydf, , x) : undefined columns selected.
I have used several SO questions (e.g. this) as guidance, but can't figure out my error.
The code below works with the mtcars dataset. Just replace mtcars with mydf.
library(ggplot2)
lapply(1:3,function(i) {
ggplot(data.frame(x=mtcars[,i]))+
geom_histogram(aes(x=x))+
ggtitle(names(mtcars)[i])
})
Notice how the reference to i (the column index) was moved from the mapping argument (the call to aes(...)), to the data argument.
Your problem is actually quite subtle. ggplot evaluates the arguments to aes(...) first in the context of your data - e.g. it looks for column names in mydf. If that fails it jumps to the global environment. It does not look in the function's environment. See this post for another example of this behavior and some discussion.
The bottom line is that it is a really bad idea to use external variables in a call to aes(...). However, the data=... argument does not suffer from this. If you must refer to a column number, etc., do it in the call to ggplot(data=...).

Naming columns of coefficient matrix in a VAR

I am searching for a fast and simple way to give comprehensible names to the columns of a VAR-coefficient matrix.
What I would like to use is the function VAR.names, which is used in the function VAR.est() in the VAR.etp-package. When I use the function VAR.est(), this works perfectly, but as soon as I modify VAR.est (by adding another element to the list of values which are returned), I receive an error message stating "could not find function VAR.names".
I could not find any information on the function VAR.names.
Example:
library(VAR.etp)
data(dat)
M=VAR.est(dat,p=2,type="const")
M$coef
Another possibility would be to use a loop as in the function VAR() from the vars package, but if VAR.names would actually work, this would be a lot more elegant!

Resources