Use the multiple variables in function in r - r

I have this function
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
ANN<-neuralnet(x~y,hidden=10,algorithm='rprop+')
return(ANN)
}
I need the function run like
formula=X1+X2
ANN(DV,formula)
and get result of the function. So the problem is to say the function USE the object which was created during the run of function. I need to run trough lapply more combinations of x,y, so I need it this way. Any advices how to achieve it? Thanks

I've edited my answer, this still works for me. Does it work for you? Can you be specific about what sort of errors you are getting?
New response:
ANN<-function (y){
X1<-c(1:10)
DV<-rep(c(0:1),5)
X2<-c(2:11)
dat <- data.frame(X1,X2)
ANN<-neuralnet(DV ~y,hidden=10,algorithm='rprop+',data=dat)
return(ANN)
}
formula<-X1+X2
ANN(formula)

If you want so specify the two parts of the formula separately, you should still pass them as formulas.
library(neuralnet)
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
formula<-update(x,y)
ANN<-neuralnet(formula,data=data.frame(DV,X1,X2),
hidden=10,algorithm='rprop+')
return(ANN)
}
ANN(DV~., ~X1+X2)
And assuming you're using neuralnet() from the neuralnet library, it seems the data= is required so you'll need to pass in a data.frame with those columns.
Formulas as special because they are not evaluated unless explicitly requested to do so. This is different than just using a symbol, where as soon as you use it is evaluated to something in the proper frame. This means there's a big difference between DV (a "name") and DV~. (a formula). The latter is safer for passing around to functions and evaluating in a different context. Things get much trickier with symbols/names.

Related

Using Surv function repeatedly with different data frames

I'm running some Surv() functions, and one thing I do not like, or understand, is why this function does not take a "data=" argument. This is annoying because I want to perform the same Surv() function on the same data frame but filtered by different criteria each time.
So for example, my data frame is called "ikt" and I want to filter by "donor_type2=='LD'" and also use a strata variable "plan 2". I tried the following but it didn't work:
library(survival)
library(dplyr)
ikt<-data.frame(organ_yrs=(seq(1,20)),
organ_status=rep(c(0,0,1,1),each=5),
plan2=rep(c('A','B','A','B'),each=5),
donor_type2=rep(c('LD','DD'),each=10) )
organ_surv_func<-function(data,criteria,strata) {
data2<-filter(data,criteria)
Surv(data2$organ_yrs,data2$organ_status)~data2$strata
}
organ_surv_func(ikt,donor_type2=='LD',plan2)
Error in filter_impl(.data, quo) : object 'donor_type2' not found
I'm coming from a SAS background so that's probably why I'm thinking this should work and it doesn't...
I looked up something about sapply(), but I don't think that works when the function doesn't have the data= option.
Also the reason I need the Surv() object and not just survfit(Surv()) (which would let me use data=) is because I'm also using survdiff() for log-rank tests, which takes in the Surv() object as it's main argument:
lr<-function (surv) {
round(1-pchisq(survdiff(surv)$chisq,length(survfit(surv)$strata)-1),3)
}
Thanks for any help you can provide.
I'm writing this "answer" to caution you against proceeding down the path you seem to be following. The Surv function is really intended to be used as the LHS of a formula defined within one of the survival package functions. You should avoid using constructions like:
Surv(data2$organ_yrs,data2$organ_status)~data2$strata
For one thing it's needlessly verbose, but more importantly, it will prevent the use of predict when it comes time to match up names to formals. The survdiff and the other survival functions all have both a "data" argument as well as a "subset" argument. The subset function should allow you to avoid using filter.
organ_surv_func<-function(data, covar) {
form = as.formula(substitute( Surv(organ_yrs, organ_status) ~ covar, list(covar=covar) ) )
survdiff(form, data=data)
}
# although I think running surdiff in a for-loop might be easier,
# as it would involve fewer tricky language constructs
organ_surv_func( subset(ikt, (donor_type2=='LD')), covar=quote(plan2))
If you assign the output of survfit to a named variable, you will be able to more economically access chisq and strata:
myfit <- organ_surv_func( subset(ikt, (donor_type2=='LD')), covar=quote(plan2))
my.lr.test<-function (myfit) {
round(1-pchisq(myfit$chisq, length(myfit$strata)-1), 3)
}
my.lr.test(myfit) # not going to be useful with that dataset.

Using For-Loop With Strings

I'm learning R and trying to use it for a statistical analysis at the same time.
Here, I am in the first part of the work: I am writing matrices and doing some simple things with them, in order to work later with these.
punti<-c(0,1,2,4)
t1<-matrix(c(-8,36,-8,-20,51,-17,-17,-17,57,-19,-19,-19,35,-8,-19,-8,0,0,0,0,-20,-20,-20,60,
-8,-8,-28,44,-8,-8,39,-23,-8,-19,35,-8,57,-8,-41,-8,-8,55,-8,-39,-8,-8,41,-25,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),ncol=4,byrow=T)
colnames(t1) <- c("20","1","28","19")
r1<-matrix(c(12,1,19,9,20,20,11,20,20,11,20,28,0,0,0,12,19,19,20,19,28,15,28,19,11,28,1,
33,20,28,31,1,19,17,28,19,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),ncol=3,byrow=T)
pt1<-rbind(sort(colSums(t1)),sort(punti))
colnames(r1)<-c("Valore","Vincitore","Perdente")
r1<-as.data.frame(r1)
But I have more matrices t_ and r_ so I would like to run a for-loop like:
for (i in 1:150)
{
pt[i]<-rbind(sort(colSums(t[i])),sort(punti))
colnames(r[i])<-c("Valore","Vincitore","Perdente")
r[i]<-as.data.frame(r[i])
}
This one just won't work because r_, t_ and pt_ are strings, but you get both the idea and that I would not like to copy-paste these three lines and manually edit the [i] 150 times. Is there a way to do it?
personally i don't advise dynamically and automatically creating lots of variables in the global environment, and would advise you to think about how you can accomplish your goals without such an approach. with that said, if you feel you really need to dynamically create all these variables, you may benefit from the assign function.
it could work like so:
for (i in 1:150)
{
assign(paste0('p',i),rbind(sort(colSums(t[i])),sort(punti)))
}
the first argument in the assign function is the formula for the variable name and how it is created; the second argument is what you wish to assign to the variable being created.

Working with "..." input in R function

I am putting together an R function that takes some undefined input through the ... argument described in the docs as:
"..." the special variable length argument ***
The idea is that the user will enter a number of column names here, each belonging to a dataset also specified by the user. These columns will then be cross-tabulated in comparison to the dependent variable by tapply. The function is to return a table (independent variable x indedependent variable).
Thus, I tried:
plotter=function(dataset, dependent_variable, ...)
{
indi_variables=list(...); # making a list of the ... input as described in the docs
result=with (dataset, tapply(dependent_variable, indi_variables, mean); # this fails
}
I figured this should work as tapply can take a list as input.
But it does not in this case ('Error in tapply...arguments must have same length') and I think it is because indi_variables is a list of strings.
If I input the contents of the list by hand and leave out the quotation marks, everything works just fine.
However, if the user feeds the function the column names as non-strings, R will interpret them as variable names; and I cannot figure out how to transform the list indi_variables in the right way, unsuccessfully trying things like this:
indi_variables=lapply(indi_variables, as.factor)
So I am wondering
What causes the error described above? Is my interpretation correct?
How would one go about transforming the list created through ... in the right way?
Is there an overall better way of doing this, in the input or the implementation of tapply?
Any help is much appreciated!
Thanks to Joran's helpful reading, I have come up with these improvements than make things work out...
indi_variables=substitute(list(...));
result=with (dataset, tapply(dependent_variable, eval(indi_variables, dataset), FUN=mean));

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

Subsetting within a function

I'm trying to subset a dataframe within a function using a mixture of fixed variables and some variables which are created within the function (I only know the variable names, but cannot vectorise them beforehand). Here is a simplified example:
a<-c(1,2,3,4)
b<-c(2,2,3,5)
c<-c(1,1,2,2)
D<-data.frame(a,b,c)
subbing<-function(Data,GroupVar,condition){
g=Data$c+3
h=Data$c+1
NewD<-data.frame(a,b,g,h)
subset(NewD,select=c(a,b,GroupVar),GroupVar%in%condition)
}
Keep in mind that in my application I cannot compute g and h outside of the function. Sometimes I'll want to make a selection according to the values of h (as above) and other times I'll want to use g. There's also the possibility I may want to use both, but even just being able to subset using 1 would be great.
subbing(D,GroupVar=h,condition=5)
This returns an error saying that the object h cannot be found. I've tried to amend subset using as.formula and all sorts of things but I've failed every single time.
Besides the ease of the function there is a further reason why I'd like to use subset.
In the function I'm actually working on I use subset twice. The first time it's the simple subset function. It's just been pointed out below that another blog explored how it's probably best to use the good old data[colnames()=="g",]. Thanks for the suggestion, I'll have a go.
There is however another issue. I also use subset (or rather a variation) in my function because I'm dealing with several complex design surveys (see package survey), so subset.survey.design allows you to get the right variance estimation for subgroups. If I selected my group using [] I would get the wrong s.e. for my parameters, so I guess this is quite an important issue.
Thank you
It's happening right as the function is trying to define GroupVar in the beginning. R is looking for the object h by itself (not within the dataframe).
The best thing to do is refer to the column names in quotes in the subset function. But of course, then you'd have to sidestep the condition part:
subbing <- function(Data, GroupVar, condition) {
....
DF <- subset(Data, select=c("a","b", GroupVar))
DF <- DF[DF[,3] %in% condition,]
}
That will do the trick, although it can be annoying to have one data frame indexing inside another.

Resources