update function in R - r

I am trying to use update function on survey.design object. For instance, I want to create a variable that is the mean of 4 other variables, as follows
x1<-runif(3)
x2<-runif(3)
x3<-runif(3)
population=10000
testdf<-data.frame(x1,x2,x3,population)
testsvy<-svydesign(id=~1,weights=c(30,30,30),data=testdf)
testsvy<-update(testsvy,avg=mean(c(x1,x2,x3)))
However this returns a vector of the same number for every person. There must be something wrong. Alternatively I can modify on test$variables, but I don't feel that this is the easiest way...

OK I got the answer myself... Hope that it could be simpler since I type the object names three times...
testsvy<-update(testsvy,avg2=rowMeans(testsvy$variables[,c("x1","x2","x3")],na.rm=TRUE))

Related

Keep the ID variable when creating seqelist in TraMineR

I am creating a list of sequences with TraMineR with the following code:
cj.seqe <- seqecreate(id=cj$party_id, time=cj$DATE_IN_num, event=cj$EVT_CD)
However, this list only contains the events and drops the id variable. I would like to merge the event sequences back to the original data. Is there a way to do this? I couldn't find anything in the docs. Thanks!
Maybe you could try with the following:
cj.seqe <- seqecreate(data=cj, id="party_id", time=cj$DATE_IN_num, event=cj$EVT_CD)
I suggest this because, to my experience, when I tried defining the id in seqcreate function in this way
id=dataset.name$variable_id didn't work!
Whereas trying defining id with id="variabile_id" worked!
I hope this could help despite I have no rigorous explanation.

Working with "..." input in R function

I am putting together an R function that takes some undefined input through the ... argument described in the docs as:
"..." the special variable length argument ***
The idea is that the user will enter a number of column names here, each belonging to a dataset also specified by the user. These columns will then be cross-tabulated in comparison to the dependent variable by tapply. The function is to return a table (independent variable x indedependent variable).
Thus, I tried:
plotter=function(dataset, dependent_variable, ...)
{
indi_variables=list(...); # making a list of the ... input as described in the docs
result=with (dataset, tapply(dependent_variable, indi_variables, mean); # this fails
}
I figured this should work as tapply can take a list as input.
But it does not in this case ('Error in tapply...arguments must have same length') and I think it is because indi_variables is a list of strings.
If I input the contents of the list by hand and leave out the quotation marks, everything works just fine.
However, if the user feeds the function the column names as non-strings, R will interpret them as variable names; and I cannot figure out how to transform the list indi_variables in the right way, unsuccessfully trying things like this:
indi_variables=lapply(indi_variables, as.factor)
So I am wondering
What causes the error described above? Is my interpretation correct?
How would one go about transforming the list created through ... in the right way?
Is there an overall better way of doing this, in the input or the implementation of tapply?
Any help is much appreciated!
Thanks to Joran's helpful reading, I have come up with these improvements than make things work out...
indi_variables=substitute(list(...));
result=with (dataset, tapply(dependent_variable, eval(indi_variables, dataset), FUN=mean));

Use the multiple variables in function in r

I have this function
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
ANN<-neuralnet(x~y,hidden=10,algorithm='rprop+')
return(ANN)
}
I need the function run like
formula=X1+X2
ANN(DV,formula)
and get result of the function. So the problem is to say the function USE the object which was created during the run of function. I need to run trough lapply more combinations of x,y, so I need it this way. Any advices how to achieve it? Thanks
I've edited my answer, this still works for me. Does it work for you? Can you be specific about what sort of errors you are getting?
New response:
ANN<-function (y){
X1<-c(1:10)
DV<-rep(c(0:1),5)
X2<-c(2:11)
dat <- data.frame(X1,X2)
ANN<-neuralnet(DV ~y,hidden=10,algorithm='rprop+',data=dat)
return(ANN)
}
formula<-X1+X2
ANN(formula)
If you want so specify the two parts of the formula separately, you should still pass them as formulas.
library(neuralnet)
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
formula<-update(x,y)
ANN<-neuralnet(formula,data=data.frame(DV,X1,X2),
hidden=10,algorithm='rprop+')
return(ANN)
}
ANN(DV~., ~X1+X2)
And assuming you're using neuralnet() from the neuralnet library, it seems the data= is required so you'll need to pass in a data.frame with those columns.
Formulas as special because they are not evaluated unless explicitly requested to do so. This is different than just using a symbol, where as soon as you use it is evaluated to something in the proper frame. This means there's a big difference between DV (a "name") and DV~. (a formula). The latter is safer for passing around to functions and evaluating in a different context. Things get much trickier with symbols/names.

R: Average value of vector

I'm playing around with R and trying to get the average of a column. Just mean(V1) doesn't work.
Could anybody give me an advice?
Thank you!
A: mean(D$V1) ... column names are not first class objects. They are part of a data.frame with a name that needs to be used.
Q: es, thanks. So $ is like the dot-operator? – user1170330 4 mins ago
A: Perhaps (depending on which language is being compared.) it is possible to construct list objects with cascaded calls to $<- and then obj$V1$subV1 to extract. Review ?Extract very carefully.
if you just want mean(V1), you can attach dataframe with attach(D), but probably I don't want to recommend it as later you may have other dataframes with same variable names and a mess withattach and detach commands. So, mean(D$V1) is the best way.

using value of a function & nested function in R

I wrote a function in R - called "filtre": it takes a dataframe, and for each line it says whether it should go in say bin 1 or 2. At the end, we have two data frames that sum up to the original input, and corresponding respectively to all lines thrown in either bin 1 or 2. These two sets of bin 1 and 2 are referred to as filtre1 and filtre2. For convenience the values of filtre1 and filtre2 are calculated but not returned, because it is an intermediary thing in a bigger process (plus they are quite big data frame). I have the following issue:
(i) When I later on want to use filtre1 (or filtre2), they simply don't show up... like if their value was stuck within the function, and would not be recognised elsewhere - which would oblige me to copy the whole function every time I feel like using it - quite painful and heavy.
I suspect this is a rather simple thing, but I did search on the web and did not find the answer really (I was not sure of best key words). Sorry for any inconvenience.
Thxs / g.
It's pretty hard to know the optimum way of achieve what you want as you do not provide proper example, but I'll give it a try. If your variables filtre1 and filtre2 are defined inside of your function and you do not return them, of course they do not show up on your environment. But you could just return the classification and make filtre1 and filtre2 afterwards:
#example data
df<-data.frame(id=1:20,x=sample(1:20,20,replace=TRUE))
filtre<-function(df){
#example function, this could of course be done by bins<-df$x<10
bins<-numeric(nrow(df))
for(i in 1:nrow(df))
if(df$x<10)
bins[i]<-1
return(bins)
}
bins<-filtre(df)
filtre1<-df[bins==1,]
filtre2<-df[bins==0,]

Resources