Can I add a condition/criteria argument to R's table function? - r

How can I incorporate criteria into a table(x,y) function? Instead of subsetting the main data frame and then running table() on each subset, can I save a step or two and just write table() with some sort "if" functionality written into it?

An example of the data set would be great. Hard to imagine what you're after at the moment.
But yes, you can write an if/else statement:
if(grepl("^1[.]0{3}, dataset)==TRUE)) {
print("1.000")
} else {
print("not 1.000")
}
Here, I'm asking it to find the pattern 1,000 in the dataset named dataset. If not, it'll return the else statement.
You can see other examples around too. Maybe look at the function summary of if, and how to use regex (regular expressions).
Hope that helps.

Related

Using For-Loop With Strings

I'm learning R and trying to use it for a statistical analysis at the same time.
Here, I am in the first part of the work: I am writing matrices and doing some simple things with them, in order to work later with these.
punti<-c(0,1,2,4)
t1<-matrix(c(-8,36,-8,-20,51,-17,-17,-17,57,-19,-19,-19,35,-8,-19,-8,0,0,0,0,-20,-20,-20,60,
-8,-8,-28,44,-8,-8,39,-23,-8,-19,35,-8,57,-8,-41,-8,-8,55,-8,-39,-8,-8,41,-25,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),ncol=4,byrow=T)
colnames(t1) <- c("20","1","28","19")
r1<-matrix(c(12,1,19,9,20,20,11,20,20,11,20,28,0,0,0,12,19,19,20,19,28,15,28,19,11,28,1,
33,20,28,31,1,19,17,28,19,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),ncol=3,byrow=T)
pt1<-rbind(sort(colSums(t1)),sort(punti))
colnames(r1)<-c("Valore","Vincitore","Perdente")
r1<-as.data.frame(r1)
But I have more matrices t_ and r_ so I would like to run a for-loop like:
for (i in 1:150)
{
pt[i]<-rbind(sort(colSums(t[i])),sort(punti))
colnames(r[i])<-c("Valore","Vincitore","Perdente")
r[i]<-as.data.frame(r[i])
}
This one just won't work because r_, t_ and pt_ are strings, but you get both the idea and that I would not like to copy-paste these three lines and manually edit the [i] 150 times. Is there a way to do it?
personally i don't advise dynamically and automatically creating lots of variables in the global environment, and would advise you to think about how you can accomplish your goals without such an approach. with that said, if you feel you really need to dynamically create all these variables, you may benefit from the assign function.
it could work like so:
for (i in 1:150)
{
assign(paste0('p',i),rbind(sort(colSums(t[i])),sort(punti)))
}
the first argument in the assign function is the formula for the variable name and how it is created; the second argument is what you wish to assign to the variable being created.

ifelse with no else

Basically in SAS I could just do an if statement without an else. For example:
if species='setosa' then species='regular';
there is no need for else.
How to do it in R? This is my script below which does not work:
attach(iris)
iris2 <- iris
iris2$Species <- ifelse(iris2$Species=='setosa',iris2$Species <- 'regular',iris2$Species <- iris2$Species)
table(iris2$Species)
A couple options. The best is to just do the replacement, this is nice and clean:
iris2$Species[iris2$Species == 'setosa'] <- 'regular'
ifelse returns a vector, so the way to use it in cases like this is to replace the column with a new one created by ifelse. Don't do assignment inside ifelse!
iris2$Species <- ifelse(iris2$Species=='setosa', 'regular', iris2$Species)
But there's rarely need to use ifelse if the else is "stay the same" - the direct replacement of the subset (the first line of code in this answer) is better.
New factor levels
Okay, so the code posted above doesn't actually work - this is because iris$Species is a factor (categorical) variable, and 'regular' isn't one of the categories. The easiest way to deal with this is to coerce the variable to character before editing:
iris2$Species <- as.character(iris2$Species)
iris2$Species[iris2$Species == 'setosa'] <- 'regular'
Other methods work as well, (editing the factor levels directly or re-factoring and specifying new labels), but that's not the focus of your question so I'll consider it out of scope for the answer.
Also, as I said in the comments, don't use attach. If you're not careful with it you can end up with your columns out of sync creating annoying bugs. (In the code you post, you're not using it anyway - the rest runs just as well if you delete the attach line.)
I would recommend looking at the base R documentation for help with this. You can find the documentation of if, else, and ifelse here. For use of if and else, refer to ?Control.
Regular control flow in code is done with the basic if and else statements, as in most languages. ifelse() is used for vectorized operations--it will return the same shape as your vector based on the test. Regular if and else expressions do not necessarily have those properties.

Saving data from replicated simulations in R

I created a small function that generates a table of Data. Then, I want R to replicate this function many times, so I included my function inside a "replicate()" function.
This seems to work great.
Now, I want to save the results of every simulation in one table. I've tried rbind, data.table etc... but clearly I am doing it wrong.
replicate(100,{object<-function{....
results<-(table of results)
}})
I don't know if I understand corectly but the easiest thing for me is use some condinional function. For example:
i<-1
main.table<-data.frame()
while (i<=100) {
object<-function (...)
main.table<-rbind(main.table,object)
i<-i+1
}

Use the multiple variables in function in r

I have this function
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
ANN<-neuralnet(x~y,hidden=10,algorithm='rprop+')
return(ANN)
}
I need the function run like
formula=X1+X2
ANN(DV,formula)
and get result of the function. So the problem is to say the function USE the object which was created during the run of function. I need to run trough lapply more combinations of x,y, so I need it this way. Any advices how to achieve it? Thanks
I've edited my answer, this still works for me. Does it work for you? Can you be specific about what sort of errors you are getting?
New response:
ANN<-function (y){
X1<-c(1:10)
DV<-rep(c(0:1),5)
X2<-c(2:11)
dat <- data.frame(X1,X2)
ANN<-neuralnet(DV ~y,hidden=10,algorithm='rprop+',data=dat)
return(ANN)
}
formula<-X1+X2
ANN(formula)
If you want so specify the two parts of the formula separately, you should still pass them as formulas.
library(neuralnet)
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
formula<-update(x,y)
ANN<-neuralnet(formula,data=data.frame(DV,X1,X2),
hidden=10,algorithm='rprop+')
return(ANN)
}
ANN(DV~., ~X1+X2)
And assuming you're using neuralnet() from the neuralnet library, it seems the data= is required so you'll need to pass in a data.frame with those columns.
Formulas as special because they are not evaluated unless explicitly requested to do so. This is different than just using a symbol, where as soon as you use it is evaluated to something in the proper frame. This means there's a big difference between DV (a "name") and DV~. (a formula). The latter is safer for passing around to functions and evaluating in a different context. Things get much trickier with symbols/names.

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

Resources