FOR loops giving no result or error in R - r

I am running the following code:
disc<-for (i in 1:33) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
i=i+1}
Running it:
>disc
NULL
Why is it giving me NULL?

This is from the documentation for for, accessible via ?`for`:
‘for’, ‘while’ and ‘repeat’ return ‘NULL’ invisibly.
Perhaps you are looking for something along the following lines:
library(plyr)
disc <- llply(1:33, function(i) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
})
Other variants exists -- the ll in llply stands for "list in, list out". Perhaps your intended final result is a data frame or an array -- appropriate functions exist.
The code above is a plain transformation of your example. We might be able to do better by splitting data right away and forgetting the otherwise useless count variable i (untested, as you have provided no data):
disc <- daply(cbind(data, n=n), .(), function(data.i) {
m=data.i$n
xbar<-sum(data.i,na.rm=TRUE)/m
sqrt(sum((data.i-xbar)^2,na.rm=TRUE)/(m-1))
})
See also the plyr website for more information.
Related (if not a duplicate): R - How to turn a loop to a function in R

krlmlr's answer shows you how to fix your code, but to explain your original problem in more abstract terms: A for loop allows you to run the same piece of code multiple times, but it doesn't store the results of running that code for you- you have to do that yourself.
Your current code only really assigns a single value, Sx, for each run of the for loop. On the next run, a new value is put into the Sx variable, so you lose all the previous values. At the end, you'll just end up with whatever the value of Sx was on the last run through the loop.
To save the results of a for loop, you generally need to add them to a vector as you go through, e.g.
# Create the empty results vector outside the loop
results = numeric(0)
for (i in 1:10) {
current_result = 3 + i
results = c(results, current_result)
}

In R for can't return a value. The unique manner to return a value is within a function. So the solution here, is to wrap your loop within a function. For example:
getSx <- function(){
Sx <- 0
disc <- for (i in 1:33) {
m=n[i]
xbar <- sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
}
Sx
}
Then you call it:
getSx()
Of course you can avoid the side effect of using a for by lapply or by giving a vectorized But this is another problem: You should maybe give a reproducible example and explain a little bit what do you try to compute.

Related

Using strings from loops as parts of function commands and variable names in R

How does one use the string coming from a loop
- to generate new variables
- as a part of functions commands
- as functions' arguments
- as a part of if statements
in R?
Specifically, as an example (the code obviously doesn't work, but I'd like to have something not less intelligible than what is bellow),
list_dist <- c("unif","norm")
for (dist in list_dist){
paste("rv",dist,sep="") = paste("r",dist,sep="")(100,0,1)
paste("meanrv",dist,sep="") = mean(paste("rv",dist,sep=""))
if (round(paste("meanrv",dist,sep=""),3) != 0){
print("Not small enough")
}
}
Note: This is an example and I do need to use kind of loops to avoid writing huge scripts.
I managed to use strings as in the example above but only with eval/parse/text/paste and combining the whole statement (i.e. the whole "line") inside paste, instead of pasting only in the varname part or the function part, which makes code ugly and illegible and coding inefficient.
Other available replies to similar questions which I've seen are not specific as in how to deal with this sort of usage of strings from loops.
I'm sure there must be a more efficient and flexible way to deal with this, as there is in some other languages.
Thanks in advance!
Resist the temptation of creating variable names programmatically. Instead, structure your data properly into lists:
list_dist = list(unif = runif, norm = rnorm)
distributions = lapply(list_dist, function (f) f(100, 0, 1))
means = unlist(lapply(distributions, mean))
# … etc.
As you can see, this also gets rid of the loop, by using list functions instead.
Your last step can also be vectorised:
if (any(round(means, 3) != 0))
warning('not small enough')
try this:
list_dist <- list(unif = runif,norm = rnorm)
for (i in 1:length(list_dist)){
assign(paste("rv",names(list_dist)[i],sep=""), list_dist[[i]](100,0,1))
assign(paste("meanrv",names(list_dist)[i],sep=""),mean(get(paste("rv",names(list_dist)[i],sep=""))))
if (round(get(paste("meanrv",names(list_dist)[i],sep="")),3) != 0){
print("Not small enough")
}
}

How do I include helper functions inside a function to find and alter data, using prefixes, from a data.frame? Also need to count and use loop

This is a question for school, but I have been working on it for some time and just need a point in the right direction. I am not asking for the full answer.
I was given a data frame with student grades for various assessments. I have to write a function that will result in the weight (as part of a total grade of 100%) for an assessment whose name is provided. This function needs to include at least one helper function.
I was first provided with the following lines of code to run, which class() defines as a function:
assessmentTypeWeights <- c(5,15,5,3,2,10,10,10,40)
names(assessmentTypeWeights) <- c("quiz","hw","term1","term2","term3",
"exam1","exam2","exam3","final")
Then I was also provided with the following helper function:
assessmentPrefix <- function(assessmentName,assessmentTypeWeights)
{
if(assessmentName %in% names(assessmentTypeWeights))
{
return(assessmentName)
}else
{
# find the prefix of the assessment name
# by removing the last digit
prefix <- substring(assessmentName,1,nchar(assessmentName)-1)
return(prefix)
}
}
Finally, I was provided with the following framework for my answer:
assessmentWeight <- function(df, assessmentName, assessmentTypeWeights)
{
#includes helper function "assessmentPrefix"
}
Additionally, I had already written the following function for a previous question:
library(stringr)
assessmentCount <- function(df, assessmentNamePrefix)
{
sum(str_detect(names(df), assessmentNamePrefix))
}
I need to be able to write the code for the assessmentWeight function, in the framework given above, to get the exact results below when the following lines of code are executed:
assessmentWeight(df,"quiz1",assessmentTypeWeights)
# quiz
# 0.8333333
and
assessmentWeight(df,"term1",assessmentTypeWeights)
# term1
# 5
This is what I have written:
assessmentWeight <- function(df, assessmentName, assessmentTypeWeights)
{
assessmentPrefix(assessmentName, assessmentTypeWeights)
#determines prefix from name of assessment
assessmentCount(assessmentPrefix)
#determines number of columns in data.frame starting with that prefix
assessmentTypeWeights(assessmentPrefix)
#determines the assessment type weight based on the prexix
myAssessmentWeight <- (assessmentTypeWeights / assessmentCount) * 100
#adjusts the assessment type weight to be percentage of 100%
return(assessmentPrefix, myAssessmentWeight)
#returns the assessment prefix and its weight as part of 100%
}
However, when I run this code, I get the following error message:
Error in type(pattern) :
argument "assessmentNamePrefix" is missing, with no default
Of course, I don't know if that's the only problem with the code, and I don't know how to fix it. I have scoured the internet and three different books but have been unable to figure out what I need to change.
I'm thinking that I might need to include the function(?) names() in there somewhere? Or perhaps I'm completely off track?
Any help would be greatly appreciated. Thank you in advance for your time.
Couple of things I can see:
assessmentPrefix is returning a value, yet you're not assigning that value (using <-) to anything to use again.
Your assessmentCount function requires two arguments, and you are only providing one, passing assessmentPrefix as the df argument. Given that you try to use this result again later, you need have that function return() a value and assign it to an object to reference in myAssessmentWeight.
Also, based on what you described, assessmentTypeWeights is a vector, not a function, so that line of code will also return an error once you fix the one above.
Hope that helps and can get you going in the right direction!
FYI, you'll get more help if you can include some detail on your input data structure and what you think your output should look like.
I finally figured it out. I doubt it's the most concise or best fomatted code, but it works.
assessmentWeight <- function(df, assessmentName, assessmentTypeWeights)
{
myPrefix <- assessmentPrefix(assessmentName, assessmentTypeWeights)
myWeight <- assessmentTypeWeights[myPrefix]
myCols <- assessmentCount(df, myPrefix)
myAssessmentWeight <- (myWeight / myCols)
return(myAssessmentWeight)
}

R: IF object is TRUE then assign object NOT WORKING

I am trying to write a very basic IF statement in R and am stuck. I thought I'd find someone with the same problem, but I cant. Im sorry if this has been solved before.
I want to check if a variable/object has been assigned, IF TRUE I want to execute a function that is part of a R-package. First I wrote
FileAssignment <- function(x){
if(exists("x")==TRUE){
print("yes!")
x <- parse.vdjtools(x)
} else { print("Nope!")}
}
I assign a filename as x
FILENAME <- "FILENAME.txt"
I run the function
FileAssignment(FILENAME)
I use print("yes!") and print("Nope!") to check if the IF-Statement works, and it does. However, the parse.vdjtools(x) part is not assigned. Now I tested the same IF-statement outside of the function:
if(exists("FILENAME1")==TRUE){
FILENAME1 <- parse.vdjtools(FILENAME1)
}
This works. I read here that it might be because the function uses {} and the if-statement does too. So I should remove the brackets from the if-statement.
FileAssignment <- function(x){
if(exists("x")==TRUE)
x <- parse.vdjtools(x)
else { print("Nope!")
}
Did not work either.
I thought it might be related to the specific parse.vdjtools(x) function, so I just tried assigning a normal value to x with x <- 20. Also did not work inside the function, however, it does outside.
I dont really know what you are trying to acheive, but I wpuld say that the use of exists in this context is wrong. There is no way that the x cannot exist inside the function. See this example
# All this does is report if x exists
f <- function(x){
if(exists("x"))
cat("Found x!", fill = TRUE)
}
f()
f("a")
f(iris)
# All will be found!
Investigate file.exists instead? This is vectorised, so a vector of files can be investigated at the same time.
The question that you are asking is less trivial than you seem to believe. There are two points that should be addressed to obtain the desired behavior, and especially the first one is somewhat tricky:
As pointed out by #NJBurgo and #KonradRudolph the variable x will always exist within the function since it is an argument of the function. In your case the function exists() should therefore not check whether the variable x is defined. Instead, it should be used to verify whether a variable with a name corresponding to the character string stored in x exists.
This is achieved by using a combination of deparse() and
substitute():
if (exists(deparse(substitute(x)))) { …
Since x is defined only within the scope of the function, the superassignment operator <<- would be required to make a value assigned to x visible outside the function, as suggested by #thothai. However, functions should not have such side effects. Problems with this kind of programming include possible conflicts with another variable named x that could be defined in a different context outside the function body, as well as a lack of clarity concerning the operations performed by the function.
A better way is to return the value instead of assigning it to a variable.
Combining these two aspects, the function could be rewritten like this:
FileAssignment <- function(x){
if (exists(deparse(substitute(x)))) {
print("yes!")
return(parse.vdjtools(x))
} else {
print("Nope!")
return(NULL)}
}
In this version of the function, the scope of x is limited to the function body and the function has no side effects. The return value of FileAssignment(a) is either parse.vdjtools(a) or NULL, depending on whether a exists or not.
Outside the function, this value can be assigned to x with
x <- FileAssignment(a)

does an object need to be initialized before for loop in R

am wondering if I an create an object within a for loop - i.e. don't have to initialize it. I have tried this how one might do it in matlab. Please see the following R code:
> for (i in 1:nrow(snp.ids)) {
+ snp.fasta[i]<-entrez_fetch(db="protein", id=snp.ids[i,], rettype="xml",retmode="text")
+ snp.seq[i]<-xpathSApply(xmlParse(snp.fasta[i]), "//Seq-data_iupacaa",xmlValue)
+ }
Error in snp.fasta[i] <- entrez_fetch(db = "protein", id = snp.ids[i, :
object 'snp.fasta' not found
where it obviously does not find snp.fasta - but you can see from the code I am trying to create snp.fasta. can anyone shed any light on why it would not create it within. the for loop, and what would be the proper way to initialize snp.fasta if I cannot create it within the for loop.
Thanks
Generally , yes. That would be an acceptable way to loop over a vector of ids. Just assign to a non-indexed object.
for (i in 1:nrow(snp.ids)) {
snp.fasta <- entrez_fetch(db="protein", id=snp.ids[i,], rettype="xml",retmode="text")
snp.seq <- xpathSApply(xmlParse(snp.fasta), "//Seq-data_iupacaa",xmlValue)
}
(You would then still need to assign any useful result to an index-able object or build a sequence of such within the loop or print some result. As it stands this example will over-write all the values of snp.seq and leave only the last one. )
It's a bit confusing to see id=snp.ids[i,]. That would imply that snp.ids has a dimension of 2. I would have expected a column name or number to be used: id=snp.ids[i,"id"]. You should provide dput(head(snp.ids)) so we can do some realistic testing rather than this half-assed guesswork.
In R, subsetting is also a function, so assigning value to an item in a vector:
a[1] = 123
is identical to
"["(a, 1) = 123
Here [ is a normal function. If a is not defined, there is an error.
Before the loop:
snp.fasta <- NULL

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

Resources