Stopping a loop in R when NA referenced - r

I'm an inexperienced R programmer, trying to make a piece of code I have written work. This is probably an elemental problem. I want this code to check one value against its predecessor in a vector, and if it is greater than a certain threshold value, to return which element on that vector satisfies this criterion. Once it has found one case, I'd like it to stop.
At present my code half-functions as I'd like it to, but it goes through the whole vector and once it reaches the end it checks a[i+1] which is NA and gives me an error message.
testdata<-c(0,0.1,0.2,0.3,0.45,0.5,0.6,0.7,0.8,0.9,1.0)
MLD<-function(a,...){
x<-NULL
y<-NULL
for(i in seq(along=a)){
if(a[i+1]>=a[i]+0.125)
{x=c(x,a[i+1]); y=which(a==x); print(y)}
}
}
try(MLD(testdata),silent=TRUE) # code finds right element
MLD(testdata) # but continues looking until it runs out of data
I know I need a break() or a stop() somewhere but I can't seem to work it out, I hope you can help me.

You can simplify your code to:
which(diff(testdata) > 0.125) + 1
Which you could put in a function:
MLD = function(a) which(diff(a) > 0.125) + 1

Related

Why do we need to assign an empty vector for the output before for looping?

I'm trying to use "for loop" to write a function extracting the data of one column from 2 csv.files, and work out the mean of these data. I'm wondering why the output of for loop need to be assigned as a empty vector c() to make the function work? I printed the sub-output for "foor loop" tring to figure out the reason. when I tried to exclude means<-c(), I got unexpected 1,2,3,4,5,6 for "means" from each time of loop. Could anyone kindly do a explanation for my confusion?
Thank you very much.
pollutantmean <- function(directory, pollutant, id = 1:332) {
means<-c()
for(monitor in id){
path <- paste(getwd(), "/", directory, "/", sprintf("%03d", monitor), ".csv", sep = "")
monitor_data <- read.csv(path)
interested_data <- monitor_data[pollutant]
means <- c(means, interested_data[!is.na(interested_data)])
print(monitor_data)
print(means)
print(interested_data)
}
mean(means)
}
pollutantmean("specdata", "sulfate", 1:2)
correct output with assigning means<-c()
wrong output without means<-c() before "for loop"
I think I found the essence of the qusetion which is I don't understand the logic behind:
> x=c(x,1)
> x
>[1] 9 1
> x=c()
> x=c(x,1)
> x
>[1] 1
It can represent one of the for loop, like the first round of the loop. I'm confused about how the value of x being assigned during each process for the upper two runs, why the first one gives the output of 9,1 but the second one gives the expected result? is anyone able to explain the what happens behind those two runs? Really grateful if anyone could answer it.
In the case that doesn't work, you still have the line
means <- c(means, interested_data[!is.na(interested_data)])
that refers to means on the right hand side. If you didn't set means <- c(), then you'll get different results depending on whatever happened to be stored in that variable before executing this code. If there's no such variable, you'll get an error.
By the way, this isn't a great way to write R code, even with the means <- c() line. The problem is that every time you execute the line above you need to make a slightly longer vector to hold means. In your case you don't know how many values you'll be adding, so it's excusable, but in the more common case where you will always be adding one more entry, it's a lot more efficient to set up the result vector to the right length in advance and assign values using something like
means[i] <- newValue

Problem with checking logical within for loop

Inspired by the leetcode challenge for two sum, I wanted to solve it in R. But while trying to solve it by brute-force I run in to an issue with my for loop.
So the basic idea is that given a vector of integers, which two integers in the vector, sums up to a set target integer.
First I create 10000 integers:
set.seed(1234)
n_numbers <- 10000
nums <- sample(-10^4:10^4, n_numbers, replace = FALSE)
The I do a for loop within a for loop to check every single element against eachother.
# ensure that it is actually solvable
target <- nums[11] + nums[111]
test <- 0
for (i in 1:(length(nums)-1)) {
for (j in 1:(length(nums)-1)) {
j <- j + 1
test <- nums[i] + nums[j]
if (test == target) {
print(i)
print(j)
break
}
}
}
My problem is that it starts wildly printing numbers before ever getting to the right condition of test == target. And I cannot seem to figure out why.
I think there are several issues with your code:
First, you don't have to increase your j manually, you can do this within the for-statement. So if you really want to increase your j by 1 in every step you can just write:
for (j in 2:(length(nums)))
Second, you are breaking only the inner-loop of the for-loop. Look here Breaking out of nested loops in R for further information on that.
Third, there are several entries in nums that gave the "right" result target. Therefore, your if-condition works well and prints all combination of nums[i]+nums[j] that are equal to target.

Inconsistency between the result of nested for loop and the independent function

I'm making a loop for calculating the power of Hardy-Weinberg test for a variant called 'snp1' (with HWPower command, from HardyWeinberg package). This command needs the inputs n (sample size) and pA (minor allele frequency). I have to do the calculation many separate times with lots of ns and pAs, because they represent different population samples, so I did the first two manually, but now I want to make a for loop for all the others.
I started with a simple loop with the first two, so I can easily check that the results are ok (and, thus, that the loop is working fine). But I encountered a problem when comparing the results of both calculations, which brings me to think that my code is not totally fine.
install.packages("HardyWeinberg")
library(HardyWeinberg)
snp1n=c(661,503)
snp1pA=c(0.006051,0.174)
HWpowersnp1<-numeric(2)
for(i in seq_along(snp1n)) {
for(j in seq_along(snp1pA)) {
HWpowersnp1[i]<-HWPower(n=snp1n[i],pA=snp1pA[j])
}
}
HWpowersnp1
This gives me the following vector:
HWpowersnp1
[1] 0.04109278 0.04253145
But when I calculate each of them using the function alone, I get:
HWPower(n = 661,pA = 0.006051)
[1] 0.02107572
HWPower(n = 503, nA = 175)
[1] 0.04253145
I don't know where the problem is that is causing the inconsistence. It's strange because its only in the first result, not in the second (the second calculated power is ok, it gives me the same result, but the first one doesn't).
Your double loop is wrong. If you run this
for(i in seq_along(snp1n)) {
for(j in seq_along(snp1pA)) {
print(paste(i,j))
HWpowersnp1[i]<-HWPower(n=snp1n[i],pA=snp1pA[j])
}
}
You will see that your function is running 4 times, not two like you expect. You want to iterate over snp1n and snp1pA simultaneously. So you should use mapply or Map. Try this instead
HWpowersnp1 <- Map(HWPower, n=snp1n, pA=snp1pA)

Struggling creating a difference function

So I have a homework problem that I am really struggling to code in R.
This is the problem: Write a function difference() that takes a vector X as a parameter and returns a vector of the
difference between each element and the next element:
X[2]-X[1], X[3]-X[2], X[4]-X[3], etc.
Thus difference(c(5,2,9,4,8)) would return c(-3,7,-5,4)
And so far I have this:
difference<-function(X) {
for (i in X)
X.val<-X[i]-X[i-1]
return(X.val)
}
difference(c(5,2,9,4,8))
I cant seem to get the function to subtract the X[2]-X[1] and it is returning one more number than it should when I run the function. Can anyone help me?
You're having a couple of problems with your code. Since this is homework, I'm not going to provide the correct code, but I'll help highlight where you're going wrong to help you get closer. The only reason I'm not providing the answer is because these are good learning experiences. If you comment with updated attempts, I'll continue to update my answer to guide you.
The issue is that you're using for (i in X), which will actually loop through the values of X and not its index. So, in your example, i will equal 5 and then 2 and then 9 and then 4 and then 8. If we start with i == 5, the code is doing this: X.val <- X[5] - X[5 - 1]. At this point you'd assign X.val to be 4 because X[5] is equal to 8 and X[4] is equal to 4. At the next iteration, i == 2. So this will set X.val to -3 because X[2] is 2 and X[1] is 5.
To fix this issue, you'd want to loop through the index of X instead. You can do this by using for (i in 1:length(X)) where length(X) will give you a number equal to the number of elements in X.
The next issue you've found is that you're getting one extra number. It's important to think about how many numbers you should have in your output and what this means in terms of where i should start. Hint: should you really be starting at 1?
Lastly, you overwrite X.val in each iteration. It surprises me that you were getting an extra number in your results given that you should have only received NA given that the last number is 8 and there are not 8 elements in X. Nevertheless, you'll need to rewrite your code so that you don't overwrite X.val, but instead append to it for each iteration.
I hope that helps.
UPDATE #1
As noted in the comments below, your code now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X[i] <- X[i] - X[i-1]
}
return(X)
}
difference(c(5, 2, 9, 4, 8))
We are now very, very close to a final solution. We just need to address a quick problem.
The problem is that we're now overriding our value of X, which is bad. Since our numbers, c(5,2,9,4,8), are passed into the function as the variable X, the line X[i] <- X[i] - X[i-1] will start to override our values. So, stepping through one iteration at a time, we get the following:
Step 1:
i gets set to 2
X[2] is currently equal to 2
We then run the line X[i] <- X[i] - X[i-1], which gets evaluated like this: X[2] <- X[2] - X[1] --> X[2] <- 2 - 5 --> X[2] <- -3
X[2] is now set to -3
Step 2:
i gets set to 3
X[3] is currently equal to 9
We then run the X[i] <- X[i] - X[i-1], which gets evaluated like this: X[3] <- X[3] - X[2] --> X[3] <- 9 - -3 --> X[3] <- 12
X[3] is now set to 12
As you can see from the first two iterations, we're overwriting our X variable, which is directly impacting the differences we get when we run our function.
To solve this, we simply go back to using X.val, like we were before. Since this variable has no values, there's nothing to be overwritten. Our function now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Now, for each iteration, nothing is overwritten and our values of X stay in tact. There are two problems that we're going to have though. If we run this new code, we'll end up with an error telling us that x.diff doesn't exist. Earlier, I told you that you can index a variable that you're making, which is true. We just have to tell R that the variable we're making is a variable first. There are several ways to do this, but the second best way to do it is to create a variable with the same class as our expected output. Since we know we want our output to be a list of numbers, we can just make X.val a numeric vector. Our code now looks like this:
difference <- function(X) {
X.val <- numeric()
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Notice that the assignment of X.val happens before we enter the for loop. As an exercise, you should think about why that's the case and then try moving it inside of the for loop and seeing what happens.
So this, solves our first problem. Try running the code and seeing what you get. You'll notice that the first element of the output is NA. Why might this be the case, and how can we fix it? Hint: it has to do with the value of i.
UPDATE #2
So now that we have the correct answer, let's look at a couple tips and tricks that are available thanks to R. R has some inherent features that it can use on vectors. To see this action, run the following example:
a <- 1:10
b <- 11:20
a + b
a - b
a * b
a / b
As you can see, R will automatically perform what is called "element wise" operations for vectors. You'll notice that a - b is pretty similar to what we were trying to do here. The difference is that a and b are two different vectors and we were dealing with one vector at a time. So how do we set up our problem to work like this? Simple: we create two vectors.
x <- c(5, 2, 9, 4, 8)
y <- x[2:length(x)]
z <- x[1:(length(x)-1)]
y - z
You should notice that y - z now gives us the answer that we wanted from our function. We can apply that to our difference function like so:
difference <- function(X) {
y <- X[2:length(X)]
z <- X[1:(length(X)-1)]
return(y-z)
}
Using this trick, we no longer need to use a for loop, which can be incredibly slow in R, and instead use the vectorized operation, which is incredibly fast in R. As was stated in the comments, we can actually skip the step of assignin those values to y and z and can instead just directly return what we want:
difference <- function(X) {
return(X[2:length(X)] - X[1:(length(X)-1)])
}
We've now just successfully created a one-line function that does what we were hoping to do. Let's see if we can make it even cleaner. R comes with two functions that are very handy for looking at data: head() and tail(). head allows you to look at the first n number of elements and tail allows you to look at the last n number of elements. Let's see an example.
a <- 1:50
head(a) # defaults to 6 elements
tail(a) # defaults to 6 elements
head(a, n=20) # we can change how many elements to return
tail(a, n=20)
head(a, n=-1) # returns all but the last element
tail(a, n=-1) # returns all but the first element
Those last two are the most important for what we want to do. In our newest version of difference we were looking at X[2:length(X)], which is another way of saying "all elements in X except the first element". We were also looking at X[1:(length(X)-1)], which is another way of saying "all elements in X except the last element". Let's clean that up:
difference <- function(X) {
return(tail(X, -1) - head(X, -1))
}
As you can see, that's a much cleaner way of defining our function.
So those are the tricks. Let's look at a couple tips. The first is to drop the return from simple functions like this. R will automatically return the last command if a function if it's not an assignment. To see this in action, try running the two different functions:
difference_1 <- function(X) {
x.diff <- tail(X, -1) - head(X, -1)
}
difference_1(1:10)
difference_2 <- function(X) {
tail(X, -1) - head(X, -1)
}
difference_2(1:10)
In difference_1 you'll notice that nothing is returned. This is because the command is an assignment command. You could force it to return a value by using the return command.
The next tip is something you won't need for a while, but it's important. Going back to the current version of difference that we have (the code you're using now, not anything I've mentioned in this update), we assign values to X.val, which causes it to "grow" over time. To see what this means, run the following code:
x.val <- numeric()
length(x)
x.val[1] <- 1
length(x)
x.val[2] <- 2
length(x)
You'll see that the length keeps growing. This is often a point of huge slowdowns in R code. The proper way to do this is to create x.val with a length equal to how big we need it. This is much, much faster and will save you some pains in the future. Here's how it would work:
difference <- function(X) {
x.val <- numeric(length=(length(X) - 1))
for (i in 2:length(X)) {
x.val[i-1] <- X[i] - X[i-1]
}
return(x.val)
}
In our current code, this doesn't make a real difference. But if you're dealing with very large data in the future, this can you hours or even days of computing time.
I hope this all helps you better understand some of the functionality in R. Good luck with everything!

R mapply() on specific function with recursive form (using for)

I am working with some R code that I'm sure must be able to written using one of the apply series of functions, but I can't work out how. I have a dataframe with multiple columns and I want to call a function, and the input of the function is using multiple columns from the dataframe. Let's say I have this data and a function f:
data<- data.frame(T=c(1,2,3,4), S=c(3,7,8,4), K=c(5,6,11,9))
data
V<-c(0.1,0.2,0.3,0.4,0.5,0.6)
f<-function(para_h,S,T,a,t,b){
r<- V
steps<-T
# Recursive form: Terminal condition for the A and B at time T
A_T=0
B_T=0
A=c()
B=c()
# A and B a time T-1
A[1]= r[steps]*a
B[1]= a*para_h[5]+ ((para_h[4])^(-2))
# Recursion back to time t
for (i in 2:steps){
A[i]= A[i-1]+ r[steps-i+1]*a + para_h[1]*B[i-1]
B[i]= para_h[2]*B[i-1]+a*para_h[5]+ (para_h[4]^(-2))
}
f = exp(log(S)*a + A[t] + B[t]*b )
return(f)
}
This function works well for some specific values :
> para_h<-c(0.1,0.2,0.3,0.4,0.5,0.7)
> f(para_h,S=3,T=2,a=0.4,t=1,b=0.1)
[1] 3.204144
I want to apply a function to each column S and T in a data frame. So, my code looks like:
mapply(function(para_h,S,T,a,t,b) f(para_h,S,T,a,t,b) ,para_h,S=data$S,T=data$T,a=0.4,t=1,b=0.1)
This gives an error:
> mapply(function(para_h,S,T,a,t,b) f(para_h,S,T,a,t,b) ,para_h,S=data$S,T=data$T,a=0.4,t=1,b=0.1)
Error in A[i] = A[i - 1] + r[steps - i + 1] * a + para_h[1] * B[i - 1] :
replacement has length zero
I'm pretty sure the problem is that : "steps" is vector. Will really appreciate an elegant solution.
I hope this has made some sort of sense, any advice would be greatly appreciated.
Couple of things:
1) each call of your function expects full para_h vector, but in your mapply code it will receive only one value at a time, so you probably wants something like this:
mapply(function(S,T) f(para_h,S,T,a=0.4,t=1,b=0.1), data$S, data$T)
or this:
apply(data,1,function(d) f(para_h,d['S'],d['T'],a=0.4,t=1,b=0.1))
2) Your function throws error when T==1 (which is the case in the first row of data), so you might need to modify your sample data set to be able to run this code.

Resources