Explaining a for loop in R - r

I'm very new to R, and much more new to programming in R. I have the following question and its answer (which is not mine). I've trying to understand why some values, from where they are obtained, why they are used, etc.
Question: Make the vector 3 5 7 9 11 13 15 17 with a for loop. Start
with x=numeric() and fill this vector with the for loop
I know I have to create x=numeric() so I can fill it with the result obtained from the loop.
The answer from a classmate was:
> x <- numeric()
> for(i in 1:8){
if(i==1){ ## Why ==1 and not 0, or any other value
x[i] <- 3
}else{
x[i] <- x[i-1]+2 ### And why i-1
}
I'm having similar problems in questions like:
Make a for loop that adds the second element of a vector to the first,
subtracts the third element from the result, adds the fourth again and
so on for the entire length of the vector
So far, I created the vector and the empty vector
> y = c(5, 10, 15, 20, 25, 30)
> answer <- 0
And then, when I try to do the for loop, I get stuck here:
for(i in 1:length(y)){
if(i...){ ### ==1? ==0?
answer = y[i] ###and here I really don't know how to continue.
}else if()
}
Believe me when I tell you I've read several replies to questions here, like in How to make a vector using a for loop, plus pages and pages about for loop, but cannot really figure how to solve these (and other) problems.
I repeat, I'm very new, so I'm struggling trying to understand it. Any help would be much appreciated.

First, I will annotate the loop to answer what the loop is doing.
# Initialize the vector
x <- numeric()
for(i in 1:8){
# Initialize the first element of the vector, x[1]. Remember, R indexes start at 1, not 0.
if(i==1){
x[i] <- 3
} else {
# Define each additional element in terms of the previous one (x[i - 1]
# is the element of x before the current one.
x[i] <- x[i-1]+2 ### And why i-1
}
}
A better solution that uses a loop and grows it (like the instructions state) is something like this:
x <- numeric()
for(i in 1:8){
x[i] <- 2 * i + 1
}
This is still not a good way to do things because growing a vector inside a loop is very slow. To fix this, you can preallocate the vector by telling numeric the length of the vector you want:
x <- numeric(8)
The best way to solve this would be:
2 * 1:8 + 1
using vectorized operations.
To help you solve your other problem, I suggest writing out each step of the loop as a table. For example, for my solution, the table would be
i | x[i]
------------------
1 | 2 * 1 + 1 = 3
2 | 2 * 2 + 1 = 5
and so on. This will give you an idea of what the for loop is doing at each iteration.

This is intentionally not an answer because there are better ways to solve the alternating sign summation problem than a for-loop. I suppose there could be value in getting comfortable with for-loops but the vectorized approaches in R should be learned as well. R has "argument recycling" for many of its operations, including the "*" (multiplication) operation: Look at:
(1:10)*c(1,-1)
Then take an arbitrary vector, say vec and try:
sum( vec*c(1,-1) )
The more correct answer after looking at that result would be:
vvec[1] + sum( vec[-1]*c(1,-1) )
Which has the educational advantage of illustrating R's negative indexing. Look up "argument recycling" in your documentation. The shorter objects are automagically duplicatied/triplicated/however-many-needed-cated to exactly match the length of the longest vector in the mathematical or logical expression.

Related

Iteration equation [duplicate]

This question already has an answer here:
What's wrong with my R function of logistic map
(1 answer)
Closed 4 years ago.
I am very new to R and am in need of some help. I am trying to write code for the following:
suppose x[0]=1 and
x[j]=x[j-1]+(2/x[j-1])
for j=1,2,...
Write a program to find the first 10 values, i.e. x[0],x[1],...x[9]
I believe I have to write a for()
loop but I am struggling to get the right combination. Any help you can provide would be greatly appreciated.
Here is where I'm at right now:
x=1
for(j in 1:10){
x=x[j-1]+(2/x[j-1])
print(x)
}
Yes, this is for homework. The x[0] is supposed to be x (subscript) 0. I'm unsure how to write that any other way.
Some pointers:
1) The goal should probably be to create a vector x with 10 elements
2) In R, vector indicies start at 1 (instead of 0), so you have that x[1] = 1.
3) In R, a single number is in fact a vector of length 1, so you can initiate this vector by writing x <- 1
4) Since you already have the first element and the loop uses the preceding element to create the next element, the loop should start at j = 2.
5) In R, when you assign an element to a vector outside its length, R will expand the vector to the necessary length. I.e. you can write
x <- 1
x[2] <- 3.14
and have a vector x = [1, 3.14]
So the setup can look like this:
x <- 1
for(j in 2:10){
#do stuff to generate the x vector
}

Add element in a vector while looping in R

I have a problem to solve in R language but I may need to add element in a loop while I am looping into it with a for, but the loop does not go through the new values.
I made a simple loop to explain the type of problem I have.
Here is the code:
c=c(1,2)
for(i in c){
c=c(c,i+2)
print(i)
}
And the result:
[1] 1
[1] 2
I would like this result:
[1] 1
[1] 2
[1] 3
[1] 4
It continues until I reach a condition.
Can someone tell me wether it is possible or not with an other way?
Thank you,
Robin
You could use a while loop instead:
test <- c(1,2)
n <- 1
while(n <= length(test)){
if(n == 5){
print(test)
break
}
print(test[n])
test <- c(test, n+2)
n <- n + 1
}
Note that in this case, the loop will keep on printing forever, so you should add some other condition to stop the loop at some point (here I quit it at 5).
Sidenote: You use c as a name for c(1,2). That's generally a bad idea, because c is reserved for defining vectors in R. It's always a good idea to avoid using names that are already used for other things by R itself.

Struggling creating a difference function

So I have a homework problem that I am really struggling to code in R.
This is the problem: Write a function difference() that takes a vector X as a parameter and returns a vector of the
difference between each element and the next element:
X[2]-X[1], X[3]-X[2], X[4]-X[3], etc.
Thus difference(c(5,2,9,4,8)) would return c(-3,7,-5,4)
And so far I have this:
difference<-function(X) {
for (i in X)
X.val<-X[i]-X[i-1]
return(X.val)
}
difference(c(5,2,9,4,8))
I cant seem to get the function to subtract the X[2]-X[1] and it is returning one more number than it should when I run the function. Can anyone help me?
You're having a couple of problems with your code. Since this is homework, I'm not going to provide the correct code, but I'll help highlight where you're going wrong to help you get closer. The only reason I'm not providing the answer is because these are good learning experiences. If you comment with updated attempts, I'll continue to update my answer to guide you.
The issue is that you're using for (i in X), which will actually loop through the values of X and not its index. So, in your example, i will equal 5 and then 2 and then 9 and then 4 and then 8. If we start with i == 5, the code is doing this: X.val <- X[5] - X[5 - 1]. At this point you'd assign X.val to be 4 because X[5] is equal to 8 and X[4] is equal to 4. At the next iteration, i == 2. So this will set X.val to -3 because X[2] is 2 and X[1] is 5.
To fix this issue, you'd want to loop through the index of X instead. You can do this by using for (i in 1:length(X)) where length(X) will give you a number equal to the number of elements in X.
The next issue you've found is that you're getting one extra number. It's important to think about how many numbers you should have in your output and what this means in terms of where i should start. Hint: should you really be starting at 1?
Lastly, you overwrite X.val in each iteration. It surprises me that you were getting an extra number in your results given that you should have only received NA given that the last number is 8 and there are not 8 elements in X. Nevertheless, you'll need to rewrite your code so that you don't overwrite X.val, but instead append to it for each iteration.
I hope that helps.
UPDATE #1
As noted in the comments below, your code now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X[i] <- X[i] - X[i-1]
}
return(X)
}
difference(c(5, 2, 9, 4, 8))
We are now very, very close to a final solution. We just need to address a quick problem.
The problem is that we're now overriding our value of X, which is bad. Since our numbers, c(5,2,9,4,8), are passed into the function as the variable X, the line X[i] <- X[i] - X[i-1] will start to override our values. So, stepping through one iteration at a time, we get the following:
Step 1:
i gets set to 2
X[2] is currently equal to 2
We then run the line X[i] <- X[i] - X[i-1], which gets evaluated like this: X[2] <- X[2] - X[1] --> X[2] <- 2 - 5 --> X[2] <- -3
X[2] is now set to -3
Step 2:
i gets set to 3
X[3] is currently equal to 9
We then run the X[i] <- X[i] - X[i-1], which gets evaluated like this: X[3] <- X[3] - X[2] --> X[3] <- 9 - -3 --> X[3] <- 12
X[3] is now set to 12
As you can see from the first two iterations, we're overwriting our X variable, which is directly impacting the differences we get when we run our function.
To solve this, we simply go back to using X.val, like we were before. Since this variable has no values, there's nothing to be overwritten. Our function now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Now, for each iteration, nothing is overwritten and our values of X stay in tact. There are two problems that we're going to have though. If we run this new code, we'll end up with an error telling us that x.diff doesn't exist. Earlier, I told you that you can index a variable that you're making, which is true. We just have to tell R that the variable we're making is a variable first. There are several ways to do this, but the second best way to do it is to create a variable with the same class as our expected output. Since we know we want our output to be a list of numbers, we can just make X.val a numeric vector. Our code now looks like this:
difference <- function(X) {
X.val <- numeric()
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Notice that the assignment of X.val happens before we enter the for loop. As an exercise, you should think about why that's the case and then try moving it inside of the for loop and seeing what happens.
So this, solves our first problem. Try running the code and seeing what you get. You'll notice that the first element of the output is NA. Why might this be the case, and how can we fix it? Hint: it has to do with the value of i.
UPDATE #2
So now that we have the correct answer, let's look at a couple tips and tricks that are available thanks to R. R has some inherent features that it can use on vectors. To see this action, run the following example:
a <- 1:10
b <- 11:20
a + b
a - b
a * b
a / b
As you can see, R will automatically perform what is called "element wise" operations for vectors. You'll notice that a - b is pretty similar to what we were trying to do here. The difference is that a and b are two different vectors and we were dealing with one vector at a time. So how do we set up our problem to work like this? Simple: we create two vectors.
x <- c(5, 2, 9, 4, 8)
y <- x[2:length(x)]
z <- x[1:(length(x)-1)]
y - z
You should notice that y - z now gives us the answer that we wanted from our function. We can apply that to our difference function like so:
difference <- function(X) {
y <- X[2:length(X)]
z <- X[1:(length(X)-1)]
return(y-z)
}
Using this trick, we no longer need to use a for loop, which can be incredibly slow in R, and instead use the vectorized operation, which is incredibly fast in R. As was stated in the comments, we can actually skip the step of assignin those values to y and z and can instead just directly return what we want:
difference <- function(X) {
return(X[2:length(X)] - X[1:(length(X)-1)])
}
We've now just successfully created a one-line function that does what we were hoping to do. Let's see if we can make it even cleaner. R comes with two functions that are very handy for looking at data: head() and tail(). head allows you to look at the first n number of elements and tail allows you to look at the last n number of elements. Let's see an example.
a <- 1:50
head(a) # defaults to 6 elements
tail(a) # defaults to 6 elements
head(a, n=20) # we can change how many elements to return
tail(a, n=20)
head(a, n=-1) # returns all but the last element
tail(a, n=-1) # returns all but the first element
Those last two are the most important for what we want to do. In our newest version of difference we were looking at X[2:length(X)], which is another way of saying "all elements in X except the first element". We were also looking at X[1:(length(X)-1)], which is another way of saying "all elements in X except the last element". Let's clean that up:
difference <- function(X) {
return(tail(X, -1) - head(X, -1))
}
As you can see, that's a much cleaner way of defining our function.
So those are the tricks. Let's look at a couple tips. The first is to drop the return from simple functions like this. R will automatically return the last command if a function if it's not an assignment. To see this in action, try running the two different functions:
difference_1 <- function(X) {
x.diff <- tail(X, -1) - head(X, -1)
}
difference_1(1:10)
difference_2 <- function(X) {
tail(X, -1) - head(X, -1)
}
difference_2(1:10)
In difference_1 you'll notice that nothing is returned. This is because the command is an assignment command. You could force it to return a value by using the return command.
The next tip is something you won't need for a while, but it's important. Going back to the current version of difference that we have (the code you're using now, not anything I've mentioned in this update), we assign values to X.val, which causes it to "grow" over time. To see what this means, run the following code:
x.val <- numeric()
length(x)
x.val[1] <- 1
length(x)
x.val[2] <- 2
length(x)
You'll see that the length keeps growing. This is often a point of huge slowdowns in R code. The proper way to do this is to create x.val with a length equal to how big we need it. This is much, much faster and will save you some pains in the future. Here's how it would work:
difference <- function(X) {
x.val <- numeric(length=(length(X) - 1))
for (i in 2:length(X)) {
x.val[i-1] <- X[i] - X[i-1]
}
return(x.val)
}
In our current code, this doesn't make a real difference. But if you're dealing with very large data in the future, this can you hours or even days of computing time.
I hope this all helps you better understand some of the functionality in R. Good luck with everything!

Make nested loops more efficient?

I'm analyzing large sets of data using the following script:
M <- c_alignment
c_check <- function(x){
if (x == c_1) {
1
}else{
0
}
}
both_c_check <- function(x){
if (x[res_1] == c_1 && x[res_2] == c_1) {
1
}else{
0
}
}
variance_function <- function(x,y){
sqrt(x*(1-x))*sqrt(y*(1-y))
}
frames_total <- nrow(M)
cols <- ncol(M)
c_vector <- apply(M, 2, max)
freq_vector <- matrix(nrow = sum(c_vector))
co_freq_matrix <- matrix(nrow = sum(c_vector), ncol = sum(c_vector))
insertion <- 0
res_1_insertion <- 0
for (res_1 in 1:cols){
for (c_1 in 1:conf_vector[res_1]){
res_1_insertion <- res_1_insertion + 1
insertion <- insertion + 1
res_1_subset <- sapply(M[,res_1], c_check)
freq_vector[insertion] <- sum(res_1_subset)/frames_total
res_2_insertion <- 0
for (res_2 in 1:cols){
if (is.na(co_freq_matrix[res_1_insertion, res_2_insertion + 1])){
for (c_2 in 1:max(c_vector[res_2])){
res_2_insertion <- res_2_insertion + 1
both_res_subset <- apply(M, 1, both_c_check)
co_freq_matrix[res_1_insertion, res_2_insertion] <- sum(both_res_subset)/frames_total
co_freq_matrix[res_2_insertion, res_1_insertion] <- sum(both_res_subset)/frames_total
}
}
}
}
}
covariance_matrix <- (co_freq_matrix - crossprod(t(freq_vector)))
variance_matrix <- matrix(outer(freq_vector, freq_vector, variance_function), ncol = length(freq_vector))
correlation_coefficient_matrix <- covariance_matrix/variance_matrix
A model input would be something like this:
1 2 1 4 3
1 3 4 2 1
2 3 3 3 1
1 1 2 1 2
2 3 4 4 2
What I'm calculating is the binomial covariance for each state found in M[,i] with each state found in M[,j]. Each row is the state found for that trial, and I want to see how the state of the columns co-vary.
Clarification: I'm finding the covariance of two multinomial distributions, but I'm doing it through binomial comparisons.
The input is a 4200 x 510 matrix, and the c value for each column is about 15 on average. I know for loops are terribly slow in R, but I'm not sure how I can use the apply function here. If anyone has a suggestion as to properly using apply here, I'd really appreciate it. Right now the script takes several hours. Thanks!
I thought of writing a comment, but I have too much to say.
First of all, if you think apply goes faster, look at Is R's apply family more than syntactic sugar? . It might be, but it's far from guaranteed.
Next, please don't grow matrices as you move through your code, that slows down your code incredibly. preallocate the matrix and fill it up, that can increase your code speed more than a tenfold. You're growing different vectors and matrices through your code, that's insane (forgive me the strong speech)
Then, look at the help page of ?subset and the warning given there:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
Always. Use. Indices.
Further, You recalculate the same values over and over again. fre_res_2 for example is calculated for every res_2 and state_2 as many times as you have combinations of res_1 and state_1. That's just a waste of resources. Get out of your loops what you don't need to recalculate, and save it in matrices you can just access again.
Heck, now I'm at it: Please use vectorized functions. Think again and see what you can drag out of the loops : This is what I see as the core of your calculation:
cov <- (freq_both - (freq_res_1)*(freq_res_2)) /
(sqrt(freq_res_1*(1-freq_res_1))*sqrt(freq_res_2*(1-freq_res_2)))
As I see it, you can construct a matrix freq_both, freq_res_1 and freq_res_2 and use them as input for that one line. And that will be the whole covariance matrix (don't call it cov, cov is a function). Exit loops. Enter fast code.
Given the fact I have no clue what's in c_alignment, I'm not going to rewrite your code for you, but you definitely should get rid of the C way of thinking and start thinking R.
Let this be a start: The R Inferno
It's not really the 4 way nested loops but the way your code is growing memory on each iteration. That's happening 4 times where I've placed # ** on the cbind and rbind lines. Standard advice in R (and Matlab and Python) in situations like this is to allocate in advance and then fill it in. That's what the apply functions do. They allocate a list as long as the known number of results, assign each result to each slot, and then merge all the results together at the end. In your case you could just allocate the correct size matrix in advance and assign into it at those 4 points (roughly speaking). That should be as fast as the apply family, and you might find it easier to code.

missing value where TRUE/FALSE needed error in R

I have got a column with different numbers (from 1 to tt) and would like to use looping to perform a count on the occurrence of these numbers in R.
count = matrix(ncol=1,nrow=tt) #creating an empty matrix
for (j in 1:tt)
{count[j] = 0} #initiate count at 0
for (j in 1:tt)
{
for (i in 1:N) #for each observation (1 to N)
{
if (column[i] == j)
{count[j] = count[j] + 1 }
}
}
Unfortunately I keep getting this error.
Error in if (column[i] == j) { :
missing value where TRUE/FALSE needed
So I tried:
for (i in 1:N) #from obs 1 to obs N
if (column[i] = 1) print("Test")
I basically got the same error.
Tried to do abit research on this kind of error and alot have to said about "debugging" which I'm not familiar with.
Hopefully someone can tell me what's happening here. Thanks!
As you progress with your learning of R, one feature you should be aware of is vectorisation. Many operations that (in C say) would have to be done in a loop, can be don all at once in R. This is particularly true when you have a vector/matrix/array and a scalar, and want to perform an operation between them.
Say you want to add 2 to the vector myvector. The C/C++ way to do it in R would be to use a loop:
for ( i in 1:length(myvector) )
myvector[i] = myvector[i] + 2
Since R has vectorisation, you can do the addition without a loop at all, that is, add a scalar to a vector:
myvector = myvector + 2
Vectorisation means the loop is done internally. This is much more efficient than writing the loop within R itself! (If you've ever done any Matlab or python/numpy it's much the same in this sense).
I know you're new to R so this is a bit confusing but just keep in mind that often loops can be eliminated in R.
With that in mind, let's look at your code:
The initialisation of count to 0 can be done at creation, so the first loop is unnecessary.
count = matrix(0,ncol=1,nrow=tt)
Secondly, because of vectorisation, you can compare a vector to a scalar.
So for your inner loop in i, instead of looping through column and doing if column[i]==j, you can do idx = (column==j). This returns a vector that is TRUE where column[i]==j and FALSE otherwise.
To find how many elements of column are equal to j, we just count how many TRUEs there are in idx. That is, we do sum(idx).
So your double-loop can be rewritten like so:
for ( j in 1:tt ) {
idx = (column == j)
count[j] = sum(idx) # no need to add
}
Now it's even possible to remove the outer loop in j by using the function sapply:
sapply( 1:tt, function(j) sum(column==j) )
The above line of code means: "for each j in 1:tt, return function(j)", an returns a vector where the j'th element is the result of the function.
So in summary, you can reduce your entire code to:
count = sapply( 1:tt, function(j) sum(column==j) )
(Although this doesn't explain your error, which I suspect is to do with the construction or class of your column).
I suggest to not use for loops, but use the count function from the plyr package. This function does exactly what you want in one line of code.

Resources