i not showing up as number in loop - r

so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me

You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.

The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)

The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.

Related

Muliplying Elements of a Vector one more each time

I am trying to create a vector from another vector where I multiply the numbers in the vector one more each time.
For example if I had (1,2,3) the new vector would be (1, 1 x 2, 1 x 2 x 3)=(1,2,6)
I tried to create a loop for this as seen below. It seems to work for whole numbers but not decimals. I am not sure why.
x <- c(0.99,0.98,0.97,0.96,0.95)
for(i in 1:5){x[i]=prod(x[1:i])}
The result given is 0.9900000 0.9702000 0.9316831 0.8590845 0.7303385
which is incorrect as prod(x) = 0.8582777. Which is not the same as the last element of the vector.
Does anyone know why this is the case? Or have a suggestion for improvement in my code to get the correct answer.
test<-c(1,2,3)
cumprod(test)
[1] 1 2 6
As #akrun suggests, one can achieve the same with:
Reduce("*", test, accumulate = TRUE)

Function in R does not return vector

I am sure this is a really dumb question with a simple answer, but I have been banging my head against the desk for an hour now. The goal is to write a simple function that returns a vector of n length consisting of integers spaces as evenly as possible, from 1 to k. So:
place_in_groups <- function(n, k){
rate = (n - 1) / (k - 1)
vect <- round(seq(from = 1, to = n, by = rate), 0)
return(vect)
}
When I run the lines inside the function on the outside of the function, it does what I want it to do: creates a vector with the appropriate values. But when I run it inside the vector, I get the actual values, not the vector:
place_in_groups(4,5)
[1] 1 2 2 3 4
As I said, I'm sure it is something obvious I'm doing wrong, but it is also something I'm obviously in need of learning.
I'm not shure I undestand the question correctly. Try str() on your results. What you are getting is a vector.
vect <- place_in_groups(4,5)
str(vect)
num [1:5] 1 2 2 3 4
What do you want to do with the vector, or what is your challenge?

R: Produce Index Values to Group Increasing Values in Vector

I have a list of increasing year values that occasionally has breaks in it and I want to create a grouping value for each unbroken sequence. Think of a vector like this one (missing 2005,2011):
x <- c(2001,2002,2003,2004,2006,2007,2008,2009,2010,2013,2014,2015,2016)
I would like to produce an equal length vector that numbers every value in a run with the same index to end up with something like this.
[1] 1 1 1 1 2 2 2 2 2 3 3 3 3
I would like to do this using best R practices so I am trying to avoid falling back to a for loop but I am not sure how to get from Vector A to Vector B. Does anyone have any suggestions?
Some things I know I can do:
I can flag the record before or after a gap as true with an ifelse
I can get the index of when the counter should change by wrapping that in a which statement
This is the code to do each
ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE)
which(ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE))
I think there a couple solutions to this problem. One as d.b posted in the comment above that will produce a sequence that increments every time there is a break in the sequence.
cummax(c(1, diff(x)))
There is a similar solution that I chose to use with ifelse() flagging breaks and cumsum(). I chose this solution because additional information,like other vectors, can be included in the decision and diff seems to have problems with very erratic up and down values.
cumsum(ifelse(!is.na(lag(x)) & x == lag(x) + 1, FALSE, TRUE))

ifelse with for loop

I would like to traverse through rows of a matrix and perform some operations on data entries based on a condition.
Below is my code
m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
if (m[,colSums(!is.na(m)) > 1, drop = FALSE]){
for(i in 1:4){
a = which(m[i,] != "NA") - mean(which(!is.na(m[i,])))
for(j in 2:5){
b = which(m[j,] != "NA") - mean(which(!is.na(m[j,])))
prod(a,b)
}
}
}
I get a warning message as below in my "if" condition
Warning message:
In if (m[, colSums(!is.na(m)) > 1, drop = FALSE]) { :
the condition has length > 1 and only the first element will be used
I know it returns a vector and I should be using ifelse block. How to incorporate for loops inside ifelse block? It seems to be a basic question, I am new to R.
Based on your description, you want to check the number of non NA in matrix by column and then do something dependent on this results (that why you need "if"/"ifelse" statement). So, you can implemented as below, and write inner loops in a specific function.
yourFunc <- function(x, data) {
# do what your want / your loops on "data"
# sample, you can check the result in here
if(x > 1) 1
else 0
}
m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
# use "apply" series function in here
sapply(colSums(!is.na(m)), yourFunc, data=m)
#[1] 1 0 1 0
Actually, I think you need to re-organize your problem and optimize the code, the "ifelse with for loop" may be totally unnecessary.
As you are new to R, I assume that some of the terminology is maybe a bit
confusing. So here is a little explanation regarding the if statement.
Lets look at the if condition:
m[,colSums(!is.na(m)) > 1, drop = FALSE]
[,1] [,2]
[1,] 1 NA
[2,] 2 NA
[3,] NA 4
[4,] NA 5
[5,] 5 NA
This is nothing that if can work with as an if condition has to be
boolean (evaluate to TRUE/FALSE). So why the result? Well the result of
colSums(!is.na(m))
[1] 3 1 2 0
is a vector of counts of entries that are not NA! (= number of TRUE's in each column). Be carful as this is not the same as
colSums(m, na.rm = TRUE)
[1] 8 1 9 0
which returns a vector of sums over all five rows for each column, excluding NA's. My guess is that the latter is what you are looking for. In any case: be aware of the difference!
By asking which of those sums is greater than 1 you do get a boolean vector
colSums(!is.na(m)) > 1
[1] TRUE FALSE TRUE FALSE
However, using that boolean vector as a criteria for selecting columns, you correctly get a matrix which is obviously not boolean:
m[,colSums(!is.na(m)) > 1]
Note: drop = FALSE is unnecessary here as there are no dimensions to be dropped potentially. See ?[ or ?drop. You can verify this using identical:
identical(m[,colSums(!is.na(m)) > 1, drop = FALSE],
m[,colSums(!is.na(m)) > 1])
Now to the loop. You find tons of discussions on avoiding for loops and using the apply family of functions. I suspect you have to take some time togo through all that. Note however, that using apply - contrary to common belief - is not necessarily superior to a for loop in terms of speed, as it is actually just a fancy wrapper around a for loop (check the source code!). It is, however, clearly superior in terms of code clarity as it is compact and clear about what it is doing. So do try to use apply functions if possible!
In order to rewrite your loop it would be helpful if you could verbally
describe what you actually want to do, since I assume that what the loop
is doing right now is probably not what you want. As which() returns the index/posistion of an element in a vector or matrix what you are basically
doing is:
indices of the i'th row that are not NA (for a given column) - mean over these indices
While this is theoretically possible, this usually doesnt make much sense. So with all my notes at hand: clearly state your problem so we can think of a fix.

How to improve performance on counting columns in a matrix which are below a threshold?

In my code I am subtracting one column of a matrix from every other column of the same matrix.
Then I count how many of the new columns have only elements that are smaller than r.
I'm doing this for each column of the matrix. You can see my code below. I left out the part where I put values into the matrix.
Is there any way to improve the performance of this code? I can't seem to figure out a way to make this faster
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
for (i in 1:(window_step)){
B_m_r[i] = sum(apply(abs(B[,-i]-B[,i]), 2,function(x) max(x) < r))
}
Solution
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
buffer_B = matrix(NA,(window_step-1),(window_step-1))
for (i in 1:(window_step-2)){
buffer_B[i,c(i:(window_step-1))] = apply(abs(B[,-c(1:i)]-B[,i]),2,function(x) max(x) < r)
B_m_r[i] = (sum(buffer_B[i,c(i:(window_step-1))])+sum(buffer_B[1:i,i]))
}
B_m_r[window_step] = sum(buffer_B[1:(window_step-1),(window_step-1)])
B_m_r[window_step-1] = sum(buffer_B[1:(window_step-2),(window_step-2)])
Ok so based on the help from Яaffael I found a solution, that doesn't calculate the differences twice.
Instead I save the result of the comparison with r from previous loops in the matrix buffer_B and use them for the next loop to calculate the sum of all columns who are smaller than r.
Now the code takes only half the time to finish.
Thanks!
You can for example reduce the calculation time by 50% by only checking "< r" for half of the column differences because they are effectively symmetric.
You are calculating abs(first of B - last of B) and abs(last of B - first of B).
PLUS you can precalculate the handled difference matrix instead of using a for loop to set it up step by step.
# I am using single-row matrices to keep it simple
> A <- matrix(1:4,ncol=4)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 4 3
> A <- matrix(1:5,ncol=5)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2 3
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 5 4 3
> abs(A[,1:ceiling(ncol(A)/2)] - A[,ncol(A):(floor(ncol(A)/2)+1)])
[1] 4 2 0
When you want to speed up code in R then first thing you should try is to turn all loops into vectorized expressions using R functions. A loop will run within R. Vectorized function calls allow R to execute essentially compiled C code.

Resources