Using the following code:
a <- seq(1, 10, 1)
b <- seq(2, 20, 2)
I would like to subtract a[i - 1] from b[i] for each i, in order to obtain something like
c <- NULL
for(i in 1:length(b)) {
c[i] <- b[i] - a[i - 1]
}
but I would like to do this without using for() loop.
Anyone knows how to do it in just one command line?
Since your a and b are the same length, I've assumed you'd like to first trim the last element off of b. (Try b - a[-1] to see why that's probably desirable.)
b[-length(b)] - a[-1]
# [1] 0 1 2 3 4 5 6 7 8
You can do this with time series:
a <- ts(seq(1, 10, 1))
b <- ts(seq(2, 20, 2))
b- lag(a,1)
##-----
Time Series:
Start = 1
End = 9
Frequency = 1
[1] 0 1 2 3 4 5 6 7 8
Not that I am necessarily recommending this. The base time-series formalism is a widely feared source of confusion. Most people avoid it, giving preference to the zoo and xts classed objects.
Related
I am attempting to create a loop that runs a function with specific values of i in a vector:
For example I would like to save i + 2 for when i is 1 and 5
test<- c()
for(i in c(1,5)){
test[i] <- i + 2
}
This ends up printing NA for 2 ,3 and 4:
[1] 3 NA NA NA 7
while the result I would like is:
[1] 3 7
This is probably very elementary but I cannot seem to figure this out.
R is vectorized, means you can do this:
c(1, 5) + 2
# [1] 3 7
for loops in R are often very slow, which is why they are implemented in C in functions of the *apply family, e.g.
sapply(c(1, 5), \(i) i + 2)
# [1] 3 7
If you really need to rely on a for loop, If you really need to rely on a "for" loop, you may want to loop over the indices rather than the values (a quite common mistake!):
v <- c(1, 5)
test <- vector('numeric', length(v))
for (i in seq_along(v)) {
test[i] <- v[i] + 2
}
test
# [1] 3 7
Use append
test<- c()
for(i in c(1,5)){
test<-append(test,i+2)
}
I am working with the R programming language. I am trying to build a loop that performs the following :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
Since I do not know how to keep generating random numbers until a condition is met, I tried to generate a large amount of random numbers hoping that the condition is met (there is probably a better way to write this):
results <- list()
for (i in 1:100){
# do until break
repeat {
# repeat many random numbers
a = rnorm(10000,10,1)
b = rnorm(10000,10,1)
# does any pair meet the requirement
if (any(a > 12 & b > 12)) {
# put it in a data.frame
d_i = data.frame(a,b)
# end repeat
break
}
}
# select all rows until the first time the requirement is met
# it must be met, otherwise the loop would not have ended
d_i <- d_i[1:which(d_i$a > 10 & d_i$b > 10)[1], ]
# prep other variables and only keep last row (i.e. the row where the condition was met)
d_i$index = seq_len(nrow(d_i))
d_i$iteration = as.factor(i)
e_i = d_i[nrow(d_i),]
results[[i]] <- e_i
}
results_df <- do.call(rbind.data.frame, results)
Problem: When I look at the results, I noticed that the loop is incorrectly considering the condition to be met, for example:
head(results_df)
a b index iteration
4 10.29053 10.56263 4 1
5 10.95308 10.32236 5 2
3 10.74808 10.50135 3 3
13 11.87705 10.75067 13 4
1 10.17850 10.58678 1 5
14 10.14741 11.07238 1 6
For instance, in each one of these rows - both "a" and "b" are smaller than 12.
Does anyone know why this is happening and can someone please show me how to fix this problem?
Thanks!
How about this way? As you tag while-loop, I tried using it.
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
dim(res)
[1] 100 3
Good morning,
I have the following problem.
My Data.frame "data" has the format:
Type amount
1 2
2 0
3 3
I would like to create a vector with the format:
1
1
3
3
3
This means I would like to transform my data.
I created a vector and wrote the following code for my transformation in R:
vector <- numeric(5)
for (i in 1:3){
k <- 1
while (k <= data[i,2]){
vector[k] <- data[i,1]
k <- k+1
}
}
The problem is, I get the following results and I have no Idea at which part I go wrong…
3
3
3
0
0
There might be many different ways in solving this particular problem in R but I am curious why my solution doesn't work. I am thankful for alternatives, but really would like to know what my mistake is.
Thank's for your help!
Try this solution:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
result <- unlist(mapply(function(x, y) rep.int(x, y), df[, "type"], df[, "amount"]))
result
Output is following:
# [1] 1 1 3 3 3
Exaclty your code is buggy. Correct code should looks following:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
vector <- numeric(5)
k <- 1
for (i in 1:3) {
j <- 1
while (j <= df[i, 2]) {
vector[k] <- df[i, 1]
k <- k + 1
j <- j + 1
}
}
vector
# [1] 1 1 3 3 3
Probably the fastest and most elegant way to obtain this result has been posted before in a comment by #akrun:
with(data, rep(Type, amount))
[1] 1 1 3 3 3
However, if you want to do this with for/while loops, it could be helpful to use a list for such cases, where the number of entries is not known at the beginning.
Here is an example with minimal modifications of your code:
my_list <- vector("list", 3)
for (i in 1:3) {
k <- 1
while (k <= data[i,2]){
my_list[[i]][k] <- data[i,1]
k <- k + 1
}
}
vector <- unlist(my_list)
#> vector
#[1] 1 1 3 3 3
The reason why your code didn't work was essentially that you were trying to put too much information into a single variable, k. It cannot serve as both, an index of your output vector, and as a counter for the individual entries in the first column of data; a counter which is reset to 1 each time the while loop has finished.
I have two vectors, A and B. For every element in A I want to find the index of the first element in B that is greater and has higher index. The length of A and B are the same.
So for vectors:
A <- c(10, 5, 3, 4, 7)
B <- c(4, 8, 11, 1, 5)
I want a result vector:
R <- c(3, 3, 5, 5, NA)
Of course I can do it with two loops, but it's very slow, and I don't know how to use apply() in this situation, when the indices matter. My data set has vectors of length 20000, so the speed is really important in this case.
A few bonus questions:
What if I have a sequence of numbers (like seq = 2:10), and I want to find the first number in B that is higher than a+s for every a of A and every s of seq.
Like with question 1), but I want to know the first greater, and the first lower value, and create a matrix, which stores which one was first. So for example I have a of A, and 10 from seq. I want to find the first value of B, which is higher than a+10, or lower than a-10, and then store it's index and value.
sapply(sapply(seq_along(a),function(x) which(b[-seq(x)]>a[x])+x),"[",1)
[1] 3 3 5 5 NA
This is a great example of when sapply is less efficient than loops.
Although the sapply does make the code look neater, you are paying for that neatness with time.
Instead you can wrap a while loop inside a for loop inside a nice, neat function.
Here are benchmarks comparing a nested-apply loop against nested for-while loop (and a mixed apply-while loop, for good measure). Update: added the vapply..match.. mentioned in comments. Faster than sapply, but still much slower than while loop.
BENCHMARK:
test elapsed relative
1 for.while 0.069 1.000
2 sapply.while 0.080 1.159
3 vapply.match 0.101 1.464
4 nested.sapply 0.104 1.507
Notice you save a third of your time; The savings will likely be larger when you start adding the sequences to A.
For the second part of your question:
If you have this all wrapped up in an nice function, it is easy to add a seq to A
# Sample data
A <- c(10, 5, 3, 4, 7, 100, 2)
B <- c(4, 8, 11, 1, 5, 18, 20)
# Sample sequence
S <- seq(1, 12, 3)
# marix with all index values (with names cleaned up)
indexesOfB <- t(sapply(S, function(s) findIndx(A+s, B)))
dimnames(indexesOfB) <- list(S, A)
Lastly, if you want to instead find values of B less than A, just swap the operation in the function.
(You could include an if-clause in the function and use only a single function. I find it more efficient
to have two separate functions)
findIndx.gt(A, B) # [1] 3 3 5 5 6 NA 8 NA NA
findIndx.lt(A, B) # [1] 2 4 4 NA 8 7 NA NA NA
Then you can wrap it up in one nice pacakge
rangeFindIndx(A, B, S)
# A S indxB.gt indxB.lt
# 10 1 3 2
# 5 1 3 4
# 3 1 5 4
# 4 1 5 NA
# 7 1 6 NA
# 100 1 NA NA
# 2 1 NA NA
# 10 4 6 4
# 5 4 3 4
# ...
FUNCTIONS
(Notice they depend on reshape2)
rangeFindIndx <- function(A, B, S) {
# For each s in S, and for each a in A,
# find the first value of B, which is higher than a+s, or lower than a-s
require(reshape2)
# Create gt & lt matricies; add dimnames for melting function
indexesOfB.gt <- sapply(S, function(s) findIndx.gt(A+s, B))
indexesOfB.lt <- sapply(S, function(s) findIndx.lt(A-s, B))
dimnames(indexesOfB.gt) <- dimnames(indexesOfB.gt) <- list(A, S)
# melt the matricies and combine into one
gtltMatrix <- cbind(melt(indexesOfB.gt), melt(indexesOfB.lt)$value)
# clean up their names
names(gtltMatrix) <- c("A", "S", "indxB.gt", "indxB.lt")
return(gtltMatrix)
}
findIndx.gt <- function(A, B) {
lng <- length(A)
ret <- integer(0)
b <- NULL
for (j in seq(lng-1)) {
i <- j + 1
while (i <= lng && ((b <- B[[i]]) < A[[j]]) ) {
i <- i + 1
}
ret <- c(ret, ifelse(i<lng, i, NA))
}
c(ret, NA)
}
findIndx.lt <- function(A, B) {
lng <- length(A)
ret <- integer(0)
b <- NULL
for (j in seq(lng-1)) {
i <- j + 1
while (i <= lng && ((b <- B[[i]]) > A[[j]]) ) { # this line contains the only difference from findIndx.gt
i <- i + 1
}
ret <- c(ret, ifelse(i<lng, i, NA))
}
c(ret, NA)
}
I'm trying to find the the point at which participants reach 8 contiguous responses in a row that are greater than 3. For example:
x <- c(2,2,4,4,4,4,5,5,5,5,7)
i want to return the value 10.
i tried the code (Thanks #DWin):
which( rle(x)$values>3 & rle(x)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
the problem with the above code is that it only works if the responses are all identical and greater than 3. thus the code returns a zero.
if:
x <- c(2,2,4,4,4,4,4,4,4,4,7)
the code works fine. but this isn't how my data looks.
Thanks in advance!
Why don't you create a new vector that contains the identical values that rle needs to work properly? You can use ifelse() for this and put everything into a function:
FUN <- function(x, value, runlength) {
x2 <- ifelse(x > value, 1, 0)
ret <- sum(rle(x2)$lengths[ 1:(min(which(rle(x2)$lengths >= runlength))-1) ]) + runlength
return(ret)
}
> FUN(x, value = 3, runlength = 8)
[1] 10
You could just convert your data so that the responses are only coded discriminating the measure of interest (greater than 3) and then your code will work as it is replacing x with x1.
x1 <- ifelse( x > 3, 4, 0 )
But if I was already doing this I might rewrite the code slightly more clearly this way.
runl <- rle(x1)
i <- which( runl$length > 8 & runl$value > 3 )[1]
sum( runl$length[1:(i-1)] ) + 8
Here's a vectorized way of doing it with just cumsum and cummax. Let's take an example that has a short (less than length 8) sequence of elements greater than 3 as well as a long one, just to make sure it's doing the right thing.
> x <- c(2,2,4,5,6,7,2,2,4,9,8,7,6,5,4,5,6,9,2,2,9)
> x3 <- x > 3
> cumsum(x3) - cummax(cumsum(x3)*(!x3))
[1] 0 0 1 2 3 4 0 0 1 2 3 4 5 6 7 8 9 10 0 0 1
> which( cumsum(x3) - cummax(cumsum(x3)*(!x3)) == 8)[1]
[1] 16