Related
I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}
I'm struggling to understand the source of difference in these outputs for a function I wrote that lengthens a vector to a desired length. In the first instance of the function I used variable assignment for current_length <- length(x):
lengthen_vector <- function(x, target_length){
repeat{
current_length <- length(x)
x <- append(x, current_length + 1, after = current_length)
current_length <- current_length + 1
if(current_length == target_length) {
return(x)
break
}
}
}
Which results as expected for a target length of 20 from a starting length of 10:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
However, when I change from variable assignment to calling the length() function throughout the vector_lengthen() function as shown below:
lengthen_vector <- function(x, target_length){
repeat{
x <- append(x, length(x) + 1, after = length(x))
length(x) <- length(x) + 1
if(length(x) == target_length) {
return(x)
break
}
}
}
...results in the following:
[1] 1 2 3 4 5 6 7 8 9 10 11 NA 13 NA 15 NA 17 NA 19 NA
What is the difference between these two that is causing this? I can't seem to locate it.
The meaning of length(x) <- n is to make the length of x to be n by either cutting it off or extending it with NAs. For example,
x <- 1:3
length(x) <- 4
x
## [1] 1 2 3 NA
so if in your second version x has 10 elements then after the first append is performed x will have 11 elements and then the length(x) <- length(x) + 1 will extend it to 12 elements by appending an NA.
Just omit the length(x) <- length(x) + 1 statement giving:
lengthen_vector1 <- function(x, target_length){
repeat{
x <- append(x, length(x) + 1, after = length(x))
if(length(x) == target_length) {
return(x)
break
}
}
}
There are still some additional improvements that can be made:
remove the break statement since it can never be reached given that it comes after a return statement. Alternately move the return statement to after the loop.
if the target_length is less than or equal to the length of x it will loop forever. This leaves open what it should do in that case. Let us assume that if the target_length is less than the length of x that we should return x unchanged. To do these items place the if statement before the append statement and fix the if so that it returns unless the target_length exceeds the length of x. Also, if that is done then the if and repeat can be consolidated into a while statement.
since the extra numbers are added to the end of x we can use c instead of append avoiding the third argument.
Thus we can write:
lengthen_vector2 <- function(x, target_length) {
while(length(x) < target_length) {
x <- c(x, length(x) + 1)
}
x
}
lengthen_vector2(1:10, 15)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
lengthen_vector2(1:10, 3)
## [1] 1 2 3 4 5 6 7 8 9 10
Also it could be done without loops by concatenating the required sequence to the end of x. We specify that the sequence ends in target_length and the length of the sequence is target_length - length(x) or 0 if negative.
lengthen_vector3 <- function(x, target_length) {
c(x, seq(to = target_length, length = max(target_length - length(x), 0)))
}
If we wanted to be able to shrink the length as well as expand it then call length_vector3 using head(x, target_length) instead of x.
lengthen_vector4 <- function(x, target_length) {
lengthen_vector3(head(x, target_length), target_length)
}
lengthen_vector4(1:10, 15)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
lengthen_vector4(1:10, 3)
## [1] 1 2 3
or combine the last two into a single function:
lengthen_vector5 <- function(x, target_length) {
c(head(x, target_length),
seq(to = target_length, length = max(target_length - length(x), 0)))
}
I have got this matrix below
k
[,1] [,2] [,3] ,4][,5] [,6]
[1,] 1 4 9 16 25 36
[2,] 1 3 7 13 21 31
[3,] 2 2 5 10 17 26
[4,] 4 2 4 8 14 22
[5,] 7 3 3 6 11 18
[6,] 11 5 3 5 9 15
and I want to loop through starting from k[1,1] and ending at k[6,6]. My looping criteria is based on min(k[i,j+1], k[i+1,j], k[i+1, j+1]) and the answer I hope to get is something like 1+1+2+2+3+3+5+9+15 = 41 (travelling through the minimum path)
So pretty much it calculates the minimum starting from k[1,1] and then continues downwards till k[6,6]
warpingDist = function(x, y, z){
mincal = numeric(length(k))
m = nrow(k)
n = ncol(k)
i=1
j=1
mincal = which(k == min(k[i, j+1], k[i+1, j], k[i+1, j+1]), arr.ind = TRUE)
indx = data.frame(mincal)
i= indx$row
j= indx$col
if(i != m || j!=n)
{
warpingDist(k[i, j+1], k[i+1, j], k[i+1, j+1])
}
warpSum = sum(mincal)
return(warpSum)
}
value = apply(k, c(1,2), warpingDist)
value
When I run this code it displays the below:
Error: object 'value' not found
Not sure why this is happening...
As you don't provide a minimal reproducible example, I can only guess:
warpingDist = function(x, y, z, k){
# browser() # This is a good option to activate, if you run your script in RStudio
...
return(warpSum)
}
# your code
k <- whatever it is
result <- warpingDist(x, y, z, k)
I hope that helps.
Am glad, I was finally able to solve the problem...The code runs fast as well
Problem: To find the minimum cost for a matrix. For clarity, let's assume I have the matrix given below:
[1,] 1 4 6 7 8 9 0
[2,] 10 12 1 3 11 2 0
[3,] 11 12 2 8 17 1 0
[4,] 20 1 18 4 28 1 0
[5,] 5 20 80 6 9 3 0
My goal is to add the minimum path distance starting from kata[1,1] first row to the last row K[5,4]. So effectively, I want to have something like 1 + 4 + 1 + 2 + 4 + 6 + 9 + 3.
Below is the R code which I have used to implement this. It implements two functions:
# Function that calculates minimum of three values. Returns the Value.
minFUN <- function(Data, a, b){
d = (min(Data[a, b+1], Data[a+1, b], Data[a+1, b+1]))
return(d)
}
# Function that calculates the index of the minimum value, from which the
# The next iteration begins
NextRC <- function(Data, a, b){
d = min(Data[a, b+1], Data[a+1, b], Data[a+1, b+1])
if(d == Data[a, b+1]){
c = cbind(a, b+1)
}else
if(d == Data[a+1, b]){
c = cbind(a+1, b)
} else
if(d == Data[a+1, b+1]){
c = cbind(a+1, b+1)
}
return(c)
}
Je <- c()
NewRow = 1
NewCol = 1
# Warping Function that uses both functions above to loop through the dataset
WarpDist <- function(Data, a = NewRow, b = NewCol){
for(i in 1:4) {
Je[i] = minFUN(Data, a, b)
# Next Start Point
NSP = NextRC(Data, a,b)
NewRow = as.numeric(NSP[1,1])
NewCol = as.numeric(NSP[1,2])
a = NewRow
b = NewCol
}
return(Je)
}
Value=WarpDist(Data = Data, a = NewRow, b = NewCol)
warpo = Data[1,1] + sum(Value)
w = sqrt(warpo)
The result is the minimum path from the first row to the last row
Value
[1] 4 1 2 4 6
The result omits 9 and 3 because its already on the last row.
Time:
Time difference of 0.08833408 secs
now I have a lot of matrices with the different number of rows. And I want to sum the odd-number rows and even number rows element respectivelylike below:
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
#I want to sum the odd-number rows and even number rows element respectively
i=1
kg <- NULL
while(i <= 2){
op<-unlist(Map(sum,o[i,],o[i+2,],o[i+4,]))
kg <- c(kg,op)
i=i+1
}
i=1
kg2 <- NULL
while(i <= 2){
op2<-unlist(Map(sum,o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10]))
kg2 <- c(kg2,op2)
i=i+1
}
kg
kg2 #the result should be a vector sequence like kg and kg2
> kg2
[1] 18 18 18 18 18 18 24 24 24 24 24 24
It is what I can do know. But my data have a lot of different length of columns. Is that any method I can do it quickly?
And how can I generate a sring like "o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10])" automatically according to the input number? Thank you for your help :)
Perhaps something like this?
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
even <- function(x) 2 * seq(1, nrow(x) / 2);
odd <- function(x) 2 * seq(1, nrow(x) / 2) - 1;
colSums(o[even(o), ]);
#[1] 12 12 12 12 12 12
colSums(o[odd(o), ]);
#[1] 9 9 9 9 9 9
colSums(o2[even(o2), ]);
#[1] 24 24 24 24 24 24
colSums(o2[odd(o2), ]);
#[1] 18 18 18 18 18 18
Explanation: even/odd return even/odd row indices of a matrix/data.frame; we can then use colSums to sum entries by column.
Update
To sum entries from rows 3, 6, 9, 12 (or any other sequence) you just need to define a corresponding function, e.g.
another_seq <- function(x) 3 * seq(1, nrow(x) / 3)
colSums(o2[another_seq(o2), ]);
#[1] 18 18 18 18 18 18
In the OP's loop, if we want to change the Map to make it more automatic
unlist(do.call(Map, c(f = sum, as.data.frame(t(o2[seq(i, i+10, by = 2),])))))
Using the full code
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
#I want to sum the odd-number rows and even number rows
i=1
kg <- NULL
while(i <= 2){
#op<-unlist(Map(sum,o[i,],o[i+2,],o[i+4,]))
op <- unlist(do.call(Map, c(f = sum,
as.data.frame(t(o[seq(i, i+4, by = 2),]))))) # change here
kg <- c(kg,op)
i=i+1
}
i=1
kg2 <- NULL
while(i <= 2){
#op2<-unlist(Map(sum,o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10]))
op2 <- unlist(do.call(Map, c(f = sum,
s.data.frame(t(o2[seq(i, i+10, by = 2),]))))) # change here
kg2 <- c(kg2,op2)
i=i+1
}
kg
#[1] 9 9 9 9 9 9 12 12 12 12 12 12
kg2
#[1] 18 18 18 18 18 18 24 24 24 24 24 24
In the OP's code, if we analyze the individual arguments of Map with just two arguments i.e. the first and 3rd row of 'o'
i <- 1
Map(function(x, y) c(x, y), o[i,], o[i+2,])
#[[1]]
#[1] 1 3
#[[2]]
#[1] 1 3
#[[3]]
#[1] 1 3
#[[4]]
#[1] 1 3
#[[5]]
#[1] 1 3
#[[6]]
#[1] 1 3
Here, each element of the list is the column values concatenated (c). If we need to get a similar structure, by subsetting the odd rows, we transpose the subset of rows, convert it to data.frame, so that each individual block is a column (that corresponds to the original rows subsetted)
do.call(Map, c(f=c, as.data.frame(t(o[c(i, i+2),]))))
#[[1]]
#V1 V2
# 1 3
#[[2]]
#V1 V2
# 1 3
#[[3]]
#V1 V2
# 1 3
#[[4]]
#V1 V2
# 1 3
#[[5]]
#V1 V2
# 1 3
#[[6]]
#V1 V2
# 1 3
Keeping it as a matrix will not solve it as it take the whole matrix as a single cell (a matrix is a vector with dimension attribute)
do.call(Map, c(f=c, o[c(i, i+2),]))
#[[1]]
#[1] 1 3 1 3 1 3 1 3 1 3 1 3
while using Map directly will loop through each element of the matrix (vector) instead of each column
Map(c, o[c(i, i+2),]) # check the output
Another option would be to split the object by col and then do the sum
onew <- o[seq(i, i+4, by = 2),]
Map(sum, split(onew, col(onew)))
The above approach is loopy, but we can also use vectorized approach (just like in the #Maurits Evers post). Instead of seq, here we are using the recycling of logical vector to subset the rows and then do the colSums
i1 <- c(TRUE, FALSE)
colSums(cbind(o[i1,], o[!i1,]))
#[1] 9 9 9 9 9 9 12 12 12 12 12 12
colSums(cbind(o2[i1,], o2[!i1,]))
#[1] 18 18 18 18 18 18 24 24 24 24 24 24
add <- c( 2,3,4)
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
print(z)
}
# Result
[1] 13
[1] 15
[1] 17
In R, it can print the result, but I want to save the results for further computation in a vector, data frame or list
Thanks in advance
Try something like:
add <- c(2, 3, 4)
z <- rep(0, length(add))
idx = 1
for(i in add) {
a <- i + 3
b <- a + 3
z[idx] <- a + b
idx <- idx + 1
}
print(z)
This is simple algebra, no need in a for loop at all
res <- (add + 3)*2 + 3
res
## [1] 13 15 17
Or if you want a data.frame
data.frame(a = add + 3, b = add + 6, c = (add + 3)*2 + 3)
# a b c
# 1 5 8 13
# 2 6 9 15
# 3 7 10 17
Though in general, when you are trying to something like that, it is better to create a function, for example
myfunc <- function(x) {
a <- x + 3
b <- a + 3
z <- a + b
z
}
myfunc(add)
## [1] 13 15 17
In cases when a loop is actually needed (unlike in your example) and you want to store its results, it is better to use *apply family for such tasks. For example, use lapply if you want a list back
res <- lapply(add, myfunc)
res
# [[1]]
# [1] 13
#
# [[2]]
# [1] 15
#
# [[3]]
# [1] 17
Or use sapply if you want a vector back
res <- sapply(add, myfunc)
res
## [1] 13 15 17
For a data.frame to keep all the info
add <- c( 2,3,4)
results <- data.frame()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- rbind(results, cbind(a,b,z))
}
results
a b z
1 5 8 13
2 6 9 15
3 7 10 17
If you just want z then use a vector, no need for lists
add <- c( 2,3,4)
results <- vector()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- c(results, z)
}
results
[1] 13 15 17
It might be instructive to compare these two results with those of #dugar:
> sapply(add, function(x) c(a=x+3, b=a+3, z=a+b) )
[,1] [,2] [,3]
a 5 6 7
b 10 10 10
z 17 17 17
That is the result of lazy evaluation and sometimes trips us up when computing with intermediate values. This next one should give a slightly more expected result:
> sapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[,1] [,2] [,3]
a 5 6 7
b 8 9 10
z 13 15 17
Those results are the transpose of #dugar. Using sapply or lapply often saves you the effort off setting up a zeroth case object and then incrementing counters.
> lapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[[1]]
a b z
5 8 13
[[2]]
a b z
6 9 15
[[3]]
a b z
7 10 17