Minimum Cost Distance in Matrix - r

I have got this matrix below
k
[,1] [,2] [,3] ,4][,5] [,6]
[1,] 1 4 9 16 25 36
[2,] 1 3 7 13 21 31
[3,] 2 2 5 10 17 26
[4,] 4 2 4 8 14 22
[5,] 7 3 3 6 11 18
[6,] 11 5 3 5 9 15
and I want to loop through starting from k[1,1] and ending at k[6,6]. My looping criteria is based on min(k[i,j+1], k[i+1,j], k[i+1, j+1]) and the answer I hope to get is something like 1+1+2+2+3+3+5+9+15 = 41 (travelling through the minimum path)
So pretty much it calculates the minimum starting from k[1,1] and then continues downwards till k[6,6]
warpingDist = function(x, y, z){
mincal = numeric(length(k))
m = nrow(k)
n = ncol(k)
i=1
j=1
mincal = which(k == min(k[i, j+1], k[i+1, j], k[i+1, j+1]), arr.ind = TRUE)
indx = data.frame(mincal)
i= indx$row
j= indx$col
if(i != m || j!=n)
{
warpingDist(k[i, j+1], k[i+1, j], k[i+1, j+1])
}
warpSum = sum(mincal)
return(warpSum)
}
value = apply(k, c(1,2), warpingDist)
value
When I run this code it displays the below:
Error: object 'value' not found
Not sure why this is happening...

As you don't provide a minimal reproducible example, I can only guess:
warpingDist = function(x, y, z, k){
# browser() # This is a good option to activate, if you run your script in RStudio
...
return(warpSum)
}
# your code
k <- whatever it is
result <- warpingDist(x, y, z, k)
I hope that helps.

Am glad, I was finally able to solve the problem...The code runs fast as well
Problem: To find the minimum cost for a matrix. For clarity, let's assume I have the matrix given below:
[1,] 1 4 6 7 8 9 0
[2,] 10 12 1 3 11 2 0
[3,] 11 12 2 8 17 1 0
[4,] 20 1 18 4 28 1 0
[5,] 5 20 80 6 9 3 0
My goal is to add the minimum path distance starting from kata[1,1] first row to the last row K[5,4]. So effectively, I want to have something like 1 + 4 + 1 + 2 + 4 + 6 + 9 + 3.
Below is the R code which I have used to implement this. It implements two functions:
# Function that calculates minimum of three values. Returns the Value.
minFUN <- function(Data, a, b){
d = (min(Data[a, b+1], Data[a+1, b], Data[a+1, b+1]))
return(d)
}
# Function that calculates the index of the minimum value, from which the
# The next iteration begins
NextRC <- function(Data, a, b){
d = min(Data[a, b+1], Data[a+1, b], Data[a+1, b+1])
if(d == Data[a, b+1]){
c = cbind(a, b+1)
}else
if(d == Data[a+1, b]){
c = cbind(a+1, b)
} else
if(d == Data[a+1, b+1]){
c = cbind(a+1, b+1)
}
return(c)
}
Je <- c()
NewRow = 1
NewCol = 1
# Warping Function that uses both functions above to loop through the dataset
WarpDist <- function(Data, a = NewRow, b = NewCol){
for(i in 1:4) {
Je[i] = minFUN(Data, a, b)
# Next Start Point
NSP = NextRC(Data, a,b)
NewRow = as.numeric(NSP[1,1])
NewCol = as.numeric(NSP[1,2])
a = NewRow
b = NewCol
}
return(Je)
}
Value=WarpDist(Data = Data, a = NewRow, b = NewCol)
warpo = Data[1,1] + sum(Value)
w = sqrt(warpo)
The result is the minimum path from the first row to the last row
Value
[1] 4 1 2 4 6
The result omits 9 and 3 because its already on the last row.
Time:
Time difference of 0.08833408 secs

Related

R how to find a series of common values in a vector (identifying growing season)

I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}

R: loop exercise without NAs on output

I am practicing a simple R loop. From a vector "m" with values 1 to 20, i want to create a loop that save a selected value in a object"a" and the remaining values in object "b".
This is what i did:
a=NULL
b=NULL
m <- c(1:20)
for (i in m)
if (i == 4){
a[[i]] <- i
} else {
b[[i]] <- i
}
This is the output:
> a
[1] NA NA NA 4
> b
[1] 1 2 3 NA 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
My question is: how can i improve my loop code so the output does not show NAs, and without using function "na.omit"?
Thanks
a=NULL
b=NULL
m <- c(1:20)
for (i in m){
if (i == 4){
a <- i
} else {
b <- append(b, i)
}
}
This will put a single value (in this case 4) in object a, and will consecutively add the other values to b.
Result:
> a
[1] 4
> b
[1] 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Another way to solve it is with vector operations. We doesn't need to do a loop to solve some problems about classification.
In your case, we can use:
m = c(1:20)
subset_with = m[m == 4] # It returns the values with the maching (m == 4)
subset_without = m[m != 4] # It returns the values with the maching (m != 4)
I hope this helps you.

iterative cumsum where sum determines the next position to be added

I have a data.table as follows
set.seed(5)
x <- data.table(x=sample(1:20,15))
> x
x
1: 5
2: 14
3: 17
4: 20
5: 2
6: 11
7: 8
8: 15
9: 12
10: 16
11: 3
12: 18
13: 10
14: 4
15: 13
and I would like to start at 1 and cumulate values iteratively such that the value of cumsum() determines the next number to be added to the sum.
In the example I want to add the first value of x, here 5, then jump to value number 5 and add that, here 2, then jump to value number 5+2=7, here 8, then value number 5+2+8=15, here 13.
That is, I want to get a vector
> res
[1] 1 5 7 15
Has anyone any idea for this problem?
We can use Reduce with accumulate = TRUE
accum <- Reduce(function(i, j) i + x$x[i], x$x, accumulate = TRUE)
c(1, accum[!is.na(accum)])
# [1] 1 5 7 15 28
or purrr::accumulate
library(purrr)
accum <- accumulate(x$x, ~ .x + x$x[.x])
c(1, accum[!is.na(accum)])
# [1] 1 5 7 15 28
A base R solution:
i = 1
v = i
sum = 0
while (i <= nrow(x)) {
v = c(v, i)
sum = sum + x$x[i]
i = sum
}
Here's a function that takes how long you want your vector to be and produces a vector of that length:
recursiveadd<-function(x, n) {k<-x$x[1]
for (i in 1:(n-1)) {
k[i+1]<-sum(x$x[k[i]],k[i])
}
k
}
recursiveadd(x,4)
[1] 5 7 15 28

storing results of a for function in list or

add <- c( 2,3,4)
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
print(z)
}
# Result
[1] 13
[1] 15
[1] 17
In R, it can print the result, but I want to save the results for further computation in a vector, data frame or list
Thanks in advance
Try something like:
add <- c(2, 3, 4)
z <- rep(0, length(add))
idx = 1
for(i in add) {
a <- i + 3
b <- a + 3
z[idx] <- a + b
idx <- idx + 1
}
print(z)
This is simple algebra, no need in a for loop at all
res <- (add + 3)*2 + 3
res
## [1] 13 15 17
Or if you want a data.frame
data.frame(a = add + 3, b = add + 6, c = (add + 3)*2 + 3)
# a b c
# 1 5 8 13
# 2 6 9 15
# 3 7 10 17
Though in general, when you are trying to something like that, it is better to create a function, for example
myfunc <- function(x) {
a <- x + 3
b <- a + 3
z <- a + b
z
}
myfunc(add)
## [1] 13 15 17
In cases when a loop is actually needed (unlike in your example) and you want to store its results, it is better to use *apply family for such tasks. For example, use lapply if you want a list back
res <- lapply(add, myfunc)
res
# [[1]]
# [1] 13
#
# [[2]]
# [1] 15
#
# [[3]]
# [1] 17
Or use sapply if you want a vector back
res <- sapply(add, myfunc)
res
## [1] 13 15 17
For a data.frame to keep all the info
add <- c( 2,3,4)
results <- data.frame()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- rbind(results, cbind(a,b,z))
}
results
a b z
1 5 8 13
2 6 9 15
3 7 10 17
If you just want z then use a vector, no need for lists
add <- c( 2,3,4)
results <- vector()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- c(results, z)
}
results
[1] 13 15 17
It might be instructive to compare these two results with those of #dugar:
> sapply(add, function(x) c(a=x+3, b=a+3, z=a+b) )
[,1] [,2] [,3]
a 5 6 7
b 10 10 10
z 17 17 17
That is the result of lazy evaluation and sometimes trips us up when computing with intermediate values. This next one should give a slightly more expected result:
> sapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[,1] [,2] [,3]
a 5 6 7
b 8 9 10
z 13 15 17
Those results are the transpose of #dugar. Using sapply or lapply often saves you the effort off setting up a zeroth case object and then incrementing counters.
> lapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[[1]]
a b z
5 8 13
[[2]]
a b z
6 9 15
[[3]]
a b z
7 10 17

Combining vectors of unequal length and non-unique values

I would like to do the following:
combine into a data frame, two vectors that
have different length
contain sequences found also in the other vector
contain sequences not found in the other vector
sequences that are not found in other vector are never longer than 3 elements
always have same first element
The data frame should show the equal sequences in the two vectors aligned, with NA in the column if a vector lacks a sequence present in the other vector.
For example:
vector 1 vector 2 vector 1 vector 2
1 1 a a
2 2 g g
3 3 b b
4 1 or h a
1 2 a g
2 3 g b
5 4 c h
5 c
should be combined into data frame
1 1 a a
2 2 g g
3 3 b b
4 NA h NA
1 1 or a a
2 2 g g
NA 3 NA b
NA 4 NA h
5 5 c c
What I did, is to search for merge, combine, cbind, plyr examples but was not able to find solutions. I am afraid I will need to start write a function with nested for loops to solve this problem.
Note - this was proposed as an answer to the first version of the OP. The question has been modified since then but the problem is still not well-defined in my opinion.
Here is a solution that works with your integer example and would also work with numeric vectors. I am also assuming that:
both vectors contain the same number of sequences
a new sequence starts where value[i+1] <= value[i]
If your vectors are non-numeric or if one of my assumptions does not fit your problem, you'll have to clarify.
v1 <- c(1,2,3,4,1,2,5)
v2 <- c(1,2,3,1,2,3,4,5)
v1.sequences <- split(v1, cumsum(c(TRUE, diff(v1) <= 0)))
v2.sequences <- split(v2, cumsum(c(TRUE, diff(v2) <= 0)))
align.fun <- function(s1, s2) { #aligns two sequences
s12 <- sort(unique(c(s1, s2)))
cbind(ifelse(s12 %in% s1, s12, NA),
ifelse(s12 %in% s2, s12, NA))
}
do.call(rbind, mapply(align.fun, v1.sequences, v2.sequences))
# [,1] [,2]
# [1,] 1 1
# [2,] 2 2
# [3,] 3 3
# [4,] 4 NA
# [5,] 1 1
# [6,] 2 2
# [7,] NA 3
# [8,] NA 4
# [9,] 5 5
I maintain that your problem might be solved in terms of the shortest common supersequence. It assumes that your two vectors each represent one sequence. Please give the code below a try.
If it still does not solve your problem, you'll have to explain exactly what you mean by "my vector contains not one but many sequences": define what you mean by a sequence and tell us how sequences can be identified by scanning through your two vectors.
Part I: given two sequences, find the longest common subsequence
LongestCommonSubsequence <- function(X, Y) {
m <- length(X)
n <- length(Y)
C <- matrix(0, 1 + m, 1 + n)
for (i in seq_len(m)) {
for (j in seq_len(n)) {
if (X[i] == Y[j]) {
C[i + 1, j + 1] = C[i, j] + 1
} else {
C[i + 1, j + 1] = max(C[i + 1, j], C[i, j + 1])
}
}
}
backtrack <- function(C, X, Y, i, j) {
if (i == 1 | j == 1) {
return(data.frame(I = c(), J = c(), LCS = c()))
} else if (X[i - 1] == Y[j - 1]) {
return(rbind(backtrack(C, X, Y, i - 1, j - 1),
data.frame(LCS = X[i - 1], I = i - 1, J = j - 1)))
} else if (C[i, j - 1] > C[i - 1, j]) {
return(backtrack(C, X, Y, i, j - 1))
} else {
return(backtrack(C, X, Y, i - 1, j))
}
}
return(backtrack(C, X, Y, m + 1, n + 1))
}
Part II: given two sequences, find the shortest common supersequence
ShortestCommonSupersequence <- function(X, Y) {
LCS <- LongestCommonSubsequence(X, Y)[c("I", "J")]
X.df <- data.frame(X = X, I = seq_along(X), stringsAsFactors = FALSE)
Y.df <- data.frame(Y = Y, J = seq_along(Y), stringsAsFactors = FALSE)
ALL <- merge(LCS, X.df, by = "I", all = TRUE)
ALL <- merge(ALL, Y.df, by = "J", all = TRUE)
ALL <- ALL[order(pmax(ifelse(is.na(ALL$I), 0, ALL$I),
ifelse(is.na(ALL$J), 0, ALL$J))), ]
ALL$SCS <- ifelse(is.na(ALL$X), ALL$Y, ALL$X)
ALL
}
Your Example:
ShortestCommonSupersequence(X = c("a","g","b","h","a","g","c"),
Y = c("a","g","b","a","g","b","h","c"))
# J I X Y SCS
# 1 1 1 a a a
# 2 2 2 g g g
# 3 3 3 b b b
# 9 NA 4 h <NA> h
# 4 4 5 a a a
# 5 5 6 g g g
# 6 6 NA <NA> b b
# 7 7 NA <NA> h h
# 8 8 7 c c c
(where the two updated vectors are in columns X and Y.)

Resources