I am trying to create a function that takes a vector and creates two sliding matrix, like bellow:
Input, Output
[d01, d02, d03, d04, d05, d06, d07], [d08, d09, d10, d11, d12, d13, d14]
[d02, d03, d04, d05, d06, d07, d08], [d09, d10, d11, d12, d13, d14, d15]
...
I tried to adapt a Python code to R but I am having some problems and I cannot find the error (I am not used to R)
This is the R code:
create_dataset = function(data, n_input, n_out){
dataX = c()
dataY = c()
in_start = 0
for (i in 1:range(length(data))) {
#define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
if(out_end <= length(data)){
x_input = data[in_start:in_end, 1]
X = append(x_input)
y = append(data[in_end:out_end], 1)
}
#move along one time step
in_start = in_start + 1
}
X; Y
}
I got this error when calling this function
> create_dataset(data, n_input = 5, n_out = 5)
Error in data[in_start:in_end, 1] : incorrect number of dimensions
In addition: Warning message:
In 1:range(length(data)) :
numerical expression has 2 elements: only the first used
EDIT:
Adding the Python code I trying to adapt to R
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out):
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 1))
X.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
Here are two approaches. Also see Lagging time series data
1) Normally in R one takes the whole object approach rather than iterating over indexes. Now, assuming inputs v, k1 and k2 we compute e as the sliding matrix with k1+k2 columns. Then first k1 columns is the first matrix and the remaining columns is the second.
# inputs
v <- 1:12 # 1, 2, ..., 12
k1 <- k2 <- 3
k <- k1 + k2
e <- embed(v, k)[, k:1]
ik1 <- 1:k1
e[, ik1]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 2 3 4
## [3,] 3 4 5
## [4,] 4 5 6
## [5,] 5 6 7
## [6,] 6 7 8
## [7,] 7 8 9
e[, -ik1]
## [,1] [,2] [,3]
## [1,] 4 5 6
## [2,] 5 6 7
## [3,] 6 7 8
## [4,] 7 8 9
## [5,] 8 9 10
## [6,] 9 10 11
## [7,] 10 11 12
2) Regarding the R code in the question:
in R the range function takes a vector input and returns a 2 element vector of the minimum and maximum so it is not what is wanted in the for loop, use seq_along instead
indexes in R start at 1 rather than 0
the return value of a function must be a single object. We return a two element list of matrices.
iteratively appending to an object is inefficient in R. This can be addressed by preallocating the result or not using a loop; however, we don't address this problem below as we already have a better implementation above in (1).
there was inconsistent naming of variables in the question's code
Although this entire approach is not how one would normally write R software, in order to make the minimal changes to get it to work we can write the following.
# data is plain vector, n_input and n_out are scalars
# result is 2 element list of matrices
create_dataset = function(data, n_input, n_out){
X <- matrix(nrow = 0, ncol = n_input)
Y <- matrix(nrow = 0, ncol = n_out)
in_start <- 0
for (i in seq_along(data)) {
#define the end of the input sequence
in_end <- in_start + n_input
out_end <- in_end + n_out
if(out_end <= length(data)){
X <- rbind(X, data[(in_start+1):in_end])
Y <- rbind(Y, data[(in_end+1):out_end])
}
#move along one time step
in_start = in_start + 1
}
list(X, Y)
}
# inputs defined in (1)
create_dataset(v, k1, k2)
giving this two element list of matrices:
[[1]]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 6
[5,] 5 6 7
[6,] 6 7 8
[7,] 7 8 9
[[2]]
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 5 6 7
[3,] 6 7 8
[4,] 7 8 9
[5,] 8 9 10
[6,] 9 10 11
[7,] 10 11 12
Related
I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1
I am trying to solve the following problem in R. Generically, given a sequence [a,b], I am to generate lists from this sequence that have a length n, whose elements pairwise at least have a difference of d.
I was thinking of using seq() but you can only create evenly-spaced sequences using this function.
This may be what you are after, generate all permutations of the possible different values that could exist in the sequence for size n and then check which satisfy your requirements of having their terminal value be b.
This is quite intensive and slow for larger vectors, but should return all possible valid sequences (unless I've made a mistake).
# sequence length of n which includes a, b
# therefore need to find n - 1 values (then check that last val of cumsum == b)
# vals must be greater than or equal to d
# vals have upper bound is if all but one value was d, b - ((n - 1) * d)
library(gtools)
library(matrixStats)
# parameters
a = 1
b = 20
n = 5
d = 2
# possible values that differences can be
poss_diffs <- d:(b - ((n - 1) * d))
# generate all possible permutations of differences
diff_perms_n <- permutations(n = length(poss_diffs), r = n - 1, v = poss_diffs)
# turn differences into sequences, add column for the a value
seqs_n <- matrixStats::rowCumsums(cbind(a, diff_perms_n))
# filter to only valid sequences, last column == b
valid_seqs <- seqs_n[seqs_n[, ncol(seqs_n)] == b, ]
# check that diffs are all greater than d
valid_seqs_diffs <- matrixStats::rowDiffs(valid_seqs)
print(head(valid_seqs))
print(head(valid_seqs_diffs))
# > print(head(valid_seqs))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 6 10 20
# [2,] 1 3 6 11 20
# [3,] 1 3 6 12 20
# [4,] 1 3 6 14 20
# [5,] 1 3 6 15 20
# [6,] 1 3 6 16 20
# > print(head(valid_seqs_diffs))
# [,1] [,2] [,3] [,4]
# [1,] 2 3 4 10
# [2,] 2 3 5 9
# [3,] 2 3 6 8
# [4,] 2 3 8 6
# [5,] 2 3 9 5
# [6,] 2 3 10 4
I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.
Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5 and total 5, it should return 5,2,3, and 1,1,3.
I was trying to use combn but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?
This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.
Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:
vec <- c(1,1,2,3,5) ## using variables from #r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5
library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})
[[1]]
[,1]
[1,] 5
[[2]]
[,1] [,2]
[1,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 1 3
[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4
These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:
## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))
rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...
bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12
comboCount(bigUni, len, freqs = bigRep)
[1] 32248100
All 300000+ results are returned very quickly:
system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271
head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30
nrow(bigTest)
[1] 280018
all(rowSums(bigTest) == bigAns)
[1] TRUE
Addendum
I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).
I took your combn idea and looped over the possible sizes of the sets.
func = function(x, total){
M = length(x)
y = NULL
total = 15
for (m in 1:M){
tmp = combn(x, m)
ind = which(colSums(tmp) == total)
if (length(ind) > 0){
for (j in 1:length(ind))
y = c(y, list(tmp[,ind[j]]))
}
}
return (unique(lapply(y, sort)))
}
x = c(1,1,2,3,5,8,13)
> func(x, 15)
[[1]]
[1] 2 13
[[2]]
[1] 1 1 13
[[3]]
[1] 2 5 8
[[4]]
[1] 1 1 5 8
[[5]]
[1] 1 1 2 3 8
Obviously, this will have problems as M grows since tmp will get big pretty quickly and the length of y can't be (maybe?) pre-determined.
Similar to mickey's answer, we can use combn inside another looping mechanism. I'll use lapply:
vec <- c(1,1,2,3,5)
ans <- 5
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 5
# [[2]]
# [,1]
# [1,] 2
# [2,] 3
# [[3]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 3
You can omit the Filter(length, portion, though it may return a number of empty matrices. They're easy enough to deal with and ignore, I just thought removing them would be aesthetically preferred.
This method gives you a matrix with multiple candidates in each column, so
ans <- 4
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1] [,2]
# [1,] 1 1
# [2,] 3 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2
If duplicates are a problem, you can always do:
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v <- v[, colSums(v) == ans, drop = FALSE]
v[,!duplicated(t(v)),drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 1
# [2,] 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2
Now here is a solution involving gtools:
# Creating lists of all permutations of the vector x
df1 <- gtools::permutations(n=length(x),r=length(x),v=1:length(x),repeats.allowed=FALSE)
ls1 <- list()
for(j in 1:nrow(df1)) ls1[[j]] <- x[df1[j,1:ncol(df1)]]
# Taking all cumulative sums and filtering entries equaling our magic number
sumsCum <- t(vapply(1:length(ls1), function(j) cumsum(ls1[[j]]), numeric(length(x))))
indexMN <- which(sumsCum == magicNumber, arr.ind = T)
finalList <- list()
for(j in 1:nrow(indexMN)){
magicRow <- indexMN[j,1]
magicCol <- 1:indexMN[j,2]
finalList[[j]] <- ls1[[magicRow]][magicCol]
}
finalList <- unique(finalList)
where x = c(1,1,2,3,5) and magicNumber = 5. This is a first draft, I am sure it can be improved here and there.
Not the most efficient but the most compact so far:
x <- c(1,1,2,3,5)
n <- length(x)
res <- 5
unique(combn(c(x,rep(0,n-1)), n, function(x) x[x!=0][sum(x)==res], FALSE))[-1]
# [[1]]
# [1] 1 1 3
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 5
#
I have a matrix of 7000 rows x 160 columns, i want to take the average of 20 values in each row to make it 1 value i.e. column avg(1:20) = 1st new value, avg(21:40) = 2nd new value ..... avg(141:160) 8th and last new value for row 1, will do the same foe all rows, at the end my matrix will be 7000 x 8 ie. 160/20 = 8. what is the fastest way to archive this in R?
eg. at the end
1st 2 4 5 6 7 7 9 4
2nd 3 6 5 3 6 7 4 3
...............
7000th 5 6 7 4 5 6 7 6
i tried this, it works but too slow!
res <- matrix(T, nrow = 7000, ncol = 8)
for (i in 1:nrow(m)){
s <- 0
k <- 1
for (j in 1:ncol(m)){
s <- s + m[i,j]
if(j %% 20 == 0){
a <- s/20
res[i,k] <- a
k <- k + 1
s <- 0
}
}
}
Thank you.
# example - you have this already
set.seed(1) # for reproducible example
M <- matrix(rnorm(7000*160),nc=160)
# you start here...
indx <- rep(1:8,each=20)
result <- sapply(1:8,function(i)rowMeans(M[,which(indx==i)]))
head(result)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 0.2127915 -0.38038950 -0.087656347 0.05933375 -0.23819112 0.03943897 -0.008970226 0.03841767
# [2,] 0.3548025 0.31491967 0.144773998 -0.05972595 -0.17191220 0.04243383 0.047314127 -0.16848104
# [3,] -0.2559990 0.35942642 0.003344486 0.23424747 0.09022379 0.58685507 -0.157652263 -0.25611335
# [4,] 0.3723693 0.23901787 -0.304657019 0.41620451 0.26005406 0.09726225 -0.434833656 0.07112657
# [5,] 0.4457805 0.08682639 0.048011727 0.15753612 0.30271061 -0.05484104 -0.103921787 -0.12066903
# [6,] -0.2823111 -0.00243217 0.055399402 0.31365508 0.17940294 0.26896135 -0.439110424 -0.30403590
Another way using rowsum and jlhoward's data ("M"):
n = 20
ans = t(rowsum(t(M), rep(seq_len(ncol(M) / n), each = n))) / n
head(ans)
I have a 7x7 matrix:
Mat<-matrix(nrow=7,ncol=7)
With certain elements:
Mat[2,2]<-37
Mat[2,4]<-39
Mat[2,6]<-24
Mat[4,2]<-35
Mat[4,4]<-36
Mat[4,6]<-26
Mat[6,2]<-26
Mat[6,4]<-31
Mat[6,6]<-39
I am generating random elements and want to test if they add up to the specified values
I have written the following code:
TF<-c()
TF[1]<-isTRUE(Mat[2,2]==sum(Mat[1,1],Mat[1,2],Mat[1,3],Mat[2,1],Mat[2,3],Mat[3,1],Mat[3,2],Mat[3,3]))
TF[2]<-isTRUE(Mat[2,4]==sum(Mat[1,3],Mat[1,4],Mat[1,5],Mat[2,3],Mat[2,5],Mat[3,3],Mat[3,4],Mat[3,5]))
TF[3]<-isTRUE(Mat[2,6]==sum(Mat[1,5],Mat[1,6],Mat[1,7],Mat[2,5],Mat[2,7],Mat[3,5],Mat[3,6],Mat[3,7]))
TF[4]<-isTRUE(Mat[4,2]==sum(Mat[3,1],Mat[3,2],Mat[3,3],Mat[4,3],Mat[4,5],Mat[5,1],Mat[5,2],Mat[5,3]))
TF[5]<-isTRUE(Mat[4,4]==sum(Mat[3,3],Mat[3,4],Mat[3,5],Mat[4,3],Mat[4,5],Mat[5,3],Mat[5,4],Mat[5,5]))
TF[6]<-isTRUE(Mat[4,6]==sum(Mat[3,5],Mat[3,6],Mat[3,7],Mat[4,5],Mat[4,7],Mat[5,5],Mat[5,6],Mat[5,7]))
TF[7]<-isTRUE(Mat[6,2]==sum(Mat[5,1],Mat[5,2],Mat[5,3],Mat[6,1],Mat[6,3],Mat[7,1],Mat[7,2],Mat[7,3]))
TF[8]<-isTRUE(Mat[6,4]==sum(Mat[5,3],Mat[5,4],Mat[5,5],Mat[6,3],Mat[6,5],Mat[7,3],Mat[7,4],Mat[7,5]))
TF[9]<-isTRUE(Mat[6,6]==sum(Mat[5,5],Mat[5,6],Mat[5,7],Mat[6,5],Mat[6,7],Mat[7,5],Mat[7,6],Mat[7,7]))
Now i am trying to make it more efficient with a nested for loop:
O<-c(2,4,6)
for (G in O)
{
for (H in O)
{
TF[]<-isTRUE(Mat[G,H]==sum(Mat[G-1,H-1],Mat[G-1,H],Mat[G-1,H+1],Mat[G,H-1],Mat[G,H+1],Mat[G+1,H-1],Mat[G+1,H],Mat[G+1,H+1]))
}
}
The problem is that the vector element will be overwritten and it does not make any sense to add another for loop.
I also have problem to find a way to rerun the simulation if one false is found.
Let's start first by answering the following question:
How do you compute the sum of every surrounding cell for each cell in a matrix?
This is actually not trivial as far as I can tell (curious to see if anyone else comes up with something cool). Here is a potential solution, though not even close to being succinct. Let's start by seeing the results of the function. Here we will create matrices of only 1 so we can check that the results make sense (corners should add to 3 since there are only three contiguous cells, insides to 8, etc.):
> compute_neighb_sum(matrix(1, nrow=3, ncol=3))
[,1] [,2] [,3]
[1,] 3 5 3
[2,] 5 8 5
[3,] 3 5 3
> compute_neighb_sum(matrix(1, nrow=3, ncol=5))
[,1] [,2] [,3] [,4] [,5]
[1,] 3 5 5 5 3
[2,] 5 8 8 8 5
[3,] 3 5 5 5 3
> compute_neighb_sum(matrix(1, nrow=7, ncol=7))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 3 5 5 5 5 5 3
[2,] 5 8 8 8 8 8 5
[3,] 5 8 8 8 8 8 5
[4,] 5 8 8 8 8 8 5
[5,] 5 8 8 8 8 8 5
[6,] 5 8 8 8 8 8 5
[7,] 3 5 5 5 5 5 3
This works!
Now, let's answer your actual question:
compute_neighb_sum(mx) == mx
and this should return TRUE for all cells that are equal to the sum of their surroundings. Lets confirm:
mx <- matrix(1, nrow=7, ncol=7)
mx[cbind(c(3, 6), c(3, 6))] <- 8 # make two interior cells equal two 8, which will be equal to sum of surroundings
which(compute_neighb_sum(mx) == mx, arr.ind=T) # you should look at `mx` to see what's going on
Sure enough, we get back the coordinates that we expect:
row col
[1,] 3 3
[2,] 6 6
Now, here is the function:
compute_neighb_sum <- function(mx) {
mx.ind <- cbind( # create a 2 wide matrix of all possible indices in input
rep(seq.int(nrow(mx)), ncol(mx)),
rep(seq.int(ncol(mx)), each=nrow(mx))
)
sum_neighb_each <- function(x) {
near.ind <- cbind( # for each x, y coord, create an index of all surrounding values
rep(x[[1]] + -1:1, 3),
rep(x[[2]] + -1:1, each=3)
)
near.ind.val <- near.ind[ # eliminate out of bound values, or the actual x,y coord itself
!(
near.ind[, 1] < 1 | near.ind[, 1] > nrow(mx) |
near.ind[, 2] < 1 | near.ind[, 2] > ncol(mx) |
(near.ind[, 1] == x[[1]] & near.ind[, 2] == x[[2]])
),
]
sum(mx[near.ind.val]) # Now sum the surrounding cell values
}
`dim<-`( # this is just to return in same matrix format as input
sapply(
split(mx.ind, row(mx.ind)), # For each x, y coordinate in input mx
sum_neighb_each # compute the neighbor sum
),
c(nrow(mx), ncol(mx)) # dimensions of input
)
}