sliding window function in julia - julia

I am looking to take a collection and slide a window of length 'w' and step size 's' over it to get many sub collections.
I have seen Base.Iterators.partition but that does not allow sliding by less than the window (or partition) length.
I have written something myself that works but I expect there is already a function that does this and I just haven't found it yet.

Assuming z is your Vector and s is your step size and w is window size simply do:
((#view z[i:i+w-1]) for i in 1:s:length(z)-w+1)
Example:
z = collect(1:10)
for e in ((#view z[i:i+4]) for i in 1:2:length(z)-4)
#do something, try display(e)
end

I just found IterTools.jl, it has a partition with custom step size.
julia> for i in partition(1:9, 3, 2)
#show i
end
i = (1, 2, 3)
i = (3, 4, 5)
i = (5, 6, 7)
i = (7, 8, 9)

Have you looked at RollingFunctions? It seems to me that it does what you're looking for, it has rolling and running functions which take a function, a vector, and a windows size as input and return the result of applying the function over successive windows.

Related

How to find the index of an array, where the element has value x, in R

I have a very large array (RFO_2003; dim = c(360, 180, 13, 12)) of numeric data. I made the array using a for-loop that does some calculations based another array. I am trying to check some samples of data in this array to ensure I have generated it properly.
To do this, I want to apply a function that returns the index of the array where that element equals a specific value. For example, I want to start by looking at a few examples where the value == 100.
I tried
which(RFO_2003 == 100)
That returned (first line of results)
[1] 459766 460208 460212 1177802 1241374 1241498 1241499 1241711 1241736 1302164 1302165
match gave the same results. What I was expecting was something more like
[8, 20, 3, 6], [12, 150, 4, 7], [16, 170, 4, 8]
Is there a way to get the indices in that format?
My searches have found solutions in other languages, lots of stuff on vectors, or the index is never output, it is immediately fed into another part of a custom function so I can't see which part would output the index in a way I understand, such as this question, although that one also returns dimnames not an index.

Is there a native R syntax to extract rows of an array?

Imagine I have an array in R with N dimensions (a matrix would be an array with 2 dimensions) and I want to select the rows from 1 to n of my array. I was wondering if there was a syntax to do this in R without knowing the number of dimensions.
Indeed, I can do
x = matrix(0, nrow = 10, ncol = 2)
x[1:5, ] # to take the 5 first rows of a matrix
x = array(0, dim = c(10, 2, 3))
x[1:5, , ] # to take the 5 first rows of a 3D array
So far I haven't found a way to use this kind of writing to extract rows of an array without knowing its number of dimensions (obviously if I knew the number of dimensions I would just have to put as many commas as needed). The following snippet works but does not seem to be the most native way to do it:
x = array(0, dim = c(10, 2, 3, 4)
apply(x, 2:length(dim(x)), function(y) y[1:5])
Is there a more R way to achieve this?
Your apply solution is the best, actually.
apply(x, 2:length(dim(x)), `[`, 1:5)
or even better as #RuiBarradas pointed out (please vote his comment too!):
apply(x, -1, `[`, 1:5)
Coming from Lisp, I can say, that R is very lispy.
And the apply solution is a very lispy solution.
And therefore it is very R-ish (a solution following the functional programming paradigm).
Function slice.index() is easily overlooked (as I know to my cost! see magic::arow()) but can be useful in this case:
x <- array(runif(60), dim = c(10, 2, 3))
array(x[slice.index(x,1) %in% 1:5],c(5,dim(x)[-1]))
HTH, Robin

I have a list and I want to print a range of it's content with range and for loop

I have the following list on python:
items = [5,4,12,7,15,9]
and I want to print in this form:
9,15,7,12,4
How can I do that ?
numbers_list = [5,4,12,7,15,9]
for index in range(len(numbers_list)):
print(numbers_list[(index + 1) * - 1])
Not sure if it's very "Pythonic"
As the list indeces are being negated you can access the elements in the reverse order.
Last index in a Python is list [-1] and so on, till the first one being list length -1 (Plus one in this case to get the negative number closer to 0).
Using reversed and str.join:
numbers = [5, 4, 12, 7, 15, 9]
print(",".join(str(n) for n in reversed(numbers))) # 9,15,7,12,4,5
str.join is by far better than building your own string using mystring += "something" in terms of performances. How slow is Python's string concatenation vs. str.join? provides interesting insights about this.
I could also write a list comprehension to build an intermediate list like this:
reversed_string = [str(n) for n in reversed(numbers)]
print(",".join(reversed_string))
but writing list comprehension implies we store in-memory twice the list (the original one and the "strigified" one). Using a generator will dynamically compute the elements for str.join, somewhat the same way a classic iterator would do.

converting a for cycle in R

I would like to convert a for cycle into a faster operation such as apply.
Here is my code
for(a in 1:dim(k)[1]){
for(b in 1:dim(k)[2]){
if( (k[a,b,1,1]==0) & (k[a,b,1,2]==0) & (k[a,b,1,3]==0) ){
k[a,b,1,1]<-1
k[a,b,1,2]<-1
k[a,b,1,3]<-1
}
}
}
It's a simple code that does a check on each element of the multidimensional array k and if the three elements are the same and equal to 0, it assigns the value 1.
Is there a way to make it faster?. The matrix k has 1,444,000 elements and it takes too long to run it. Can anyone help?
Thanks
With apply you can return all your 3-combinations as a numeric vector and then check for your specific condition:
# This creates an array with the same properties as yours
array <- array(data = sample(c(0, 1), 81, replace = TRUE,
prob = c(0.9, 0.1)), c(3, 3, 3, 3))
# This loops over all vectors in the fourth dimension and returns a
# vector of ones if your condition is met
apply(array, MARGIN = c(1, 2, 3), FUN = function(x) {
if (sum(x) == 0 & length(unique(x)) == 1)
return(c(1, 1, 1))
else
return(x)
})
Note that the MARGIN argument specifies the dimensions over which to loop. You want the fourth dimension vectors so you specify c(1, 2, 3).
If you then assign this newly created array to the old one, you replaced all vectors where the condition is met with ones.
You should first use the filter function twice (composed), and then the apply (lapply?) function on the filtered array. Maybe you can also reduce the array, because it looks like you're not very interested in the third dimension (always accessing the 1st item). You should probably do some reading about functional programming in R here http://adv-r.had.co.nz/Functionals.html
Note I'm not a R programmer, but I'm quite familiar with functional programming (Haskell etc) so this might give you an idea. This might be faster, but it depends a bit on how R is designed (lazy or eager evaluation etc).

Completely stumped about this error: Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument

When I run this code, it works for about 100 iterations of the for loop then throws this error:Error in seq.default(start.at, NROW(data), by = by) :
wrong sign in 'by' argument
Here is the data that I used, and here is my code...
library(igraph)
library(zoo)
#import network data as edgelist
fake.raw.data <- read.csv("fakedata.csv")
fake.raw.data <- fake.raw.data[,2:3]
as.matrix(fake.raw.data)
#create igraph object from edglist data
fgraph <- graph_from_data_frame(fake.raw.data, directed = TRUE)
#finding the shortest paths that go through "special chain"
POI <- list()
df.vertices <- get.data.frame(fgraph, what = "vertices")
list.vertices <- as.list(df.vertices[,1])
AverageEBForPath <- function(graph = fgraph, from, to, mode = "out", chain){
browser()
asp <- all_shortest_paths(graph, from = from, to = to, mode)$res
for(i in seq_along(asp)){
if(sum(rollapply(names(asp[[i]]), length(chain), identical, chain)) == 1){
print(names(asp[[i]]))
}
}
}
AverageEBForPath(from = 32, to = V(fgraph), chain = c(32, 15, 9))
If anybody could help that would be extremely appreciated. I have been working on this for days, and I am really stuck.
Looking through the code of rollapply, there's a bit where it works out where in the array to start the rolling. The code it uses is:
start.at <- if (partial < 0)
max(-min(width[[1]]), 0) + 1
else 1
Note that in the function itself, width is a list generated from the window width that you're trying to use and the alignment you want... Given that you're passing a window width of 3 and a default alignment of "centre", the width list the function has generated for the code above is a list of three integers: [-1, 0, 1]
Which means that, using the code above, it has decided that given you're after a centre aligned window of width 3, the place to start is the second value in the data (because max(-min(width[[1]]),0) + 1 in the above code evaluates to 2).
All very reasonable, but whilst all of the rest of the instances of asp[[i]] have either 2 or 3 vertices, asp[[100]] has only one vertex (as you rightly pointed out) - so it throws a bit of fit trying to find the second one in order to start rolling through it!
I'm not entirely sure what your function is eventually going to do, so the ball's a bit in your court to work out how best to handle this, I think you've got two options given what you're seeing:
Option 1
Use the partial = TRUE setting on your rollapply, which will just always start at the first vertex no matter what (see the code snippet above!)
Option 2
Use align="left" in your rollapply. In this case, the width list we saw in the rollapply function itself would be [0, 1, 2] for a window width of 3 and start.at would evaluate to 1.
Hope that rambling and convoluted attempt at an answer helps!

Resources