I want to create a double sliding window in a for loop. An example data set might look like:
a <- structure(list(a = c(0.0961136, 0.1028192, 0.1106424, 0.1106424,
0.117348, 0.117348, 0.117348, 0.122936, 0.1307592, 0.1307592,
0.1318768, 0.1318768, 0.1385824, 0.1385824, 0.1318768, 0.1251712,
0.1251712, 0.1251712, 0.1251712, 0.1251712)), .Names = "a", row.names = c(NA,
-20L), class = "data.frame")
The code I have so far looks like this:
windowSize <- 5
windowStep <- 1
dat <- list()
for (i in seq(from = 1, to = nrow(a), by = windowStep)){
window1 <- a[i:windowSize, ]
window2 <- a[i:windowSize + windowSize, ]
if (median(window1) <= 0.12 && (median(window1) >= 0.08)) {
p <- "True"
} else
p <- "not"
dat[[i]] <- c(p)
}
result <- as.data.frame(do.call(rbind, dat))
This example shows that I require two windows of size 5 (data points) to slide one in front of the other by 1 data point at a time. This example does not utilize window 2 because it doesn't work!(I will need it to work eventually) However using just window1 to calculate the median (in this case) at each step works but the output is incorrect. The if statements ask that if the median of window 1 is between 0.08 and 0.12 then output "True" else "not."
Output for my for loop =
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 not
11 not
12 not
13 not
14 not
15 not
16 not
17 not
18 not
19 not
20 not
Correct output as checked using rollapply (and obviously can be seen by eye)
rollapply(a, 5, FUN = median, by = 1, by.column = TRUE, partial = TRUE, align = c("left"))
should be:
1 True
2 True
3 True
4 not
5 not
6 not
7 not
8 not
9 not
10 not
11 not
12 not
13 not
14 not
15 not
16 not
17 not
18 not
19 not
20 not
Could the solution remain as a for loop if possible as I have much more to add but need to get this right first. Thanks.
This gets close..modified from: https://stats.stackexchange.com/questions/3051/mean-of-a-sliding-window-in-r
windowSize <- 10
windowStep <- 1
Threshold <- 0.12
a <- as.vector(a)
data <- a
slideFunct <- function(data, windowSize, WindowStep){
total <- length(data)
dataLength <- seq(from=1, to=(total-windowSize), by=windowStep)
result <- vector(length = length(dataLength))
for(i in 1:length(dataLength)){
result[i] <- if (median(data[dataLength[i]:(dataLength[i]+windowSize)]) <= Threshold)
result[i] <- "True"
else
result[i] <- "not"
}
return(result)
}
Related
Brief description:
I got a matrix based on action and states (1:25, nrow 5) and i want to be able to select the upcoming row (so whenever i am sitting on the first row no matter position i want to have an output of all the positions in the next row, example input function number 8, output = 4 9 14 19 24). Came up with a logical function but whenever i run it i get an error in environment_mat$cellnumb, $ operator is invalid for atomic vectors....
Can you maybe help a lad out here?
states <- seq(1,5, by = 1)
actions <- seq(1,5, by = 1)
state_sequence <- cbind(merge(states,states), state = seq(1, length(states)*length(actions)))
environment_mat <- matrix(state_sequence$state, nrow = length(states), ncol= length(actions))
rewards_mat <- matrix(data = c(-100,10,50,16,32,40,-100,80,41,7,50,1,-100,
85,2,16,98,4,-100,8,32,45,95,78,-100), nrow = 5)
environment_mat
nextCells <- function(curCell) {
nexSta <- seq(0, max(states)-1, by = 1)*max(states)+
environment[environment_mat$cellnum == curCell,]$y
return(nexSta)
}
nextCells(24)
As explained above i tried multiple things but i cannot come up with another logical function than this
You may be looking for the modulo operator %%.
nextCells <- function(curCell) {
environment_mat[(which(environment_mat == curCell) - 1L) %% nrow(environment_mat) + 2L,]
}
nextCells(24)
#> [1] 5 10 15 20 25
nextCells(8)
#> [1] 4 9 14 19 24
Or, more simply:
nextCells <- function(curCell) {
environment_mat[(curCell - 1L) %% nrow(environment_mat) + 2L,]
}
nextCells(24)
#> [1] 5 10 15 20 25
nextCells(8)
#> [1] 4 9 14 19 24
I know how to generate 100 random numbers in R (without replacement):
random_numbers = sample.int(100, 100, replace = FALSE)
I was now curious about learning how to generate 100 "non random" numbers (without replacement). The first comes to mind is to generate a random number, and the next number will be the old number + 1 with a probability of 0.5 or an actual random number with probability 0.5. Thus, these numbers are not "fully random".
This was my attempt to write this code for numbers in a range of 0 to 100 (suppose I want to repeat this procedure 100 times):
library(dplyr)
all_games <- vector("list", 100)
for (i in 1:100){
index_i = i
guess_sets <- 1:100
prob_i = runif(n=1, min=1e-12, max=.9999999999)
guess_i = ifelse(prob_i> 0.5, sample.int(1, 100, replace = FALSE), guess_i + 1)
guess_sets_i <- setdiff(guess_sets_i, guess_i)
all_games_i = as.list(index_i, guess_i, all_games_i)
all_games[[i]] <- all_games_i
}
all_games <- do.call("rbind", all_games)
I tried to make a list that stores all guesses such that the range for the next guess automatically excludes numbers that have already been guessed, but I get this error:
Error in sample.int(1, 100, replace = FALSE) :
cannot take a sample larger than the population when 'replace = FALSE'
Ideally, I am trying to get the following results (format doesn't matter):
index_1 : 5,6,51,4,3,88,87,9 ...
index_2 77,78,79,2,65,3,1,99,100,4...
etc.
Can someone please show me how to do this? Are there easier ways in R to generate "non-random numbers"?
Thank you!
Note: I think an extra line of logic needs to be added - Suppose I guess the number 100, after guessing the number 100 I must guess a new random number since 100+1 is not included in the original range. Also, if I guess the number 5, 17 then 4 - and after guessing 4, the loop tells me to guess 4+1, this is impossible because 5 has already been guessed. In such a case, I would also have to guess a new random number?
It would be tricky to make your algorithm very efficient in R... it doesn't lend itself nicely to vectorization. Here's how I'd write it directly as a for loop:
semirandom = function(n) {
safe_sample = function(x, ...) {
if(length(x) == 1) return(x)
sample(x, ...)
}
result = numeric(n)
result[1] = sample.int(n, size = 1)
for(i in 2:length(result)) {
if(runif(1) < .5 &&
result[i - 1] < n &&
!((result[i - 1] + 1) %in% result)) {
result[i] = result[i - 1] + 1
} else {
result[i] = safe_sample(x = setdiff(1:n, result), size = 1)
}
}
result
}
# generate 10 semirandom numbers 5 times
replicate(semirandom(10), n = 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 6 4 4 2 6
# [2,] 3 5 5 3 7
# [3,] 4 3 6 4 5
# [4,] 5 1 2 5 2
# [5,] 7 9 3 6 3
# [6,] 9 10 10 1 1
# [7,] 10 2 8 9 4
# [8,] 2 8 1 8 10
# [9,] 1 7 9 10 9
# [10,] 8 6 7 7 8
You get the error cannot take a sample larger than the population when 'replace = FALSE' because you attempt to extract 100 values from a vector of length one without replacement.
The following draws numbers between 1 and 100, draws each number not more than once, has a 50 percent chance of drawing the previous number + 1 and a 50 percent chance of drawing another random number, if the previous number + 1 has not been drawn yet, and a 100 percent chance to draw another random number, if the previous number + 1 has been drawn.
i <- sample.int(100, 1)
j <- i
for(x in 1:99) {
if((i + 1L) %in% j) {
i <- sample((1:100)[-j], 1L)
} else {
if(runif(1L) > 0.5 || i == 100L) {
i <- sample((1:100)[-j], 1L)
} else {
i <- i + 1L
}
}
j <- c(j, i)
}
I showed how I see the implementation of this algorithm, I divided it into two steps
step one sequence search
step two check break rules
set.seed(123)
dat <- as.data.frame(matrix(sample(10,60,replace = T),ncol = 3))
colnames(dat) <- LETTERS[1:ncol(dat)]
dat
rule <- c("A==0","A==10 & B==4","C==9","A>10","B<0","C==0","A==5","A>10",
"B<0","C==0","A==9 & B==9","A>10","B<0","A==10","A==7 & B==5")
action <- c("break","next","next",rep("break",3),"next",rep("break",3),
"next",rep("break",3) ,"next")
rule <- cbind(rule,action)
I think this works -
seq_rule <- function(dat, rule, res.only = TRUE) {
value = rule$action
rule <- rule$rule
m <- with(dat, lapply(rule, function(r) eval(str2expression(r))))
fu <- function(x, y) {
k <- which(y)
ifelse(all(k <= x), NA, min(k[k > x]))
}
idx <- Reduce(fu , m,init = 0, accumulate = TRUE)[-1]
if (!res.only) {
idx <- na.omit(idx)
fidx <- head(idx, length(rule))
debug.vec <- replace(rep("no", nrow(dat)), fidx, rule[seq_along(fidx)])
return(cbind(dat, debug.vec))
}
if(any(value[!is.na(idx)] == 'break')) return(FALSE)
idx <- na.omit(idx)
length(idx) >= length(rule)
}
Here are some checks -
rule <- data.frame(rule= c("A==9","B==4","C==4","A==4", "B==10","C==4") ,
action= c(rep("next",3),"break","break","next"))
seq_rule(dat = dat,rule = rule)
#[1] FALSE
rule <- data.frame(rule= c("C==9","B==3","C==4"),
action= c(rep("next",3)))
seq_rule(dat = dat,rule = rule)
#[1] TRUE
seq_rule(dat = dat,rule = rule, res.only = FALSE)
# A B C debug.vec
#1 3 5 9 C==9
#2 3 3 3 B==3
#3 10 9 4 C==4
#4 2 9 1 no
#5 6 9 7 no
#6 5 3 5 no
#7 4 8 10 no
#8 6 10 7 no
#9 9 7 9 no
#10 10 10 9 no
rule <- data.frame(rule= c("C==9","B==3","C==4", "A == 1"),
action= c(rep("next",3), 'break'))
seq_rule(dat = dat,rule = rule)
#[1] FALSE
rule <- data.frame(rule= c("C==9","B==3","C==4", "A == 6"),
action= c(rep("next",3), 'break'))
seq_rule(dat = dat,rule = rule)
#[1] FALSE
Since the logic of your question is a bit complicated, I guess a straightforward way, e.g., using loops, might be more efficient and readable. Here is one version of seq_rule
seq_rule <- function(dat, rule, res.only = TRUE) {
m <- with(dat, as.data.frame(sapply(rule$rule, function(r) eval(str2expression(r)))))
rule_next <- with(rule, rule[action == "next"])
m_next <- m[rule_next]
idx <- na.omit(
Reduce(
function(x, y) {
k <- which(y)
ifelse(all(k <= x), NA, min(k[k > x]))
}, m_next,
init = 0, accumulate = TRUE
)
)[-1]
fidx <- head(idx, length(rule_next))
debug.vec <- replace(rep("no", nrow(dat)), fidx, rule_next[seq_along(fidx)])
trgs <- do.call(
rbind,
Map(
function(p, q) {
u <- as.matrix(m[p, ][q[q %in% with(rule, rule[action == "break"])]])
k <- which(u, arr.ind = TRUE)
data.frame(breakRowID = row.names(u)[k[, "row"]], breakTrigger = colnames(u)[k[, "col"]])
},
split(1:nrow(dat), cut(1:nrow(dat), c(0, idx, Inf))),
split.default(names(m), cumsum(rule$action != "break"))
)
)
triggerBreaks <- replace(rep("no", nrow(dat)), debug.vec != "no", NA)
if (!res.only) {
cbind(dat, debug.vec, trigger.break = with(trgs, replace(triggerBreaks, as.numeric(breakRowID), breakTrigger)))
} else {
nrow(trgs) == 0
}
}
and you will see
> seq_rule(dat = dat, rule = rule)
[1] FALSE
> seq_rule(dat = dat, rule = rule, res.only = FALSE)
A B C debug.vec trigger.break
1 3 9 2 no no
2 3 3 1 no no
3 10 4 9 A==10 & B==4 <NA>
4 2 1 9 C==9 <NA>
5 6 7 6 no no
6 5 5 5 A==5 <NA>
7 4 10 9 no no
8 6 7 10 no no
9 9 9 4 A==9 & B==9 <NA>
10 10 9 6 no A==10
11 5 10 8 no no
12 3 7 6 no no
13 9 5 6 no no
14 9 7 7 no no
15 9 5 1 no no
16 3 6 6 no no
17 8 9 2 no no
18 10 2 1 no A==10
19 7 5 2 A==7 & B==5 <NA>
20 10 8 4 no no
I want to say a huge thank you to everyone who tried to help me, as well as for unlimited patience ..
But it was impossible to help me because I myself did not fully understand what I wanted. Instead of breaking the question into several parts and asking separately (as it should be), I asked a big difficult question that I could hardly explain to myself.
I am very very sorry for that.
Here is my answer, this is what I wanted to get in the end.
seq_rule2 <- function(dat , rule ,res.only = TRUE){
# This is a fast function written by Thomas here
# https://stackoverflow.com/questions/68625542/match-all-logic-rules-with-a-dataframe-need-super-fast-function
# as an answer to my earlier question.
# It takes the rules as a vector and looks for the sequence
seq_rule <- function(dat, rule, res.only = TRUE) {
m <- with(dat, lapply(rule, function(r) eval(str2expression(r))))
fu <- function(x, y) {
k <- which(y)
ifelse(all(k <= x), NA, min(k[k > x]))
}
idx <- na.omit(Reduce( fu, m,init = 0, accumulate = TRUE ))[-1]
if (!res.only) {
fidx <- head(idx, length(rule))
debug.vec <- replace(rep("no", nrow(dat)), fidx, rule[seq_along(fidx)])
return(cbind(dat, debug.vec))
}
length(idx) >= length(rule)
}
#if there is only one next rule, then there is no point in continuing to return the FALSE and finish completely
if( length(rule$rule[rule$action=="next"]) <= 1 ) return(FALSE)
# STEP 1
# run seq_rule
yes.next.rule.seq <- seq_rule(dat = dat , rule = rule$rule[rule$action=="next"] , res.only = T)
if(res.only==FALSE & yes.next.rule.seq==FALSE) {
Next <- rep("no",nrow(dat))
Break <- rep("no",nrow(dat))
dat <- cbind(dat,Next=Next, Break=Break)
return(dat)
}
if(res.only==TRUE & yes.next.rule.seq==FALSE) return(FALSE)
# if the seq_rule found the sequence (TRUE) but there are no "break rules" in the "rule",
# then there is no point in searching for "break rules". Return TRUE and finish completely
if( length(rule$rule[rule$action=="break"]) == 0 & yes.next.rule.seq == TRUE) return(TRUE)
# STEP 2
#looking for break rules in the range between next rules
if(yes.next.rule.seq){
#get indices where the "next rules" triggered in dat
deb.vec <- seq_rule(dat = dat , rule = rule$rule[rule$action=="next"] , res.only = F)[,"debug.vec"]
idx.next.rules <- which(deb.vec!="no")
#get indices where the "break rules" triggered in dat
m <- with(dat, lapply(rule$rule[rule$action=="break"], function(r) eval(str2expression(r))))
idx.break.rules <- unlist(lapply(m,which))
# RES the final result is equal to TRUE,
# but if a "break rule" is found between the "next rules",
# then the RES will be false
RES <- TRUE
# sliding window of two "next rules" http://prntscr.com/1qhnzae
for(i in 2:length(idx.next.rules)){
temp.range <- idx.next.rules[ (i-1):i ]
# Check if there is any "break rule" index between the "next rule" indexes
break.detect <- any( idx.break.rules > temp.range[1] & idx.break.rules < temp.range[2] )
if( break.detect ) RES <- FALSE ; break
}
}
if(!res.only) {
Next <- rep("no",nrow(dat)) ; Next[idx.next.rules] <- "yes"
Break <- rep("no",nrow(dat)) ; Break[idx.break.rules] <- "yes"
dat <- cbind(dat,Next=Next, Break=Break)
return(dat)
}
return(RES)
}
data for to check
set.seed(963)
dat <- as.data.frame(matrix(sample(10,30,replace = T),ncol = 3))
colnames(dat) <- LETTERS[1:ncol(dat)]
rule <- cbind.data.frame(rule= c("A==9","B==4","C==4","A==4") ,
action= c("next","break","break","next"))
rule <- as.data.frame(rule,stringsAsFactors = F)
seq_rule2(dat = dat, rule = rule)
dat
rule
for example no breaks set.seed(963)
http://prntscr.com/1qhprxq
with break set.seed(930) http://prntscr.com/1qhpv2h
I have question on replacing the value in between the vectors.
The algorithm should find that replacement number when the certain condition is met. In this case finding the number which makes the difference -20 with the previous number. So I prefer to use diff function.
Here is what I mean
x <- c(20,20,0,20,0,5)
> diff(x)
[1] 0 -20 20 -20 5
So in this case 0 makes the difference -20 and I want to change those 0s to 20.
. I know the easiest solution is the directly assigning x[3] <- 20 or x[5] <- 20
However, the 0 location is always different so I need an automated process that can do that. Thanks!
**EDIT
if we need to do this in a grouped data.frame
> df
x gr
1 20 1
2 20 1
3 0 1
4 20 1
5 0 1
6 5 1
7 33 2
8 0 2
9 20 2
10 0 2
11 20 2
12 0 2
How can we implement this ?
modify <- function(x){
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
}
df%>%
group_by(gr)%>%
mutate(modif_x=modify(x))
Error in mutate_impl(.data, dots) :
Evaluation error: 'match' requires vector arguments.
You can do it using which to get the position, i.e.
x[which(diff(x) == -20)+1] <- 20
x
#[1] 20 20 20 20 20 5
if you want a generic way to replace values of a vector based on particular values, i would approach it this way.
x = c(20,20,0,20,0,5)
value_search = 0
value_replacement = 20
index_position = which(x %in% value_search)
for (i in index_position) {
x[i] = value_replacement
}
but this works for single values. if you want to look for multiple values, you can use a nested loop as below:
x = c(20,20,0,20,0,5,33)
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
in response to OP's edits:
any number of ways to do this:
x = c(20,20,0,20,0,5,33)
gr = c(1,1,1,1,2,2,2)
df = data.frame(x, gr)
func_replace <- function(source, value_search, value_replacement) {
for (k in 1:length(source)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
source[i] = replacement
} # for i loop
} # for k loop
return(source)
} # func_replace
value_search = c(0, 33)
value_replacement = c(20, 44)
gr_value = 1
df$replacement = with(df, ifelse(gr == gr_value, sapply(df, FUN = function(x) func_replace(x, value_search, value_replacement)), NA))
I have hundreds of TimeSeries lines, each corresponding to unique values of a set of parameters. I put all the data in one large dataframe. The data looks like this (containing 270 TimeSeries):
> beginning
TimeSeriesID TimeSeries Par1 Par2 Par3 Par4 Par5
1 1 3936.693 51 0.05 1 1 True
2 1 3936.682 51 0.05 1 1 True
3 1 3945.710 51 0.05 1 1 True
4 1 3937.385 51 0.05 1 1 True
5 1 3938.050 51 0.05 1 1 True
6 1 3939.387 51 0.05 1 1 True
> end
TimeSeriesID TimeSeries Par1 Par2 Par3 Par4 Par5
3600452 270 -16.090 190 0.025 5 5 False
3600453 270 -21.120 190 0.025 5 5 False
3600454 270 -14.545 190 0.025 5 5 False
3600455 270 -23.950 190 0.025 5 5 False
3600456 270 -4.390 190 0.025 5 5 False
3600457 270 -3.180 190 0.025 5 5 False
What I am trying to achieve is for the Shiny app to allow the user vary the parameters he wants, get the user input and plot all the TimeSeries that satisfy those values in one plot. Therefore the plot will have different number of lines displayed given the users' input - ranging from one (when all parameters are set to a specified value) to 270 (when no parameters are chosen, all TimeSeries are plotted).
I had no success so far, so there is nothing I can share that may help solve the problem, although I spent many days on-and-off the it. So far I have been trying to use reactivePlot() and specify the lines by adding geom_line() in ggplot2. Now I am trying to look into the aes() parameter whether there is a possibility achieve what I need. I have also read about converting data into long format by reshape2, but I am not sure that is what I need, since I am working with TimeSeries data.
Thank you in advance.
In the end I went for a base R solution. Not perfect, but suited my needs:
equityplot.IDs <- function()
{
bounds <- c(-6000, 100000) #c(min(sapply(eq.list, min)), max(sapply(eq.list, max)))
colors <- rainbow(length(outputIDs()[[2]]))
j <- 1
indexy <- c(0, 6000)
# Plot
plot(NULL,xlim=indexy,ylim=bounds)
for (i in 1:length(equitieslist))
{
if(i %in% outputIDs()[[2]])
{
profit <- rev(equitieslist[[i]][,1]) #$Profit1)
lines(1:length(profit), profit, col=colors[j])
j <- j + 1
}
}
}
After more experimenting, currently working with this:
ggpokus <- function(n) {
mymin <- function(N = n){
m <- Inf
for (i in 1:N)
{
g <- length(equitieslist[[i]][,1])
if (g < m) {m <- g}
}
return (m)
}
mylength <- mymin()
# t <- paste("qplot(1:", mylength, ", rev(equitieslist[[", 1, "]][,1])[1:", mylength, "], geom = \"line\")", sep = "")
t <- paste("qplot(1:", mylength, ", rev(equitieslist[[", 1, "]][,1])[1:", mylength, "], geom = \"line\", ylim = c(0, 5000))", sep = "")
cols <- rainbow(n)
for (i in 1:n) {
p <- paste("rev(equitieslist[[", i+1, "]][,1])[1:", mylength, "])", sep = "")
c <- paste("\"", cols[i+1], "\"", sep = "") # paste("cols[", i, "]", sep = "")
t <- c(paste(t, " + geom_line(aes(y = ", p,", colour = ", c, ")", sep = ""))
}
# cat(t)
# cat("\n")
return (t)
}
options(expressions=10000)
z <- ggpokus(1619)
eval(parse(text=z))