Rolling queue size - r

I want to calculate number of items waiting or queued over. Let's say, I have fixed capacity of 102 item/hour and different incoming items for 9 hours.
as data table:
dt<-data.table(hour = c(1,2,3,4,5,6,7,8,9),
incoming = c(78,102,115,117,105,99,91,80,71),
capacity = rep(102,9))
I want to calculate queued items in each period.
In 1 and 2 capacity is enough and queue is 0.
In 3, 13 items are queued
In 4, 15+13 backlogged items are queued.
In 6, there were 31 backlogged items and 3 items are deducted so 28 were queued.
I have tried several options but could not figure out how to calculate.
Result should be:

Explicit looping in R won't get you far, and I don't see a vectorized solution for this, but this is trivial to solve using Rcpp:
library(Rcpp)
cppFunction("NumericVector queue(NumericVector x) {
NumericVector res(x.size());
res[0] = std::max<double>(0, x[0]);
for (int i = 1, size = x.size(); i < size; ++i) {
res[i] = std::max<double>(0, res[i-1] + x[i]);
}
return res;
}")
dt[, queued := queue(incoming - capacity)][]
# hour incoming capacity queued
#1: 1 78 102 0
#2: 2 102 102 0
#3: 3 115 102 13
#4: 4 117 102 28
#5: 5 105 102 31
#6: 6 99 102 28
#7: 7 91 102 17
#8: 8 80 102 0
#9: 9 71 102 0

I'd create a separate function to get queued number like #sebastian-c did, but with #R.S. 's logic. Like this
get_queue <- function(x){
n <- length(x)
y <- c(max(0, x[[1]]), rep(0, n - 1))
for(i in 2:n){
y[i] <- max(0, y[i - 1] + x[i])
}
y
}
And then
dt[,incoming_capacity := incoming - capacity]
dt[,queued := get_queue(incoming_capacity)]

Another alternative:
require(data.table)
dt<-data.table(hour = c(1,2,3,4,5,6,7,8,9),
incoming = c(78,102,115,117,105,99,91,80,71),
capacity = rep(102,9))
dt$incoming_capactity<- dt$incoming-dt$capacity
dt$carriedover<- 0
dt$carriedover[1]<- max(0,dt$incoming_capactity[1]) #added
for( i in 2:length(dt$carriedover)) {
dt$carriedover[i]<- max(0,dt$incoming_capactity[i] + dt$carriedover[i-1])
}
dt

Related

Splitting days into episode identifiers and start days

I have a dataset of event days (in a date format), and each event belongs to an (unknown) episode. I want to categorize the events into episodes in such a way that all events within 180 days of the first day of the episode are considered part of the same episode, and the first day of the episode is assigned to all events part of that episode. For example, given a vector of event dates
event_dates <- c(34, 102, 190, 202, 245, 460, 500, 517)
I'm hoping to get a vector of episode IDs
c(1, 1, 1, 1, 2, 3, 3, 3)
and a vector of episode start days
c(34, 34, 34, 34, 245, 460, 460, 460)
This 5th entry begins a new episode because it is more than 180 days after the first date of the first episode; the 6th entry begins a new episode because it is more than 180 days after the first date of the second episode, etc.
I have do perform this operation on millions of separate patients, so ideally I would prefer a vectorized solution that could work with by in a data.table or in grouped tibble even if it is a bit opaque over a readable but slow solution, which I currently have. Thanks!
Using event_dates from the Note at the end (copied from the question) here are two approaches.
1) Reduce Use Reduce to loop through the events:
f <- function(base, x) if (x > base + 180) x else base
st <- Reduce(f, init = -Inf, event_dates, acc = TRUE)[-1]; st
## [1] 34 34 34 34 245 460 460 460
as.numeric(factor(st))
## [1] 1 1 1 1 2 3 3 3
2) for loop Loop through the values maintaining a variable base that is the latest baes value.
base <- -Inf
st <- event_dates
for(i in seq_along(event_dates)) {
if (st[i] > base + 180) base <- st[i]
st[i] <- base
}
st
## [1] 34 34 34 34 245 460 460 460
as.numeric(factor(out))
## [1] 1 1 1 1 2 3 3 3
3) C++
Create a file called event_dates.cpp in the current directory containing:
// To build & load: library(Rcpp); source("event_dates.cpp")
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector grouper(NumericVector x) {
NumericVector y(clone(x));
int n = y.size();
double base = y[0] - 200;
for(int i = 0; i < n; i++) {
if (y[i] > base + 180.0) base = y[i];
y[i] = base;
}
return y;
}
and then run this.
library(Rcpp)
sourceCpp("event_dates.cpp")
st <- grouper(event_dates); st
## [1] 34 34 34 34 245 460 460 460
as.numeric(factor(st))
## [1] 1 1 1 1 2 3 3 3
Note
event_dates <- c(34, 102, 190, 202, 245, 460, 500, 517)
Using a while loop
event_grp <- event_dates
tmp <- event_dates
index <- rep(1, length(event_dates))
i <- 1
while(TRUE) {
to_compare <- event_dates[i]
i1 <- which((tmp - to_compare) > 180)[1] -1
if(is.na(i1)) i1 <- length(event_dates)
event_grp[i:i1] <- to_compare
if(i > 1) index[i:i1] <- index[i-1] + 1
tmp[i:i1] <- NA
if(i1 == length(event_dates)) break
i <- i1+1
}
-output
> event_grp
[1] 34 34 34 34 245 460 460 460
> index
[1] 1 1 1 1 2 3 3 3

Fibonacci sequence less than 1000 in R

I'm trying to print the Fibonacci Sequence less than 1000 using while loop in R.
So far,
fib <- c(1,1)
counter <-3
while (fib[counter-1]<1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
I have this code. Only the first two numbers are given: 1,1. This is printing:
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
How do I fix my code to print only less than 1000?
Instead of checking the value of the last element wrt 1000, for the expected output you should be checking the sum of the last two elements as so.
fib <- c(1,1)
counter <-3
while (fib[counter-2]+fib[counter - 1]<1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
The issue with your approach is when the condition (fib[counter-1]<1000) in while loop is FALSE you have already added the number in fib which is greater than 1000.
You could return fib[-length(fib)] to remove the last number or check the number before inserting the number in fib.
fib <- c(1,1)
counter <-3
while (TRUE){
temp <- fib[counter-2] + fib[counter-1]
if(temp < 1000)
fib[counter] <- temp
else
break
counter = counter+1
}
fib
#[1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
You could change the while condition to sum the last 2 answers instead of just the last one:
fib <- c(1,1)
counter <-3
while (sum(fib[counter - 1:2]) < 1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
#> [1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Or just get rid of counter completely:
fib <- c(1,1)
while (sum(fib[length(fib) - 0:1]) < 1000) fib <- c(fib, sum(fib[length(fib) - 0:1]))
fib
#> [1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

Replacing NA with mean using loop in R

I have to solve this problem using loop in R (I am aware that you can do it much more easily without loops, but it is for school...).
So I have vector with NAs like this:
trades<-sample(1:500,150,T)
trades<-trades[order(trades)]
trades[sample(10:140,25)]<-NA
and I have to create a FOR loop that will replace NAs with mean from 2 numbers before the NA and 2 numbers that come after the NA.
This I am able to do, with loop like this:
for (i in 1:length(trades)) {
if (is.na(trades[i])==T) {
trades[i] <- mean(c(trades[c(i-1:2)], trades[c(i+1:2)]), na.rm = T)
}
}
But there is another part to the homework. If there is NA within the 2 previous or 2 following numbers, then you have to replace the NA with mean from 4 previous numbers and 4 following numbers (I presume with removing the NAs). But I just am not able to crack it... I have the best results with this loop:
for (i in 1:length(trades)) {
if (is.na(trades[i])==T && is.na(trades[c(i-1:2)]==T || is.na(trades[c(i+1:2)]==T))) {
trades[i] <- mean(c(trades[c(i-1:4)], trades[c(i+1:4)]), na.rm = T)
}else if (is.na(trades[i])==T){
trades[i] <- mean(c(trades[c(i-1:2)], trades[c(i+1:2)]))
}
}
But it still misses some NAs.
Thank you for your help in advance.
We can use na.approx from zoo
library(zoo)
na.approx(trades)
Here is another solution using a loop. I did shortcut some code by using lead and lag from dplyr. First we use 2 recursive functions to calculate the lead and lag sums. Then we use conditional statements to determine if there are any missing data. Lastly, we fill the missing data using either the output of the recursive or the sum of the previous and following 4 (with NA removed). I would note that this is not the way that I would go about this issue, but I tried it out with a loop as requested.
library(dplyr)
r.lag <- function(x, n){
if (n == 1) return(lag(x = x, n = 1))
else return( lag(x = x, n = n) + r.lag(x = x, n = n-1))
}
r.lead <- function(x, n){
if (n == 1) return(lead(x = x, n = 1))
else return( lead(x = x, n = n) + r.lead(x = x, n = n-1))
}
lead.vec <- r.lead(trades, 2)
lag.vec <- r.lag(trades, 2)
output <- vector(length = length(trades))
for(i in 1:length(trades)){
if(!is.na(trades[[i]])){
output[[i]] <- trades[[i]]
}
else if(is.na(trades[[i]]) & !is.na(lead.vec[[i]]) & !is.na(lag.vec[[i]])){
output[[i]] <- (lead.vec[[i]] + lag.vec[[i]])/4
}
else
output[[i]] <- mean(
c(trades[[i-4]], trades[[i-3]], trades[[i-2]], trades[[i-1]],
trades[[i+4]], trades[[i+3]], trades[[i+2]], trades[[i+1]]),
na.rm = T
)
}
tibble(
original = trades,
filled = output
)
#> # A tibble: 150 x 2
#> original filled
#> <int> <dbl>
#> 1 7 7
#> 2 7 7
#> 3 12 12
#> 4 18 18
#> 5 30 30
#> 6 31 31
#> 7 36 36
#> 8 NA 40
#> 9 43 43
#> 10 50 50
#> # … with 140 more rows
So it seems that posting to StackOverflow helped me solve the problem.
trades<-sample(1:500,25,T)
trades<-trades[order(trades)]
trades[sample(1:25,5)]<-NA
which gives us:
[1] NA 20 24 30 NA 77 188 217 238 252 264 273 296 NA 326 346 362 368 NA NA 432 451 465 465 490
and if you run this loop:
for (i in 1:length(trades)) {
if (is.na(trades[i])== T) {
test1 <- c(trades[c(i+1:2)])
if (any(is.na(test1))==T) {
test2 <- c(trades[abs(c(i-1:4))], trades[c(i+1:4)])
trades[i] <- round(mean(test2, na.rm = T),0)
}else {
test3 <- c(trades[abs(c(i-1:2))], trades[c(i+1:2)])
trades[i] <- round(mean(test3, na.rm = T),0)
}
}
}
it changes the NAs to this:
[1] 22 20 24 30 80 77 188 217 238 252 264 273 296 310 326 346 362 368 387 410 432 451 465 465 490
So it works pretty much as expected.
Thank you for all your help.

Normalizing the values in a data table using the values stored in another data table

I am trying to normalize the values in a data table (dt) using the baseline values stored in another data table (dt.base). Next you have a sample contents of these tables and the code to generate that example:
> dt
Bench Config Part Power
1: A 10 P 171
2: A 10 Q 125
3: A 100 P 139
4: A 100 Q 109
5: B 10 P 196
6: B 10 Q 101
7: B 100 P 157
8: B 100 Q 176
> dt.base
Bench Config Part Power
1: A Base P 187
2: A Base Q 104
3: B Base P 166
4: B Base Q 188
Example generation code:
set.seed(13)
dt <- data.table(
Bench = c(rep('A', 4), rep('B', 4)),
Config = rep(c(10, 10, 100, 100), 2),
Part = rep(c('P', 'Q'), 4),
Power = round(runif(8, 100, 200)))
dt.base <- data.table(
Bench = c(rep('A', 2), rep('B', 2)),
Config = c('Base', 'Base', 'Base', 'Base'),
Part = rep(c('P', 'Q'), 2),
Power = round(runif(4, 100, 200)))
The idea would be to divide all the values in dt by their corresponding values in dt.base. Therefore, the table would become:
Bench Config Part Power
1: A 10 P 171 / 187
2: A 10 Q 125 / 104
3: A 100 P 139 / 187
4: A 100 Q 109 / 104
5: B 10 P 196 / 166
6: B 10 Q 101 / 188
7: B 100 P 157 / 166
8: B 100 Q 176 / 188
I thought the solution for this was quite straightforward, but I am running into some issues. This is my current attempt:
normalize.power <- function(pwr, base.pwr) {
pwr / base.pwr
}
dt.norm <- dt[,
Power <- normalize.power(
.SD, dt.base[Bench == Bench & Config == 'Base' & Part == Part,
'Power', with = F]
), by = list(Bench, Config, Part)]
The problem is that normalize.pwr is not receiving a single value in its second parameter (base.pwr), but rather a vector containing all the power values in dt.base. However, when I directly execute from the command line
dt.base[Bench == 'A' & Config == 'Base' & Part == 'P', 'Power', with = F]
then I obtain a single power value, as expected.
I would appreciate any help that solves my problem or leads me to the solution.
You can try something like this
setkey(dt, Bench, Part)
setkey(dt.base, Bench, Part)
dt[dt.base, Power := Power / i.Power]
dt
## Bench Config Part Power
## 1: A 10 P 0.91444
## 2: A 100 P 0.74332
## 3: A 10 Q 1.20192
## 4: A 100 Q 1.04808
## 5: B 10 P 1.18072
## 6: B 100 P 0.94578
## 7: B 10 Q 0.53723
## 8: B 100 Q 0.93617
Thanks #Arun for the useful i.Power syntax

if 13* D = 1 mod 60 then D = 37 how?

I am solving an example problem, RSA algorithm
I have been given two prime numbers 7 and 11. Let's say p=7 and q=11
I have to calculate the decryption key, d, for some encryption key, e.
Firstly I calculated n=p*q which implies that n=77.
Suppose that e=13,
to calculate d I used the formula d*e = 1 mod fi,
where fi=(p-1)(q-1), and so fi=60
The final equation becomes 13*d = 1 mod fi
According to some solved example
d is calculated to be 37, how is this result obtained?
Any help would be appreciated.
i think this is what you are looking for
Verifying the answer is easy, finding it in the first place, a little more work.
Verification:
13 * 37 = 481
481 = 8 * 60 + 1
Hence if you divide 13 * 37 by 60 you have remainder 1.
Alternate answers:
Any integer of the form (37 + 60 k) where k is any integer is also a solution. (97, -23, etc.)
To find the solution you can proceed as follows:
Solve:
13 d = 1 + 60 k
mod 13:
0 = 1 + 8k (mod 13)
8k = -1 (mod 13)
Add 13's until a multiple of 8 is found:
8k = 12 or 25 or 38 or 51 or 64 .... aha a multiple of 8!
k = 64 / 8 = 8
Substitute k = 8 back into 13 d = 1 + 60 k
13 d = 1 + 8 * 60 = 481
481 /13 = 37
and that is the answer.
Use the extended Euclidean algorithm to compute integers x and y such that
13*x+60*y=1
Then x is the answer you're looking for, mod 60.

Resources