Adding Values In a Stochastic Simulation using R - r

Here I've managed to extract extract time in discrete form such as 1,2,3...,50 from each simulation with help of the users. But, since there is no value for interval 20-21 and more, is there any coding such that I can add the value inside myself? Because, if there is no reading for that time interval, that means the readings are same until the next interval. The coding I used as below:
library(GillespieSSA)
parms <- c(beta=0.591,sigma=1/8,gamma=1/7)
x0 <- c(S=50,E=0,I=1,R=0)
a <- c("beta*S*I","sigma*E","gamma*I")
nu <- matrix(c(-1,0,0, 1,-1,0, 0,1,-1, 0,0,1),nrow=4,byrow=TRUE)
set.seed(12345)
out <- lapply(X=1:1,FUN=function(x)
ssa(x0,a,nu,parms,tf=50)$data)
out a<-as.data.frame(out)
idx <- diff(ceiling(a$V1)) == 1 a[idx,]

## change ==1 to >0
idx <- diff(ceiling(a$V1)) > 0
## get discrete time series
discrete.data <- a[idx,]
## get the last time step value
end.time <- ceiling(tail(discrete.data$V1,1))
## create an empty data frame with all time steps
new.df <- data.frame(t=0:end.time, S=0, E=0, I=0, R=0)
## replace only those time steps that have valid values
new.df[new.df$t %in% ceiling(discrete.data$V1),2:5] <- discrete.data[,2:5]
If necessary missing values can be replace by NA, depends on how you want to handle them.

Related

R - Create subvector iterative

I currently have this for-loop which I want to vectorize. It calculates the percentage amount of 6's in a for different subvectors. Starting with a[1:100], a[1:200], ... always in 100's steps.
rolls.max <- 100000
a <- sample(1:6, size=rolls.max, replace=TRUE)
sixes.ratio <- c()
for(i in 1:(rolls.max/100)) {
sixes.count <- table(a[1:(i*100)])[6]
ratio <- sixes.count/(i*100)
sixes.ratio <- c(sixes.ratio, ratio)
}
I think the most difficult part is to get the count of 6's from a for each subvector. I tried this:
rolls.max <- 100000
a <- matrix(sample(1:6, size=rolls.max, replace=TRUE))
subset.creator <- function(x, c) if (c!=0 && c%%100==0) { as.vector(table(x[1:(rolls[c/100])]))[6] }
sixes.count <- mapply(subset.creator, a, col(a))
# Converting the other lines won't be difficult I think
Want I wanted to achieve with this is, to create a subvector of a for every 100th call of the function subset.creator. Then create a table and take the sixth column, to get the count of 6's and then extract only the count by using as.vector()
But this just gives me rubbish instead of a vector with counts of 6's.
If you want to create a "rolling tally" at every hundredth chunk of your simulated rolls, one way to solve the problem is to create a vector of "stops" that represents your cutoff points, then use sapply to perform the calculation (in this case, counting up the 6s) at each stop:
rolls.max <- 100000
a <- sample(1:6, size=rolls.max, replace=TRUE)
# a vector of "stops" at every hundredth entry of 'a'
stops <- seq(0, rolls.max, 100)[-1]
# counts of 6s from the first entry in 'a' to the values in 'stops'
count.6 <- sapply(stops, function(x) sum(a[1:x] == 6))
# ...or just as easily, the rolling proportion of 6s
prop.6 <- sapply(stops, function(x) mean(a[1:x] == 6))

storing lubridate intervals in a dataframe (R)

I want to create a dataframe of 15 minute intervals over 24 hours starting with a certain inverval on several dates. I use a loop for that but instant of the actual intervals it stores the number of seconds which is not useful in my case. Is there any way to avoid this? I need the intervals to look how often timed events happen in these intervals. I found one similar question, but the answer concentrated on using lapply instead of apply, which isn't applicable here.
So here is a basic example:
begin<-as.POSIXct(rbind("2016-03-31 09:00:00","2016-04-12 09:00:00"))
end<-as.POSIXct(rbind("2016-03-31 09:15:00","2016-04-12 09:15:00"))
int<-as.interval(begin,end)
aufl<-duration(15, "mins")
Intervall=data.frame()
for (j in 1:length(int)){for (i in 1:96){Intervall[j,i]<-int_shift(int[j],aufl*(i-1))}}
Intervall
I created an answer, I hope this is what you are looking for. If not, please comment:
library(lubridate)
begin <- as.POSIXct(rbind("2016-03-31 09:00:00","2016-04-12 09:00:00"))
# copy begin time for loop
begin_new <- begin
# create durateion object
aufl <- duration(15, "mins")
# count times for loop
times <- 24*60/15
# create dataframe with begin time
Intervall <- data.frame(begin,stringsAsFactors = FALSE)
for (i in 1:times){
cat("test",i,"\n")
# save old time for interval calculation
begin_start <- begin_new
# add 15 Minutes to original time
begin_new <- begin_new + aufl
cat(begin_new,"\n")
# create an interval object between
new_dur <- interval(begin_start,begin_new)
# bind to original dataframe
Intervall <- cbind(Intervall,new_dur)
}
# Add column names
vec_names <- paste0("v",c(1:(times+1)))
colnames(Intervall) <- vec_names

Difftime Error using Looping Regressions in R

With the below code I am getting the error Error in Ops.difftime((f - mean(f)), 2) : '^' not defined for "difftime" objects.
This error only occurs with the inclusion of r_sq[[counter-lookback]] <- summary(temp_lm)$r.squared; which is located towards the end of the loop. I cannot find any similar error solutions online. Thank you for your help.
#Import necessary packages
require(quantmod)
require(ggplot2)
#Measure time used in processing
ptm <- proc.time()
#########
#Write in the ticker symbols of the pair
tickers <- c("GS","JPM")
########
#Pull data down for symbols
A <- getSymbols(tickers[1],auto.assign=FALSE)
B <- getSymbols(tickers[2],auto.assign=FALSE)
#Strip data such as high and low prices
A <- A[,4]
B <- B[,4]
#Create data frame of both price series
AB_DF <- data.frame(A,B)
#Create a time series of the spread & rename header
S <- A-B
colnames(S) <- "Spread.Close"
#Separate the index of times from the spread data for regression
TS <- index(S)
SP <- coredata(S)
#Perform regressions of past 'lookback' days of the spread, incrementing by 1, beginning at T = lookback+1
########
# Change below variable to alter length of data in regression
lookback <- 250
#######
#Initialize a counter, and lists to hold data from the spread regressions
counter <- lookback+1
res_store <- list()
spread_coef <- list()
r_sq <- list()
while (counter<length(SP)) {
temp_lm <- lm(TS[(counter-lookback):counter]~SP[(counter-lookback):counter]);
res_store[[counter-lookback]] <- residuals(temp_lm);
spread_coef[[counter-lookback]] <- coefficients(temp_lm)[[2]];
r_sq[[counter-lookback]] <- summary(temp_lm)$r.squared;
counter <- counter+1;
}
Ok all, I have figured it out. The issue is that R does not like to compute R^2 values for data indexed by time. By regressing the data values against time, an error in difftime() occurs. I solved this by changing the index from time values to a standard integer index, and everything ran fine.

Represent a vector in R for sampling that is too large for existing memory

I am generating a data vector to sample from with sample without replacement.
If the dataset I am generating from is large enough, the vector exceeds the limits of R.
How can I represent these data in such a way that I can sample without replacement but can still handle huge datasets?
Generating the vector of counts:
counts <- vector()
for (i in 1:1024) {
counts <- c(counts, rep(i, times=data[i,]$readCount))
}
Sampling:
trial_fn <- function(counts) {
replicate(num_trials, sample(counts, size=trial_size, replace=F), simplify=F)
}
trials <- trial_fn(counts)
Error: cannot allocate vector of size 32.0 Mb
Is there a more sparse or compressed way I can represent this and still be able to sample without replacement?
If I understand correctly, your data has 1024 rows with different readCount.
The vector you build has the first readCount value repeated once, the second readCount repeated twice and so on.
Then you want to sample from this vector without replacement. So basically, you're sampling the first readCount with a probability of 1 / sum(1:1024), the second readCount with a probability of 2 / sum(1:1024) and so on, and each time you extract one value, it is removed from the set.
Of course the fastest and easier approach is yours, but you can also do it with much less memory but losing speed (significantly). This can be done by giving probabilities of extraction to sample function, extracting one value at a time and manually "removing" the extracted value.
Here's an example :
# an example of your data
data <- data.frame(readCount=1:1024)
# custom function to sample
mySample <- function(values, size, nElementsPerValue){
nElementsPerValue <- as.integer(nElementsPerValue)
if(sum(nElementsPerValue) < size)
stop("Total number of elements per value is lower than the sample size")
if(length(values) != length(nElementsPerValue))
stop("nElementsPerValue must have the same length of values")
if(any(nElementsPerValue < 0))
stop("nElementsPerValue cannot contain a negative numbers")
# remove values having zero elements inside
nElementsPerValue <- nElementsPerValue[which(nElementsPerValue > 0)]
values <- values[which(nElementsPerValue > 0)]
# pre-allocate the result vector
res <- rep.int(0.0,size)
for(i in 1:size){
idx <- sample(1:length(values),size=1,replace=F,prob=nElementsPerValue)
res[i] <- values[idx]
# remove sampled value from nElementsPerValue
nElementsPerValue[idx] <- nElementsPerValue[idx] - 1
# if zero elements remove also from values
if(nElementsPerValue[idx] == 0){
values <- values[-idx]
nElementsPerValue <- nElementsPerValue[-idx]
}
}
return(res)
}
# just for reproducibility
set.seed(123)
# sample 100k values from readCount
system.time(
a <- mySample(data$readCount, 100000, 1:1024),
gcFirst=T)
# on my machine it gives :
# user system elapsed
# 10.63 0.00 10.67

Prepend xts rows to a subset

Supposing I need to apply an MA(5) to a batch of market data, stored in an xts object. I can easily pull the subset of data I wanted smoothed with xts subsetting:
x['2013-12-05 17:00:01/2013-12-06 17:00:00']
However, I need an additional 5 observations prior to the first one in my subset to "prime" the filter. Is there an easy way to do this?
The only thing I have been able to figure out is really ugly, with explicit row numbers (here using xts sample data):
require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
x$rn <- row(x[,1])
frst <- first(x['2007-05-18'])$rn
finl <- last(x['2007-06-09'])$rn
ans <- x[(frst-5):finl,]
Can I just say bleah? Somebody help me.
UPDATE: by popular request, a short example that applies an MA(5) to the daily data in sample_matrix:
require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)$Close
calc_weights <- function(x) {
##replace rnorm with sophisticated analysis
wgts <- matrix(rnorm(5,0,0.5), nrow=1)
xts(wgts, index(last(x)))
}
smooth_days <- function(x, wgts) {
w <- wgts[index(last(x))]
out <- filter(x, w, sides=1)
xts(out, index(x))
}
set.seed(1.23456789)
wgts <- apply.weekly(x, calc_weights)
lapply(split(x, f='weeks'), smooth_days, wgts)
For brevity, only the final week's output:
[[26]]
[,1]
2007-06-25 NA
2007-06-26 NA
2007-06-27 NA
2007-06-28 NA
2007-06-29 -9.581503
2007-06-30 -9.581208
The NAs here are my problem. I want to recalculate my weights for each week of data, and apply those new weights to the upcoming week. Rinse, repeat. In real life, I replace the lapply with some ugly stuff with row indexes, but I'm sure there's a better way.
In an attempt to define the problem clearly, this appears to be a conflict between the desire to run an analysis on non-overlapping time periods (weeks, in this case) but requiring overlapping time periods of data (2 weeks, in this case) to perform the calculation.
Here's one way to do this using endpoints and a for loop. You could still use the which.i=TRUE suggestion in my comment, but integer subsetting is faster.
y <- x*NA # pre-allocate result
ep <- endpoints(x,"weeks") # time points where parameters change
set.seed(1.23456789)
for(i in seq_along(ep)[-(1:2)]) {
rng1 <- ep[i-1]:ep[i] # obs to calc weights
rng2 <- ep[i-2]:ep[i] # "prime" obs
wgts <- calc_weights(x[rng1])
# calc smooth_days on rng2, but only keep rng1 results
y[rng1] <- smooth_days(x[rng2], wgts)[index(x[rng1])]
}

Resources