This is my code where I explain some parts where I think there is a problem.
set.seed(5623)
t_llegada <- (1:30)
t_viaje <- (1:30)
t_intervalo<- (1:30)
#Prob. morning
pllegadaM <- rexp(30,rate=0.81) #Prob
pviajeM <- rexp(30,rate=30.47) #Prob
pinterM<- rexp(30,rate=0.12) #Prob
#Prob afternoon
pllegadaT <- rexp(30,rate=0.096) #Prob
pviajeT <- rexp(30,rate=31.80) #Prob
pinterT<- rexp(30,rate=0.97) #Prob
#Prob night
pllegadaN <- rexp(30,rate=0.12) #Prob
pviajeN <- rexp(30,rate=32.12) #Prob
pinterN<- rexp(30,rate=0.9) #Prob
sim <-NULL
##Dimension variables:
minutos.dia<-numeric(600)
min.llegada <- minutos.dia
min.salida <- minutos.dia
tinterval <- minutos.dia
tservicio.llegada<-minutos.dia
##Sample time with probs
tintervalM <- sample(t_intervalo,size=1, replace=TRUE, prob= pinterM)
tllegadasM <- sample(t_llegada,size=1,replace = TRUE,prob=pllegadaM)
tviajeM <- sample(t_viaje,size=1,replace = TRUE,prob=pviajeM)
tintervalT <- sample(t_intervalo,size=1, replace=T, prob= pinterT)
tllegadasT <- sample(t_llegada,size=1,replace = TRUE,prob=pllegadaT)
tviajeT <- sample(t_viaje,size=1,replace = TRUE,prob=pviajeT)
tintervalN <- sample(t_intervalo,size=1, replace=T, prob= pinterN)
tllegadasN <- sample(t_llegada,size=1,replace = TRUE,prob=pllegadaN)
tviajeN <- sample(t_viaje,size=1,replace = TRUE,prob=pviajeN)
##Count first person
min.llegada[1]<- 1
tinterval[1]<- 1
min.salida[1]<-tinterval[1]+tviajeM[1]
###Save in data frame "Sim"
uno <- data.frame (caso = 1,
minuto_llegada = min.llegada[1],
minuto_inicio_del_viaje = tinterval[1],
Tiempo_viaje = tviajeM [1],
minuto_salida_del_cliente = min.salida[1])
sim <- rbind(sim, uno)
##Loop to asigne probs acording to number cases
for (c in 2:600){
tllegadasM[c] <- if(c <300){sample(t_llegada,size=1,replace =TRUE,prob=pllegadaM)#VAL 2
} else{
sample(t_llegada,size=1,replace = TRUE,prob=pllegadaT)}
tviajeM[c] <- if(c<300){sample(t_viaje,size=1,replace = TRUE,prob=pviajeM)#VAL 3
}
else{
sample(t_viaje,size=1,replace = TRUE,prob=pviajeT)}
tintervalM[c]<-if(c <300){sample(t_intervalo,size=1, replace=TRUE, prob=pinterM)#VAL 2
} else{
sample(t_intervalo,size=1, replace=T, prob= pinterT)
}}
I previously asgined the number aleat. to the variable tintervalM, and in the second loop I suposed to I only pick the number from the variables aleats. an just sum. I hope to be well explained and be helped.
#Loop for times
for (c in 3:600){
min.llegada[c]<-min.llegada[c-1]+tllegadasM[c] #VAL 1
tinterval[c]<-if(min.llegada[c-1]>tinterval[c-1]){
tinterval[c-1]+ tintervalM[c]+tintervalM[c+1]} #VAL 2 HERE IS THE PROBLEM
min.salida[c]<-tinterval[c]+tviajeM[c] #VAL 4
nuevo <- data.frame (caso = c,
minuto_llegada = min.llegada[c],#1
minuto_inicio_del_viaje = tinterval[c],#2
Tiempo_viaje = tviajeM [c],#3
minuto_salida_del_cliente = min.salida[c])#4
sim <- rbind(sim, nuevo)
}
I want to asigne the sum of the previos number of tinterval[c-1] plus the number generated by tintervalM[c] and plus next number generated by tintervalM[c] to the variable tinterval[c] if min.salida[c] is greater than tinverval[c],but i recive the error has length zero,
Despite not understanding what your code does, I think I can point to the error. This part will fail if the condition within if in not TRUE.
tinterval[c] <- if(min.llegada[c-1]>tinterval[c-1]){
tinterval[c-1] + tintervalM[c] + tintervalM[c+1]
} #VAL 2 HERE IS THE PROBLEM
If the condition is FALSE, the if clause returns NULL and it cannot be assigned to tinterval[c].
How I guess it should be written is as follow
if( min.llegada[c-1] > tinterval[c-1] ){
tinterval[c] <- tinterval[c-1] + tintervalM[c] + tintervalM[c+1]
}
Now if the condition is FALSE, nothing happens.
Related
Good evening,
I asked a question earlier and found it hard to implement the solution so I am gonna reask it in a more clear way.
I have the problem, that I want to add a column to a dataframe of daily returns of a stock. Lets say its normally distributed and I would like to add a column that contains the value at risk (hist) whose function I wrote myself.
The restriction is that each observation should be assigned to my function and take the last 249 observations as well.
So when the next observation is calculated it should also take only the last 249 observations of the das before. So the input values should move as the time goes on. In other words I want values from 251 days ago to be excluded. Hopefully I explained myself well enough. If not maybe the code speaks for me:
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist<- function(x, n=250, hd=20, q=0.05){
width<-nrow(x)
NA.x<-na.omit(x)
quantil<-quantile(NA.x[(width-249):width],probs=q)
VaR<- quantil*sqrt(hd)%>%
return()
}
# Run the function on the dataframe
df$VaR<- df$Returns%>%VaR.hist()
Error in (width - 249):width : argument of length 0
This is the Error code that I get and not my new Variable...
Thanks !!
As wibom wrote in the comment nrow(x) does not work for vectors. What you need is length() instead. Also you do not need return() in the last line as R automatically returns the last line of a function if there is no early return() before.
library(dplyr)
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist <- function(x, n=250, hd=20, q=0.05){
width <- length(x) # here you need length as x is a vector, nrow only works for data.frames/matrixes
NA.x <- na.omit(x)
quantil <- quantile(NA.x[(width-249):width], probs = q)
quantil*sqrt(hd)
}
# Run the function on the dataframe
df$VaR <- df$Returns %>% VaR.hist()
It's a bit hard to understand what you want to do exactly.
My understanding is that you wish to compute a new variable VarR, calculated based on the current and previous 249 observations of df$Returns, right?
Is this about what you wish to do?:
library(tidyverse)
set.seed(42)
df <- tibble(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns=rnorm(500)
)
the_function <- function(i, mydata, hd = 20, q = .05) {
r <-
mydata %>%
filter(ridx <= i, ridx > i - 249) %>%
pull(Returns)
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df <-
df %>%
mutate(ridx = row_number()) %>%
mutate(VaR = map_dbl(ridx, the_function, mydata = .))
If you are looking for a base-R solution:
set.seed(42)
df <- data.frame(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns = rnorm(500)
)
a_function <- function(i, mydata, hd = 20, q = .05) {
r <- mydata$Returns[mydata$ridx <= i & mydata$ridx > (i - 249)]
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df$ridx <- 1:nrow(df) # add index
df$VaR <- sapply(df$ridx, a_function, mydata = df)
I am using the package depmixS4 to fit a HMM on time-series data. Here is an example with some high vol and low vol data.
In the getpars function we can see the parameter value estimates.
What is happening is that sometimes the first two values in the density are the low vol state and sometimes the second two values are the low vol state. Is there any way to fix (maybe setting initial priors?)
set.seed(1)
a <- data.frame(v1 = c(rnorm(n = 100, sd = 10), rnorm(n=100, sd = 1)))
a <- sample(a)
my_model <- depmixS4::depmix(response = v1 ~ 1 , nstates = 2 , data = a)
fitted_model <- depmixS4::fit(my_model)
getpars(fitted_model)
for (i in 100:200) {
my_model2 <- depmixS4::depmix(response = v1 ~ 1 , nstates = 2 , data = a[1:i, , drop = FALSE])
fitted_model2 <- depmixS4::fit(my_model2)
pars <- getpars(fitted_model2)
if (pars[8] > 8) {
print(i)
}
}
This is called label switching.
Models in which you swap the label of states (e.g., relabel state 1 as state 2 and state 2 as state 1) have the same likelihood and hence are both valid maximum likelihood solutions.
You can try to "fix" this issue by:
setting initial values for the parameters (which make it more likely that the EM algorithm will converge to a particular solution, although this is not guaranteed!);
or by setting order constraints (e.g. forcing the mean for state 1 to be larger than the mean for state 2). Such constraints can be supplied to the fit method in depmixS4 (see examples in ?fit);
a final option is to switch the labels of a fitted depmixS4 object.
Here is a function to relabel a fitted depmix object I have used before (not tested well though!):
label_switch <- function(mod,labels) {
# labels is vector, first element is new integer label for original state integer 1, second is new integer label for original state integer 2, etc.
if(!is(mod,"depmix") || !is(mod,"depmix.fitted")) stop("this function is for depmix models")
n_states <- mod#nstates
if(length(labels) != n_states || length(unique(labels)) != n_states || !(all(labels) %in% 1:n_states)) {
stop("labels needs to be a vector of unique integers between 1 and", n_states)
}
inv_labels <- sapply(1:n_states,function(x) which(labels == x))
tmp <- mod
# relabel prior
ppars <- getpars(mod#prior)
fpars <- getpars(mod#prior,which="fixed")
out_pars <- as.numeric(t(matrix(ppars,nrow=length(ppars)/n_states,byrow = TRUE)[,inv_labels]))
out_fixed <- as.logical(t(matrix(fpars,nrow=length(fpars)/n_states,byrow = TRUE)[,inv_labels]))
if(!tmp#prior#family$link=="identity") tmp#prior#family$base <- labels[tmp#prior#family$base]
# relabel transition
for(i in 1:n_states) {
ppars <- getpars(mod#transition[[inv_labels[i]]])
fpars <- getpars(mod#transition[[inv_labels[i]]],which="fixed")
out_pars <- c(out_pars,as.numeric(t(matrix(ppars,nrow=length(ppars)/n_states,byrow = TRUE)[,inv_labels])))
out_fixed <- c(out_fixed,as.logical(t(matrix(fpars,nrow=length(fpars)/n_states,byrow = TRUE)[,inv_labels])))
tmp#transition[[i]] <- mod#transition[[inv_labels[i]]]
if(!tmp#transition[[i]]#family$link=="identity") tmp#transition[[i]]#family$base <- labels[tmp#transition[[i]]#family$base]
#out_pars <- c(out_pars,getpars(mod#transition[[inv_labels[i]]]))
}
# relabel response
for(i in 1:n_states) {
out_pars <- c(out_pars,unlist(lapply(mod#response[[inv_labels[i]]],getpars)))
out_fixed <- c(out_fixed,unlist(lapply(mod#response[[inv_labels[i]]],getpars,which="fixed")))
}
tmp <- setpars(tmp,out_fixed,which="fixed")
tmp <- setpars(tmp,out_pars)
if(is(tmp,"depmix.fitted")) tmp#posterior <- viterbi(tmp)
return(tmp)
}
I'm trying to "pseudo-randomize" a vector in R using a while loop.
I have a vector delays with the elements that need to be randomized.
I am using sample on a vector values to index randomly into delays. I cannot have more than two same values in a row, so I am trying to use an if else statement. If the condition are met, the value should be added to random, and removed from delays.
When I run the individual lines outside the loop they are all working, but when I try to run the loop, one of the vector is populated as NA_real, and that stops the logical operators from working.
I'm probably not great at explaining this, but can anyone spot what I'm doing wrong? :)
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1,2)
while (length(random) < 27) {
count <- count + 1
b <- sample(value, 1, replace = FALSE)
a <- delays[b]
if(a == tail(random,1) & a == head(tail(random,2),1) {
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random,a)
}
}
Two problems with your code:
b can take a value that is greater than the number of elements in delays. I fixed this by using sample(1:length(delays), 1, replace = FALSE)
The loop continues when delays is empty. You could either change length(random) < 27 to length(random) < 26 I think or add length(delays) > 0.
The code:
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1, 2)
while (length(random) < 27 & length(delays) > 0) {
count <- count + 1
b <- sample(1:length(delays), 1, replace = FALSE)
a <- delays[b]
if (a == tail(random, 1) & a == head((tail(random, 2)), 1))
{
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random, a)
}
}
I have a vector like that
objetosDisponibles <- c(1,2,3,4)
I choose random one with this
objetoAleatorio <- sample(objetosDisponibles,size = 1, replace = F)
Then, I delete the choosen element in the vector
objetosDisponibles <- objetosDisponibles[objetosDisponibles!=objetoAleatorio]
If I do this 4 times, I want to choose the elements ramdonly, when I used sample method at the vector with 1 element, it choose other diferent. You can probe this with this code:
cont <- 0
objetosDisponibles <- c(1,2,3,4)
while(cont < 4){
objetoAleatorio <- sample(objetosDisponibles,size = 1, replace = F)
print(objetoAleatorio)
objetosDisponibles <- objetosDisponibles[objetosDisponibles!=objetoAleatorio]
print(objetosDisponibles)
cont <- cont +1
}
The error, a number with "L":
Print output:
I'm subsampling rows from a dataframe with c("x","y","density") columns at a variety of c("s_size","reps"). Reps= replicates, s_size= number of rows subsampled from the whole dataframe.
> head(data_xyz)
x y density
1 6 1 0
2 7 1 17600
3 8 1 11200
4 12 1 14400
5 13 1 0
6 14 1 8000
#Subsampling###################
subsample_loop <- function(s_size, reps, int) {
tm1 <- system.time( #start timer
{
subsample_bound = data.frame()
#Perform Subsampling of the general
for (s_size in seq(1,s_size,int)){
for (reps in 1:reps) {
subsample <- sample.df.rows(s_size, data_xyz)
assign(paste("sample" ,"_","n", s_size, "_", "r", reps , sep=""), subsample)
subsample_replicate <- subsample[,] #temporary variable
subsample_replicate <- cbind(subsample, rep(s_size,(length(subsample_replicate[,1]))),
rep(reps,(length(subsample_replicate[,1]))))
subsample_bound <- rbind(subsample_bound, subsample_replicate)
}
}
}) #end timer
colnames(subsample_bound) <- c("x","y","density","s_size","reps")
subsample_bound
} #end function
Here's the function call:
source("R/functions.R")
subsample_data <- subsample_loop(s_size=206, reps=5, int=10)
Here's the row subsample function:
# Samples a number of rows in a dataframe, outputs a dataframe of the same # of columns
# df Data Frame
# N number of samples to be taken
sample.df.rows <- function (N, df, ...)
{
df[sample(nrow(df), N, replace=FALSE,...), ]
}
It's way too slow, I've tried a few times with apply functions and had no luck. I'll be doing somewhere around 1,000-10,000 replicates for each s_size from 1:250.
Let me know what you think! Thanks in advance.
=========================================================================
UPDATE EDIT: Sample data from which to sample:
https://www.dropbox.com/s/47mpo36xh7lck0t/density.csv
Joran's code in a function (in a sourced function.R file):
foo <- function(i,j,data){
res <- data[sample(nrow(data),i,replace = FALSE),]
res$s_size <- i
res$reps <- rep(j,i)
res
}
resampling_custom <- function(dat, s_size, int, reps) {
ss <- rep(seq(1,s_size,by = int),each = reps)
id <- rep(seq_len(reps),times = s_size/int)
out <- do.call(rbind,mapply(foo,i = ss,j = id,MoreArgs = list(data = dat),SIMPLIFY = FALSE))
}
Calling the function
set.seed(2)
out <- resampling_custom(dat=retinal_xyz, s_size=206, int=5, reps=10)
outputs data, unfortunately with this warning message:
Warning message:
In mapply(foo, i = ss, j = id, MoreArgs = list(data = dat), SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
I put very little thought into actually optimizing this, I was just concentrating on doing something that's at least reasonable while matching your procedure.
Your big problem is that you are growing objects via rbind and cbind. Basically anytime you see someone write data.frame() or c() and expand that object using rbind, cbind or c, you can be very sure that the resulting code will essentially be the slowest possible way of doing what ever task is being attempted.
This version is around 12-13 times faster, and I'm sure you could squeeze some more out of this if you put some real thought into it:
s_size <- 200
int <- 10
reps <- 30
ss <- rep(seq(1,s_size,by = int),each = reps)
id <- rep(seq_len(reps),times = s_size/int)
foo <- function(i,j,data){
res <- data[sample(nrow(data),i,replace = FALSE),]
res$s_size <- i
res$reps <- rep(j,i)
res
}
out <- do.call(rbind,mapply(foo,i = ss,j = id,MoreArgs = list(data = dat),SIMPLIFY = FALSE))
The best part about R is that not only is this way, way faster, it's also way less code.