I have the following time series:
ts <- cbind(data.frame(date=seq(as.Date("2017/11/01"), by = "day", length.out = 30)),value=rep(5,30))
ts <- ts[order(ts$date, decreasing=T),]
I would like to adjust it by the below cumulative factor that has a value on some given dates:
cf <- cbind(data.frame(date=as.Date(c("2017/11/28", "2017/11/25","2017/11/04","2017/09/25"))),cumfactor=c(0.8,0.7,0.6,.05))
Such that, the value on each date on ts will be multiplied (adjusted) by the cumfactor on cf on the corresponding date and that cumfactor will be used for subsequent (earlier) dates until the next cumfactor shows up for an earlier date. The first (latest) dates in ts should not be adjusted if they are later than the first (latest) cumfactor date.
I am looking for the following result:
result <- cbind(data.frame(date=seq(as.Date("2017/11/01"), by = "day", length.out = 30)),value=c(3,3,3,3,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,4,4,4,5,5))
result <- result[order(result$date, decreasing=T),]
My guess is that a for loop could be the best option but I haven't been successful at obtaining this result.
Merge ts and cf, carry forward the factors and multiply them.
library(zoo)
m <- merge(ts, cf, all.x = TRUE)[nrow(ts):1, ]
transform(m, value = value * na.fill(na.locf0(cumfactor), 1))
We have retained your descending sequence of dates from the question but note that in R normally time series are represented in ascending order of date.
Related
I have a data.frame in R in which includes two variables with a Start-Date and an End-Date. I would like to add a new column with the number of days between the two dates and reduce the result by the number of sundays in each interval. I tried it like below but it doesn't work:
Data$Start <- as.Date(Data$Start, "%d.%m.%y")
Data$End <- as.Date(Data$End,"%d.%m.%y")
interval <- difftime(Data$Start, Data$End, units = "days")
sundays <- seq(from = Data$Start, to = Data$End, by = "days")
number.sundays <- length(which(wday(sundays)==1))
Data$DaysAhead <- interval - number.sundays
I get the error message in the seq() function, that it has to have the lenght 1 but I don't understand how I can handle this. Can somenone help me out with that?
Here's an example that works:
Data <- data.frame(
Start = c("01.01.2020", "01.06.2020"),
End = c("01.03.2020", "01.09.2020")
)
Data$Start <- as.Date(Data$Start, "%d.%m.%Y")
Data$End <- as.Date(Data$End,"%d.%m.%Y")
interval <- difftime(Data$End, Data$Start, units = "days")
sundays <- lapply(1:nrow(Data), function(i)seq(from = Data$Start[i], to = Data$End[i], by = "days"))
number.sundays <- sapply(sundays, function(x)length(which(lubridate::wday(x)==1)))
Data$DaysAhead <- interval - number.sundays
The problem is that seq() isn't vectorized, it assumes a single start and single end point. If you putt it inside of a loop (like lapply()) it will work and generate the relevant sequence for each start and end time. Then you can use sapply() to figure out how many sundays and since the returned value is a scalar, the return from sapply() will be a vector of the same length as interval.
I realized with an updated data set that there's a problem with the solution above, when Start-Date and End-Date aren't in the same year. I still want to count the days except Sundays starting on the 20.12.2020 until 10.01.2021 for example. The error message showing up in that case is that the sign with the argument "by" is wrong. I just can't manage to get it running . If I turn the dates around, the output makes no sense and the number of days is too high. What do I have to do to get this running over the year-end?
I'm calculate the difference in seconds of two consecutive row with the following code
set.seed(79)
library(outbreaks)
library(lubridate)
# Import data
disease_df <- measles_hagelloch_1861[, 3, drop = FALSE]
# Generate a random time for each day
disease_df$time <- sample(1:86400, nrow(disease_df), replace = TRUE)
disease_df$time <- hms::as.hms(disease_df$time)
# Combine date and time
disease_df$time1 <- with(disease_df, ymd(date_of_prodrome) + hms(time))
# Sort data
disease_df <- disease_df[order(disease_df$time1), ]
# Difference in days of two consecutive row
disease_df$diff <- as.numeric(difftime(disease_df$date_of_prodrome,
dplyr::lag(disease_df$date_of_prodrome, 1), units = 'days'))
# Difference in seconds of two consecutive row
disease_df$diff1 <- as.numeric(difftime(disease_df$time1,
dplyr::lag(disease_df$time1, 1), units = 'secs'))
Here is the resulted dataframe
and error message longer object length is not a multiple of shorter object length.
Could you please explain why difftime works fine for days but results in error for seconds? Thank you so much!
time1 column is of type "POSIXlt". I am not really sure why difftime with units = 'secs' doesn't work but if you convert it to POSIXct, it works without any error.
disease_df$time1 <- as.POSIXct(disease_df$time1)
disease_df$diff1 <- as.numeric(difftime(disease_df$time1,
dplyr::lag(disease_df$time1, 1), units = 'secs'))
Apparently dplyr was not happy wth the line: dplyr::lag(disease_df$time1, 1) because of the format of disease_df$time1.
Converting it to POSIXct works, so just update this part of your code:
# Combine date and time and convert to POSIXct
disease_df$time1 <- as.POSIXct(with(disease_df, ymd(date_of_prodrome) + hms(time)))
I have a CSV file containing data as follows-
date, group, integer_value
The date starts from 01-January-2013 to 31-October-2015 for the 20 groups contained in the data.
I want to create a time series for the 20 different groups. But the dates are not continuous and have sporadic gaps in it, hence-
group4series <- ts(group4, frequency = 365.25, start = c(2013,1,1))
works from programming point of view but is not correct due to gaps in data.
How can I use the 'date' column of the data to create the time series instead of the usual 'frequency' parameter of 'ts()' function?
Thanks!
You could use zoo::zoo instead of ts.
Since you don't provide sample data, let's generate daily data, and remove some days to introduce "gaps".
set.seed(2018)
dates <- seq(as.Date("2015/12/01"), as.Date("2016/07/01"), by = "1 day")
dates <- dates[sample(length(dates), 100)]
We construct a sample data.frame
df <- data.frame(
dates = dates,
val = cumsum(runif(length(dates))))
To turn df into a zoo timeseries, you can do the following
library(zoo)
ts <- with(df, zoo(val, dates))
Let's plot the timeseries
plot.zoo(ts)
I frequently use to.daily to convert 1 min OHLC data to a daily format but am trying to find a way to do the same with overnight data. I was hoping to see the option to specify what time a "day" starts and ends but didn't see that.
Overnight session being 18:00 to 09:30.
Does anyone have a simple way to do this?
You could use time-of-day subsetting with which.i = TRUE to find all of the observations you don't want. Then subset the original data with the negative of the result, so all the non-overnight observations will be dropped.
# assume data are in a xts object named 'x'
DayObs <- x["T09:30/T18:30", which.i = TRUE]
Overnight <- x[-DayObs,]
You might need to change the start and end times in the time-of-day subset call.
If you already have your data subset so that it only includes the overnight session, you can aggregate to "daily" using period.apply() and custom endpoints. Assuming your data are in an object named x:
ep <- c(0, which(diff(.indexhour(x) > 9 & .indexmin(x) > 30) == 1))
makeOHLC <- function(x) {
op <- as.numeric(first(x))
cl <- as.numeric(last(x))
c(Open = op, High = max(x), Low = min(x), Close = cl)
}
period.apply(x, ep, makeOHLC)
Assume the following dataset. I get closing prices for all working days. But I also have missing rows for dates for which there is no observation. How can I add rows equal to each day and date all the way to the present? The reason I need this done is that I need to average by week and having variable time windows renders that impossible.
Here is my code:
library(quantmod)
from="2012-09-01"
sym = c("BARC")
prices = Map(function(n)
{
print(n)
tryCatch(getSymbols(n, src="google", env=NULL, from=from)[, 4], error =
function(e) NA)
}, sym)
N = length(prices)
# identify symbols returning valid data
i = ! unlist(Map(function(i) is.na(prices[i]), seq(N)))
# combine returned prices list into a matrix, one column for each symbol
prices = Reduce(cbind, prices[i])
colnames(prices) = sym[i]
If you see the "prices" data frame you will see the point I am making.
You can create a blank xts with all the dates first, and then merge with your prices object.
full_dates <- xts(,order.by = seq(from = start(prices), to = end(prices), by= "day"))
full_prices <- merge(full_dates,prices, all = TRUE)
You can also choose to fill forward the missing prices, by the following
na.locf(full_prices)