I am trying to create a ts object using R for a daily time series that starts on 24.02.2015 and ends on 13.04.2015. I have put the frequency=7 for daily data but I cannot find a way to put the exact date as start argument.
I think this is what you want, using the decimal_date() function from 'lubridate' to get the proper start time for a daily series and assuming that the vector of values you want to index as a ts is called x and is of the proper length:
library(lubridate)
df <- ts(x, start = decimal_date(as.Date("2015-02-24")), frequency = 365)
Here's what that looks like if I use rnorm() to generate an x of the proper length:
> df
Time Series:
Start = c(2015, 55)
End = c(2015, 103)
Frequency = 365
[1] 0.4284579 1.9384426 0.1242242 -2.4002789 -0.4064669 0.6945274 -0.5172909 0.4772347 0.8758635 -1.7233406 0.5929249 1.5662611 1.0692173 -0.1354226
[15] 1.1404375 0.7714662 -0.2871663 -5.2720038 -1.7353146 -0.7053329 1.0206803 1.7170262 -0.3469172 0.2594851 2.0371700 -2.1549066 -0.6639050 -0.4912258
[29] -0.3849884 -3.0448583 -1.3317834 1.6173705 0.7176759 -0.8646802 -1.7697016 1.1114061 0.6941131 -0.1942612 -0.1836107 -0.5850649 -1.7449090 -3.3646555
[43] -0.4341833 1.9721407 1.4995265 1.7168002 1.8617295 -3.4578959 1.1639413
Note that for daily indexing, you want frequency = 365, not 7, which denotes weekly indexing.
If you want a vector of dates that you can use in 'zoo' instead, this does it:
seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1)
So you would create a zoo object like this:
zoo(x, seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1))
And if you want a table with date column, you can use:
df <- data.frame(date = seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1))
Using the xts library:
library(xts)
data_xts <- xts(x=dataframe$x, order.by=as.Date(dataframe$date, "%m/%d/%Y"))
With this method, you can't or don't have to specify the end date.
The output looks like this:
[,1]
2020-01-01 7168.3
2020-01-02 7174.4
2020-01-03 6942.3
2020-01-04 7334.8
Related
I have a starting time specified as a year-month character, e.g. "2020-12". From the start, for each of T consecutive months, I need to generate n different dates (year-month-day), where the day is random.
Any help will be useful!
The data I'm working on:
data <- data.frame(
data = sample(seq(as.Date('2000/01/01'), as.Date('2020/01/01'), by="day"), 500),
price = round(runif(500, min = 10, max = 20),2),
quantity = round(rnorm(500,30),0)
)
func <- function(start, months, n) {
startdate <- as.Date(paste0(start, "-01"))
enddate <- seq(startdate, by = "month", length.out = months)
months <- seq_len(months)
enddate_lt <- as.POSIXlt(enddate)
enddate_lt$mon <- enddate_lt$mon + 1
enddate_lt$mday <- enddate_lt$mday - 1
days_per_month <- as.integer(format(enddate_lt, format = "%d"))
days <- lapply(days_per_month, sample, size = n)
dates <- Map(`+`, enddate, days)
do.call(c, dates)
}
set.seed(2021)
func("2020-12", 4, 3)
# [1] "2020-12-08" "2020-12-07" "2020-12-15" "2021-01-27" "2021-01-08" "2021-01-13" "2021-02-21" "2021-02-07" "2021-02-28"
# [10] "2021-03-28" "2021-03-07" "2021-03-15"
func("2020-12", 5, 2)
# [1] "2020-12-06" "2020-12-16" "2021-01-08" "2021-01-10" "2021-02-24" "2021-02-13" "2021-03-20" "2021-03-29" "2021-04-19"
# [10] "2021-04-28"
func("2020-12", 2, 10)
# [1] "2020-12-29" "2020-12-30" "2020-12-04" "2020-12-15" "2020-12-09" "2020-12-27" "2020-12-05" "2020-12-06" "2020-12-23"
# [10] "2020-12-17" "2021-01-03" "2021-01-20" "2021-01-05" "2021-01-22" "2021-01-23" "2021-01-06" "2021-01-10" "2021-01-07"
# [19] "2021-01-19" "2021-01-12"
Most of the dancing with POSIXlt objects is because it gives us clean (base R) access to the number of days in a month, which makes sampleing the days in a month rather simple. It can also be done (code-golf shorter) using the lubridate package, but I don't know that that is any more correct than this code is.
This just dumps out a sequence of random dates, with n days per month. It does not sort within each month, though it does output the months in order. (That's not a difficult extension, there just wasn't a requirement for it.) It doesn't put out a frame, you can easily extend this to fit in a frame or call data.frame(date = do.call(c, dates)) on the last line, depending on what you need to do with the output.
You could convert the start time to a class for monthly data, zoo::yearmon. Then use as.Date.yearmon and its frac argument ("a number between 0 and 1 inclusive that indicates the fraction of the way through the period that the result represents") with random values from runif (uniform between 0 and 1) to convert to a random date within each year-month.
start = "2020-12"
T = 3
n = 2
library(zoo)
set.seed(1)
as.Date(as.yearmon(start) + rep((1:T)/12, each = n), frac = runif(T * n))
# [1] "2021-01-08" "2021-01-12" "2021-02-16" "2021-02-25" "2021-03-07" "2021-03-27"
First time question and new to R.
I am pulling data from a SQL server and putting into a R data table (SAP). I am trying to calculate the total hours from 4 columns (StartDate, FinDate, StartTime, FinTime).
I have tried this to calcute from datatable (SAP), but not getting what I want.
SAP$hours <- with(SAP,
difftime(c(ActStartDate, ActStartTime),
c(ActFinDate, ActFinTime),
units = "hours") )
I would like to have the total hours added to the data Table or a vector assigned the total hours.
This is how I would do in excel:
Hours = ((End_Date+End_Time)-(Start_Date+Start_Time))*24
You could do something like this:
#sample data:
df <- data.frame(startdate = c("2018-08-23 00.00.00"),
enddate = c("2018-08-24 00.00.00"),
starttime = c("23:00:00"),
endtime = c("23:30:00"))
#This will first combine date(after extracting the date part) and time and
#then convert it to a date time object readable by R.
df$sdt <- as.POSIXct(paste(substr(df$startdate, 1, 10),
df$starttime,
sep = " "),
format = "%Y-%m-%d %H:%M:%S")
#Same for end date time
df$edt <- as.POSIXct(paste(substr(df$enddate, 1, 10),
df$endtime,
sep = " "),
format = "%Y-%m-%d %H:%M:%S")
df$diff <- difftime(df$edt, df$sdt, units = "hours")
Thank you all. I ended up doing this with your inputs and it worked.
# make it a data table
SAP <- data.table(SAP)
# only select some columns of interest
SAP <- SAP[, .(Equipment, Order, ActStartDate, ActStartTime, ActFinDate, ActFinTime)]
# generate start / end as POSIX,
# this code assumes that start date from SAP is always like 2018-05-05
# so 10 chars as YYYY-MM-DD
# if needed add time zone information, e.g as.POSIXct(..., tz = 'UTC')
SAP[, start := as.POSIXct(paste0(substring(ActStartDate, 1, 10), ActStartTime))]
SAP[, end := as.POSIXct(paste0(substring(ActFinDate, 1, 10), ActFinTime))]
# calculate duration
SAP[, duration := difftime(end, start, units = "hours")]
I am trying to aggregate quarterly hour data but I am getting the error message invalid type (list). The list is a POSIXlt list and I have aggregated minutely and hourly data before but I have never seen this error before. Do I need to convert the list to a different type and if so, would I still be able to extract the 15min data? Here is my code, I would really appreciate any help:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
Arrival_board$QTR= as.POSIXlt(round(as.double(Arrival_board$Minute_Seq)/(5*60))*(5*60),origin=(as.POSIXlt('1970-01-01')))
arrive_stats <- aggregate(Arrival_board$Arrivals ~ Arrival_board$QTR, Arrival_board, FUN=mean)
POSIXlt is a list type, use POSIXct instead:
aggregate(Arrivals ~ QTR, transform(Arrival_board, QTR=as.POSIXct(QTR)), FUN=mean)
Here is an alternative to binning your data via your QTR expression. It uses the seq.Date and cut command. It is more straight forward than divide round and multiple:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
QTR= seq(trunc(seq_start, units='mins'), by='5 mins',length = 13)
Arrival_board$QTR = cut(Arrival_board$Minute_Seq,QTR)
arrive_stats <- aggregate(Arrival_board$Arrivals ~ QTR, Arrival_board, FUN=mean)
Do to the differences on how the bins are defined there will be slight shift in the results. To correct for this case with a 5 minute window, change the seq_start time by 2 minutes:
QTR= seq(trunc(seq_start-(2*60), units='mins'), by='5 mins',length = 14)
I'm having some trouble with converting daily stock returns into a time series.
My data is just a data.frame object.
data$Returns
> .0317
> -.0126
> -.0279
Then, when I try to convert the data.frame to a time series object, these returns seem to arbitrarily change.
data.ts <- ts(data, freq=365)
> 55 17 30
Why is the ts function changing the data?
Here is a reproducible example that will accomplish what you're after:
data = data.frame(
Date = c("2004-01-01", "2004-01-02", "2004-01-03"),
Returns = c(.0317, -.0126, -.0279))
data.ts = ts(data$Returns, freq = 365)
data.ts
> data.ts
Time Series:
Start = c(1, 1)
End = c(1, 3)
Frequency = 365
[1] 0.0317 -0.0126 -0.0279
I'm working with some time data and I'm having problems converting a time difference to years and months.
My data looks more or less like this,
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
First I subtract the data of birth (DOB) form today's data (Today).
dfn$ageToday <- dfn$Today - dfn$DOB
This gives me the Time difference in days.
dfn$ageToday
Time differences in days
[1] 1875.866 1872.866 1869.866 1866.866 1863.866
[6] 1860.866 1857.866 1854.866 1851.866 1848.866
[11] 1845.866 1842.866 1839.866 1836.866 1833.866
[16] 1830.866 1827.866 1824.866 1821.866 1818.866
[21] 1815.866 1812.866 1809.866 1806.866 1803.866
[26] 1800.866
attr(,"tzone")
[1] ""
This is where first part of my question comes in; how do I convert this difference to years and months (rounded to months)? (i.e. 4.7, 4.11, etc.)
I read the ?difftime man page and the ?format, but I did not figure it out.
Any help would be appreciated.
Furthermore, I would like to melt my final object and if I try using melt on the data frame above using this command,
require(plyr)
require(reshape)
mdfn <- melt(dfn, id=c('Patient'))
I get this strange warning I haven't see before
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
So, my second question is; how do I create a time diffrence I can melt alongside my POSIXct variables? If I melt without dfn$ageToday everything works like a charm.
Thanks, Eric
The lubridatepackage makes working with dates and times, including finding time differences, really easy.
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
dfn$diff <- new_interval(dfn$DOB, dfn$Today) / duration(num = 1, units = "years")
mdfn <- melt(dfn, id=c('Patient'))
class(mdfn$value) # all values are coerced into numeric
The new_interval() function calculates the time difference between two dates. Note that there is a function today() that could substitute for your use of Sys.time. Finally note the duration() function that creates a standard, ehm, duration that you can use to divide the interval by a length of standard units, in this case, a unit of one year.
In case you want to preserve the contents of Today and DOB, then you may want to convert everything to character first and reconvert later...
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
# Create standard durations for a year and a month
one.year <- duration(num = 1, units = "years")
one.month <- duration(num = 1, units = "months")
# Calculate the difference in years as float and integer
dfn$diff.years <- new_interval(dfn$DOB, dfn$Today) / one.year
dfn$years <- floor( new_interval(dfn$DOB, dfn$Today) / one.year )
# Calculate the modulo for number of months
dfn$diff.months <- round( new_interval(dfn$DOB, dfn$Today) / one.month )
dfn$months <- dfn$diff.months %% 12
# Paste the years and months together
# I am not using the decimal point so as not to imply this is
# a numeric representation of the diference
dfn$y.m <- paste(dfn$years, dfn$months, sep = '|')
# convert Today and DOB to character so as to preserve them in melting
dfn$Today <- as.character(dfn$Today)
dfn$DOB <- as.character(dfn$DOB)
# melt using string representation of difference between the two dates
dfn2 <- dfn[,c("Today", "DOB", "Patient", "y.m")]
mdfn2 <- melt(dfn2, id=c('Patient'))
# alternative melt using numeric representation of difference in years
dfn3 <- dfn[,c("Today", "DOB", "Patient", "diff.years")]
mdfn3 <- melt(dfn3, id=c('Patient'))