Create date index and add to data frame in R - r

Currently transitioning from Python to R. In Python, you can create a date range with pandas and add it to a data frame like so;
data = pd.read_csv('Data')
dates = pd.date_range('2006-01-01 00:00', periods=2920, freq='3H')
df = pd.DataFrame({'data' : data}, index = dates)
How can I do this in R?
Further, if I want to compare 2 datasets with different lengths but same time span, you can resample the dataset with lower frequency so it can be the same length as the higher frequency by placing 'NaNs' in the holes like so:
df2 = pd.read_csv('data2') #3 hour resolution = 2920 points of data
data2 = df2.resample('30Min').asfreq() #30 Min resolution = 17520 points
I guess I'm basically looking for a Pandas package equivalent for R. How can I code these in R?

The following is a way of getting your time-series data from a given time interval (3 hours)to another (30 minutes):
Get the data:
starter_df <- data.frame(dates=seq(from=(as.POSIXct(strftime("2006-01-01 00:00"))),
length.out = 2920,
by="3 hours"),
data = rnorm(2920))
Get the full sequence in 30 minute intervals and replace the NA's with the values from the starter_df data.frame:
full_data <- data.frame(dates=seq(from=min(starter_df$dates),
to=max(starter_df$dates), by="30 min"),
data=rep(NA,NROW(seq(from=min(starter_df$dates),
to=max(starter_df$dates), by="30 min"))))
full_data[full_data$dates %in% starter_df$dates,] <- starter_df[starter_df$dates %in% full_data$dates,]
I hope it helps.

Related

R - Use Lubridate to create 1 second intervals in datetime column where only minutes are specified

I am working with a time series that looks something like this:
# making a df with POSIXct datetime sequence with just minutes
#Make reproducible data frame:
set.seed(1234)
datetime <- rep(lubridate::ymd_hm("2016-08-01 15:10"), 60)
# Generate measured value
value <- runif(n = 60, min = 280, max = 1000)
df <- data.frame(datetime, value)
The data is actually recorded at 1 second intervals, but it appears as 60 rows with the same hour and minute with with seconds part always at 00. I want to change it such that each minute has its seconds value increasing at one second intervals. The actual dataset includes many hours of data. Thank you
We can use
df$datetime <- with(df, datetime + seconds(seq_along(datetime)) -1)

Simulate a series of code n(lets say 1000) times while saving the result in a vector in R

I'm still relatively new to R so I'm struggling with repeating lines of code several times and saving the result for each repetition.
The aim is to randomly (equal probability) assign a number of events, in my case 100, over a 20 year period. Since days are irrelevant I use the number of months to define the period. Subsequently, I'm counting the events for every 24-month period within the 20 years. Lastly, extracting the maximum number of events occurring within a 24-month period.
Albeit messy and probably inefficient, the code works for the intended purpose. However, I want to repeat this process 1000 times to get a distribution of all the maximum number of events taking place over 24 months to compare to my real data.
here is my coding so far:
library(runner)
library(dplyr)
#First I set the period from the year 2000 to 2019 with one-month increments.
period <- seq(as.Date("2000/1/1"), by = "month", length.out = 240)
#I sample random observations assigned to different months over the entire period.
u <- sample(period, size=100, replace=T)
#Make a table in order to register the number of occurrences within each month.
u <- table(u)
#Create a data frame to ease information processing.
simulation <- data.frame(u)
#Change the date column to date format.
simulation$u <- as.Date(simulation$u)
#Compute number of events taking place within every 24-month period (730 = days in 24 months).
u <- u %>%
mutate(
Last_24_month_total = sum_run(
x = simulation$Freq,
k = 730,
idx = as.Date(simulation$u, format = "%d/%m/%Y"))
)
#extract the maximum number of uccurences within a 24 month period
max <- max(u$Last_24_month_total)
Could someone help me understand how to rewrite this process in order to facilitate a thousand repetitions while saving the max value for each repetition?
thanks
As #jogo suggested in the comments, you can use replicate.
I simplified your code.
library(runner)
library(dplyr)
seq_dates <- seq(as.Date("2000/1/1"), by = "month", length.out = 240)
replicate(100,
seq_dates %>%
sample(100, replace = TRUE) %>%
table() %>%
sum_run(730, idx = as.Date(names(.))) %>%
max)

Converting df into ts object and decompose in 15 minute intervals in R

I know there has been a lot on this topic already but I can't seem to get what I want working.
I've read:
how to convert data frame into time series in R
Convert data frame with date column to timeseries
As well as several others but can't get it to work.
I have the following df
df <- data.frame(CloseTime = c("2017-09-13 19:15:00","2017-09-13 19:30:00","2017-09-13 19:45:00","2017-09-13 20:00:00","2017-09-13 20:15:00"),
OpenPice = c(271.23,269.50,269.82,269.10,269.50),
HightPrice = c(271.23,269.50,269.82,269.10,269.50),
LowPrice = c(271.23,269.50,269.82,269.10,269.50),
ClosePrice = c(271.23,269.50,269.82,269.10,269.50))
I'd like to convert it into a tsobject, with 15-minute intervals and decompose the time series.
I also read that the zoo package allows you to decompose specific multiple intervals i.e. 15 mins, 1h, 1 day?
Can someone please help. How can I convert this into a ts object and decompose my ts object?
Just for the reproducibility purpose, another toy-example with longer period of time.
df <-
data.frame(
CloseTime = seq(as.POSIXct("2017-09-13 19:15:00"),as.POSIXct("2018-10-20 21:45:00"),by="15 mins"),
ClosePrice1 = cumsum(rnorm(38603)),
ClosePrice2 = cumsum(rnorm(38603)),
ClosePrice3 = cumsum(rnorm(38603))
)
I found it much better to aggregate time series into different intervals using dplyr and lubridate::floor_date. Instead of mean, one can summarise using min, max, first, last. I would recommend stay around the tidyr to keep code readable. Below example converting into 30minutes interval.
library(lubridate); library(dplyr); library(magrittr)
df30m <-
df %>%
group_by( CloseTime = floor_date( CloseTime, "30 mins")) %>%
summarize_all(mean)
Data.frame can be converted to timeseries object such as zoo and than to ts for decomposing purposes.
library(zoo)
df30m_zoo <- zoo( df30m[-1], order.by = df30m$CloseTime )
df30m_ts <- ts(df30m_zoo, start=1, frequency = 2 * pi)
df30m_decomposed <- decompose(df30m_ts)
The points are already 15 minutes apart so assuming that you want a period of 1 day this will convert it. There are 24 * 60 * 60 seconds in a day (which s the period) but you can change the denominator to the number of seconds in a period get a different period. You will need at least two periods of data to decompose it.
library(zoo)
z <- read.zoo(df)
time(z) <- (as.numeric(time(z)) - as.numeric(start(z))) / (24 * 60 * 60)
as.ts(z)
giving:
Time Series:
Start = c(0, 1)
End = c(0, 5)
Frequency = 96
OpenPice HightPrice LowPrice ClosePrice
0.00000000 271.23 271.23 271.23 271.23
0.01041667 269.50 269.50 269.50 269.50
0.02083333 269.82 269.82 269.82 269.82
0.03125000 269.10 269.10 269.10 269.10
0.04166667 269.50 269.50 269.50 269.50
Alhtough not asked for in the question, in another answer the data was converted to 30 minutes. That could readily be done like this:
library(xts) # also loads zoo
z <- read.zoo(df)
to.minutes30(z)

calculating seasonal range in r for a number of years

I have a data frame of daily temperature measurements spanning 20 years. I would like to calculate the annual range in the data series for each year (i.e. end up with 20 values, representing the range for each year). Example data:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
dat = data.frame(dt = begin_date + (0:(20*365)) * (86400))
dat = within(dat, {speed = runif(length(dt), 1, 10)})
I was thinking of writing a loop which goes through each year and then calculate the range, but was hoping there was another solution.
I think the best way forward would be to have the maximum and minimum values for each year and then calculate the range from that. Can anyone suggest a method to do this without writing a loop to go through each year individually?
Try
library(dplyr)
dat %>%
group_by(year=year(dt)) %>%
summarise(Range=diff(range(speed)))
Or
library(data.table)
setDT(dat)[, list(Range=diff(range(speed))), year(dt)]
Or
aggregate(speed~cbind(year=year(dt)), dat, function(x) diff(range(x)))

R repeat rows by vector and date

I have a data frame with 275 different stations and 43 years seasonal data (October to next April, no need for May to Sept data)and 6 variables, here is a small example of the data frame with only one variable called value:
data <- data.frame(station=rep(1,6), year=rep(1969,6), month=c(10,10,10,10,11,11),day=c(1,8,16,24,1,9),value=c(1:6))
What I need is to fill the gap of day with daily date(eg:1:8) and the value of each row the average of the 8 days, it would be look like:
data1 <- data.frame(station=rep(1,40), year=rep(1969,40), month=c(rep(10,31),rep(11,9)),day=c(1:31,1:9),value=rep(c(1/7,2/8,3/8,4/8,5/8,6/8),c(7,8,8,8,8,1)))
I wrote some poor code and searched around the site, but unfortunately didn't work out, please help or better ideas would be appreciated.
station.date <- as.Date(with(data, paste(year, month, day, sep="-")))
for (i in 1:length(station.date)){
days <- as.numeric(station.date[i+1]-station.date[i]) #not working
data <- within(data, days <- c(days,1))
}
rows <- rep(1:nrow(data), times=data[ ,data$days])
rows <- ifelse(rows > 10, 0, rows) #get rid of month May to Sept
data <- data[rows, ]
data <- within(data, value1 <- value/days)
data <- within(data, dd <- ?) #don't know to change the repeated days to real days
I wrote some code that does the same things as your example, but probably You have to modyfi it in order to handle whole data set. I wasn't sure what to do with the last observation. Eventually I made a special case for it. If it should be divided by different number, You need just to replace 8 inside values <- c(values, tail(data$value, 1) / 8)
with that number. Moreover if you have all 275 stations in one data.frame, I think the best idea would be to split it, transform it separately and than cbind it.
data <- data.frame(station=rep(1,6), year=rep(1969,6), month=c(10,10,10,10,11,11),day=c(1,8,16,24,1,9),value=c(1:6))
station.date <- as.Date(with(data, paste(year, month, day, sep="-")))
d <- as.numeric(diff(station.date))
range <- sum(d) + 1
# create dates
dates <- seq(station.date[1], by = "day", length = range)
# create values
values <- unlist(sapply(1:length(d), function(i){
rep(data$value[i] / d[i] , d[i])
}))
# adding last observation
values <- c(values, tail(data$value, 1) / 8)
# create new data frame
data2 <- data.frame(station = rep(1, range),
year = as.numeric(format(dates, "%Y")),
month = as.numeric(format(dates, "%m")),
day = as.numeric(format(dates, "%d")),
value = values)
It could probably be optimised in some way, however I hope it helps too. Note how I extract year, month and day from dates.

Resources