Transforming data frame to time series in r - r

I am currently trying to convert a data.frame to a time series. The data frame looks like this:
All I want to do is be able to plot the doc data as a function of time and run a statistical test on it.
Any help would be greatly appreciated!
This is what my code currently looks like:
x=aggregate( doc ~ mo + yr , B , mean )
x$Date <- as.yearmon(paste(x$yr, x$mo), "%Y %m")
df_ts <- xts(x, order.by = x$Date)
keeps <- "doc"
df_ts <- df_ts[ , keeps, drop = FALSE]
df_ts_1 <- as.ts(df_ts, start = head(index(df_ts), 1), end =
tail(index(df_ts), 1))
The issue I'm running into is that the months and years are not in sequential order so when I try to apply a as.tf function, the data does not fill in correctly.

Using DF defined reproducibly in the Note at the end read the data frame into a zoo object converting the Date column to yearmon class and plot using ggplot2. If there exist duplicate dates (there are none in the example data) then add the aggregate = mean argument to read.zoo.
library(ggplot2)
library(zoo)
z <- read.zoo(DF[c("Date", "doc")], FUN = as.yearmon, format = "%b %Y")
autoplot(z) + scale_x_yearmon()
This would also work:
tt <- as.ts(z)
plot(na.approx(tt), ylab = "tt")
Note
In the future please do not use images. I have retyped the first three rows this time.
DF <- data.frame(month = c("02", "10", "12"), year = c(1998, 2000, 2000),
doc = c(1.55, 2.2, 0.96), Date = c("Feb 1998", "Oct 2000", "Dec 2000"),
stringsAsFactors = FALSE)

Related

How to construct a random data set of different year in R?

The code below will generate uniformly distributed data at a daily time step for the year 2009. Suppose, i want to construct a similar data set which would include the year 2009,2012, 2015, and 2019, how would i do that?. I am basically trying to avoid repeating the code or using filter to grab data for the year of interest.
library(tidyverse)
library(lubridate)
set.seed(500)
DF1 <- data.frame(Date = seq(as.Date("2009-01-01"), to = as.Date("2009-12-31"), by = "day"),
Flow = runif(365,20,60))
Here is an option where we create a vector of year, loop over the vector, get the sequence of dates after converting to Date class and create the 'Flow' from uniform distribution
year <- c(2009, 2012, 2015, 2019)
lst1 <- lapply(year, function(yr) {
dates <- seq(as.Date(paste0(yr, '-01-01')),
as.Date(paste0(yr, '-12-31')), by = 'day')
data.frame(Date = dates,
Flow= runif(length(dates), 20, 60))
})
and create a single data.frame with do.call
dat1 <- do.call(rbind, lst1)
Here is a possible solution:
set.seed(123)
sample_size <- 1000
y <- sample(c(2009,2012,2015,2019),sample_size,replace=TRUE)
simulate_date <- function(year){
n_days <- ifelse(lubridate::leap_year(year),
366,365)
as.Date(sample(1:n_days, 1), origin=paste0(year,"-01-01"))
}
dates <- Reduce(`c`, purrr::map(y, simulate_date))
> head(dates)
[1] "2012-06-28" "2012-01-15" "2009-07-15" "2012-11-02" "2019-04-29"
[6] "2015-10-27"

How can you convert quarterly data into monthly data in R?

I am currently trying to convert quarterly GDP data into monthly data and make it a time series. I figured out a way to do so before, however, I lost my code for it and cannot figure it out again. My data needs to span from 1997-01-01 to 2018-12-01, and the original data I had taken from the Fred database.
library(pdfetch)
GDP.1 <- pdfetch_FRED("GDP")
My code looked something like this...
GDP.working <- seq(as.Date(GDP.1, "1997/1/1"),
by = "month",
length.out = 252)
xts.1 <- xts(x = rep(NA,252),
order.by = seq(as.Date("1997/1/1"),
by = "month",
length.out = 252),
frequency = 12)
xts.GDP.1 <- merge(GDP.working, xts.1)
GDP.final <- na.locf(xts.GDP.1, fromLast = TRUE)
Any suggestions/methods would be very much appreciated.
Thank you for your time.
ERV

"Week-Year" string to a meaningful date or numeric format that can be plotted

I have a set of data that includes date that are in the "week-year" format. Therefore, "30-2010" represents the 30th week in 2010. I'm trying to plot the data, but need to adjust the date values to something in a date format or as a numeric value so that ggplot2 will use it as the labels on my x axis. Any ideas on how this can be done?
dte = "30-2010"
Check out ?week in the lubridate package. This seems to do what you want:
library(lubridate)
str <- "30-2010"
wk <- substr(str, 1, 2)
yr <- substr(str, 4, 7)
dt <- as.Date(paste0(yr, "-01-01"))
week(dt) <- as.numeric(wk)
dt
[1] "2010-07-23"

compare time intervals in R

Lets say I have dataframe consisting of 3 columns with dates:
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data.frame(index.new, begin.new, end.new)
My problem: I want to select (subset) the rows, where the interval of begin and end-date is within 4 days before the index-day. This is obviously only in row no 2.
Can you help me out here?
Your way to express the problem is messy, in the first case dates.new[1]>dates.new[2] and in the second case dates.new[3]<dates.new[4]. Making things proper:
interval1 = c(dates.new[2], dates.new[1])
interval2 = c(dates.new[3],dates.new[4])
If you wanna check interval2 CONTAINS interval1:
all.equal(findInterval(interval1, interval2),c(1,1))
Pleas let me know if this works and if is what you want
library("timeDate")
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data <- data.frame(index.new, begin.new, end.new)
apply(data, 1, function(x){paste(x[1]) %in% paste(timeSequence(x[2], x[3], by = "day"))})

Creating a sequence of columns in a data frame based on an index for loop or using plyr in r

I wish to create 24 hourly data frames in which each data.frame contains hourly demand for a product as 1 column, and the next 8 columns contain hourly temperatures. For example, for the data.frame for 8am, the data.frame will contain a column for demand at 8am, then eight columns for temperature ranging from the most current hour to the 7 past hours. The additional complication is that for hours before 8AM i.e. "4AM", I have to get yesterday's temperatures. I am hitting my head against the wall trying to figure out how to do this with apply or plyr, or a vectorized function.
demand8AM Temp8AM Temp7AM Temp6AM...Temp1AM
Demand4AM Temp4AM Temp3AM Temp2AM Temp1AM Temp12AM Temp11pm(Lag) Temp10pm(Lag)
In my code Hours are numbers; 1 is 12AM etc.
Here is some simple code I created to create the dataset I am dealing with.
#Creating some Fake Data
require(plyr)
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:90
dates <-seq(as.Date("2012-01-01"), as.Date("2012-3-30"), by = "day")
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Temperature <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Temperature )
## ok, done with the fake data generation.
It looks as though you could benefit from utilizing a time series. Here's my interpretation of what you want (I used the "mean" function in rollapply), not what you asked for. I recommend you read over the xts and zoo packages.
#create dummy time vector
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
#create dummy demand and temp.C
info <- data.frame(demand = sample(1:length(time_index), replace = T),
temp.C = sample (1:10))
#turn demand + temp.C into time series
eventdata <- xts(info, order.by = time_index)
x2 <- eventdata$temp.C
for (i in 1:8) {x2 <- cbind(x2, lag(eventdata$temp.C, i))}

Resources