I am currently trying to convert quarterly GDP data into monthly data and make it a time series. I figured out a way to do so before, however, I lost my code for it and cannot figure it out again. My data needs to span from 1997-01-01 to 2018-12-01, and the original data I had taken from the Fred database.
library(pdfetch)
GDP.1 <- pdfetch_FRED("GDP")
My code looked something like this...
GDP.working <- seq(as.Date(GDP.1, "1997/1/1"),
by = "month",
length.out = 252)
xts.1 <- xts(x = rep(NA,252),
order.by = seq(as.Date("1997/1/1"),
by = "month",
length.out = 252),
frequency = 12)
xts.GDP.1 <- merge(GDP.working, xts.1)
GDP.final <- na.locf(xts.GDP.1, fromLast = TRUE)
Any suggestions/methods would be very much appreciated.
Thank you for your time.
ERV
Related
I am currently trying to convert a data.frame to a time series. The data frame looks like this:
All I want to do is be able to plot the doc data as a function of time and run a statistical test on it.
Any help would be greatly appreciated!
This is what my code currently looks like:
x=aggregate( doc ~ mo + yr , B , mean )
x$Date <- as.yearmon(paste(x$yr, x$mo), "%Y %m")
df_ts <- xts(x, order.by = x$Date)
keeps <- "doc"
df_ts <- df_ts[ , keeps, drop = FALSE]
df_ts_1 <- as.ts(df_ts, start = head(index(df_ts), 1), end =
tail(index(df_ts), 1))
The issue I'm running into is that the months and years are not in sequential order so when I try to apply a as.tf function, the data does not fill in correctly.
Using DF defined reproducibly in the Note at the end read the data frame into a zoo object converting the Date column to yearmon class and plot using ggplot2. If there exist duplicate dates (there are none in the example data) then add the aggregate = mean argument to read.zoo.
library(ggplot2)
library(zoo)
z <- read.zoo(DF[c("Date", "doc")], FUN = as.yearmon, format = "%b %Y")
autoplot(z) + scale_x_yearmon()
This would also work:
tt <- as.ts(z)
plot(na.approx(tt), ylab = "tt")
Note
In the future please do not use images. I have retyped the first three rows this time.
DF <- data.frame(month = c("02", "10", "12"), year = c(1998, 2000, 2000),
doc = c(1.55, 2.2, 0.96), Date = c("Feb 1998", "Oct 2000", "Dec 2000"),
stringsAsFactors = FALSE)
I have a data frame with daily data. I need to bind it to hourly data, but first I need to convert it to a suitable posixct format. This looks like this:
set.seed(42)
df <- data.frame(
Date = seq.Date(from = as.Date("2015-01-01", "%Y-%m-%d"), to = as.Date("2015-01-29", "%Y-%m-%d"), by = "day"),
var1 = runif(29, min = 5, max = 10)
)
result <- data.frame(
Date = d <- seq.POSIXt(from = as.POSIXct("2015-01-01 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-01-29 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour"),
var1 = rep(df$var1, each = 24) )
However, my data is not as easy to work with as the above. I have lots of missing dates, so I need to be able to take the specific df$Date-vector and convert it to a posixct frame, with the matching daily values.
I've looked high and low but been unable to find anything on this.
The way I went about this was to find the min and max of the dataset and deem them hour 0 and hour 23.
hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$Date, "00:00:00"),tz="")),max(as.POSIXct(paste0(df$Date, "23:00:00"),tz="")),by="hour"))
hourly[,"Var1"] <- df[match(x = as.Date(hourly$Hourly),df$Date),"var1"]
This achieves a result of the daily values becoming hourly with the daily var1 assigned to each hour that contains the day. In this respect missing daily values should not be an issue and if there is no match, it will add in NA's.
I've run into a trouble working with time series & zones in R, and I can't quite figure out how to proceed.
I have an time series data like this:
df <- data.frame(
Date = seq(as.POSIXct("2014-01-01 00:00:00"), length.out = 1000, by = "hours"),
price = runif(1000, min = -10, max = 125),
wind = runif(1000, min = 0, max = 2500),
temp = runif(1000, min = - 10, max = 25)
)
Now, the Date is in UTC-time. I would like to subset/filter the data, so for example I get the values from today (Today is 2014-05-13):
df[ as.Date(df$Date) == Sys.Date(), ]
However, when I do this, I get data that starts with:
2014-05-13 02:00:00
And not:
2014-05-13 00:00:00
Because im currently in CEST-time, which is two hours after UTC-time. So I try to change the data:
df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H", tz = "Europe/Berlin")
Yet this doesn't work. I've tried various variations, such as stripping it to character, and then converting and so on, but I've run my head against a wall, and Im guessing there is something simple im missing.
To avoid using issues with timezones like this, use format to get the character representation of the date:
df[format(df$Date,"%Y-%m-%d") == Sys.Date(), ]
I wish to create 24 hourly data frames in which each data.frame contains hourly demand for a product as 1 column, and the next 8 columns contain hourly temperatures. For example, for the data.frame for 8am, the data.frame will contain a column for demand at 8am, then eight columns for temperature ranging from the most current hour to the 7 past hours. The additional complication is that for hours before 8AM i.e. "4AM", I have to get yesterday's temperatures. I am hitting my head against the wall trying to figure out how to do this with apply or plyr, or a vectorized function.
demand8AM Temp8AM Temp7AM Temp6AM...Temp1AM
Demand4AM Temp4AM Temp3AM Temp2AM Temp1AM Temp12AM Temp11pm(Lag) Temp10pm(Lag)
In my code Hours are numbers; 1 is 12AM etc.
Here is some simple code I created to create the dataset I am dealing with.
#Creating some Fake Data
require(plyr)
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:90
dates <-seq(as.Date("2012-01-01"), as.Date("2012-3-30"), by = "day")
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Temperature <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Temperature )
## ok, done with the fake data generation.
It looks as though you could benefit from utilizing a time series. Here's my interpretation of what you want (I used the "mean" function in rollapply), not what you asked for. I recommend you read over the xts and zoo packages.
#create dummy time vector
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
#create dummy demand and temp.C
info <- data.frame(demand = sample(1:length(time_index), replace = T),
temp.C = sample (1:10))
#turn demand + temp.C into time series
eventdata <- xts(info, order.by = time_index)
x2 <- eventdata$temp.C
for (i in 1:8) {x2 <- cbind(x2, lag(eventdata$temp.C, i))}
I have a data.frame of a time series of data, I would like to thin the data by only keeping the entries that are measured on every even day number. For example:
set.seed(1)
RandData <- rnorm(100,sd=20)
Locations <- rep(c('England','Wales'),each=50)
today <- Sys.Date()
dseq <- (seq(today, by = "1 days", length = 100))
Date <- as.POSIXct(dseq, format = "%Y-%m-%d")
Final <- data.frame(Loc = Locations,
Doy = as.numeric(format(Date,format = "%j")),
Temp = RandData)
So, how would I reduce this data frame to only contain every entry that is measured on even numbered days such as Lloc, day, and temp on day 172, day 174 and so on...
What about:
Final[Final$Doy%%2==0,]