Convert date to month/year format for time series - r

I have some have some water quality sample data.
> dput(GrowingArealog90s[1:10,])
structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
This data is collected monthly, although some months are missed over the 25 year period.
I know there is so much help out there for converting dates to different formats but I have not been able to figure this out. I want to create a time series with just a month/year format, so that I can do things like decompose the data by month and run seasonal kendalls and such. I have tried so many different ways of converting my date to the desired format that I have completely confused myself. I don't care about the exact format as long as it is recognized month/year.
I also need to fill in the missing months with NAs.
I tried uploading the "SampleDate" column in a numeric format, "yyyymm". I could then merge that data frame with another that contained all the dates I need.
GA90 <- merge(Dates, GrowingArealog90s, by.x = "Date", by.y = "Date", all.x = TRUE)
However, when I converted the resulting data frame to a time series it would not recognize the 12 month frequency.
GA90ts <- as.ts(GA90, frequency(12))
> GA90ts
Time Series:
Start = 1
End = 324
Frequency = 1
Any help with this is appreciated.

Here's how to do it with zoo. You'll get a warning, but it's OK for now. You'll get a series with mon/yy.
series <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo)
series <-as.data.frame(series) #to drop dplyr class
series.zoo <-zoo(series[,-1,drop=FALSE],as.yearmon(series[,1]))
Best practice would be to keep your series with actual date and use as.yearmon or as.yearmon only when you actually need to make calculations or aggregate.zoo by month and year.
The following is a matter of taste, but I've dealt with a lot of time series and I think zoo is superior to ts and xts. Much more flexible.
Now, to fill in missing values, you have to create a vector of dates. Here, I'm using a zoo object with actual dates. I then use na.locf, which is "last observation carry forward". You could also look at na.approx.
series.zoo <-zoo(series[,-1,drop=FALSE],(series[,1]))
my.seq <-seq.Date(first(series[,1,drop=FALSE]), last(series[,1,drop=FALSE]),by="month")
merged <-merge.zoo(series.zoo,zoo(,my.seq))
na.locf(merged)
UPDATE
With aggregate.
GrowingArealog90s <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo);library(xts)
GrowingArealog90s <-as.data.frame(GrowingArealog90s) #to remove dplyr format
GrowingArealog90s.zoo <-zoo(GrowingArealog90s[,-1,drop=FALSE],as.Date(GrowingArealog90s[,1]))
#First aggregate by month. I chose to get the mean per month
GrowingArealog90s.agg <-aggregate(GrowingArealog90s.zoo, as.yearmon, mean) #replace mean with last to get last reading of the month
#Then create a sequence of months and merge it
my.seq <-seq.Date(first(GrowingArealog90s[,1]), last(GrowingArealog90s[,1]),by="month")
merged <-merge.zoo(GrowingArealog90s.agg ,zoo(,as.yearmon(my.seq)))
na.locf(merged)

Related

R plotting annual data and "January" repeated at end of graph

I'm fairly new to R and am trying to plot some expenditure data. I read the data in from excel and then do some manipulation on the dates
data <- read.csv("Spending2019.csv", header = T)
#converts time so R can use the dates
strdate <- strptime(data$DATE,"%m/%d/%Y")
newdate <- cbind(data,strdate)
finaldata <- newdate[order(strdate),]
This probably isn't the most efficient, but it gets me there :)
Here's the relevant columns of the first four lines of my finaldata dataframe
dput(droplevels(finaldata[1:4,c(5,7)]))
structure(list(AMOUNT = c(25.13, 14.96, 43.22, 18.43), strdate = structure(c(1546578000,
1546750800, 1547010000, 1547010000), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, 4L), class = "data.frame")
The full data set has 146 rows and the dates range from 1/4/2019 to 12/30/2019
I then plot the data
plot(finaldata$strdate,finaldata$AMOUNT, xlab = "Month", ylab = "Amount Spent")
and I get this plot
This is fine for me getting started, EXCEPT why is JAN repeated at the far right end? I have tried various forms of xlim and can't seem to get it to go away.

Converting a count down timer to a usable format

I am trying to convert a count down timer from a character into a usable format, ideally a time format but numeric may work.
I have tried converting it using, as.POSIXct and also using the Chron package
Here is a dput of the DF
structure(list(Time = c("(-01:30)", "(-01:15)", "(-01:00)", "(-00:45)",
"(-00:30)", "(-00:15)", "0", "+00:13", "+00:15", "+00:30", "+00:45"
)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", "data.frame"
))
I have already removed the brackets from the time column using
sd$Time = (gsub("[(),//]", "", sd$Time))
Then tried ton convert using the following
sd$Time <- as.POSIXct(sd$Time, format="%M:%S")
An option would be strptime
strptime(sub("^0$", "00:00", gsub("[-+()]", "", sd$Time)), format = "%M:%S")

arrange with period object (ms function) doesn't work - R

I have a time recorded in following format mm:ss where the minutes values can actually be greater than 59. I have parsed it as chr. Now I need to sort my observations in a descending order so I firstly created a time2 variable with ms function and used arrange on the new variable. However arranging doesn't work and the values in the second column are totally mixed up.
library(tidyverse)
library(lubridate)
test <- structure(list(time = c("00:58", "07:30", "08:07", "15:45", "19:30",
"24:30", "30:05", "35:46", "42:23", "45:30", "48:08", "52:01",
"63:45", "67:42", "80:12", "86:36", "87:51", "04:27", "09:34",
"12:33", "18:03", "20:28", "21:39", "23:31", "24:02", "26:28",
"31:13", "43:03", "44:00", "45:38")), .Names = "time", row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
test %>% mutate(time2 = ms(time)) %>% arrange(time2) %>% View()
How can I arrange this times?
I think it would be easier to just put time in te same unit and then arrange(). Try this:
test %>% mutate(time_in_seconds = minute(ms(time) )*60 + second(ms(time))) %>%
arrange(desc(time_in_seconds)) %>%
View()
seconds_to_period(minute(ms(test$time))*60 + second(ms(test$time))) # to get right format (with hours)
This is a known limitation with arrange. dplyr does not support S4 objects: https://github.com/tidyverse/lubridate/issues/515

R convert YYMMDD to date

I have data in YYMMDDHH format but am trying to get the weekday so I need to go to a date format but can't figure it out.
Here's a dput of the relevant data:
structure(list(id = c(7927751403363142656, 18236986451472797696,
5654946373641778176, 14195690822403907584, 1693303484298446848,
1.1362181921561e+19, 11694645532962195456, 1221431312630614784,
1987127670789791488, 379819848497418688), hour = c(14102118L,
14102217L, 14102812L, 14102912L, 14102820L, 14102401L, 14102117L,
14102312L, 14102301L, 14102414L)), .Names = c("id", "hour"), row.names = c(3620479L,
8510796L, 29632625L, 34450879L, 31874113L, 13420799L, 3332671L,
11543560L, 9602012L, 15574701L), class = "data.frame")
When I use:
dat2$dow <- as.Date(substr(as.character(dat2$hour), 1,6), format = '%Y%m%d')
I just get NA's. Any suggestions?
"%Y" is for 4-digit years; "%y" is for 2-digit years. And you don't need to use substr. as.Date will ignore anything after the end of the specified format.
dat2$dow <- as.Date(as.character(dat2$hour), format='%y%m%d')

R read.zoo error for incorrect date format

I have a data that has one date column and 10 other columns.
The date column has the format of 199010.
so it's yyyymm.
It seems like that zoo/xts requires that the date has days info in it.
Is there any way to address this issue?
hier ist my data
structure(list(Date = 198901:198905, NoDur = c(5.66, -1.44, 5.51,
5.68, 5.32)), .Names = c("Date", "NoDur"), class = "data.frame", row.names = c(NA,
5L))
data<-read.zoo("C:/***/data_port.csv",sep=",",format="%Y%m",header=TRUE,index.column=1,colClasses=c("character",rep("numeric",1)))
The code has these problems:
the data is space separated but the code specifies that it is comma separated
the data does not describe dates since there is no day but the code is using the default of dates
the data is not provided in reproducible form. Note how one can simply copy the data and code below and paste it into R without any additional work.
Try this:
Lines <- "Date NoDur
198901 5.66
198902 -1.44
198903 5.51
198904 5.68
198905 5.32
"
library(zoo)
read.zoo(text = Lines, format = "%Y%m", FUN = as.yearmon, header = TRUE,
colClasses = c("character", NA))
The above converts the index to "yearmon" class which probably makes most sense here but it would alternately be possible to convert it to "Date" class by using FUN = function(x, format) as.Date(as.yearmon(x, format)) in place of the
FUN argument above.

Resources