Choose specific date with strptime in r - r

I have a text file dataset with headers
YEAR MONTH DAY value
which runs hourly from 1/6/2010 to 14/7/2012. I open and plot the data with the following commands:
data=read.table('example.txt',header=T)
time = strptime(paste(data$DAY,data$MONTH,data$YEAR,sep="-"), format="%d-%m-%Y")
plot(time,data$value)
However, when the data are plotted, the x axis only shows 2011 and 2012. . How can I do to keep the 2011 and 2012 labels but also to add some specific month, e.g. if I want March, June & September?
I have made the data available on this link
https://dl.dropbox.com/u/107215263/example.txt

You need to use function axis.POSIXct to format and dispose of your date labels as you wish:
plot(time,data$value,xaxt="n") #Skip the x-axis here
axis.POSIXct(1, at=pretty(time), format="%B %Y")
To see all possible formats, see ?strptime.
You can of course play with parameter at to place your ticks wherever you want, for instance:
axis.POSIXct(1, at=seq(time[1],time[length(time)],"3 months"),
format="%B %Y")

While this doesn't answer question directly, I would like to suggest you to use xts package for any timeseries analysis. It makes timeseries analysis very convenient
require(xts)
DF <- read.table("https://dl.dropbox.com/u/107215263/example.txt", header = TRUE)
head(DF)
## YEAR MONTH DAY value
## 1 2010 6 1 95.3244
## 2 2010 6 2 95.3817
## 3 2010 6 3 100.1968
## 4 2010 6 4 103.8667
## 5 2010 6 5 104.5969
## 6 2010 6 6 107.2666
#Get Index for xts object which we will create in next step
DFINDEX <- ISOdate(DF$YEAR, DF$MONTH, DF$DAY)
#Create xts timeseries
DF.XTS <- .xts(x = DF$value, index = DFINDEX, tzone = "GMT")
head(DF.XTS)
## [,1]
## 2010-06-01 12:00:00 95.3244
## 2010-06-02 12:00:00 95.3817
## 2010-06-03 12:00:00 100.1968
## 2010-06-04 12:00:00 103.8667
## 2010-06-05 12:00:00 104.5969
## 2010-06-06 12:00:00 107.2666
#plot xts
plot(DF.XTS)

Related

Specifying start date of timeseries data in R as Q2

I have time series data that is seasonal by the quarter. However, the data starts in the 2nd quarter of the first year but all other years have all four quarters.
> EquifaxData
DATE EQFXSUBPRIME013045
1 2014-04-01 42.58513
2 2014-07-01 43.15483
3 2014-10-01 43.55090
4 2015-01-01 42.59218
5 2015-04-01 41.47105
6 2015-07-01 41.53640
7 2015-10-01 41.82020
8 2016-01-01 40.98760
9 2016-04-01 40.51305
10 2016-07-01 39.91170
11 2016-10-01 40.15402
I then converted the Date column to a date as follows:
> EquifaxData$DATE <- as.Date(EquifaxData$DATE)
Now comes the issue. I want to convert this data to a time series. But I need to specify my start date as the beginning of Q2 in 2014. Not the beginning of 2014. As you can see below from what I have tried, the resulting time series shown by head has all the values shifted one quarter back because it is starting from the beginning of 2014.
> EquifaxTs <- ts(EquifaxData$EQFXSUBPRIME013045, start=2014, frequency = 4)
> head(EquifaxTs)
Qtr1 Qtr2 Qtr3 Qtr4
2014 42.58513 43.15483 43.55090 42.59218
2015 41.47105 41.53640
>
How can I define EquifaxTs to correctly start in Q2 2014 and still remain seasonal with a frequency of 4 per year?
I think that's it solves:
EquifaxTs <- ts(EquifaxData$EQFXSUBPRIME013045, start = c(2014, 2), frequency = 4)

Formatting date column with different formats (including missing day information) - lubridate

I'm relatively new to R. I downloaded a dataset about clinical trial data, but it occurred to me, that the format of the dates in the relative column are mixed up: most of them are like "September 1, 2012", but some are missing the day information (e.g. October 2015).
I want to express them all in the same way (eg. yyyy-mm-dd), to work with them. That went fine, the only problem that is missing is the name of the output column. In the last function (date_correction) I planned to include an argument "output_col" which I can pass the intended name for the created (formatted) column, but it only prints output_col all the time.
Do you know, how I could handle this? To pass the intended name of the output column right into the function?
Is there a better way to solve my problem?
-> I even tried to manage more complex orders-argument for lubricate::parse_date_time like
parse_date_time(input_col, orders="mdy", "my")
but this didn't work.
Here's the code:
library("tidyverse")
library("lubridate")
Observation <- c(seq(1:5))
Date_original <- c("October 2014","August 2014","June 2013",
"June 24, 2010","January 2005")
df_dates <- data.frame(Observation, Date_original)
# looking for a comma in the cell
comma_detect <- function(a_string){
str_detect(a_string, ",")
}
# if comma: assume "mdy", if not apply "my" -> return formatted value
date_correction_row <- function(input_col){
if_else(comma_detect(input_col),
parse_date_time(input_col, orders="mdy"),
parse_date_time(input_col, orders="my"))
}
# prepare function for dataframe:
date_correction <- function(df, input_col, output_col){
mutate(df, output_col = date_correction_row(input_col))
}
df_dates %>% date_correction(df_dates$Date_original, date_formatted) %>% view()
OUTPUT
Observation Date_original output_col
1 1 October 2014 2014-10-01
2 2 August 2014 2014-08-01
3 3 June 2013 2013-06-01
4 4 June 24, 2010 2010-06-24
5 5 January 2005 2005-01-01
In the code below we assume that output_col equals "Date". They all set the column name, give no warnings and use Date class.
1) Try each format and take the one that does not give NA. This uses only base R.
output_col <- "Date"
within(df_dates, assign(output_col, pmin(na.rm = TRUE,
as.Date(Date_original, "%B %d, %Y"),
as.Date(paste(Date_original, 1), "%B %Y %d"))))
## Observation Date_original Date
## 1 1 October 2014 2014-10-01
## 2 2 August 2014 2014-08-01
## 3 3 June 2013 2013-06-01
## 4 4 June 24, 2010 2010-06-24
## 5 5 January 2005 2005-01-01
2) This can also be done in lubridate. It is important that my is the first rather than second argument to coalesce since it outputs NA for those values that do not match the format whereas mdy gives a wrong date so if that were first coalesce would never get to my. This approach is shorter than (3) but you might prefer the robustness (3) since it does not depend on what is returned for non-matching dates.
library(dplyr)
library(lubridate)
output_col <- "Date"
df_dates %>%
mutate(!!output_col := coalesce(my(Date_original, quiet = TRUE),
mdy(Date_original)))
## Observation Date_original Date
## 1 1 October 2014 2014-10-01
## 2 2 August 2014 2014-08-01
## 3 3 June 2013 2013-06-01
## 4 4 June 24, 2010 2010-06-24
## 5 5 January 2005 2005-01-01
3) If you prefer your own method of first checking for comma here is a variation of that which is more compact. It uses my and mdy instead of parse_date_time since my and mdy give Date class results which are more appropriate here than the POSIXct of parse_date_time given that there are no times.
library(dplyr)
library(lubridate)
output_col <- "Date"
df_dates %>%
mutate(!!output_col := if_else(grepl(",", Date_original),
mdy(Date_original), my(Date_original, quiet = TRUE)))
## 1 1 October 2014 2014-10-01
## 2 2 August 2014 2014-08-01
## 3 3 June 2013 2013-06-01
## 4 4 June 24, 2010 2010-06-24
## 5 5 January 2005 2005-01-01
When the date structure is known, I like to explicitly correct the date structure first, then parse. Here I use regex to sub in 1 when the day is missing, then we just parse like normal.
library(tidyverse)
df_dates %>%
mutate(
output_col = gsub("(?<!,)\\s(?=\\d{4})", " 1, ", Date_original, perl = TRUE) %>%
as.Date(., format = '%B %d, %Y')
)
Observation Date_original output_col
1 1 October 2014 2014-10-01
2 2 August 2014 2014-08-01
3 3 June 2013 2013-06-01
4 4 June 24, 2010 2010-06-24
5 5 January 2005 2005-01-01

FRED data: aggregate quarterly data into annual

I need to convert quarterly data into yearly, by summing over 4 quarters in each year. When I searched stackoverflow.com, I found that using a function to sum over periods, seem to work. However, the format did not match, so I couldn't work with the converted year data array with the other arrays
For example, annual data in FRED looks as follows:
2009-01-01 12126.078
2010-01-01 12739.542
2011-01-01 13352.255
2012-01-01 14061.878
2013-01-01 14444.823
However, when I changed the data using the following function:
library("quantmod")
library(zoo)
library(mFilter)
library(nleqslv)
fredsym <- c("PROPINC")
quarter.proprietors_income <- PROPINC
## convert to annual
as.year <- function(x) as.integer(as.yearqtr(x)) # a new function
annual.proprietors_income <- aggregate(quarter.proprietors_income, as.yearqtr, sum) # sum over quarters
it changes from this:
2016-01-01 1327.613
2016-04-01 1339.493
2016-07-01 1346.067
2016-10-01 1354.560
2017-01-01 1380.221
2017-04-01 1378.637
2017-07-01 1381.911
2017-10-01 1403.114
to this:
2011 4574.669
2012 4965.486
2013 5138.968
2014 5263.208
2015 5275.225
2016 5367.733
2017 5543.883
What I need is having an annual data but with the original YYYY-MM-DD format, and it should appear as 01-01 for each yearly data.. Otherwise it doesn't work with other annual data...
Is there any way to solve this issue?
Using DF in the Note below use cut as shown:
aggregate(DF["value"], list(year = as.Date(cut(as.Date(DF$Date), "year"))), sum)
giving:
year value
1 2016-01-01 5367.733
2 2017-01-01 5543.883
Note
Lines <- "Date value
2016-01-01 1327.613
2016-04-01 1339.493
2016-07-01 1346.067
2016-10-01 1354.560
2017-01-01 1380.221
2017-04-01 1378.637
2017-07-01 1381.911
2017-10-01 1403.114"
DF <- read.table(text = Lines, header = TRUE)
I found that, the aggregate command makes the class into zoo. No more xts to be remained as time series.
Alternatively, apply.yearly seems to work.
annual.proprietors_income <- apply.yearly(xts(quarter.proprietors_income),sum)
This is now in xts. BUt the thing is they show mon-day as ending quarter as YYYY-10-01 for each year. How can I make it into YYYY-01-01....

Convert character month name to date time object

I must be missing something simple.
I have a data.frame of various date formats and I'm using lubridate which works great with everything except month names by themselves. I can't get the month names to convert to date time objects.
> head(dates)
From To
1 June August
2 January December
3 05/01/2013 10/30/2013
4 July November
5 06/17/2013 10/14/2013
6 05/04/2013 11/23/2013
Trying to change June into date time object:
> as_date(dates[1,1])
Error in charToDate(x) :
character string is not in a standard unambiguous format
> as_date("June")
Error in charToDate(x) :
character string is not in a standard unambiguous format
The actual year and day do not matter. I only need the month. zx8754 suggested using dummy day and year.
lubridate can handle converting the name or abbreviation of a month to its number when it's paired with the rest of the information needed to make a proper date, i.e. a day and year. For example:
lubridate::mdy("August/01/2013", "08/01/2013", "Aug/01/2013")
#> [1] "2013-08-01" "2013-08-01" "2013-08-01"
You can utilize that to write a function that appends "/01/2013" to any month names (I threw in abbreviations as well to be safe). Then apply that to all your date columns (dplyr::mutate_all is just one way to do that).
name_to_date <- function(x) {
lubridate::mdy(ifelse(x %in% c(month.name, month.abb), paste0(x, "/01/2013"), x))
}
dplyr::mutate_all(dates, name_to_date)
#> From To
#> 1 2013-06-01 2013-08-01
#> 2 2013-01-01 2013-12-01
#> 3 2013-05-01 2013-10-30
#> 4 2013-07-01 2013-11-01
#> 5 2013-06-17 2013-10-14
#> 6 2013-05-04 2013-11-23
The following is a crude example of how you could achieve that.
Given that dummy values are fine:
match(dates[1, 1], month.abb)
The above would return you, given that we had Dec in dates[1. 1]:
12
To generate the returned value above along with dummy number in a date format, I tried:
tmp = paste(match(dates[1, 1], month.abb), "2013", sep="/")
which gives us:
12/2013
and then lastly:
result = paste("01", tmp, sep="/")
which returns:
01/12/2013
I am sure there are more flexible approaches than this; but this is just an idea, which I just tried.
Using a custom function:
# dummy data
df1 <- read.table(text = "
From To
1 June August
2 January December
3 05/01/2013 10/30/2013
4 July November
5 06/17/2013 10/14/2013
6 05/04/2013 11/23/2013", header = TRUE, as.is = TRUE)
# custom function
myFun <- function(x, dummyDay = "01", dummyYear = "2013"){
require(lubridate)
x <- ifelse(substr(x, 1, 3) %in% month.abb,
paste(match(substr(x, 1, 3), month.abb),
dummyDay,
dummyYear, sep = "/"), x)
#return date
mdy(x)
}
res <- data.frame(lapply(df1, myFun))
res
# From To
# 1 2013-06-01 2013-08-01
# 2 2013-01-01 2013-12-01
# 3 2013-05-01 2013-10-30
# 4 2013-07-01 2013-11-01
# 5 2013-06-17 2013-10-14
# 6 2013-05-04 2013-11-23

Combining date and time into a Date column for plotting

I want to create a line plot. I have 3 columns in my data frame:
date time numbers
01-02-2010 14:57 5
01-02-2010 23:23 7
02-02-2010 05:05 3
02-02-2010 10:23 11
How can I combine the first two columns and make a plot based on date and time ?
Date is Date class, time is just a char variable.
The lubridate package is another option. It handles most of the fussy formatting details, so it can be easier to use than base R date functions. For example, in your case, mdy_hm (month-day-year_hour-minute) will convert your date and time variables into a single POSIXct date-time column. (If you meant it to be day-month-year, rather than month-day-year, then just use dmy_hm.) See code below.
library(lubridate)
dat$date_time = mdy_hm(paste(dat$date, dat$time))
dat
date time numbers date_time
1 01-02-2010 14:57 5 2010-01-02 14:57:00
2 01-02-2010 23:23 7 2010-01-02 23:23:00
3 02-02-2010 05:05 3 2010-02-02 05:05:00
4 02-02-2010 10:23 11 2010-02-02 10:23:00
library(ggplot2)
ggplot(dat, aes(date_time, numbers)) +
geom_point() + geom_line() +
scale_x_datetime(breaks=date_breaks("1 week"),
minor_breaks=date_breaks("1 day"))
Reconstruct your data:
dat <- read.table(text="
date time numbers
01-02-2010 14:57 5
01-02-2010 23:23 7
02-02-2010 05:05 3
02-02-2010 10:23 11", header=TRUE)
Now use as.POSIXct() and paste() to combine your date and time into a POSIX date. You need to specify the format, using the symbols defined in ?strptime. Also see ?DateTimeClasses for more information
dat$newdate <- with(dat, as.POSIXct(paste(date, time), format="%m-%d-%Y %H:%M"))
plot(numbers ~ newdate, data=dat, type="b", col="blue")

Resources