Order dates in ggplot by month - r

I have DF$Date in the as.Date format "yyyy-mm-dd" as shown below. Is there an easy way to get these grouped by month in ggplot?
Date
2015-07-30
2015-08-01
2015-08-02
2015-08-06
2015-08-11
2015-08-12
I've added a column DF$Month as "year Monthname" (e.g. April 2015.)
I'm doing this by DF$Month<-strftime(DF$Date,format="%B %Y")
Is there a quick way to factor the month/years so that they are ordinal?
I used a workaround by formatting using:
DF$Month<-strftime(DF$Date,format="%Y-%m") so that the larger numbers are first and subsequently the month number.
This gives the output, which is sortable:
DF$Month
"2015-07"
"2015-08"
This output allows me to get this grouping:
http://imgur.com/df1FI3s
When using this plot:
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+
geom_boxplot()
MonthlyActivity
Any alternatives so I can use the full month name and still be in the correct time order?

There are probably other solutions, but here is one with full month names as a factor. As you already found out, you need a x variable to group by. We can then treat it as a 'order a factor' problem instead of a date-scale problem.
#first, generate some data
dat <- data.frame(date=sample(seq(as.Date("01012015",format="%d%m%Y"),
as.Date("01082015", format="%d%m%Y"),by=1),1000,T),
value=rnorm(1000))
We find the minimum and maximum month, and do some date-arithmetic to allow for all start-days (so that february doesn't get skipped when the minimum date is on the 29th/30th/31st). I used lubridate for this.
library(lubridate)
min_month = min(dat$date)-day(min(dat$date))+1
max_month = max(dat$date)-day(max(dat$date))+1
We generate a grouping variable. It is a factor with labels like 'January 2015, March 2015'. However, we force the order by creating a sequence (by month) from min date to max date and formatting it in the same way.
dat$group <- factor(format(dat$date, "%B %Y"),
levels=format(seq(min_month, max_month,by="month"),
"%B %Y"))
This forces the ordering on the axis:

Try adding
scale_x_discrete(limits = month.abb)
so your code would be
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+ geom_boxplot()+scale_x_discrete(limits = month.abb)
you will need library(dplyr)

Related

Converting yyyy numeric data to date format in R

I have a dataset, df with a column containing dates in yyyy format (ex: 2018). I’m trying to make a time series graph, and therefore need to convert them to a date format.
I initially tried, df$year <- as.Date(df$year) but was told I needed to specify an origin.
I then tried to convert to a character, then a date format:
df$year <- as.character(df$year)
df$year <- as.Date(df$year, format = “%Y”)
This seems to have worked, however when it changed the all the years to yyyy-mm-dd format, and set the month and day to April 5th, today. For example 2018 becomes 2018-04-05.
Does anyone have an idea for how to fix this? I would like it to start on January 1, not the day I am performing the conversion. I also tried strptime(as.character(beer_states$year), “%Y”) with the same result.
Any help would be very much appreciated. Thanks!
Add an arbitrary date and month before converting to date.
df$Date <- as.Date(paste(df$year, 1, 1), '%Y %m %d')
We can use as.yearmon
library(zoo)
df$Date <- as.Date(as.yearmon(df$year, '-01'), '%Y-%m'))

How can I fix my x-axis, and plot my dates using GGPlot2 in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
So I have a CSV file with two columns. Date (DD-MM-YYYY) and Gas Price (0.00). I also converted the date using as.Date. But when I attempted to plot it using ggplot, I kept getting this error that it can't work with numeric type or something.
Here is my source code:
gasoline <- read.csv(file.choose())
Date <- gasoline$Date
Price <- as.numeric(gasoline$Price) str(Price) ggplot(gasoline, aes(Date, Price)) + geom_line(colour="red")
Unfortunately, this code results in my graph having every date crammed down at the bottom so that it is barely legible:
(imgur.com/a/iitXw).
It's just a flat line. It also says, "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?"
How can I plot dates such as this in Ggplot?
It is hard to diagnose the problem or propose a solution without your code or data. However, the code below is a very simple way to create a basic plot from example data that matches your description of your data:
# attach the ggplot package
library(ggplot2)
# make example data that matches your description
# (where the dates are January, April, July, and October 1 of 2015)
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015')
prices <- rnorm(4, 3)
# now convert dates to date format
dates <- as.Date(dates, format='%d-%m-%Y')
# you should then be able to create a simple plot using qplot
qplot(dates, prices, geom='line', xlab='Date', ylab='Price',
main='Gas Prices Over Time')
Note the use of the format option in the as.Date() function. Since the dates are in DD-MM-YYYY format, while as.Date() assumes YYYY-MM-DD format, your dates would not convert correctly without that option:
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015')
dates <- as.Date(dates)
format(dates, format="%B %d %Y")
[1] "January 20 1" "April 20 1" "July 20 1" "October 20 1"
Since you had some problem with your dates being numeric, I suspect you did some data cleaning after converting them to dates; for example,
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015', NA)
dates <- as.Date(dates, format='%d-%m-%Y')
class(dates)
[1] "Date"
dates <- ifelse(is.na(dates), NA, dates)
class(dates)
[1] "numeric"
This is because Date objects are stored as integer values, which are used when making logical comparisons (see this page at IDRE). However, that still should not throw an error when plotting; the axis labels should simply be numbers rather than dates.
prices <- c(rnorm(4, 3), NA)
qplot(dates, prices, geom='line', xlab='Date', ylab='Price',
main='Gas Prices Over Time')
Hopefully this will get you started on a solution to your problem.
EDIT:
Having seen the first twelve rows of your csv file, I can give you the following solution:
library(ggplot2)
df <- read.csv('dataframe.csv') # change file name/path as appropriate
df$Date <- as.Date(df$Date, format='%m/%d/%Y')
ggplot(df, aes(x=Date, y=Price)) + geom_line(color="red")
For me this produced the following plot:
I believe the issue was that in the code you provided me, the date variable was not converted to a date object; it was a factor (as seen in the lower right of the screenshot you provided). There was only one observation per level of the factor, resulting in the error you saw. You can get rid of that error by adding group=1 to the aesthetic (i.e. aes(x=Date, y=Price, group=1)), but this does not accomplish exactly what you want.
The key is making sure you convert the variable to a Date object (being careful about the format, as mentioned above); then everything works out nicely.

Re-coding dates

I am trying to recode my time variable in my dataset. Currently, my dataset reflects data for all of December and I would like to re-code the dates so that there is a variable that includes week1, week2, week3, and week4.
My date is formatted as
december$DATE <- as.Date(december$DATE, "%m/%d/%Y")
This is my current attempt at re-coding, but to no avail:
december$week <- cut (december$DATE,
breaks = c(-Inf, 12/08/2016, 12/15/2016, 12/22/2016, Inf),
labels=c("W1", "W2", "W3", "W4"))
The traditional way of recoding continuous into categorical is not applicable in this case. Any suggestions?
The ISOWeek package allows you to convert Date-objects to ISO 8601 weeks. For example,
ISOweek::ISOweek('2016-12-08')
yields
[1] "2015-W53"
This could do it too:
december$week <- cut (december$DATE,
breaks = as.Date(c("2016-12-01", "2016-12-08", "2016-12-15", "2016-12-22", "2016-12-31")),
labels=c("W1", "W2", "W3", "W4"))
You could consider adding a column of week numbers based on the date e.g.
require(plyr)
december$week <- format(december$date, format="%U")
Use ?strptime to see that %U gives week of the year (00–53) starting on Sunday. %V starts on a Monday etc.

R aggregate by %Y-%b as date

I have a data frame item_sold_time with a item_sold_time$SaleDate "POSIXct" "POSIXt" column, looking like this 2015-04-28 07:59:43.
I want to aggregate by month in year:
df2<-aggregate(list(Qty=item_sold_time$Qty), by=list(ISBN=item_sold_time$ISBN,DateYM=strftime(item_sold_time$SaleDate, format="%Y-%b")), FUN=sum)
The problem with strftime is that it converts the date into characters and I get an error if I try to convert it back into date.
I tried all combinations of date formatting, I could find in 2 days of searching. The final destination of that date is to be used in this plot:
ggplot(df2, aes(x = DateYM, y=Qty))
Please help. Thanks
Set each date to (e.g.) the 15th of the month, then can aggregate by date and plot as normal.
as.Date(format(item_sold_time$SaleDate, "%Y-%m-15"))

R: sorting a data frame by two columns "Year" and "Month"

I have a data frame DF which has a column Month as a character string using the full English name of the month, and a column Year as numeric:
Year Month {several xi}
2016 April {numeric}
I need to plot several of the xi as a time series. What is the most efficient way to sort this data frame from the earliest month (January 2015) to the present? My attempts to convert "month" into a date-classed object using as.Date are not working as I'd like; they keep coming back sorted alphabetically.
Apologies if this is a noob question, but by sheer bad luck I have not had to work with date-class objects very often in my R career, so I'm not sure which of the various similar questions I am seeing can help me.
I concur with Gregor's suggestion of using the zoo package. I think it is good practice to combine dates into one variable. If you ever need to extract information about only the year or month you can use the lubridate package. Here is a simple example of how to use zoo.
library(zoo)
#Toy Data Set
d <- data.frame( Month = c("March", "April", "May", "March"), Year = c("2008", "1998", "1997", "1999"), stringsAsFactors = FALSE)
#Generating Yearmon
d$my <- as.yearmon(paste(d$Month, d$Year))
#Ordering the data
d <- d[order(d$my), ]
Make sure that the month and year variables in your data frame are not factors. They must respectively be of a character and numeric/integer class.
One note, if you plan to use ggplot instead of plot then you'll need to use scale_x_yearmon().
Finally, you mention that you had trouble with as.Date. As Gregor notes, this is because as.Date expects a format which contains a day, month and year. Therefore in your case you can insert an arbitrary day to use as.Date. For example, as.Date(paste(d$Month, 1, d$Year), "%B %d %Y"). For a complete list of the different date formats read this.

Resources