Change order of dates in R ggplot? - r

I have been working on a plot in R using ggplot and plotting dates on the x axis. I have noticed that R does not recognize them as dates, and so the order on the x axis is wrong. I have tried many different things such as using as.Date(), manually editing levels and ordering the x axis, but nothing has worked. Here's my code:
library(dplyr)
library(ggplot2)
library(hrbrthemes)
calories_data = read.csv('dailyCalories_clean.csv',header = TRUE, sep=",")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line()
Here's the plot
I appreciate any help, I'm new at this and have been researching for hours with no success. Thank you!

One option to fix your issue would be to convert your dates to proper dates to fix the order and use the date_labels argument of scale_x_date to format your dates. To convert to dates you have to add a fake year to your ActivityDay, e.g. "2022":
Using some fake random data to mimic your real data:
library(ggplot2)
set.seed(123)
calories_data <- data.frame(
ActivityDay <- rep(c("4/1", "4/10", "5/11", "5/1"), 3),
Id = rep(1:3, each = 4),
Calories = runif(12, 1000, 3000)
)
calories_data$ActivityDay <- as.Date(paste("2022", calories_data$ActivityDay, sep = "/"), format = "%Y/%m/%d")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line() +
scale_x_date(date_breaks = "5 day", date_labels = "%m/%d")

Related

ggplot2, x-axis not recognizing dates

Trying to plot the following data frame (call it bob):
1
Since the original date is in d/m/y, I use Finaldate and Value to graph.
Here is the code used to graph:
ggplot(Bob, aes(Finaldate, Value)) +geom_line() + geom_point(size = 3) +
labs(title = "TITLE",subtitle = "SUBTITLE", y = "Y", x = "X") +
theme_fivethirtyeight()+scale_y_continuous(name="name", labels = scales::comma)+theme(legend.title = element_blank())+scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
While I do get an output, it is not as a time series but rather the dates are not in order and the plot makes no sense. Attached a copy of the plot as well.
enter image description here
Not sure how to fix this problem, and have tried a couple of different things
Have you tried using
+ scale_x_date(date_labels = "%d %m %Y") (ggplot2)
https://r-graph-gallery.com/279-plotting-time-series-with-ggplot2.html
You need to convert Finaldate to a date -- it is being treated as a character so all the dates are in "alphabetical" order. Try:
Bob$finalDate <- as.Date(Bob$finalDate, format = "%m/%d/%Y")

How to unclutter the x-axis in a plot

Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks
You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:

Frequency count of dates

I have a data frame with a column of dates. I am trying to get a frequency count of each date. I was thinking that a histogram would visualize the data nicely, but maybe there is a better way? I was able to created a histogram of the data but it is not exactly what I was looking for. I was hoping to get each individual date on the x-axis and the frequency count on the y-axis.
I have done some programming in R but I have not done much visualizations in R. Any help would be greatly appreciated.
RawDates<- c("11/8/2017","12/6/2017","10/6/2017","12/6/2017","1/24/2018","9/5/2017","1/24/2018","2/21/2018","10/12/2017","1/22/2018","5/2/2018","1/24/2018","10/12/2017","1/22/2018","2/21/2018","5/2/2018","3/12/2018","5/3/2018","11/7/2017","12/5/2017","9/8/2017","10/6/2017","10/5/2017","11/3/2017","12/6/2017","2/21/2018","11/2/2017","12/5/2017","5/2/2018","1/24/2018","9/6/2017","11/2/2017","2/21/2018","5/2/2018","1/24/2018","11/8/2017","3/12/2018","5/3/2018","1/24/2018")
FormattedDates <- as.Date(RawDates, format = "%m/%d/%Y")
df <- data.frame(FormattedDates)
##This is whatI have already tried
hist(df$FormattedDates, "days", format = "%m/%d/%Y")
Here a simple ggplot2 solution:
library(ggplot2)
library(scales)
ggplot(df) +
geom_histogram(aes(x = FormattedDates)) +
scale_x_date(labels = date_format("%m %d %Y"), date_breaks = "30 days") +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))

Change x-axis names in ggplot

I am not very good in R, and need some help.
My ggplot has a lot of dates(in the x-axis) so you can't actually see the dates, and I want to change it to months to give a better overview of the plot.
For example to something like this in the link:
Display the x-axis on ggplot as month only in R
This is the script I'm using:
r <- read.csv("xxdive.csv", header = T, sep = ";")
names(r) <- c("Date", "Number")
r <- data.frame(r)
r$Date <- factor(r$Date, ordered = T)
r[1:2, ]
Date Number
16.02.2015 97
17.02.2015 47
library(tidyverse)
ggplot(r, aes(Date, Number)) +
theme_light() +
ggtitle("16.02.15-10.02.16") +
ylab("Dives") +
geom_line(aes(group = 1), color = "blue")
This shows what kind of data I have.
I have tried using scale etc, but I can't make it work..
I hope this was understandable, and that someone can help me!! :)
I would convert column Date to data type Date
r$Date <- as.Date(r$Date, "%d.%m.%Y");
instead of converting it to data type factor.
r$Date <- factor(r$Date, ordered = T);
It's a little tricky without a working example, but try this.
install.packages("tidyverse")
library(tidyverse)
r <- read_delim("xxdive.csv", ";", col_types = list(col_date(), col_integer()))
names(r) <- c("Date", "Number")
ggplot(r, aes(Date, Number)) +
geom_line(aes(group = 1), color = "blue") +
scale_x_date(date_breaks = "1 month") +
ylab("Dives") +
ggtitle("16.02.15-10.02.16") +
theme_light()

ggplot2: plotting time series data by month & week

I'm trying to plot time series data by week and month; ideally, I think, I'd like to use boxplots to visualise daily data binned by week. While I can change the labels and gridlines on the x-axis using scale_x_date, that won't affect the points in the plot.
Here's a demonstration of the problem and my current (clumsy) solution.
library(zoo)
library(ggplot2)
d = as.Date(c(as.Date("2007-06-01"):as.Date("2008-05-31"))) # using zoo to reformat numeric
x = runif(366, min = 0, max = 100)
df = data.frame(d,x)
# PROBLEM #
p = ggplot(df, aes(d, x))
p + geom_point()
p + geom_boxplot() # more or less useless
# CURRENT FIX #
df$Year.Month <- format(df$d, "%Y-%m")
p = ggplot(df, aes(Year.Month, x))
p + geom_point(alpha = 0.75)
p + geom_boxplot() # where I'm trying to get to...
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
#shadow's answer below is much neater. But is there a way to do this using binning? Using stats in some form, perhaps?
You can treat Dates as dates in R, and use scale_x_date() in ggplot to get the x-labels you want.
Also, I find it easier to just create a new variable-factor called "Month" to group the boxplots by month. In this case I used lubridate to accomplish the task.
If you do not want to go through the trouble of creating a new variable "Month", your bloxplot will be plotted on the 15th of the month, making the viz reading a bit more difficult.
library(magrittr)
library(lubridate)
library(dplyr)
df %>%
mutate(Date2 = as.Date(paste0("2000-", month(d), "-", "01"))) %>%
mutate(Month = lubridate::month(d)) %>%
ggplot(aes(Date2, x, group=Month)) +
geom_boxplot() +
scale_x_date(date_breaks="1 month", date_labels = "%b")
If you do not create the variable "Month", boxplots won't align nicely with the x tick marks:

Resources