I have a data frame with a column of dates. I am trying to get a frequency count of each date. I was thinking that a histogram would visualize the data nicely, but maybe there is a better way? I was able to created a histogram of the data but it is not exactly what I was looking for. I was hoping to get each individual date on the x-axis and the frequency count on the y-axis.
I have done some programming in R but I have not done much visualizations in R. Any help would be greatly appreciated.
RawDates<- c("11/8/2017","12/6/2017","10/6/2017","12/6/2017","1/24/2018","9/5/2017","1/24/2018","2/21/2018","10/12/2017","1/22/2018","5/2/2018","1/24/2018","10/12/2017","1/22/2018","2/21/2018","5/2/2018","3/12/2018","5/3/2018","11/7/2017","12/5/2017","9/8/2017","10/6/2017","10/5/2017","11/3/2017","12/6/2017","2/21/2018","11/2/2017","12/5/2017","5/2/2018","1/24/2018","9/6/2017","11/2/2017","2/21/2018","5/2/2018","1/24/2018","11/8/2017","3/12/2018","5/3/2018","1/24/2018")
FormattedDates <- as.Date(RawDates, format = "%m/%d/%Y")
df <- data.frame(FormattedDates)
##This is whatI have already tried
hist(df$FormattedDates, "days", format = "%m/%d/%Y")
Here a simple ggplot2 solution:
library(ggplot2)
library(scales)
ggplot(df) +
geom_histogram(aes(x = FormattedDates)) +
scale_x_date(labels = date_format("%m %d %Y"), date_breaks = "30 days") +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
Related
I have been working on a plot in R using ggplot and plotting dates on the x axis. I have noticed that R does not recognize them as dates, and so the order on the x axis is wrong. I have tried many different things such as using as.Date(), manually editing levels and ordering the x axis, but nothing has worked. Here's my code:
library(dplyr)
library(ggplot2)
library(hrbrthemes)
calories_data = read.csv('dailyCalories_clean.csv',header = TRUE, sep=",")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line()
Here's the plot
I appreciate any help, I'm new at this and have been researching for hours with no success. Thank you!
One option to fix your issue would be to convert your dates to proper dates to fix the order and use the date_labels argument of scale_x_date to format your dates. To convert to dates you have to add a fake year to your ActivityDay, e.g. "2022":
Using some fake random data to mimic your real data:
library(ggplot2)
set.seed(123)
calories_data <- data.frame(
ActivityDay <- rep(c("4/1", "4/10", "5/11", "5/1"), 3),
Id = rep(1:3, each = 4),
Calories = runif(12, 1000, 3000)
)
calories_data$ActivityDay <- as.Date(paste("2022", calories_data$ActivityDay, sep = "/"), format = "%Y/%m/%d")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line() +
scale_x_date(date_breaks = "5 day", date_labels = "%m/%d")
I have been trying to convert POSIXct format so that my date and times would reflect Julian dates.
ind$DateAndTime <- as.POSIXct(ind$DateAndTime, tz = "UTC",
origin = '1970-01-01')
ind$DateAndTime<- format(as.POSIXct(ind_steps$t2),"%y%j")
I had used these two lines of code to do so, but I am now having trouble plotting them using ggplot.
plot_list[[i]] <- ggplot(ind, aes(x = DateAndTime, y = NSD)) +
geom_line() + theme_bw() +
ggtitle(random_tables[i]) +
theme(axis.text.x = element_text(angle = 90))
When I plot it I get this, where the julian dates are vertical, but they still overlap. I would like to get the graph to show the julian dates more visibly and to show every other julian date so that it isn't so cramped in the x-axis. Is there a way to do this?
Here is the completed code. Without any sample data, it is difficult to provide an exact example.
From your previous question, your issue maybe related to attempting to pass a datetime object to a function expecting a date object. In this case I used as.Date() and scale_x_date(), in your case you may want to use as.POSIXct() and scale_x_datetime()
#create dummy data
DateAndTime = 18000:18300
NSD = DateAndTime/10000
ind <-data.frame(DateAndTime, NSD)
#convert the DateAndTime column into a date object
ind$DateAndTime <- as.Date(ind$DateAndTime, tz = "UTC",
origin = '1970-01-01')
#Plot table and format x-axis
ggplot(ind, aes(x = DateAndTime, y = NSD)) +
geom_line() + theme_bw() +
ggtitle("Demo Title") +
scale_x_date(date_breaks = "1 month", date_labels = "%y-%j")
theme(axis.text.x = element_text(angle = 90))
Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks
You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
I am trying to use ggplot to plot a grouped barplot. I noticed that the date variable is considered as factor and while plotting R changes its order randomly.
I tried converting the date using as.date(X,format="%m/%d/%Y") but with no help.
Here is my code,
A<-data.frame(A)
A$Date<-as.Date(A$Date, format = "%m/%d/%Y")
Name=XYZ
B<-A[c(1,2,5,6)]
C<-melt(B,id.vars='Date')
C$Date<-as.Date(C$Date, format = "%m/%d/%Y")
ggplot(C,aes(x=Date,y=value))+
geom_bar(aes(fill = variable), position = "dodge", stat="identity")+
scale_fill_brewer(palette="Set1")+
labs(x='Date',y='%Return')+ggtitle('Model')+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x = element_text(angle = 90, hjust = 1, size=7,color="Black"))+
theme(legend.position = 'bottom')+
theme(legend.text=element_text(size=8))+coord_cartesian( ylim=c(-6, 6))
Before converting Date to the proper date format I got the below output but without the order of Date.
!
After Date conversion, the output changes drastically:
!
Not sure what I am missing out on.
Thanks in advance!
I'm attempting to use ggplot and R for analysing some epidemiologic data, and I'm continuing to struggle with getting an epidemic curve to appear properly.
Data is here
attach(epicurve)
head(epicurve)
onset age
1 21/12/2012 18
2 14/06/2013 8
3 10/06/2013 64
4 28/05/2013 79
5 14/04/2013 56
6 9/04/2013 66
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
ggplot(epicurve, aes(onset)) + geom_histogram() + scale_x_date(breaks=date_breaks("1 year"), minor_breaks=date_breaks("1 month"), labels = date_format("%b-%Y"))
gives this graph. This is fine, but the binwidths are not related to any time period of note, and adjusting them is a bit trial and error.
For this particular dataset, I'd like to display the cases by month of onset.
One way I worked out how to do this is:
epicurve$monyr <- format(epicurve$onset, "%b-%Y")
epicurve$monyr <- as.factor(epicurve$monyr)
ggplot(epicurve, aes(monyr)) + geom_histogram()
Outputs a graph I can't post because of the reputation system. The bars represent something meaningful, but the axis labels are a bomb-site. I can't format the axes using scale_x_date because they aren't dates and I can't work out what arguments to pass to scale_x_discrete to give useful labels.
I have a feeling there should be an easier way to do this by doing an operation on the onset column. Can anyone give me any pointers, please?
One option is to aggregate the data outside ggplot and then use geom_bar. This will produce counts by month.
edited Sept. 21 2013. Altered plot to show months with no counts.
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
# initial formatting
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y") # convert to Date class
epicurve$onset <- strftime(epicurve$onset, format="%Y/%m") # convert to Year-month
epicurve$onset <- paste(epicurve$onset, "/01", sep = "") # add arbitrary day on to end to make compatible w/ ggplot2
# aggregate by month
onset_counts <- aggregate(epicurve$onset, by = list(date = epicurve$onset), length) # aggregate by month
onset_counts$date = as.Date(onset_counts$date, format = "%Y/%m/%d") # covert to Date class
# plot
library(ggplot2)
library(scales)
ggplot(onset_counts, aes(x=date, y=x)) + geom_bar(stat="identity") + theme_bw() + theme(axis.text.x = element_text(angle=90, hjust = 1, vjust = 1)) +
ylab("Frequency") + xlab(NULL) + scale_x_date(breaks="month", labels=date_format("%Y-%m"))
I've also just happened across another way of making it look pretty, although it feels like a bit of a kludge.
#read data
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
#load libraries
library(ggplot2)
library(scales)
#plot
ggplot(epicurve, aes(onset)) + geom_histogram(colour="white", binwidth=30.4375) +
scale_x_date(breaks=date_breaks("1 year"), minor_breaks=("1 month"), labels=date_format("%b-%Y")) +
scale_y_continuous(breaks=0:10, minor_breaks=NULL) +
theme(axis.text.x = element_text(angle=45, vjust=0.5))
# binwidth = (365.25/12) = 30.4375 - which nicely makes the bins fit the scale nicely
Which gives this (notice the beautiful alignment of the bins!):
Many thanks to Nate for the help, and hopefully this will be useful!