How do i reshape my data from y-m-d into years? - r

I want to plot a frequency of topics over years.
However my variable containing dates have the following structure, example:2016-01-01. This means that the data is structured in days.
However i want the data to be visualized on a monthly basis.
The data is structured in a data.frame
I tried to visualize my topic frequency over the dates as such:
ggplot(data = dat,
aes(x = date,
fill = Topics[1])) +
geom_freqpoly(binwidth = 30)
However when i execute the command my visualization only shows every third month, like: January, April, July, etc..
How do I get the dates on the x-axis to show all the months: (January, Februrary, March, April .. etc)?

You can edit your x axis labels
library(tidyverse)
library(lubridate)
last_month <- Sys.Date() - 0:199
df <- data.frame(
date = last_month,
price = runif(200)
)
base <- ggplot(df, aes(date, price)) +
geom_line()
base + scale_x_date(date_breaks = "months", date_labels="%B")

Related

Is there a way to adjust the order of time of day on an axis in ggplot2?

I am running an analysis looking at the diel and seasonal activity patterns of bats. I'm hoping to plot time on the y-axis centered around midnight but am not sure how. Right now the y-axis runs 0-24 hours. I am using ggplot2. The code is below along with a picture of the plot.
library(ggplot2)
library(ggExtra) #for ggMarginal()
p <- ggplot(LBB, aes(x = RecordingDate, y = Time)) +
geom_point(size = .5) +
theme(legend.position="none")
p2 <- ggMarginal(p, type = "density")
I do not know of a way to center Time only data on midnight in a ggplot, but as a fairly simple workaround you can convert your times to datetimes, with all times before midnight coded as one day (you can choose any arbitrary day, in the code here I used today), and all times after midnight coded as the next day. You can then make a ggplot with a datetime axis centered on the midnight between those two days, and do some axis label formatting to make it look like a time only axis.
library(tidyverse)
library(hms) #package for handling time data
#Generate date and time data
LBB <- data.frame(RecordingDate =
rep(seq.Date(as.Date("2021-04-01"), as.Date("2021-04-20"),
"days"), 25),
Time = hms(sample(0:59, replace = TRUE, 500),
sample(0:59, replace = TRUE, 500),
sample(c(0:5, 20:23), replace = TRUE, 500)))
# Make a graph.datetime column that assigns all times before midnight to the same
# date, and all times after midnight to the next day (you can choose any day)
LBB <- LBB %>%
mutate(graphing.date = if_else(
Time > parse_hms("12:00:00"), as.Date("2021-10-19"), #times 12 to 24 are this date
as.Date("2021-10-20")), #other times (0-12) are this date
#combine date and time to get a graphing datetime spread over two days
#(now you have data that is centered on midnight)
graphing.datetime = as.POSIXct(paste(graphing.date, Time)))
#Graph, but use recording.datetime, not Time as y variable
p <- ggplot(LBB, aes(x = RecordingDate, y = graphing.datetime)) +
geom_point() +
#format the y axis so it looks like Time even though it is datetime
scale_y_datetime("Time", #change axis title back to time
#Set limits to show a full day
limits = c(as.POSIXct("2021-10-19 12:00:00"),
as.POSIXct("2021-10-20 12:00:00")),
expand = c(0,0), #turn off axis limit expansion
date_breaks = "4 hours", #set custom axis breaks
date_labels = "%H:%M") #Hours:minutes only for labels
p

R: ggplot of average daily counts by month

I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.
The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).
Here's what I've tried for the plot.
by_day <- df_temp %>%
group_by(Start.Day)
ggplot(by_day, aes(x=Start.Month,
fill=Start.Month)) +
geom_bar() +
scale_fill_brewer(palette = "Paired") +
labs(title="Number of Daily Trips by Month",
x=" ",
y="Number of Daily Trips")
Here's the plot I am trying to replicate:
You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.
{lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.
# simulating your data
## a series of dates from June through October
days <- seq(from = lubridate::ymd("2020-06-01")
,to = lubridate::ymd("2020-10-30")
,by = "1 day")
## random trips on each day
set.seed(666)
trips <- sample(2000:5000, length(days), replace = TRUE)
# putting things together in a data frame
df_temp <- data.frame(date = days, counts = trips) %>%
# I assume the variable Start.Month is the monthly bin
# let's use lubridate to "bin" the month from the date
mutate(Start.Month = lubridate::floor_date(date, unit = "month"))
# aggregate trips for each month, calculate average daily trips
by_month <- df_temp %>%
group_by(Start.Month) %>% # group by the binning variable
summarise(Avg.Trips = mean(counts)) # calculate the mean for each group
ggplot( data = by_month
, aes(x = Start.Month, y = Avg.Trips
, fill=as.factor(Start.Month)) # to work with a discrete palette, factorise
) +
# ------------ bar layer -----------------------------------------
## instead of geom_bar(... stat = "identity"), you can use geom_col()
## and define the fill colour
geom_col() +
scale_fill_brewer(palette = "Paired") +
# ------------ if you like provide context with annotation -------
geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +
# ------------ finalise plot with labels, theme, etc.
labs(title="Number of Daily Trips by Month",
x=NULL, # setting an unused lab to NULL is better than printing empty " "!
y="Number of Daily Trips"
) +
theme_minimal() +
theme(legend.position = "none") # to suppress colour legend

How to unclutter the x-axis in a plot

Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks
You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:

Plot time series as one year

I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.
Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:

Plotting a non-standard year (water year) with ggplot2

Building on this question and the use of "water year" in R I have question regarding plotting in ggplot2 with a common date axis over many years. A water year is definitely the start of the year to be October 1st ending September 30. It is a little more sensible for the hydrological cycle.
So say I have this data set:
library(dplyr)
library(ggplot2)
library(lubridate)
df <- data.frame(Date=seq.Date(as.Date("1910/1/1"), as.Date("1915/1/1"), "days"),
y=rnorm(1827,100,1))
Then here is the wtr_yr function:
wtr_yr <- function(dates, start_month=10) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
What I would like to do is use colour as a grouping variable, then make a x axes that only consists of month and date information. Usually I've done like so (using the lubridate package):
ymd(paste0("1900","-",month(df$Date),"-",day(df$Date)))
This works fine if year is arranged normally. However in this water year scenario, the real year span the water year. So ideally I'd like a plot that goes from October 1 to September 30 and plot separate lines for each water year maintaining all the correct water years. Here is where I am so far:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDate=as.Date(paste0("1900","-",month(Date),"-",day(Date))))
df1 <- %>%
ggplot(aes(x=CDate, y=y, colour=wtr_yrVAR)) +
geom_point()
So plotting that obviously date spans from Jan to Dec. Any ideas how I can force ggplot2 to plot these along the water year lines?
Here is a method that works:
df3 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
#seq along dates starting with the beginning of your water year
mutate(CDate=as.Date(paste0(ifelse(month(Date) < 10, "1901", "1900"),
"-", month(Date), "-", day(Date))))
Then:
df3 %>%
ggplot(., aes(x = CDate, y = y, colour = wtr_yrVAR)) +
geom_point() +
scale_x_date(date_labels = "%b %d")
Which gives:
not very elegant but this should work:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDdate= as.Date(as.numeric(Date - as.Date(paste0(wtr_yrVAR,"-10-01"))), origin = "1900-10-01"))
df1 %>% ggplot(aes(x =CDdate, y=y, colour=wtr_yrVAR)) +
geom_line() + theme_bw()+scale_x_date(date_breaks = "1 month", date_labels = "%b", limits = c(as.Date("1899-09-30"),as.Date("1900-10-01")))+theme_bw()

Resources