ggplot2 prevent months from repeating on y axis - r

I plotting a time series where I take the average day an event occurs over the years. I'm trying to plot this with the months on the Y axis, the years on the x-axis. but the months repeat for each year. For example, I have two events that occur in January on 2006, and 2007. I don't need two separate January's on the y-axis, just one. Attached is the image of what I'm getting, you can see the months repeat on the y-axis. I'm not sure how to fix this. Below is the table of my data (test1), followed by my code to plot, only three lines:
Zone Year Mean PosSD NegSD
1 zone4 2006 2006-07-19 2007-01-13 2006-01-23
2 zone4 2007 2007-05-29 2007-11-04 2006-12-22
3 zone4 2008 2008-01-12 2008-01-15 2008-01-09
FI_plot<- ggplot(test1, aes(x=Year, y = Mean))+
geom_point()+
scale_y_date(date_labels = "%b, %d", date_breaks = "1 month")

This happens because the “Mean”column is a date which year changes for each row.
If you want to plot only the “day”, the easiest solution is to assign all
rows to the same year (does not matter which year). Something like this would work:
library(ggplot2)
dd <- data.frame(stringsAsFactors=FALSE,
Zone = c("zone4", "zone4", "zone4"),
Year = c(2006, 2007, 2008),
Mean = as.Date(c("2006-07-19", "2007-05-29", "2008-01-12")),
PosSD = as.Date(c("2007-01-13", "2007-11-04", "2008-01-15")),
NegSD = as.Date(c("2006-01-23", "2006-12-22", "2008-01-09")))
# replace current year with 1900
dd$Mean <- as.Date(gsub("[0-9]{4}-", "1900-", dd$Mean))
dd$PosSD <- as.Date(gsub("[0-9]{4}-", "1900-", dd$PosSD))
dd$NegSD <- as.Date(gsub("[0-9]{4}-", "1900-", dd$NegSD))
dd
#> Zone Year Mean PosSD NegSD
#> 1 zone4 2006 1900-07-19 1900-01-13 1900-01-23
#> 2 zone4 2007 1900-05-29 1900-11-04 1900-12-22
#> 3 zone4 2008 1900-01-12 1900-01-15 1900-01-09
# plot
FI_plot<- ggplot(dd, aes(x=Year, y = Mean))+
geom_point() +
scale_y_date(date_labels = "%b, %d", date_breaks = "1 month")
FI_plot
Created on 2019-01-13 by the reprex package (v0.2.1)

Related

How to plot Time series without breaks caused by missing dates?

This question has been asked multiple times but I cannot find any that fit my needs.
My goal is to plot timeseries for one month over multiple years. The following JAN dataframe is created by sub-setting from data frame containing daily rainfall for the entire year.
> head(JAN)
DATE RCM GPM TRI
1: 2000-01-01 0.012182957 NA NA
2: 2000-01-02 0.001769934 NA NA
3: 2000-01-03 0.007916438 NA NA
4: 2000-01-04 0.008227825 NA NA
5: 2000-01-05 0.005192382 NA NA
6: 2000-01-06 0.065458169 NA NA
The dataframe is for the month of January containing daily records over 20 years.
I got the following plot.
dfmelt<-melt(JAN,id.vars="DATE")
ggplot(dfmelt,aes(x=DATE,y=value,
col=variable,group = lubridate::year(DATE)))+
labs(title='JANUARY')+
geom_line()
I'm assuming it's because my data consists only January months and while plotting breaks are there for February to December.
I want to avoid this to see the trend of precipitation over the years for the month january.
introducing breaks give the following
breaks <- unique(as.Date(cut(dfmelt$DATE, "month")))
ba2 <- transform(dfmelt, year = as.integer(format(DATE, "%Y")))
p <- ggplot(ba2, aes(x=DATE,y=value,
col=variable)) +
geom_line() +
facet_grid(cols = vars(year), scales = "free_x", space = "free_x")
p + scale_x_date(breaks = breaks, date_labels = "%b")
Is there any way to get a continuous plot basically joining the lines together? using any other package or language?
Suppose we have the data frame df1 shown in the Note at the end which has a values column with 22 * 31 = 682 rows, one for each of the 31 dates in January for each of the 22 years from 2000 to 2021.
Then convert to ts with frequency 31 and plot.
tt <- ts(df1$values, start = 2000, freq = 31)
plot(tt)
or to use ggplot2
library(ggplot2)
library(zoo)
z <- as.zoo(tt)
autoplot(z)
Note
set.seed(123)
date <- seq(as.Date("2000-01-01"), as.Date("2021-12-31"), 1)
values <- seq_along(date)
df1 <- subset(data.frame(date, values), months(date) == "January")

Convert YYYY-MM-DD to YYYY-YY Qx in R

I'm trying to plot data by quarter then display in ggplot. Dates in dataset are of the format YYYY-MM-DD, and I want the ggplot x-axis to display the financial year like YYYY-YY Qx. The financial year starts July 1.
Data is in long format. This is where I've got to:
Data set named: TOX
TREE_ID PM_Date variable value
1: 2013000584 2013-04-02 elm 0
2: 2013000498 2013-06-11 elm 1
3: 2013000123 2013-09-03 maple 0
4: 2013000642 2014-02-15 maple 0
5: 2013000778 2016-07-08 maple 1
PM_Dateq <- as.yearqtr(TOX$PM_Date, format)
Tox_longer_yr <- TOX [,list(value=sum(value)), by=list(PM_Dateq, variable)]
ggplot(Tox_longer_yr, aes(x = PM_Dateq, y = value, colour = variable))
+ geom_line()
The X-axis currently displaying as:
2015, 2016, 2017...etc
(Though it is grouped into quarters in ggplot correctly.)
I want the x-axis to look like:
2015-16 Q3, 2015-16 Q4, 2016-17 Q1, 2016-17 Q2...etc
So an event happening on 2016-02-13 would be grouped into "2015-16 Q3".
How about something like this.
library(lubridate)
df %>%
mutate(
PM_Date = as.Date(PM_Date),
Qtr = sprintf("%s-%s Q%i",
year(PM_Date),
year(PM_Date %m+% years(1)),
cut(
month(tmp$PM_Date),
breaks = c(0, 3, 6, 9, 12),
labels = c("Q3", "Q4", "Q1", "Q2")))) %>%
group_by(Qtr, variable) %>%
summarise(value = sum(value)) %>%
ggplot(aes(x = Qtr, y = value, colour = variable, group = variable)) +
geom_line()
Explanation: We construct a new Qtr variable in the form YYYY-YYYY QX by extracting the year from PM_Date, and binning the months into 3 month bins starting from 1 July using cut. We use lubridate for easy extraction of the year and "date arithmetic" (for the second YYYY we add one year to the current year).
Sample data
df <- read.table(text =
"TREE_ID PM_Date variable value
2013000584 2013-04-02 elm 0
2013000498 2013-06-11 elm 1
2013000123 2013-09-03 maple 0
2013000642 2014-02-15 maple 0
2013000778 2016-07-08 maple 1", header = T)

R Coding for ggridges

I am new to coding in R so please excuse the simple question. I am trying to run ggridges geom in R to create monthly density plots. The code is below, but it creates a plot with the months in the wrong order:
The code references a csv data file with 3 columns (see image) - MST, Aeco_5a, and month: Any suggestions on how to fix this would be greatly appreciated. Here is my code:
> library(ggridges)
> read_csv("C:/Users/Calvin Johnson/Desktop/Aeco_Price_2017.csv")
Parsed with column specification:
cols(
MST = col_character(),
Month = col_character(),
Aeco_5a = col_double()
)
# A tibble: 365 x 3
MST Month Aeco_5a
<chr> <chr> <dbl>
1 1/1/2017 January 3.2678
2 1/2/2017 January 3.2678
3 1/3/2017 January 3.0570
4 1/4/2017 January 2.7811
5 1/5/2017 January 2.6354
6 1/6/2017 January 2.7483
7 1/7/2017 January 2.7483
8 1/8/2017 January 2.7483
9 1/9/2017 January 2.5905
10 1/10/2017 January 2.6902
# ... with 355 more rows
>
> mins<-min(Aeco_Price_2017$Aeco_5a)
> maxs<-max(Aeco_Price_2017$Aeco_5a)
>
> ggplot(Aeco_Price_2017,aes(x = Aeco_5a,y=Month,height=..density..))+
+ geom_density_ridges(scale=3) +
+ scale_x_continuous(limits = c(mins,maxs))
This has two parts: (1) you want your months to be factor instead of chr, and (2) you need to order the factors the way we typically order months.
With some reproducible data:
library(ggridges)
df <- sapply(month.abb, function(x) { rnorm(10, rnorm(1), sd = 1)})
df <- as_tibble(x) %>% gather(key = "month")
Then you need to mutate month to be a factor, and use the levels defined by the actual order they show up in the data.frame (unique gives the unique levels in the dataset, and orders them in the way they're ordered in your data ("Jan", "Feb", ...)). Then you need to reverse them, because this way "Jan" will be at the bottom (it's the first factor).
df %>%
# switch to factor, and define the levels they way you want them to show up
# in the ggplot; "Dec", "Nov", "Oct", ...
mutate(month = factor(month, levels = rev(unique(df$month)))) %>%
ggplot(aes(x = value, y = month)) +
geom_density_ridges()

Plotting numerous layers (bar graph) using ggplot and R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to recreate a bar graph that I created in Excel using data that lists inventory and sales throughout the year. Here is my graph in Excel:
Note: Average sales rate is total sales / total inventory for the 13 months in the bar graph.
I am doing this through R and the ggplot package. I am quite new at this but this was what I managed so far:
library(lubridate)
library(ggplot2)
library(scales)
library(reshape2)
COdata <- read.csv("C:/.../CenterOne.csv")
# Grab related data
# VIN refers to a unique inventory identifier for the item
# First Launch Date is what I use to count my inventory for the month
# Sale Date is what I use to count my sales for the month
DFtest <- COdata[, c("VIN", "First.Launch.Date", "Sale.Date")]
Here is a snapshot of what the data looks like:
> head(DFtest)
VIN First.Launch.Date Sale.Date
1 4T1BF1FK4CU048373 22/04/2015 0:00
2 2T3KF4DVXCW108677 16/03/2015 0:00
3 4T1BF1FKXCU035935 19/03/2015 0:00 20/03/2015 0:00
4 JTDKN3DU3B1465796 16/04/2015 0:00
5 2T3YK4DV8CW015050
6 4T1BF1FK5CU599556 30/04/2015 0:00
I convert the dates to a proper format removing the hours/seconds and breaking them up into monthly intervals:
DFtest$First.Launch.Date <- as.Date(DFtest$First.Launch.Date, format = "%d/%m/%Y")
DFtest$Sale.Date <- as.Date(DFtest$Sale.Date, format = "%d/%m/%Y")
DFtest$month.listings <- as.Date(cut(DFtest$First.Launch.Date, breaks = "month"))
DFtest$month.sales <- as.Date(cut(DFtest$Sale.Date, breaks = "month"))
> head(DFtest)
VIN First.Launch.Date Sale.Date month.listings month.sales
1 4T1BF1FK4CU048373 2015-04-22 <NA> 2015-04-01 <NA>
2 2T3KF4DVXCW108677 2015-03-16 <NA> 2015-03-01 <NA>
3 4T1BF1FKXCU035935 2015-03-19 2015-03-20 2015-03-01 2015-03-01
4 JTDKN3DU3B1465796 2015-04-16 <NA> 2015-04-01 <NA>
5 2T3YK4DV8CW015050 <NA> <NA> <NA> <NA>
6 4T1BF1FK5CU599556 2015-04-30 <NA> 2015-04-01 <NA>
Avg line graph - my attempt at creating one
DF_Listings = data.frame(table(format(DFtest$month.listings)))
DF_Sales = data.frame(table(format(DFtest$month.sales)))
DF_Merge <- merge(DF_Listings, DF_Sales, by = "Var1", all = TRUE)
> head(DF_Listings)
Var1 Freq
1 2014-12-01 77
2 2015-01-01 886
3 2015-02-01 930
4 2015-03-01 1167
5 2015-04-01 1105
6 2015-05-01 1279
DF_Merge$Avg <- DF_Merge$Freq.y / DF_Merge$Freq.x
> head(DF_Merge)
Var1 Freq.x Freq.y Avg
1 2014-12-01 77 NA NA
2 2015-01-01 886 277 0.3126411
3 2015-02-01 930 383 0.4118280
4 2015-03-01 1167 510 0.4370180
5 2015-04-01 1105 309 0.2796380
6 2015-05-01 1279 319 0.2494136
ggplot(DF_Merge, aes(x=Var1, y=Avg, group = 1)) +
stat_smooth(aes(x = seq(length(unique(Var1)))),
se = F, method = "lm", formula = y ~ poly(x, 11))
Bar Graph
dfm <- melt(DFtest[ , c("VIN", "First.Launch.Date", "Sale.Date")], id.vars = 1)
dfm$value <- as.Date(cut(dfm$value, breaks = "month"))
ggplot(dfm, aes(x= value, width = 0.4)) +
geom_bar(aes(fill = variable), position = "dodge") +
scale_x_date(date_breaks = "months", labels = date_format("%m-%Y")) +
theme(axis.text.x=element_text(hjust = 0.5)) +
xlab("Date") + ylab("")
So I managed to make some of the plots which brings me to several questions:
How would I combine them into all a single graph using ggplot?
Notice how my bar graph has blanks for the first and last month? How do I remove that (precisely, how do I remove 11-2014 and 01-2016 from the x-axis)?
In my bar graph, January 2014 had no sales and as a result, the inventory bar takes up a larger space. How do I reduce its size to fit with the rest of the graph?
What could I do to change the x-axis from using dates as numbers (i.e. 12-2014) to using month-year in words (i.e. December-2014). I've tried using as.yearmon but that doesn't work with the scale_x_date portion of my ggplot function.
There's also the issue with the average sales rate line which I can safely assume I would be using geom_hline() but I am not sure how to approach this.
Using mtoto's suggestion of utilizing googleVis, I took a crack at recreating the graph:
# Testing Google Vis
mytest <- DF_Merge
library(zoo)
library(plyr) # to rename columns
library(googleVis)
mytest$Var1 <- as.yearmon(mytest$Var1)
mytest$Var1 <- as.factor(mytest$Var1) # googleVis cannot understand yearmon "class" so change it to factor
# Rename columns to ensure comprehension
mytest <- rename(mytest, c("Var1"="Date", "Freq.x"="Listings", "Freq.y"="Sales", "Avg"="Sales Rate"))
# Prepare for values to be displayed right on the plot
mytest$Listings.annotation <- mytest$Listings
mytest$Sales.annotation <- mytest$Sales
mytest$`Sales Rate.annotation` <- percent(mytest$`Sales Rate`) #Googlevis automatically understands that .annotation is used to display values in the graph
# Create average rate line
mytest$`Sales Rate` <- as.numeric(mytest$`Sales Rate`)
mytest$AvgRate <- (sum(mytest$Sales) / sum(mytest$Listings))
mytest <- rename(mytest, c("AvgRate"="Average Sales Rate"))
# Create the annotation for the average line
mytest$`Average Sales Rate.annotation` <- mytest$`Average Sales Rate`
x = nrow(mytest) - 1
mytest$`Average Sales Rate.annotation`[1:x] = "" # Ensures only the last row in this column has a value
mytest$`Average Sales Rate.annotation` <- as.numeric(mytest$`Average Sales Rate.annotation`, na.rm = TRUE)
mytest$`Average Sales Rate.annotation`[nrow(mytest)] <- percent(mytest$`Average Sales Rate.annotation`[nrow(mytest)]) # Transforms only the last row to a proper percentage!
# Plot the graph
column <- gvisComboChart(mytest, xvar= "Date",
yvar=c("Listings", "Listings.annotation", "Sales", "Sales.annotation", "Sales Rate", "Sales Rate.annotation", "Average Sales Rate",
"Average Sales Rate.annotation"),
options=list(seriesType="bars",
series="[{type: 'bars', targetAxisIndex:0, color:'orange'},
{type: 'bars', targetAxisIndex:0, color:'green'},
{type: 'line', targetAxisIndex:1, color:'red'},
{type: 'line', targetAxisIndex:1, color:'purple', lineDashStyle:[2,2,20,2,20,2]}]",
vAxes="[{format:'decimal', textPosition: 'out', viewWindow:{min:0, max:200}},
{format:'percent', textPosition: 'out', viewWindow:{min:0, max:1}}]",
hAxes="[{textPosition: 'out'}]",
legend = "bottom",
curveType="function",
width=1500,
height=800))
plot(column)
The variables could have been named better but I was able to get what I was looking for with my final result:

R ggplot by month and values group by Week

With ggplot2, I would like to create a multiplot (facet_grid) where each plot is the weekly count values for the month.
My data are like this :
day_group count
1 2012-04-29 140
2 2012-05-06 12595
3 2012-05-13 12506
4 2012-05-20 14857
I have created for this dataset two others colums the Month and the Week based on day_group :
day_group count Month Week
1 2012-04-29 140 Apr 17
2 2012-05-06 12595 May 18
3 2012-05-13 12506 May 19
4 2012-05-20 14857 May 2
Now I would like for each Month to create a barplot where I have the sum of the count values aggregated by week. So for example for a year I would have 12 plots with 4 bars (one per week).
Below is what I use to generate the plot :
ggplot(data = count_by_day, aes(x=day_group, y=count)) +
stat_summary(fun.y="sum", geom = "bar") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
facet_grid(facets = Month ~ ., scales="free", margins = FALSE)
So far, my plot looks like this
https://dl.dropboxusercontent.com/u/96280295/Rplot.png
As you can see the x axes is not as I'm looking for. Instead of showing only week 1, 2, 3 and 4, it displays all the month.
Do you know what I must change to get what I'm looking for ?
Thanks for your help
Okay, now that I see what you want, I wrote a small program to illustrate it. The key to your order of month problem is making month a factor with the levels in the right order:
library(dplyr)
library(ggplot2)
#initialization
set.seed(1234)
sday <- as.Date("2012-01-01")
eday <- as.Date("2012-07-31")
# List of the first day of the months
mfdays <- seq(sday,length.out=12,by="1 month")
# list of months - this is key to keeping the order straight
mlabs <- months(mfdays)
# list of first weeks of the months
mfweek <- trunc((mfdays-sday)/7)
names(mfweek) <- mlabs
# Generate a bunch of event-days, and then months, then week numbs in our range
n <- 1000
edf <-data.frame(date=sample(seq(sday,eday,by=1),n,T))
edf$month <- factor(months(edf$date),levels=mlabs) # use the factor in the right order
edf$week <- 1 + as.integer(((edf$date-sday)/7) - mfweek[edf$month])
# Now summarize with dplyr
ndf <- group_by(edf,month,week) %>% summarize( count = n() )
ggplot(ndf) + geom_bar(aes(x=week,y=count),stat="identity") + facet_wrap(~month,nrow=1)
Yielding:
(As an aside, I am kind of proud I did this without lubridate ...)
I think you have to do this but I am not sure I understand your question:
ggplot(data = count_by_day, aes(x=Week, y=count, group= Month, color=Month))

Resources