R Coding for ggridges - r

I am new to coding in R so please excuse the simple question. I am trying to run ggridges geom in R to create monthly density plots. The code is below, but it creates a plot with the months in the wrong order:
The code references a csv data file with 3 columns (see image) - MST, Aeco_5a, and month: Any suggestions on how to fix this would be greatly appreciated. Here is my code:
> library(ggridges)
> read_csv("C:/Users/Calvin Johnson/Desktop/Aeco_Price_2017.csv")
Parsed with column specification:
cols(
MST = col_character(),
Month = col_character(),
Aeco_5a = col_double()
)
# A tibble: 365 x 3
MST Month Aeco_5a
<chr> <chr> <dbl>
1 1/1/2017 January 3.2678
2 1/2/2017 January 3.2678
3 1/3/2017 January 3.0570
4 1/4/2017 January 2.7811
5 1/5/2017 January 2.6354
6 1/6/2017 January 2.7483
7 1/7/2017 January 2.7483
8 1/8/2017 January 2.7483
9 1/9/2017 January 2.5905
10 1/10/2017 January 2.6902
# ... with 355 more rows
>
> mins<-min(Aeco_Price_2017$Aeco_5a)
> maxs<-max(Aeco_Price_2017$Aeco_5a)
>
> ggplot(Aeco_Price_2017,aes(x = Aeco_5a,y=Month,height=..density..))+
+ geom_density_ridges(scale=3) +
+ scale_x_continuous(limits = c(mins,maxs))

This has two parts: (1) you want your months to be factor instead of chr, and (2) you need to order the factors the way we typically order months.
With some reproducible data:
library(ggridges)
df <- sapply(month.abb, function(x) { rnorm(10, rnorm(1), sd = 1)})
df <- as_tibble(x) %>% gather(key = "month")
Then you need to mutate month to be a factor, and use the levels defined by the actual order they show up in the data.frame (unique gives the unique levels in the dataset, and orders them in the way they're ordered in your data ("Jan", "Feb", ...)). Then you need to reverse them, because this way "Jan" will be at the bottom (it's the first factor).
df %>%
# switch to factor, and define the levels they way you want them to show up
# in the ggplot; "Dec", "Nov", "Oct", ...
mutate(month = factor(month, levels = rev(unique(df$month)))) %>%
ggplot(aes(x = value, y = month)) +
geom_density_ridges()

Related

How to plot Time series without breaks caused by missing dates?

This question has been asked multiple times but I cannot find any that fit my needs.
My goal is to plot timeseries for one month over multiple years. The following JAN dataframe is created by sub-setting from data frame containing daily rainfall for the entire year.
> head(JAN)
DATE RCM GPM TRI
1: 2000-01-01 0.012182957 NA NA
2: 2000-01-02 0.001769934 NA NA
3: 2000-01-03 0.007916438 NA NA
4: 2000-01-04 0.008227825 NA NA
5: 2000-01-05 0.005192382 NA NA
6: 2000-01-06 0.065458169 NA NA
The dataframe is for the month of January containing daily records over 20 years.
I got the following plot.
dfmelt<-melt(JAN,id.vars="DATE")
ggplot(dfmelt,aes(x=DATE,y=value,
col=variable,group = lubridate::year(DATE)))+
labs(title='JANUARY')+
geom_line()
I'm assuming it's because my data consists only January months and while plotting breaks are there for February to December.
I want to avoid this to see the trend of precipitation over the years for the month january.
introducing breaks give the following
breaks <- unique(as.Date(cut(dfmelt$DATE, "month")))
ba2 <- transform(dfmelt, year = as.integer(format(DATE, "%Y")))
p <- ggplot(ba2, aes(x=DATE,y=value,
col=variable)) +
geom_line() +
facet_grid(cols = vars(year), scales = "free_x", space = "free_x")
p + scale_x_date(breaks = breaks, date_labels = "%b")
Is there any way to get a continuous plot basically joining the lines together? using any other package or language?
Suppose we have the data frame df1 shown in the Note at the end which has a values column with 22 * 31 = 682 rows, one for each of the 31 dates in January for each of the 22 years from 2000 to 2021.
Then convert to ts with frequency 31 and plot.
tt <- ts(df1$values, start = 2000, freq = 31)
plot(tt)
or to use ggplot2
library(ggplot2)
library(zoo)
z <- as.zoo(tt)
autoplot(z)
Note
set.seed(123)
date <- seq(as.Date("2000-01-01"), as.Date("2021-12-31"), 1)
values <- seq_along(date)
df1 <- subset(data.frame(date, values), months(date) == "January")

Convert YYYY-MM-DD to YYYY-YY Qx in R

I'm trying to plot data by quarter then display in ggplot. Dates in dataset are of the format YYYY-MM-DD, and I want the ggplot x-axis to display the financial year like YYYY-YY Qx. The financial year starts July 1.
Data is in long format. This is where I've got to:
Data set named: TOX
TREE_ID PM_Date variable value
1: 2013000584 2013-04-02 elm 0
2: 2013000498 2013-06-11 elm 1
3: 2013000123 2013-09-03 maple 0
4: 2013000642 2014-02-15 maple 0
5: 2013000778 2016-07-08 maple 1
PM_Dateq <- as.yearqtr(TOX$PM_Date, format)
Tox_longer_yr <- TOX [,list(value=sum(value)), by=list(PM_Dateq, variable)]
ggplot(Tox_longer_yr, aes(x = PM_Dateq, y = value, colour = variable))
+ geom_line()
The X-axis currently displaying as:
2015, 2016, 2017...etc
(Though it is grouped into quarters in ggplot correctly.)
I want the x-axis to look like:
2015-16 Q3, 2015-16 Q4, 2016-17 Q1, 2016-17 Q2...etc
So an event happening on 2016-02-13 would be grouped into "2015-16 Q3".
How about something like this.
library(lubridate)
df %>%
mutate(
PM_Date = as.Date(PM_Date),
Qtr = sprintf("%s-%s Q%i",
year(PM_Date),
year(PM_Date %m+% years(1)),
cut(
month(tmp$PM_Date),
breaks = c(0, 3, 6, 9, 12),
labels = c("Q3", "Q4", "Q1", "Q2")))) %>%
group_by(Qtr, variable) %>%
summarise(value = sum(value)) %>%
ggplot(aes(x = Qtr, y = value, colour = variable, group = variable)) +
geom_line()
Explanation: We construct a new Qtr variable in the form YYYY-YYYY QX by extracting the year from PM_Date, and binning the months into 3 month bins starting from 1 July using cut. We use lubridate for easy extraction of the year and "date arithmetic" (for the second YYYY we add one year to the current year).
Sample data
df <- read.table(text =
"TREE_ID PM_Date variable value
2013000584 2013-04-02 elm 0
2013000498 2013-06-11 elm 1
2013000123 2013-09-03 maple 0
2013000642 2014-02-15 maple 0
2013000778 2016-07-08 maple 1", header = T)

R ggplot by month and values group by Week

With ggplot2, I would like to create a multiplot (facet_grid) where each plot is the weekly count values for the month.
My data are like this :
day_group count
1 2012-04-29 140
2 2012-05-06 12595
3 2012-05-13 12506
4 2012-05-20 14857
I have created for this dataset two others colums the Month and the Week based on day_group :
day_group count Month Week
1 2012-04-29 140 Apr 17
2 2012-05-06 12595 May 18
3 2012-05-13 12506 May 19
4 2012-05-20 14857 May 2
Now I would like for each Month to create a barplot where I have the sum of the count values aggregated by week. So for example for a year I would have 12 plots with 4 bars (one per week).
Below is what I use to generate the plot :
ggplot(data = count_by_day, aes(x=day_group, y=count)) +
stat_summary(fun.y="sum", geom = "bar") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
facet_grid(facets = Month ~ ., scales="free", margins = FALSE)
So far, my plot looks like this
https://dl.dropboxusercontent.com/u/96280295/Rplot.png
As you can see the x axes is not as I'm looking for. Instead of showing only week 1, 2, 3 and 4, it displays all the month.
Do you know what I must change to get what I'm looking for ?
Thanks for your help
Okay, now that I see what you want, I wrote a small program to illustrate it. The key to your order of month problem is making month a factor with the levels in the right order:
library(dplyr)
library(ggplot2)
#initialization
set.seed(1234)
sday <- as.Date("2012-01-01")
eday <- as.Date("2012-07-31")
# List of the first day of the months
mfdays <- seq(sday,length.out=12,by="1 month")
# list of months - this is key to keeping the order straight
mlabs <- months(mfdays)
# list of first weeks of the months
mfweek <- trunc((mfdays-sday)/7)
names(mfweek) <- mlabs
# Generate a bunch of event-days, and then months, then week numbs in our range
n <- 1000
edf <-data.frame(date=sample(seq(sday,eday,by=1),n,T))
edf$month <- factor(months(edf$date),levels=mlabs) # use the factor in the right order
edf$week <- 1 + as.integer(((edf$date-sday)/7) - mfweek[edf$month])
# Now summarize with dplyr
ndf <- group_by(edf,month,week) %>% summarize( count = n() )
ggplot(ndf) + geom_bar(aes(x=week,y=count),stat="identity") + facet_wrap(~month,nrow=1)
Yielding:
(As an aside, I am kind of proud I did this without lubridate ...)
I think you have to do this but I am not sure I understand your question:
ggplot(data = count_by_day, aes(x=Week, y=count, group= Month, color=Month))

How to plot values for a subgroup based on time in R?

So currently I have a dataset that looks like this:
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
I want to plot the averages for a certain month every year, IE the averages for March 2011, March 2012, March 2013, March 2014 all in one graph, and repeat this for each of the 12 months. How would I go about doing this?
1) monthplot Convert the data to zoo (using "yearmon" class -- we also show in the comments an alternative converter) and then to "ts" class and then use monthplot (in the base of R) with the "ts" object (or further below we use autoplot.zoo (which uses the ggplot2 package) with the zoo object).
library(zoo)
# to_yearmon <- function(x) as.yearmon((x %% 10000) + (x %/% 10000 - 1) / 12)
to_yearmon <- function(x) as.yearmon(sub("(.*)(....)$", "\\2-\\1", x))
ser_zoo <- read.zoo(ser_df, FUN = to_yearmon) # convert to DF to zoo
ser_ts <- as.ts(ser_zoo) # convert zoo to ts
monthplot(ser_ts)
(continued after plot)
2) autoplot.zoo We show how to plot (i) one line per year (2011, 2012, ...) all in one chart and (ii) in separate panels and (iii) one line per month (1, 2, 3, ...) all in one chart and (iv) separate panels.
We create a data frame ser_df2 with 3 columns representing month, year and the value of the series. Then we convert this long form series to a wide form, ser_zoo2 with with times 1, 2, 3, ... representing the months and one column per year. We also convert this long form series to a wide form, ser_zoo2, with times 2011, 2012, ... representing years and one column per month. By plotting each of these in a single panel and in multiple panels we get 2x2 = 4 charts which we show below.
library(ggplot2)
library(gridExtra)
ser_df2 <- data.frame(month = cycle(ser_zoo),
year = floor(as.numeric(time(ser_zoo))),
ser = coredata(ser_zoo))
ser_zoo2 <- read.zoo(ser_df2, index = 1, split = 2) # split into one column per year
p1 <- autoplot(ser_zoo2, facet = NULL)
p2 <- autoplot(ser_zoo2)
ser_zoo3 <- read.zoo(ser_df2, index = 2, split = 1) # split into one column per month
p3 <- autoplot(ser_zoo3, facet = NULL)
p4 <- autoplot(ser_zoo3)
grid.arrange(p1, p3, p2, p4, ncol = 2)
(click on chart to enlarge)
Note: We used this as the input data frame ser_df:
Lines <- "
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
"
ser_df <- read.table(text = Lines)
Here is a way to do it more explicitly with ggplot:
library(dplyr)
library(ggplot)
library(lubridate)
data %>%
mutate(date =
yearMon %>%
parse_date_time("%m%y"),
month =
date %>%
format("%B") %>%
ordered(month.name),
year =
date %>%
format("%Y") %>%
as.numeric) %>%
ggplot +
aes(x = year, y = V1, color = month) +
geom_line()

Extract Date in R

I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4

Resources