R ggplot by month and values group by Week - r

With ggplot2, I would like to create a multiplot (facet_grid) where each plot is the weekly count values for the month.
My data are like this :
day_group count
1 2012-04-29 140
2 2012-05-06 12595
3 2012-05-13 12506
4 2012-05-20 14857
I have created for this dataset two others colums the Month and the Week based on day_group :
day_group count Month Week
1 2012-04-29 140 Apr 17
2 2012-05-06 12595 May 18
3 2012-05-13 12506 May 19
4 2012-05-20 14857 May 2
Now I would like for each Month to create a barplot where I have the sum of the count values aggregated by week. So for example for a year I would have 12 plots with 4 bars (one per week).
Below is what I use to generate the plot :
ggplot(data = count_by_day, aes(x=day_group, y=count)) +
stat_summary(fun.y="sum", geom = "bar") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
facet_grid(facets = Month ~ ., scales="free", margins = FALSE)
So far, my plot looks like this
https://dl.dropboxusercontent.com/u/96280295/Rplot.png
As you can see the x axes is not as I'm looking for. Instead of showing only week 1, 2, 3 and 4, it displays all the month.
Do you know what I must change to get what I'm looking for ?
Thanks for your help

Okay, now that I see what you want, I wrote a small program to illustrate it. The key to your order of month problem is making month a factor with the levels in the right order:
library(dplyr)
library(ggplot2)
#initialization
set.seed(1234)
sday <- as.Date("2012-01-01")
eday <- as.Date("2012-07-31")
# List of the first day of the months
mfdays <- seq(sday,length.out=12,by="1 month")
# list of months - this is key to keeping the order straight
mlabs <- months(mfdays)
# list of first weeks of the months
mfweek <- trunc((mfdays-sday)/7)
names(mfweek) <- mlabs
# Generate a bunch of event-days, and then months, then week numbs in our range
n <- 1000
edf <-data.frame(date=sample(seq(sday,eday,by=1),n,T))
edf$month <- factor(months(edf$date),levels=mlabs) # use the factor in the right order
edf$week <- 1 + as.integer(((edf$date-sday)/7) - mfweek[edf$month])
# Now summarize with dplyr
ndf <- group_by(edf,month,week) %>% summarize( count = n() )
ggplot(ndf) + geom_bar(aes(x=week,y=count),stat="identity") + facet_wrap(~month,nrow=1)
Yielding:
(As an aside, I am kind of proud I did this without lubridate ...)

I think you have to do this but I am not sure I understand your question:
ggplot(data = count_by_day, aes(x=Week, y=count, group= Month, color=Month))

Related

How to plot Time series without breaks caused by missing dates?

This question has been asked multiple times but I cannot find any that fit my needs.
My goal is to plot timeseries for one month over multiple years. The following JAN dataframe is created by sub-setting from data frame containing daily rainfall for the entire year.
> head(JAN)
DATE RCM GPM TRI
1: 2000-01-01 0.012182957 NA NA
2: 2000-01-02 0.001769934 NA NA
3: 2000-01-03 0.007916438 NA NA
4: 2000-01-04 0.008227825 NA NA
5: 2000-01-05 0.005192382 NA NA
6: 2000-01-06 0.065458169 NA NA
The dataframe is for the month of January containing daily records over 20 years.
I got the following plot.
dfmelt<-melt(JAN,id.vars="DATE")
ggplot(dfmelt,aes(x=DATE,y=value,
col=variable,group = lubridate::year(DATE)))+
labs(title='JANUARY')+
geom_line()
I'm assuming it's because my data consists only January months and while plotting breaks are there for February to December.
I want to avoid this to see the trend of precipitation over the years for the month january.
introducing breaks give the following
breaks <- unique(as.Date(cut(dfmelt$DATE, "month")))
ba2 <- transform(dfmelt, year = as.integer(format(DATE, "%Y")))
p <- ggplot(ba2, aes(x=DATE,y=value,
col=variable)) +
geom_line() +
facet_grid(cols = vars(year), scales = "free_x", space = "free_x")
p + scale_x_date(breaks = breaks, date_labels = "%b")
Is there any way to get a continuous plot basically joining the lines together? using any other package or language?
Suppose we have the data frame df1 shown in the Note at the end which has a values column with 22 * 31 = 682 rows, one for each of the 31 dates in January for each of the 22 years from 2000 to 2021.
Then convert to ts with frequency 31 and plot.
tt <- ts(df1$values, start = 2000, freq = 31)
plot(tt)
or to use ggplot2
library(ggplot2)
library(zoo)
z <- as.zoo(tt)
autoplot(z)
Note
set.seed(123)
date <- seq(as.Date("2000-01-01"), as.Date("2021-12-31"), 1)
values <- seq_along(date)
df1 <- subset(data.frame(date, values), months(date) == "January")

R Coding for ggridges

I am new to coding in R so please excuse the simple question. I am trying to run ggridges geom in R to create monthly density plots. The code is below, but it creates a plot with the months in the wrong order:
The code references a csv data file with 3 columns (see image) - MST, Aeco_5a, and month: Any suggestions on how to fix this would be greatly appreciated. Here is my code:
> library(ggridges)
> read_csv("C:/Users/Calvin Johnson/Desktop/Aeco_Price_2017.csv")
Parsed with column specification:
cols(
MST = col_character(),
Month = col_character(),
Aeco_5a = col_double()
)
# A tibble: 365 x 3
MST Month Aeco_5a
<chr> <chr> <dbl>
1 1/1/2017 January 3.2678
2 1/2/2017 January 3.2678
3 1/3/2017 January 3.0570
4 1/4/2017 January 2.7811
5 1/5/2017 January 2.6354
6 1/6/2017 January 2.7483
7 1/7/2017 January 2.7483
8 1/8/2017 January 2.7483
9 1/9/2017 January 2.5905
10 1/10/2017 January 2.6902
# ... with 355 more rows
>
> mins<-min(Aeco_Price_2017$Aeco_5a)
> maxs<-max(Aeco_Price_2017$Aeco_5a)
>
> ggplot(Aeco_Price_2017,aes(x = Aeco_5a,y=Month,height=..density..))+
+ geom_density_ridges(scale=3) +
+ scale_x_continuous(limits = c(mins,maxs))
This has two parts: (1) you want your months to be factor instead of chr, and (2) you need to order the factors the way we typically order months.
With some reproducible data:
library(ggridges)
df <- sapply(month.abb, function(x) { rnorm(10, rnorm(1), sd = 1)})
df <- as_tibble(x) %>% gather(key = "month")
Then you need to mutate month to be a factor, and use the levels defined by the actual order they show up in the data.frame (unique gives the unique levels in the dataset, and orders them in the way they're ordered in your data ("Jan", "Feb", ...)). Then you need to reverse them, because this way "Jan" will be at the bottom (it's the first factor).
df %>%
# switch to factor, and define the levels they way you want them to show up
# in the ggplot; "Dec", "Nov", "Oct", ...
mutate(month = factor(month, levels = rev(unique(df$month)))) %>%
ggplot(aes(x = value, y = month)) +
geom_density_ridges()

How to plot values for a subgroup based on time in R?

So currently I have a dataset that looks like this:
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
I want to plot the averages for a certain month every year, IE the averages for March 2011, March 2012, March 2013, March 2014 all in one graph, and repeat this for each of the 12 months. How would I go about doing this?
1) monthplot Convert the data to zoo (using "yearmon" class -- we also show in the comments an alternative converter) and then to "ts" class and then use monthplot (in the base of R) with the "ts" object (or further below we use autoplot.zoo (which uses the ggplot2 package) with the zoo object).
library(zoo)
# to_yearmon <- function(x) as.yearmon((x %% 10000) + (x %/% 10000 - 1) / 12)
to_yearmon <- function(x) as.yearmon(sub("(.*)(....)$", "\\2-\\1", x))
ser_zoo <- read.zoo(ser_df, FUN = to_yearmon) # convert to DF to zoo
ser_ts <- as.ts(ser_zoo) # convert zoo to ts
monthplot(ser_ts)
(continued after plot)
2) autoplot.zoo We show how to plot (i) one line per year (2011, 2012, ...) all in one chart and (ii) in separate panels and (iii) one line per month (1, 2, 3, ...) all in one chart and (iv) separate panels.
We create a data frame ser_df2 with 3 columns representing month, year and the value of the series. Then we convert this long form series to a wide form, ser_zoo2 with with times 1, 2, 3, ... representing the months and one column per year. We also convert this long form series to a wide form, ser_zoo2, with times 2011, 2012, ... representing years and one column per month. By plotting each of these in a single panel and in multiple panels we get 2x2 = 4 charts which we show below.
library(ggplot2)
library(gridExtra)
ser_df2 <- data.frame(month = cycle(ser_zoo),
year = floor(as.numeric(time(ser_zoo))),
ser = coredata(ser_zoo))
ser_zoo2 <- read.zoo(ser_df2, index = 1, split = 2) # split into one column per year
p1 <- autoplot(ser_zoo2, facet = NULL)
p2 <- autoplot(ser_zoo2)
ser_zoo3 <- read.zoo(ser_df2, index = 2, split = 1) # split into one column per month
p3 <- autoplot(ser_zoo3, facet = NULL)
p4 <- autoplot(ser_zoo3)
grid.arrange(p1, p3, p2, p4, ncol = 2)
(click on chart to enlarge)
Note: We used this as the input data frame ser_df:
Lines <- "
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
"
ser_df <- read.table(text = Lines)
Here is a way to do it more explicitly with ggplot:
library(dplyr)
library(ggplot)
library(lubridate)
data %>%
mutate(date =
yearMon %>%
parse_date_time("%m%y"),
month =
date %>%
format("%B") %>%
ordered(month.name),
year =
date %>%
format("%Y") %>%
as.numeric) %>%
ggplot +
aes(x = year, y = V1, color = month) +
geom_line()

Extracting a point from ggplot and plot it

I am initially having the dataset as shown below:
ID A B Type Time Date
1 12 13 R 23:20 1-1-01
1 13 12 F 23:40 1-1-01
1 13 11 F 00:00 2-1-01
1 15 10 R 00:20 2-1-01
1 12 06 W 00:40 2-1-01
1 11 09 F 01:00 2-1-01
1 12 10 R 01:20 2-1-01
so on...
I tried to make the ggplot of the above dataset for A and B.
ggplot(data=dataframe, aes(x=A, y=B, colour = Type)) +geom_point()+geom_path()
Problem:
HOW do I add a subsetting variable that looks at the first 24 hours after the every 'F' point.
For the time being I have posted a continuous data set [with respect to time] but my original data set is not continuous. How can I make my data set continuous in a interval of 10 mins? I have used interpolation xspline() function on A and B but I don't know how to make my data set continuous with respect to time,
The highlighted part shown below is what I am looking for, I want to extract this dataset and then plot a new ggplot:
From MarkusN plots this is what I am looking for:
Taking first point as 'F' point and traveling 24hrs from that point (Since there is no 24 hrs data set available here so it should produce like this) :
I've tried the following, maybe you can get an idea from here. I recommend you to first have a variable with the time ordered (either in minutes or hours, in this example I've used hours). Let's see if it helps
#a data set is built as an example
N = 100
set.seed(1)
dataframe = data.frame(A = cumsum(rnorm(N)),
B = cumsum(rnorm(N)),
Type = sample(c('R','F','W'), size = N,
prob = c(5/7,1/7,1/7), replace=T),
time.h = seq(0,240,length.out = N))
# here, a list with dataframes is built with the sequences
l_dfs = lapply(which(dataframe$Type == 'F'), function(i, .data){
transform(subset(.data[i:nrow(.data),], (time.h - time.h[1]) <= 24),
t0 = sprintf('t0=%4.2f', time.h[1]))
}, dataframe)
ggplot(data=do.call('rbind', l_dfs), aes(x=A, y=B, colour=Type)) +
geom_point() + geom_path(colour='black') + facet_wrap(~t0)
First I created sample data. Hope it's similar to your problem:
df = data.frame(id=rep(1:9), A=c(12,13,13,14,12,11,12,11,10),
B=c(13,12,10,12,6,9,10,11,12),
Type=c("F","R","F","R","W","F","R","F","R"),
datetime=as.POSIXct(c("2015-01-01 01:00:00","2015-01-01 22:50:00",
"2015-01-02 08:30:00","2015-01-02 23:00:00",
"2015-01-03 14:10:00","2015-01-05 16:30:00",
"2015-01-05 23:00:00","2015-01-06 17:00:00",
"2015-01-07 23:00:00")),
stringsAsFactors = F)
Your first question is to plot the data, highlighting the first 24h after an F-point. I used dplyr and ggplot for this task.
library(dplyr)
library(ggplot)
df %>%
mutate(nf = cumsum(Type=="F")) %>% # build F-to-F groups
group_by(nf) %>%
mutate(first24h = as.numeric((datetime-min(datetime)) < (24*3600))) %>% # find the first 24h of each F-group
mutate(lbl=paste0(row_number(),"-",Type)) %>%
ggplot(aes(x=A, y=B, label=lbl)) +
geom_path(aes(colour=first24h)) + scale_size(range = c(1, 2)) +
geom_text()
The problem here is, that the colour only changes at some points. One thing I'm not happy with is the use of different line colors for path sections. If first24h is a discrete variable
geom_path draws two sepearate paths. That's why I defined the variable as numeric. Maybe someone can improve this?
Your second question about an interpolation can easily be solved with the zoo package:
library(zoo)
full.time = seq(df$datetime[1], tail(df$datetime, 1), by=600) # new timeline with point at every 10 min
d.zoo = zoo(df[,2:3], df$datetime) # convert to zoo object
d.full = as.data.frame(na.approx(d.zoo, xout=full.time)) # interpolate; result is also a zoo object
d.full$datetime = as.POSIXct(rownames(d.full))
With these two dataframes combined, you get the solution. Every F-F section is drawn in a separate plot and only the points not longer than 24h after the F-point is shown.
df %>%
select(Type, datetime) %>%
right_join(d.full, by="datetime") %>%
mutate(Type = ifelse(is.na(Type),"",Type)) %>%
mutate(nf = cumsum(Type=="F")) %>%
group_by(nf) %>%
mutate(first24h = (datetime-min(datetime)) < (24*3600)) %>%
filter(first24h == TRUE) %>%
mutate(lbl=paste0(row_number(),"-",Type)) %>%
filter(first24h == 1) %>%
ggplot(aes(x=A, y=B, label=Type)) +
geom_path() + geom_text() + facet_wrap(~ nf)

Extract Date in R

I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4

Resources