I have crime data of the years 2018-2020. Each row represents one crime. For the sake of this example let's assume that there are two variables crimetype (e.g. theft, robbery) and date (when the crime was committed).
Some sample data:
data <- data.frame(date= sample(seq(as.Date('2018/01/01'), as.Date('2020/12/31'), by="day"),10000, replace=T),
crimetype = sample(c("A", "B", "C"), 100000, replace=T))
My goal is to create a lineplot for, let's say, type "A" crimes. On the x-axis there should be the date (from january 1st to december 31st), on the y-axis there should be the number of crimes per day. However, as I want the three lines (one for each year) to be shown on top of each other, so that I can compare them, there should be no year on the x-axis. Or it should not be displayed at least.
^ . . . . . .
| . . .
| . . .
n | . 2018
| - - -
| - - - - - - - - 2019
| = = =
| = = = = = = = = 2020
|
------------------------------------->
Jan-1 Dec-31
I was trying to create a new date-variable with all the dates in the same year (here 2020).
data <- data %>% mutate(daymonth = substr(date, 5, length(date)),
date_new = as.Date(paste("2020", daymonth, sep="")),
daymonth = NULL)
Is there a better way to do this and how can I plot the graph?
data_plot <- data %>% filter(crimetype == 'A')
ggplot(data = data_plot, aes(x = date_new, y = ?, color=format(date, "%Y")) + geom_line()
For working with dates have a look at the lubridate package which I use here for extracting the year. Also you can get rid of the year by using format(date, "%d-%m"). The following approach is a bit of a hack. To use a date axis but still get rid of the year I set the year for all dates to 2018. The question of which variable to plot ... simply count the obs to get the number of crimes by date. Finally. I set the breaks of the date axis to 1 month. Adjust this as you like. Try this:
library(ggplot2)
library(dplyr)
library(lubridate)
data <- data.frame(date= sample(seq(as.Date('2018/01/01'), as.Date('2020/12/31'), by="day"),10000, replace=T),
crimetype = sample(c("A", "B", "C"), 100000, replace=T))
data_plot <- data %>%
mutate(
year = lubridate::year(date),
year = factor(year),
# A hack. Set year to 2018. Allows me to use a date axis
date_foo = as.Date(paste(2018, format(date, "%m-%d"), sep = "-"))) %>%
filter(crimetype == 'A') %>%
count(date, date_foo, year, crimetype)
ggplot(data = data_plot, aes(x = date_foo, y = n, color = year, group = year)) +
geom_line() +
scale_x_date(date_breaks = "1 month", date_labels = "%d-%m")
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2020-03-28 by the reprex package (v0.3.0)
Related
I want to perform an analysis of property prices across segments on a quarterly basis (x-axis) from 2016 Jan to 2019 Jan.
However the column of data I would like to use is in a month format (c("19-Jan", "19-Feb", "19-Mar", "19-Apr",..."19-Dec"), that is a "yy-mmm" character format.
I wanted the whole column of data to be converted from "19-Jan" to a date format as such c("Qtr 1 - 2016", "Qtr 2 - 2017", ... "Qtr 3 - 2018") etc.
How can I convert the column containing character values of "19-Jan" to a quarter format?
I have attached my raw date format data in a google link since it has over 56,000 rows:
https://docs.google.com/spreadsheets/d/1cynVkZv0aJRjwFgvVzlSRG7G-6t96cAgXOZMTJdPkC8/edit#gid=80649901
Here is my previous graph with yearly analysis (which I want to convert to quarterly):
This is my code:
library(dplyr)
library(ggplot2)
URA_data <- read.csv('URAdata.csv')
options(scipen=999)
Plotly<-URA_data %>%
mutate(Year = 2000 + as.integer(substring(Date.of.Sale, 1, 2))) %>%
filter(Type.of.Sale %in% "Resale" & Type %in% "Condominium")%>%
group_by(Year,Market.Segment ) %>%
summarise(Price = mean(Price....))%>%
ggplot(aes(Year, Price, color = Market.Segment)) + geom_line()+ geom_point()+
labs(color="Segments")+
ggtitle("Median Property Prices by Market Segments ")+
xlab("Year")+ylab("Price (Median)")+
theme(
plot.title=element_text(color="red",size=14,face="bold.italic",hjust=0.5),
axis.title.x=element_text(color="blue",size=14,face="bold"),
axis.title.y=element_text(color="green",size=14,face="bold")
)
library(plotly)
Graph<-ggplotly(Plotly)
Graph
The zoo package has a as.yearqtr function. You could use that to convert your months to quarters. The format = argument allows you to define the format of your month data. You can use zoo::scale_x_yearqtr to improve the x-axis formatting
library(dplyr)
library(ggplot2)
library(zoo)
URA_data %>%
mutate(Quarter = as.yearqtr(Date.of.Sale, format = "%y-%b")) %>%
filter(Type.of.Sale %in% "Resale" & Type %in% "Condominium")%>%
group_by(Quarter,Market.Segment ) %>%
summarise(Price = mean(Price....))%>%
ggplot(aes(Quarter, Price, color = Market.Segment)) + geom_line()+ geom_point()+
scale_x_yearqtr(breaks = seq(from = as.yearqtr("2016-1"), to = as.yearqtr("2018-3"), by = 0.25),
lim = as.yearqtr(c("2016-1","2018-3"))) +
labs(color="Segments") + ggtitle("Median Property Prices by Market Segments ")+
xlab("Quarter")+ylab("Price (Median)")+
theme(plot.title=element_text(color="red",size=14,face="bold.italic",hjust=0.5),
axis.title.x=element_text(color="blue",size=14,face="bold"),
axis.text.x=element_text(angle = 45, vjust = 1, hjust = 1),
axis.title.y=element_text(color="green",size=14,face="bold"))
Download and fix data:
library(gsheet)
URA_data <- gsheet::gsheet2tbl("https://docs.google.com/spreadsheets/d/1cynVkZv0aJRjwFgvVzlSRG7G-6t96cAgXOZMTJdPkC8/edit#gid=80649901")
URA_data <- URA_data %>%
mutate(Type.of.Sale = `Type of Sale`, Date.of.Sale = `Date of Sale`,
Market.Segment = `Market Segment`, Price.... = `Price ($)`)
I am looking at data from Nov to April and would like to have a plot starting from Nov to April. Below is my sample code to screen out month of interests.
library(tidyverse)
mydata = data.frame(seq(as.Date("2010-01-01"), to=as.Date("2011-12-31"),by="days"), A = runif(730,10,50))
colnames(mydata) = c("Date", "A")
DF = mydata %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(Month == 11 | Month == 12 | Month == 01 | Month == 02 | Month == 03 | Month == 04)
I tried to re-order the data starting at month 11 followed by month 12 and then month 01,02,03,and,04. I used the code factor(Month, levels = c(11,12,01,02,03,04)) along with the code above but it didn't work.
I wanted a plot that starts at month Nov and ends on April. The following code gave me attached plot
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity")+ facet_wrap(~Year, ncol = 2)
Right now, the plot is starting at January all the way to December- I dont want this. I want the plot starting at November, and all the way to April. I tried to label the plot using scale_x_date(labels = date_format("%b", date_breaks = "month", name = "Month") which didn't work. Any help would
I converted Month to character before applying factor() and it worked.
DF = mydata %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(Month %in% c(11, 12, 1, 2, 3, 4)) %>%
mutate(Month = sprintf("%02d", Month)) %>%
mutate(Month = factor(Month, levels = c("11","12","01","02","03","04")))
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity")+ facet_wrap(~Year, ncol = 2)
Output:
user2332849 answer is close but does introduce an error. The bar are not in the correct order. For example for 2010, it plot is showing November and December's data prior to the beginning of the year's data. In order to plot in the proper order the year will need adjustment so that the calendar starts on month 11 and goes to month 4.
#Convert month to Factor and set desired order
DF$Month<- factor(DF$Month, levels=c(11, 12, 1, 2, 3, 4))
#Adjust the year to match the year of the beginning of series
#For example assign Jan, Feb, Mar and April to prior year
DF$Year<-ifelse(as.integer(as.character(DF$Month)) <6, DF$Year-1, DF$Year)
#plot
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity") +
facet_wrap(~Year, ncol = 3)
In the plot below the first 4 months of 2010 is shifted to become the last 4 periods of the prior year. And the last 2 months of 2011 is ready for the first 4 months of 2012.
Is there any way to shift the dates of a seasonal graph so that they match an arbitrary fiscal year (for example MARCH/FEB instead of DEC/JAN)?
I have this of far:
startMonth = 3 # march
startDate = as.Date(paste0(year(today()), '-', startMonth, '-1'))
dts = seq.Date(from = today() - 500, to = today(), by = 'day')
dat = data.frame(date = dts, value = runif(n = length(dts), min = 1, max = 10))
dat$month = month(dat$date)
dat$year = year(dat$date)
dat$yearPlot = ifelse(test = dat$month < startMonth, yes = (dat$year - 1), no = dat$year)
dat$year = as.character(dat$year)
dat$ydaydiff = yday(dat$date) - yday(startDate)
dat$datePlot1 = ifelse(dat$ydaydiff < 0, dat$ydaydiff + 365, dat$ydaydiff)
dat$datePlot1 = as.Date('0001-01-01') + days(dat$datePlot1)
dat$yearPlot = as.character(dat$yearPlot)
ggplot(dat) +
geom_path(aes(x = datePlot1, y = value, color = yearPlot)) +
scale_x_date(date_labels = '%b', )
Which makes this plot:
However I'd like the x-axis to start at March instead of Jan. Is there any way to adjust this? I thought of using the month column in dat but not sure how to implement.
Here is a not-too-pretty solution. The logic somewhat follows your own: find the starting date/time for each fiscal year (March 1 = time 1) and the last date/time (Feb 28 = time 365). Use this separate 'time' variable as your x-axis, then re-label the tick marks. You can change the scale_x_continuous() breaks and labels to get your desired dates along the x-axis.
t <- data.frame(date=seq.Date(as.Date('2018-03-01'),as.Date('2020-02-28'),by='days'),
fy=1)
t$fy[t$date>='2019-03-01'] <- 2
t <- t %>% group_by(fy) %>% mutate(time=seq(1:n()))
dat <- left_join(dat,t)
dat %>% ggplot(.) +
geom_path(aes(x = time, y = value, color = factor(fy),group=fy)) +
scale_x_continuous(breaks = c(1,100,200,300),labels=c('March 1','June 8','Sept 16','Dec 25'))
The breaks_width argument to scale_x_date() allows you to offset the breaks by a few months in a year.
The labels argument accepts a function to format the labels as a fiscal year, e.g. to convert a date 2019-03-01 to "19/20".
# Function to create fiscal year labels like "14/15" for the 2014/15 fiscal year
fiscal_year <- function(x) {
year_number <- lubridate::year(x)
paste(substr(year_number, 3, 4),
substr(year_number + 1, 3, 4),
sep = "/")
}
ggplot(dat) +
geom_path(aes(x = date, y = value, color = yearPlot)) +
scale_x_date(labels = fiscal_year, # Use the function to create the labels
breaks = scales::breaks_width("1 year", offset = 90)) # Offset by 90 days to March
I made a dataframe with columns year,month,temp,upper and lower
upper and lower are the max temperature by year and lower is the minimum
I have two questions:
first is why for some values in the end of dataframe the upper and lower are not correctly computed but in the rest of the dataframe they are fine?
And why am I getting weird axes when I am using ggplot
the dataframe is this
as you can see upper and lower for 2017 is wrong
Year Month Temp upper lower
1 1880 Jan -.29 -.29 -.09
2 1880 Feb -.18 -.29 -.09
3 1880 Mar -.11 -.29 -.09
......
1655 2017 Nov .84 .96 1.12
1656 2017 Dec .88 .96 1.12
the code is:
newDF <- df %>%
group_by(Year) %>%
mutate(upper = max(Temp), # identify max value for month day
lower = min(Temp) # identify min value for month day
) %>%
ungroup()
p <- ggplot(newDF, aes(Month, Temp)) +
geom_linerange(newDF, mapping=aes(x=Year, ymin=lower, ymax=upper), colour = "wheat2", alpha=.1)
print(p)
the graph seems fine but the axis are messed up
I think you're very close -- it's just the second part that needs a tweak. ggplot can work with a date field as the x axis, but the Month field is text (and it doesn't include the Year). Here I make a new column called date that combines them. lubridate is a handy package for that, since it does some smart parsing of date formats.
# Fake data
library(dplyr)
df <- data_frame(
Year = rep(1880:2017, each = 12),
Month = rep(month.abb, times = (2017-1880+1)),
Temp = rnorm(n = 1656, mean = 0, sd = 1)
)
newDF = df %>%
# This line adds a date field based on Year and Month
mutate(date = lubridate::ymd(paste(Year, Month, 1))) %>%
group_by(Year) %>%
mutate(upper = max(Temp), # identify max value for month day
lower = min(Temp), # identify min value for month day
) %>%
ungroup()
library(ggplot2)
p <- ggplot(newDF, aes(date, Temp)) +
geom_linerange(newDF, mapping=aes(x=Year, ymin=lower, ymax=upper), colour = "wheat2", alpha=.1)
print(p)
I have a data frame called "fish" which contains variables such as mass, length and day of the year. I need to make a boxplot of fish length by month but there is no month variable, only day of the year (i.e 1:365). How can I group days by 30 to represent month and then name them so I can make a boxplot? I have attached a screenshot of the data.
You can use this solution:
#load package
require(tidyverse)
#make dataframe
n <- 100
tmp <- tibble(year = rep(c(1994,1994),n/2),day = c(1:n),lenght_mm = rnorm(n),mass_g = rnorm(n,5))
#add month column
tmp <- tmp %>%
mutate(month = as.factor(ifelse(day%%30/30 != 0,day%/%30 +1,day%/%30)))
#make plot
tmp %>%
ggplot(aes(month,lenght_mm,col = month)) +
geom_boxplot() +
theme_bw()
I would add a new column with the full date:
as.Date(104, origin = "2014-01-01")
and from that you can group by month.
months(as.Date(104, origin = "2014-01-01"))
put together:
df %>% mutate(date = as.Date(day_of_the_year, origin = "2014-01-01"),
month = months(date))