Removing the non-business days from ggplot axis [duplicate] - r

This question already has answers here:
R + ggplot2: how to hide missing dates from x-axis?
(3 answers)
Closed 5 years ago.
I would like to plot the development of different indexes with ggplot2. My problem is, that 2018-02-03 and 2018-02-04 are non-working days, and thus there is no data available for these days, but when plotting ggplot2 adds them by extrapolating. How could I omit the non-business days, so that I get 2018-02-02 followed by 2018-02-05?
library(tidyverse)
library(quantmod)
#retrieve data
getSymbols("BTCUSD=X;^DJI;^VIX;^GDAXI", from="2017-01-01")
#merge all dataframes together
df <- merge(`BTCUSD=X`,`DJI`, all = TRUE)
df <- merge(df, `VIX`, all = TRUE)
df <- merge(df, `GDAXI`, all = TRUE)
#creating a dataframe with first column as date that comes from xts object extracted by index()
df <- data.frame(date=index(df), coredata(df))
#selecting columns and filtering the time series start date
df_1 <- df%>% select(date, contains("Close"))%>% na.omit() %>% filter(date>"2018-01-25")
#df_1 <- df_1 %>%mutate(BTCUSD.X.Close=BTCUSD.X.Close/BTCUSD.X.Close[1], DJI.Close=DJI.Close/DJI.Close[1], GDAXI.Close=GDAXI.Close/GDAXI.Close[1], VIX.Close=VIX.Close/VIX.Close[1])
df_1 <- df_1 %>% gather(var, closing, 2:5)
png("indexes.png", width = 9, height = 6, units = 'in', res = 600)
plot_1 <- ggplot(data=df_1)+
geom_line(aes(x=date, y=closing))+
facet_wrap(~var, scales = "free")+
scale_x_date(breaks = df_1$date, date_minor_breaks = "1 day", date_labels = "%y-%m-%d")+
theme(text = element_text(size=7), axis.text.x = element_text(angle = 90, hjust = 1))
plot_1
dev.off()
plot_1

The package bdscale was designed for this purpose, once added, you can substitute your scale_x_date line with:
scale_x_bd(
business.dates = df_1$date,
max.major.breaks = 10,
labels = date_format("%y-%m-%d")
)
To produce this plot...
Fully reproducible code
library(tidyverse)
library(quantmod)
library(bdscale)
library(scales)
getSymbols("BTCUSD=X;^DJI;^VIX;^GDAXI", from = "2017-01-01")
df <- merge(`BTCUSD=X`,`DJI`, all = TRUE) %>%
merge(`VIX`, all = TRUE) %>%
merge(`GDAXI`, all = TRUE)
df <- data.frame(date = index(df), coredata(df))
df_1 <- df %>%
select(date, contains("Close")) %>%
na.omit %>%
filter(date > "2018-01-25") %>%
gather(var, closing,2:5)
ggplot(data = df_1, aes(x = date, y = closing)) +
geom_line() +
facet_wrap(~var, scales = "free") +
scale_x_bd(business.dates = df_1$date,
max.major.breaks = 10,
labels = date_format("%y-%m-%d")) +
theme(text = element_text(size = 7),
axis.text.x = element_text(angle = 90, hjust = 1))

Related

How can I create multiple plots from same dataset in R?

Let me first share a dummy data, from which I want to prepare ggplot graphs.
library(tidyverse)
set.seed(1)
sample_size <- 1200
dates <- sample(seq(1,31),sample_size,replace = TRUE)
Monthss <- sample(seq(1,12),sample_size,replace = TRUE)
hrs <- sample(seq(1,23),sample_size,replace = TRUE)
minutes <- sample(seq(1,59),sample_size,replace = TRUE)
date_time_vector <- paste0(dates,"-",Monthss,"-",2022," ",hrs,":",minutes) |> lubridate::parse_date_time("dmy HM")
Conversion <- sample(c(TRUE,FALSE),sample_size, prob = c(0.25,0.75), replace = TRUE)
df <- data.frame(Date = date_time_vector, Conversion_Status = Conversion)
df <- df |> mutate(Leads = round(runif(sample_size, min = 0,max = 10),digits = 0))
df <- df[complete.cases(df), ]
The code above gives me a data.frame with columns Date, Leads and Conversion_Status. I want to prepare Monthly column chart of total leads per day. (For example, daily leads in January, daily leads in February, etc.) So, basically, I will need to split the data on the basis of Month, and prepare one chart for each month. How can I prepare such charts?
I have tried following way:
bar_function <- function(df, col1, col2, title) {
df %>%
ggplot2::ggplot(aes(x = {{col1}}, y = {{col2}})) +
ggplot2::geom_col(fill = "steelblue") +
theme(plot.background = element_rect(fill = "white")) +theme(plot.title = element_text(hjust = 0.5))+coord_flip() +
ggplot2::labs(title = title)
}
mycharts <- df |> dplyr::nest_by(Month) |> dplyr::mutate(plot = bar_function(df,Date,Leads,"Daily Leads by Month"))
But it is giving me errors.
You can split according to month(year) and plot that.
library(ggplot2)
library(lubridate)
set.seed(1)
sample_size <- 1200
dates <- sample(seq(1,31),sample_size,replace = TRUE)
Monthss <- sample(seq(1,12),sample_size,replace = TRUE)
hrs <- sample(seq(1,23),sample_size,replace = TRUE)
minutes <- sample(seq(1,59),sample_size,replace = TRUE)
date_time_vector <- paste0(dates,"-",Monthss,"-",2022," ",hrs,":",minutes) |> lubridate::parse_date_time("dmy HM")
Conversion <- sample(c(TRUE,FALSE),sample_size, prob = c(0.25,0.75), replace = TRUE)
df <- data.frame(Date = date_time_vector, Conversion_Status = Conversion)
df$Leads <- round(runif(sample_size, min = 0,max = 10),digits = 0)
df <- df[complete.cases(df), ]
df$month_year <- strftime(df$Date, format = "%m-%Y")
df.split <- split(df, f = df$month_year)
out <- vector("list", length(df.split))
names(out) <- names(df.split)
for (i in seq_along(df.split)) {
out[[i]] <- ggplot(data = df.split[[i]], mapping = aes(x = Date, y = Leads)) +
geom_col(fill = "steelblue") +
theme(plot.background = element_rect(fill = "white")) +
theme(plot.title = element_text(hjust = 0.5))+
coord_flip() +
labs(title = "Daily leads by month")
}
To plot you can just print e.g. out[[1]].
If you want to change the desired columns dynamically, you can use aes_string for mapping. This can naturally be wrapped into sapply and there are probably other ways of approaching the problem. The for loop is pretty agnostic and I find that it's readable even by people who do not dabble in R (compared to say sapply).
There are some issues with your code. First, your dataset has no Month column, i.e. you have to add it for which I use lubridate::month. Second, you are passing the dataset df to your bar function instead of the splitted data column from your nested df. Third, in the mutate step you have to wrap the result in list():
library(ggplot2)
library(dplyr, warn=FALSE)
mycharts <- df |>
nest_by(Month = lubridate::month(Date)) |>
mutate(plot = list(bar_function(data, Date, Leads, "Daily Leads by Month")))
mycharts$plot[[1]]
mycharts$plot[[5]]
I finally found an answer. I used following code:
lapply(split(df, df$Month),
function(x)
ggplot(x, aes(x=Date, y=Leads)) +
geom_col(fill = "steelblue") + coord_flip()+
ggtitle(x$Month[1]))
Thank you all for your support.

How to adjust the the y axis like this in ggplot2?

Here is the codes and the present outplot
df <- data.frame(state = c('0','1'),
male = c(26287942,9134784),
female = c(16234000,4406645))
#output
> df
state male female
1 0 26287942 16234000
2 1 9134784 4406645
library(ggplot2)
library(tidyr)
df_long <- pivot_longer(df, cols = c("female","male"))
names(df_long) <- c('state','sex','observations')
ggplot(data = df_long) +
geom_col(aes(x = sex, y =observations, fill = state)) +
theme(legend.position = c(0.1,0.9),
legend.background = element_rect(fill='lightgrey') )
I want to adjust the plots like this. (I marked what I want to change.)
Simplify the scientific records in y-axis.
Count the ratio (the number of state 1)/(the number of state 0 + state 1) and plot like this.
It may be a little complicated, and I don't know which functions to use. If possible, can anyone tell me some related functions or examples?
You can set options(scipen = 99) to disable scientific notation on y-axis. We can create a separate dataset for label data.
library(tidyverse)
options(scipen = 99)
long_data <- df %>%
pivot_longer(cols = c(male, female),
names_to = "sex",
values_to = "observations")
label_data <- long_data %>%
group_by(sex) %>%
summarise(perc = observations[match(1, state)]/sum(observations),
total = sum(observations), .groups = "drop")
ggplot(long_data) +
geom_col(aes(x = sex, y = observations, fill = state)) +
geom_text(data = label_data,
aes(label = round(perc, 2), x = sex, y = total),
vjust = -0.5) +
theme(legend.position = c(0.1,0.9),
legend.background = element_rect(fill='lightgrey'))
By searching the Internet for about two days, I have finished the work!
sex <- c('M','F')
y0 <- c(26287942,16234000)
y1 <- c(9134784, 4406645)
y0 <- y0*10^{-7}
y1 <- y1*10^{-7}
ratio <- y1/(y0+y1)
ratio <- round(ratio,2)
m <- t(matrix(c(y0,y1),ncol=2))
colnames(m) <- c(as.character(sex))
df <- as.data.frame(m)
df <- cbind(c('0','1'),df)
colnames(df)[1] <- 'observations'
df
df_long <- pivot_longer(df, cols = as.character(sex))
names(df_long) <- c('state','sex','observations')
df_r <- as.data.frame(df_long)
df_r <- data.frame(df_r,ratio=rep(ratio,2))
ggplot(data = df_r) +
geom_col(aes(x =sex, y = observations, fill = state))+
theme(legend.position = c(0.1,0.9),
legend.background = element_rect(fill=NULL) )+
geom_line(aes(x=sex,y=ratio*10),group=1)+
geom_point(aes(x=sex,y=ratio*10))+
geom_text(aes(x=sex,y=ratio*10+0.35),label=rep(ratio,2))+
scale_y_continuous(name =expression(paste('observations(','\u00D7', 10^7,')')),
sec.axis = sec_axis(~./10,name='ratio'))
The output:

How to fix "Breaks and labels are different lengths" when using ggplot2 for faceted plots?

Consider the following example:
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
df <- data.frame(x = seq(0, 70, 0.5),
y = seq(0, 70, 0.5),
val = rnorm(141),
group =rep(1:3,47))
max_val_plot <- df$val %>% max() %>% round(0)
min_val_plot <--df$val %>% min() %>% round(0)
breaks_plot <-seq(min_val_plot,max_val_plot,0.1)
n <- breaks_plot %>% length()
getPalette <- colorRampPalette(brewer.pal(9, "RdBu"))
colors_plot <-getPalette(n)
labels_plot <- breaks_plot %>%
as.character()
labels_plot[!1:0]=' '
df %>%
ungroup() %>%
ggplot(aes(x=x,y=y,fill=val))+
geom_raster()+
facet_grid(~group)+
theme_bw(base_size = 20)+
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = labels_plot
)
Although labels and breaks are of equal length, the error "Breaks and labels are different lengths" is returned due to the presence of multiple groups and the faceted function in plotting code.
How can I fix this?
Thanks!
One option to fix your issue would be to pass a function to the labels argument of scale_fill_xxx to create the labels on the fly instead of providing the labels as a vector.
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
set.seed(123)
df %>%
ungroup() %>%
ggplot(aes(x = x, y = y, fill = val)) +
geom_raster() +
facet_grid(~group) +
theme_bw(base_size = 20) +
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = function(x) { x <- as.character(x); x[!1:0] <- " "; x}
)

Line plot with multible lines with data from Excel

I have a table in Excel that show which weapon where used to commit murder per year.
I want to read the data and plot it as a line plot with multiple lines:
However my code only gives me a gibberish graph:
This is my code:
library("readxl")
library(data.table)
library(ggplot2)
# Read excel data
res <- as.data.frame(readxl::read_excel("murders_per_modus_veapon.xlsx", sheet = 1))
res$r = c('Unknown','Knife/stabbing weapon','Axe','Firearms','Suffocation','Blunt violence','Other')
res = reshape::melt(res)
# Plot data
ggplot(res, aes(x=variable,y=value))+ geom_line()
You need to convert your variable column from factor to date. See below;
library(reshape2)
library(dplyr)
library(lubridate)
library(ggplot2)
reshape2::melt(res, value.name = "MR", variable.name = "Year") %>%
mutate(Year = make_date(as.character(Year), 1, 1)) %>%
ggplot(., aes(x=Year, y=MR, color = Modus)) +
geom_line() +
scale_x_date(date_breaks = "1 year",date_labels = "%Y")
#> Using Modus as id variables
Created on 2021-05-25 by the reprex package (v2.0.0)
Data:
read.table(text = "Modus 2018 2019 2020
Unknown 2 0 0
Knife_stabbing_weapon 8 14 16
Axe 1 1 0
Firearms 3 3 2
Suffocation 2 5 6
Blunt_violence 8 3 4
Other 1 1 0",
header = T, stringsAsFactors = F, check.names = F) -> res
You can still run this part from your own code to import the data, and then use mine for plotting;
res <- as.data.frame(readxl::read_excel("murders_per_modus_veapon.xlsx", sheet = 1))
res$r = c('Unknown','Knife/stabbing weapon','Axe','Firearms',
'Suffocation','Blunt violence','Other')
Since you have not provided reproducible data, I created tiny data that may look like a small subset of your original xlsx data:
df = data.frame(weapon = c("Unknown", "knife", "axe"),
x2018 = c(2, 8, 1),
x2019 = c(0, 14, 1),
x2020 = c(0, 16, 0))
Then, I make the data tidy using dplyr and tidyr. Finally I produce a line plot that you may be looking for:
df %>%
pivot_longer(cols = 2:4, names_to = "year", values_to = "amount") %>%
mutate(year = gsub("x", "", year)) %>%
ggplot(aes(as.numeric(year), amount, col = weapon)) +
geom_line() +
scale_x_continuous(breaks = c(2018, 2019, 2020))
This could be an option. Thanks to M-- for the data.
library(ggplot2)
library(ggrepel)
df <- df %>%
pivot_longer(
cols = c("2018", "2019", "2020")
) %>%
mutate(label = if_else(name == max(name), as.character(Modus), NA_character_)) %>%
mutate(name = as.factor(name))
p <- ggplot(df, aes(x=name, y=value, colour=Modus, group=Modus)) +
geom_point () +
geom_line(size = 0.8) +
theme_bw()
p + geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = TRUE) +
theme(legend.position = "none")

exclude weekends from x axis in heatmap

I have coded a heatmap using ggplot tiles and it has sequencial days on the x axis . The problem I am trying to solve is to remove weekends from the heatmap and show only weekdays. I have found that one solution would be to transform the dates into factors but if I do that how can I format the labels in scale_x_discrete to be in %d%m date format ? Is there a way to keep the dates as date format instead of turning it into factors ?
Below is an example:
randomString <- function(n=5,length=3) {
randomStringX <- c(1:n)
for(i in 1:n) {
randomStringX[i] <- paste(sample(c(LETTERS),length,replace = TRUE),collapse = "")
}
return(randomStringX)
}
randomString()
data.frame(client=randomString(),day=rep(seq.Date(Sys.Date()-10,length.out=10,by="1 day"),2)) %>% mutate(sales=round(rnorm(20,492,300),1)) %>% mutate(scale=cut(sales,breaks=c(0,100,200,300,max(sales)),labels = c("0-100","100-200","200-300","+300"))) %>% ggplot(.,aes(x=day,y=client,fill=scale)) + geom_tile() + scale_x_date(date_breaks = "1 day")
Thanks in advance
You can exclude data from weekends using the is.weekend function from chron
The weekend dates themselves can be excluded from an x-axis using the bdscale package
library(chron)
library(bdscale)
library(scales)
library(ggplot2)
library(dplyr)
df <- as.data.frame(client = randomString(), day = rep(seq.Date(
Sys.Date() - 10, length.out = 10, by = "1 day"), 2)) %>%
mutate(sales = round(rnorm(20, 492, 300), 1)) %>%
mutate(scale =
cut(
sales,
breaks = c(0, 100, 200, 300, max(sales)),
labels = c("0-100", "100-200", "200-300", "+300")
)) %>%
filter(is.weekend(day) == FALSE)
ggplot(df, aes(x = day, y = client, color = scale, fill = scale)) +
geom_tile() +
# scale_x_date(date_breaks = "1 day") +
theme(axis.text.x = element_text(angle = 45)) +
scale_x_bd(business.dates = sort(df$day), max.major.breaks = 30, labels=scales::date_format('%d %b'))
Removing data from weekends can also be done using lubridate and the wday function as
filter(!wday(day) %in% c(1,7))
Sun/Sat are stored as 1 and 7 respectively. - Credit to #AHart

Resources