Draw the space between two lines for anomalies - r

Hi I want to draw the space between two lines with red and blue( representing the anomalies), but I don't succeed to make it. Only blue anomaly is drawn.
Here is my code :
library(RCurl)
t <- getURL("https://raw.githubusercontent.com/vladamihaesei/weather_covid/master/Tab/UVanomaly.csv")## if can not download it, try manually
t <- read.csv(text =t)
head(t)
ggplot(t, aes(x=data, y=uv1920)) +
geom_line(aes(y = uv1920)) +
geom_line(aes(y = uv01_19)) +
geom_ribbon(data=subset(t, uv1920 <= uv01_19),
aes(ymin=uv1920,ymax=uv01_19), fill="blue") +
#scale_y_continuous(expand = c(0, 0), limits=c(0,20)) +
#scale_x_continuous(expand = c(0, 0), limits=c(0,5)) +
scale_x_date(date_breaks = "2 weeks", date_labels = "%d%b")+
scale_fill_manual(values=c("red","blue"))

One approach to get the space between the lines filled may look like so.
The basic idea is to split the data in periods which can be mapped on the group aes.
Unfortunately this is not a perfect solution. As you can see we get gaps at the intersection points. I've done something similar lately with a lot of manual work to fill the gaps but far less intersection points. Maybe someone else has a more general and feasible solution to tackle this issue.
t <- read.csv("https://raw.githubusercontent.com/vladamihaesei/weather_covid/master/Tab/UVanomaly.csv")
library(ggplot2)
library(dplyr)
t1 <- t %>%
mutate(date = as.Date(data1),
diff = uv1920 <= uv01_19,
period = cumsum(diff != lag(diff, default = TRUE)))
t1 %>%
ggplot(aes(x=date)) +
geom_line(aes(y = uv1920)) +
geom_line(aes(y = uv01_19)) +
geom_ribbon(aes(ymin =uv1920, ymax=uv01_19, group = period, fill = diff)) +
scale_x_date(date_breaks = "2 weeks", date_labels = "%d%b")+
scale_fill_manual(values=c("red","blue"))

Related

ggplot geom_col: making certain axis count integers rather than summing

I am currently making a hate crime case study. For my plot I am using one zip-code as my y-axis and plotting how many crimes and what group is being targeted on the x-axis using geom-col. The problem is my y-axis is adding the zip-codes together rather than counting each frequency of how many times the zip-code shows up. Here is my dataset looks like:
structure(list(ID = 1:5, CRIME_TYPE = c("VANDALISM", "ASSAULT", "VANDALISM", "ASSAULT",
"OTHER"), BIAS_MOTIVATION_GROUP = c("ANTI-BLACK ",
"ANTI-BLACK ", "ANTI-FEMALE HOMOSEXUAL (LESBIAN) ",
"ANTI-MENTAL DISABILITY ", "ANTI-JEWISH "),
ZIP_CODE = c(40291L, 40219L, 40243L, 40212L, 40222L
)), row.names = c(NA, 5L), class = "data.frame")
Here is my code:
library(ggplot2)
df <- read.csv(file = "LMPD_OP_BIAS.csv", header = T)
library(tidyverse)
hate_crime <- df %>%
filter(ZIP_CODE == "40245")
hate_crime_plot <- hate_crime %>%
ggplot(., aes(x = BIAS_MOTIVATION_GROUP, y = ZIP_CODE, fill =
BIAS_MOTIVATION_GROUP)) +
geom_col() + labs(x = "BIAS_MOTIVATION_GROUP", fill = "BIAS_MOTIVATION_GROUP") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_plot)
hate_crime_ploter <- hate_crime %>%
ggplot(., aes(x = UOR_DESC, y = ZIP_CODE, fill =
UOR_DESC)) +
geom_col() + labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_ploter)
For full data visit here: visit site to download data set
Alright, I think you've got a couple issues here. What's happening in your code is you're asking ggplot to make a bar plot with a categorical variable (BIAS_MOTIVATION_GROUP and UOR_DESC) on the x-axis and a continuous variable (ZIP_CODE) on the y-axis. Since there are more than one row per x-y combination, ggplot adds things together by x value, which is what you'd expect out of a bar plot. Long story short, I wonder if what you actually want is a histogram here. Your dataset (hate_crime) only has one value of ZIP_CODE, so I'm not sure what plotting ZIP on the y-axis is supposed to visualize. A histogram would look like this:
hate_crime %>%
ggplot(., aes(x = UOR_DESC, , fill = UOR_DESC)) +
geom_histogram(stat = "count") +
labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
If, instead, you're trying to visualize how often each ZIP code shows up in each category, you'd have to approach things differently. Perhaps you're looking for something like this?
df %>%
ggplot(aes(x = UOR_DESC, fill = factor(ZIP_CODE))) +
geom_histogram(stat = "count") +
theme(axis.text.x=element_text (angle =45, hjust =1))

How do I correctly connect data points ggplot

I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.
Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path

ggplot2 comparation of time period

I need to visualize and compare the difference in two equally long sales periods. 2018/2019 and 2019/2020. Both periods begin at week 44 and end at week 36 of the following year. If I create a graph, both periods are continuous and line up. If I use only the week number, the values ​​are sorted as continuum and the graph does not make sense. Can you think of a solution?
Thank You
Data:
set.seed(1)
df1 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2018-11-01"), as.Date("2019-08-31"), by = "1 week")),
period = "18/19")
df2 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2019-11-01"), as.Date("2020-08-31"), by = "1 week")),
period = "19/20")
# Yearweek on x axis, when both period are separated
ggplot(df1, aes(YW, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
# week on x axis when weeks are like continuum and not splited by year
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
Another alternative is to facet it. This'll require combining the two sets into one, preserving the data source. (This is commonly a better way of dealing with it in general, anyway.)
(I don't have tstibble, so my YW just has seq(...), no yearweek. It should translate.)
ggplot(dplyr::bind_rows(tibble::lst(df1, df2), .id = "id"), aes(YW, sells)) +
geom_line(aes(color = id)) +
facet_wrap(id ~ ., scales = "free_x", ncol = 1)
In place of dplyr::bind_rows, one might also use data.table::rbindlist(..., idcol="id"), or do.call(rbind, ...), though with the latter you will need to assign id externally.
One more note: the default formatting of the x-axis is obscuring the "year" of the data. If this is relevant/important (and not apparent elsewhere), then use ggplot2's normal mechanism for forcing labels, e.g.,
... +
scale_x_date(labels = function(z) format(z, "%Y-%m"))
While unlikely that you can do this without having tibble::lst available, you can replace that with list(df1=df1, df2=df2) or similar.
If you want to keep the x axis as a numeric scale, you can do:
ggplot(df1, aes((week + 9) %% 52, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
scale_x_continuous(breaks = 1:52,
labels = function(x) ifelse(x == 9, 52, (x - 9) %% 52),
name = "week") +
labs(color="Legend text")
Try this. You can format your week variable as a factor and keep the desired order. Here the code:
library(ggplot2)
library(tsibble)
#Data
df1$week <- factor(df1$week,levels = unique(df1$week),ordered = T)
df2$week <- factor(df2$week,levels = unique(df2$week),ordered = T)
#Plot
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19",group=1)) +
geom_line(data=df2, aes(color="Period 19/20",group=1)) +
labs(color="Legend text")
Output:

How to show the ticks' labels on an axis while having single data point in R using ggplot?

I have a dataframe with one row, i'd like to show it when the horizontal axis is of type datetime. for some reason when I have a single dot, there are no ticks on the horizontal axis.
table_hr_tags_per_bin <- data.frame(matrix(c("2018-11-21 12:40:35", "25"),nrow = 1,ncol = 2))
colnames(table_hr_tags_per_bin) <-c('StartTimeStamp', 'cars')
plot_conf = ggplot() +
geom_point(data = table_hr_tags_per_bin, aes_string(x='StartTimeStamp', y= "cars"),colour = "red", size=3) +
labs(subtitle="plot_name",
y="y_axis_name",
x="Time",
title="my mitle",
caption = "") +
theme(axis.text.x = element_text(angle = 80, hjust = 1)) +
scale_x_datetime(date_breaks = paste0(4," sec"), label=function(x) substr(x,12,19))+
scale_y_continuous(breaks=waiver())
plot(plot_conf)
The problematic output is shown below:
Any suggestion would be helpful!
Maybe I am wrong in anticipating what you mean, if not, I think your datetime and scale_x_datetime use is not right.
If you use lubridate package and the right format for dates, it probably is much easier to get what you want. I have added a second date with a second value for coming nearer to what you wanted with just showing one single point.
library(lubridate)
df <- tibble(dt=c("2018-11-21T12:40:35",
"2018-11-22T12:41:35"),
value=c("25", "26"))
ggplot(df %>% filter(dt < "2018-11-22T12:41:35"), aes(dt, value)) + geom_point()

ggplot, facet, piechart: placing text in the middle of pie chart slices

I'm trying to produce a facetted pie-chart with ggplot and facing problems with placing text in the middle of each slice:
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1), y=Cnt, label=Cnt, ymax=Cnt),
position=position_fill(width=1))
The output:
What parameters of geom_text should be adjusted in order to place numerical labels in the middle of piechart slices?
Related question is Pie plot getting its text on top of each other but it doesn't handle case with facet.
UPDATE: following Paul Hiemstra advice and approach in the question above I changed code as follows:
---> pie_text = dat$Cnt/2 + c(0,cumsum(dat$Cnt)[-length(dat$Cnt)])
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1),
---> y=pie_text,
label=Cnt, ymax=Cnt), position=position_fill(width=1))
As I expected tweaking text coordiantes is absolute but it needs be within facet data:
NEW ANSWER: With the introduction of ggplot2 v2.2.0, position_stack() can be used to position the labels without the need to calculate a position variable first. The following code will give you the same result as the old answer:
ggplot(data = dat, aes(x = "", y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
To remove "hollow" center, adapt the code to:
ggplot(data = dat, aes(x = 0, y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
scale_x_continuous(expand = c(0,0)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
OLD ANSWER: The solution to this problem is creating a position variable, which can be done quite easily with base R or with the data.table, plyr or dplyr packages:
Step 1: Creating the position variable for each Channel
# with base R
dat$pos <- with(dat, ave(Cnt, Channel, FUN = function(x) cumsum(x) - 0.5*x))
# with the data.table package
library(data.table)
setDT(dat)
dat <- dat[, pos:=cumsum(Cnt)-0.5*Cnt, by="Channel"]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(Channel), transform, pos=cumsum(Cnt)-0.5*Cnt)
# with the dplyr package
library(dplyr)
dat <- dat %>% group_by(Channel) %>% mutate(pos=cumsum(Cnt)-0.5*Cnt)
Step 2: Creating the facetted plot
library(ggplot2)
ggplot(data = dat) +
geom_bar(aes(x = "", y = Cnt, fill = Volume), stat = "identity") +
geom_text(aes(x = "", y = pos, label = Cnt)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
The result:
I would like to speak out against the conventional way of making pies in ggplot2, which is to draw a stacked barplot in polar coordinates. While I appreciate the mathematical elegance of that approach, it does cause all sorts of headaches when the plot doesn't look quite the way it's supposed to. In particular, precisely adjusting the size of the pie can be difficult. (If you don't know what I mean, try to make a pie chart that extends all the way to the edge of the plot panel.)
I prefer drawing pies in a normal cartesian coordinate system, using geom_arc_bar() from ggforce. It requires a little bit of extra work on the front end, because we have to calculate angles ourselves, but that's easy and the level of control we get as a result is more than worth it.
I've used this approach in previous answers here and here.
The data (from the question):
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
The pie-drawing code:
library(ggplot2)
library(ggforce)
library(dplyr)
# calculate the start and end angles for each pie
dat_pies <- left_join(dat,
dat %>%
group_by(Channel) %>%
summarize(Cnt_total = sum(Cnt))) %>%
group_by(Channel) %>%
mutate(end_angle = 2*pi*cumsum(Cnt)/Cnt_total, # ending angle for each pie slice
start_angle = lag(end_angle, default = 0), # starting angle for each pie slice
mid_angle = 0.5*(start_angle + end_angle)) # middle of each pie slice, for the text label
rpie = 1 # pie radius
rlabel = 0.6 * rpie # radius of the labels; a number slightly larger than 0.5 seems to work better,
# but 0.5 would place it exactly in the middle as the question asks for.
# draw the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt),
hjust = 0.5, vjust = 0.5) +
coord_fixed() +
scale_x_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To show why I think this this approach is so much more powerful than the conventional (coord_polar()) approach, let's say we want the labels on the outside of the pie rather than inside. This creates a couple of problems, such as we will have to adjust hjust and vjust depending on the side of the pie a label falls, and also we will have to make the
plot panel wider than high to make space for the labels on the side without generating excessive space above and below. Solving these problems in the polar coordinate approach is not fun, but it's trivial in the cartesian coordinates:
# generate hjust and vjust settings depending on the quadrant into which each
# label falls
dat_pies <- mutate(dat_pies,
hjust = ifelse(mid_angle>pi, 1, 0),
vjust = ifelse(mid_angle<pi/2 | mid_angle>3*pi/2, 0, 1))
rlabel = 1.05 * rpie # now we place labels outside of the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt,
hjust = hjust, vjust = vjust)) +
coord_fixed() +
scale_x_continuous(limits = c(-1.5, 1.4), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To tweak the position of the label text relative to the coordinate, you can use the vjust and hjust arguments of geom_text. This will determine the position of all labels simultaneously, so this might not be what you need.
Alternatively, you could tweak the coordinate of the label. Define a new data.frame where you average the Cnt coordinate (label_x[i] = Cnt[i+1] + Cnt[i]) to position the label in the center of that particular pie. Just pass this new data.frame to geom_text in replacement of the original data.frame.
In addition, piecharts have some visual interpretation flaws. In general I would not use them, especially where good alternatives exist, e.g. a dotplot:
ggplot(dat, aes(x = Cnt, y = Volume)) +
geom_point() +
facet_wrap(~ Channel, ncol = 1)
For example, from this plot it is obvious that Cnt is higher for Kiosk than for Agent, this information is lost in the piechart.
Following answer is partial, clunky and I won't accept it.
The hope is that it will solicit better solution.
text_KIOSK = dat$Cnt
text_AGENT = dat$Cnt
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
text_KIOSK = text_KIOSK/1.7 + c(0,cumsum(text_KIOSK)[-length(dat$Cnt)])
text_AGENT = text_AGENT/1.7 + c(0,cumsum(text_AGENT)[-length(dat$Cnt)])
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
pie_text = text_KIOSK + text_AGENT
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position=position_fill(width=1)) +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(y=pie_text, label=format(Cnt,format="d",big.mark=','), ymax=Inf), position=position_fill(width=1))
It produces following chart:
As you noticed I can't move labels for green (low).

Resources