I am using the built-in economics (from the ggplot2 package) dataset in R, and have plotted a time-series for each variable in the same graph using the following code :
library(reshape2)
library(ggplot2)
me <- melt(economics, id = c("date"))
ggplot(data = me) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_y')
Now, I further want to refine my graph, For each series, I want to display a red point for the smallest and the largest value.
So I thought if I could find the co-ordinates of the min and max of each time-series, I could find a way to plot a red dot at beginning and ending of each time series. For this I used the following code :
which(pce == min(economics$pce), arr.ind = TRUE)
which(pca == max(pca), arr.ind = TRUE)
This doesnt really lead me anywhere.
Thank you:)
Method 1: Using Joins
This can be nice when you want to save the filtered subsets
library(reshape2)
library(ggplot2)
library(dplyr)
me <- melt(economics, id=c("date"))
me %>%
group_by(variable) %>%
summarise(min = min(value),
max = max(value)) -> me.2
left_join(me, me.2) %>%
mutate(color = value == min | value == max) %>%
filter(color == TRUE) -> me.3
ggplot(data=me, aes(x = date, y = value)) +
geom_line() +
geom_point(data=me.3, aes(x = date, y = value), color = "red") +
facet_wrap(~variable, ncol=1, scales='free_y')
Method 2: Simplified without Joins
Thanks #Gregor
me.2 <- me %>%
group_by(variable) %>%
mutate(color = (min(value) == value | max(value) == value))
ggplot(data=me.2, aes(x = date, y = value)) +
geom_line() +
geom_point(aes(color = color)) +
facet_wrap(~variable, ncol=1, scales="free_y") +
scale_color_manual(values = c(NA, "red"))
Related
I like to plot the time series of my data. However there are some gaps in the date value like in the example below. The following code produces the plot disregarding the missing date. How can I show the missing date i.e. show a gap between 2021-01-02 and 2021-01-04 and similarly 2021-01-06 and 2021-01-08.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08')),
stringsAsFactors = FALSE
)
fake.data %>%
gather(variable, value,varA:varC) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area()
I guess the easiest would be to fake the gaps, e.g., with geom_rect.
Consider that "gaps in data" are actually inherent to most use of line / area graphs - some purists might actually be totally against showing lines / areas for non-continuous measurements, because it suggests continuous measurements. Thus, because it is interpolated anyways, you could argue that you might as well not need to show those gaps.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08'))
) %>% pivot_longer(cols = matches("^var"), names_to = "variable", values_to = "value" )
ls_data <- setNames(fake.data %>%
complete(start_date = full_seq(start_date, 1)) %>%
split(., is.na(.$variable)), c("vals", "missing"))
ggplot(ls_data$vals, aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_rect(data = ls_data$missing, aes(xmin = start_date-.5, xmax = start_date+.5,
ymin = 0, ymax = Inf), fill = "white") +
theme_classic()
Created on 2021-04-21 by the reprex package (v2.0.0)
Considering the above - I'd possibly favour not explicitly showing the gaps, but to show the measurements more explicitly. E.g., with geom_point.
fake.data %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_point(position = "stack") +
geom_line(position = "stack")
is this close to what you wish ?
todateseq<-fake.data %>%
select(start_date) %>%
pull
first <- min(todateseq)
last <- max(todateseq)
date_seq <- seq.Date(first,last,by='day')
fake.data2 <- data.frame(start_date=date_seq) %>%
left_join(fake.data,by='start_date')
fake.data2 %>%
gather(variable, value,varA:varC) %>%
mutate(value=ifelse(is.na(value),0,value)) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area(na.rm = F,position = position_stack())
I'm wanting to calculate interpolated x-intercepts for lines like these with geom_hline:
library(dplyr)
library(ggplot2)
g1=data.frame(grp="1", x=seq(1,50,1), y=rnorm(50,5,1))
g2=data.frame(grp="2", x=seq(1,30,1), y=rnorm(30,8,2))
g3=data.frame(grp="3", x=seq(1,45,1), y=rnorm(45,10,1))
comb.dat=rbind(g1,g2,g3)
plot.dat=comb.dat %>% group_by(grp) %>% mutate(cum=cumsum(y)/sum(y))
p1=ggplot(plot.dat, aes(x = x, y = cum, color=grp)) +geom_line()+ geom_hline(yintercept=.5, linetype="dashed", color = "black")
print(p1)
hline=data.frame(plot.dat %>% group_by(grp) %>% mutate(test=cum>0.49 & cum<0.51))#
print(hline[hline$test==T,])# only works when an exact value in the range exists
F1=approxfun(plot.dat$cum,plot.dat$x,ties=mean)
#data.frame(plot.dat %>% group_by(grp) %>% mutate(F1(0.50)))#works only on first group
g1b=plot.dat[plot.dat$grp=="1",]
F2=approxfun(g1b$cum,g1b$x)
F2(0.5)#works but inefficient
I have a lot of these plots and I am looking for the most efficient way to find the x-intercepts with the horizontal line for each factor level so the intercept values can be compared to each other and annotated to the plot. I thought there was a logic approach but then I realized I need interpolation, probably using approxfun. I have not found a way to do this without breaking the groups out of the data frame and doing it one by one...Thanks for setting me straight.
Here's a dplyr approach using base::approx.
x_seq = seq(1, 50, by = 0.01)
intersections <- plot.dat %>%
group_by(grp) %>%
summarise(interpolated = approx(x = x, y = cum, xout = x_seq)$y) %>%
mutate(x_seq = x_seq) %>%
slice_min(abs(interpolated - 0.5))
ggplot(plot.dat, aes(x = x, y = cum, color=grp)) +
geom_line() +
geom_hline(yintercept=.5, linetype="dashed", color = "black") +
geom_point(data = intersections, aes(x_seq, interpolated), size = 3) +
geom_text(data = intersections, aes(x_seq, interpolated, label = x_seq), vjust = -1)
I have a ggplot with facets and colors. The colors are related to "ID" and the columns of the facets are related to "Type". One ID is always in the same Type but there are a different numbers of IDs in each Type. I would like to reset the colors with each column of the facets to have a bigger difference in the colors.
ggplot(data = plt_cont_em, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(ID))) +
facet_grid(Schadstoff_ID ~ Type, scales = "free_y")
Now it looks like:
I understand, that I have to introduce a dummy var for the color. But is there an easy way of numerating the IDs in each Type, starting in each Type with 1?
Since the data is confidential, I created dummy data that shows the same problem.
ID<-c()
Type<-c()
Jahr<-c()
Schadstoff<-c()
Monat<-c()
Konz<-c()
for (i in 1:25){
#i = ID
#t = Type
t<-sample(c("A","B","C"),1)
for (j in 1:5){
#j = Schadstoff
if(runif(1)<0.75){
for(k in 2015:2020){
#k = Jahr
for(l in 1:12){
#l = Monat
if(runif(1)<0.9){
ID<-c( ID,i)
Type<-c( Type,t)
Jahr<-c( Jahr,k)
Schadstoff<-c( Schadstoff,j)
Monat<-c( Monat,l)
Konz<-c( Konz,runif(1))
}
}
}
}
}
}
tmp<- data.frame(ID,Type, Jahr, Schadstoff, Monat, Konz)
tmp<-tmp %>% group_by( Type) %>% mutate( Color=row_number())
p<-ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(Color)), size=0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
p
Problem still exists, that the grouping doesn't work and Color is unique for each line.
Using dplyr you can group_by Type and create a new column with the dense_rank of the ID inside each group:
plt_cont_em %>%
group_by(Type) %>%
mutate(Type_ID = dense_rank()) %>%
ggplot() +
...
This will 'rank' each ID from smallest to biggest inside the group, keeping records with the same ID with the same value.
You will probabily then want to exclude the legend, as it'll have little sense.
library(dplyr)
library(ggplot2)
# Using provided random data
tmp <- tmp %>%
group_by(Type) %>%
mutate(Color = dense_rank(ID))
ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color = factor(Color)), size = 0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Created on 2020-04-02 by the reprex package (v0.3.0)
I'm a new stackoverflow user and can't comment currently on the original post to ask a question. I came across a previous stackoverflow answer (https://stackoverflow.com/a/34045068/11799491) and I was wondering how you would add two vertical lines (mean of the group and median of the group) to this graph here.
My attempt: I don't know how to add in the group variable "type"
geom_vline(aes(xintercept = mean(diff), ), color="black") +
geom_vline(aes(xintercept = median(diff), ), color="red")
There are a few different ways to do this, but I like creating a separate summarized data frame and then passing that into the geom_vline call. This lets you analyze the results and makes it easy to add multiple lines that are automatically sorted and colored by type:
library(tidyverse)
df <-
tibble(
x = rnorm(40),
category = rep(c(0, 1), each = 20)
)
df_stats <-
df %>%
group_by(category) %>%
summarize(
mean = mean(x),
median = median(x)
) %>%
gather(key = key, value = value, mean:median)
df %>%
ggplot(aes(x = x)) +
geom_histogram(bins = 20) +
facet_wrap(~ category) +
geom_vline(data = df_stats, aes(xintercept = value, color = key))
The easiest way is to pre-compute the means and the medians by groups of type. I will do it with aggregate.
agg <- aggregate(diff ~ type, data, function(x) {
c(mean = mean(x), median = median(x))
})
agg <- cbind(agg[1], agg[[2]])
agg <- reshape2::melt(agg, id.vars = "type")
library(ggplot2)
ggplot(data, aes(x = diff)) +
geom_histogram() +
geom_vline(data = agg, mapping = aes(xintercept = value,
color = variable)) +
facet_grid(~type) +
theme_bw()
I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)