I need help with plotting > 741 lines in a ggplot.
The color of one specific line should not change, e.g. the color line should be assigned only by the final value of eci.
I would want to display the name (in the code example “unit”) of each line at the beginning and the end of each line
Of course over 700 lines are hard to distinguish with the bare eye but any suggestions how to make the lines more distinguishable?
df <- data.frame(unit=rep(1:741, 4),
year=rep(c(2012, 2013, 2014, 2015), each=741),
eci=round(runif(2964, 1, 741), digits = 0))
g = ggplot(data = df, aes(x=year, y=eci, group=unit)) +
geom_line(aes(colour=eci), size=0.01) +
scale_colour_gradientn(colours = terrain.colors(10)) +
geom_point(aes(colour=eci), size=0.04)
# The colour of the line should be determined by all eci for which year=2015
One way to achieve your desired result is creating new columns with extra information to use when plotting with ggplot2.
With dplyr, we group data by unit, and then arrange it, so we can create a column that stores the value of the last eci, and two columns with labels for the first and last year, so we can add them as text to the plot.
df_new <- df %>%
group_by(unit) %>%
arrange(unit, year, eci) %>%
mutate(last_eci = last(eci),
first_year = ifelse(year == 2012, unit, ""),
last_year = ifelse(year == 2015, unit, ""))
Then, we plot it.
ggplot(data = df_new,
aes(x = year, y = eci, group = unit, colour = last_eci)) +
geom_line(size = 0.01) +
geom_text(aes(label = first_year), nudge_x = -0.05, color = "black") +
geom_text(aes(label = last_year), nudge_x = 0.05, color = "black") +
scale_colour_gradientn(colours = terrain.colors(10)) +
geom_point(aes(colour = eci), size = 0.04)
Of course, looking at the resulting plot it's easy to see that trying to plot >700 lines of different colors and >1400 labels in a single plot is not very advisable.
I'd use relevant subsets of df, so we produce plots that helps us to better understand the data.
df_new %>%
filter(unit %in% c(1:10)) %>%
ggplot(data = .,
aes(x = year, y = eci, group = unit, colour = last_eci)) +
geom_line(size = 0.01) +
geom_text(aes(label = first_year), nudge_x = -0.05, color = "black") +
geom_text(aes(label = last_year), nudge_x = 0.05, color = "black") +
scale_colour_gradientn(colours = terrain.colors(10)) +
geom_point(aes(colour = eci), size = 0.04)
For better readability, I have opted for a 10-line example, using the directlabels-package.
library(ggplot2)
library(dplyr)
library(directlabels)
set.seed(95)
l <- 10
df1 <- data.frame(unit=rep(1:l, 4),
year=rep(c(2012, 2013, 2014, 2015), each=l),
eci=round(runif(4*l, 1, l), digits = 0))
df2 <- df1 %>% filter (year == 2015) %>% select(-year, end = eci)
df <- left_join(df1,df2, by = "unit")
g <-
ggplot(data = df, aes(x=year,
y=eci,
group=unit)) +
geom_line(aes(colour=end), size=0.01) +
scale_colour_gradientn(colours = terrain.colors(10)) +
geom_point(aes(colour=eci), size=0.04) +
geom_dl(aes(label = unit,color = end), method = list(dl.combine("first.points", "last.points"), cex = 0.8))
g
Half a year later, I think there is a much easier solution based on parcoord() applied to a wide df.
set.seed(95)
l <- 1000 # really 1000 observations per year this time
df1 <- data.frame(unit=rep(1:l, 4),
year=rep(c(2012, 2013, 2014, 2015), each=l),
eci=round(runif(4*l, 1, l), digits = 0))
df1 <- tidyr::spread(df1, year, eci) # change from long to wide
df1 <- df1 %>%
dplyr::arrange(desc(`2015`)) # Assign after which column (year) rows should be ordered
# create 10 different colrs which are repeated 100 times
my_colors=rep(terrain.colors(11)[-1], each=100)
parcoord(df1[, c(2:5)] , col= my_colors)
This is more efficient and easily scaleable.
Related
I am plotting max_temperature (mean_tmax) against rainfall (mean_rain) in a mirrored barplot: max temp displayed upwards, rain values downwards on the negative scale. These two are stored in the "name" variable.
To highlight the highest values out of the 32 years plotted, I created two vectors colVecTmax, colVecRain. They return a color vector of length 32 each, with the index of max values marked differently.
But when adding these two vectors to fill within geom_bar(), it turns out that ggplot stops counting the top after 16 bars, and moves down to the negative scale to continue. So it does not count by the name (mean_tmax, or mean_rain) variable.
This messes up the plot, and I am not sure how to get ggplot count through on the top bars for max_temperature first, coloring by colVecTmax, and then move down to do the same for rain on the negative scale with colVecRain.
Can anyone give a hint on how to solve this?
colVecTmax <- rep("orange",32)
colVecTmax[which.max(as.numeric(unlist(df.long[df.long$place=="sheffield" & df.long$name == "mean_tmax",4])))] <- "blue"
colVecRain <- rep("grey",32)
colVecRain[which.max(as.numeric(unlist(df.long[df.long$place=="sheffield" & df.long$name == "mean_rain",4])))] <- "blue"
ggplot(df.long[df.long$name %in% c('mean_rain', 'mean_tmax'), ] %>% filter(place== "sheffield")%>%
group_by(name) %>% mutate(value = case_when(
name == 'mean_rain' ~ value/10 * -1,
TRUE ~ value)) %>% mutate(place==str_to_sentence(placenames)) %>%
mutate(name = recode(name,'mean_rain' = "rainfall" , "mean_tmax" = "max temp"))
, aes(x = yyyy, y = value, fill=name))+
geom_bar(stat="identity", position="identity", fill=c(colVecTmax,colVecRain))+
labs(x="Year", y=expression("Rain in cm, temperature in ("*~degree *C*")"))+
geom_smooth(colour="black", lwd=0.5,se=F)+
scale_y_continuous(breaks = seq(-30, 30 , 5))+
scale_x_continuous(breaks = seq(1990, 2025, 5))+
guides(fill= guide_legend(title=NULL))+
scale_fill_discrete(labels=c("Max temperature", "Rainfall"))+
guides(fill=guide_legend(reverse=T), res=96)
Using ggplot2 there are much easier and less error prone ways to assign colors. Instead of creating color vectors which you pass to the color or fill argument you could simply map on aesthetics (which you basically already have done) and assign your desired colors using a manual scale, e.g. scale_fill_manual. The same approach works fine when you want to highlight some values. To this end you could create additional categories, e.g. in the code below I add "_max" to the name for the observations with the max temperature or rainfall and assign your desired "blue" color to these categories. As doing so will add additional categories I use the breaks argument of scale_fill_manual so that these max categories will not show up in the legend.
Using some fake random example data:
# Create example data
set.seed(123)
df.long <- data.frame(
name = rep(c("mean_rain", "mean_tmax"), each = 30),
place = "sheffield",
yyyy = rep(1991:2020, 2),
value = c(runif(30, 40, 100), runif(30, 12, 16))
)
library(ggplot2)
library(dplyr)
df_plot <- df.long %>%
filter(name %in% c("mean_rain", "mean_tmax")) |>
filter(place == "sheffield") %>%
mutate(value = case_when(
name == "mean_rain" ~ -value / 10,
TRUE ~ value
)) |>
# Maximum values
group_by(name) |>
mutate(name = ifelse(abs(value) >= max(abs(value)), paste(name, "max", sep = "_"), name))
ggplot(df_plot, aes(x = yyyy, y = value, fill = name)) +
geom_col(position = "identity") +
geom_smooth(colour = "black", lwd = 0.5, se = F) +
scale_y_continuous(breaks = seq(-30, 30, 5), labels = abs) +
scale_x_continuous(breaks = seq(1990, 2025, 5)) +
scale_fill_manual(
values = c(
mean_rain = "orange", mean_tmax = "grey",
mean_rain_max = "blue", mean_tmax_max = "blue"
),
labels = c(mean_tmax = "Max temperature", mean_rain = "Rainfall"),
breaks = c("mean_rain", "mean_tmax")
) +
labs(x = "Year", y = expression("Rain in cm, temperature in (" * ~ degree * C * ")"), fill = NULL) +
guides(fill = guide_legend(reverse = TRUE))
desired_output_sample
I have following data:
#1. dates of 15 day frequency:
dates = seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15) #96 times observation
#2. water content in crops corresponding to the times given.
water <- c(0.5702722, 0.5631781, 0.5560839, 0.5555985, 0.5519783, 0.5463459,
0.5511598, 0.546652, 0.5361545, 0.530012, 0.5360571, 0.5396569,
0.5683526, 0.6031535, 0.6417821, 0.671358, 0.7015542, 0.7177007,
0.7103561, 0.7036985, 0.6958607, 0.6775161, 0.6545367, 0.6380155,
0.6113306, 0.5846186, 0.5561815, 0.5251135, 0.5085149, 0.495352,
0.485819, 0.4730029, 0.4686458, 0.4616468, 0.4613918, 0.4615532,
0.4827496, 0.5149105, 0.5447824, 0.5776764, 0.6090217, 0.6297454,
0.6399422, 0.6428941, 0.6586344, 0.6507473, 0.6290631, 0.6011123,
0.5744375, 0.5313527, 0.5008027, 0.4770338, 0.4564025, 0.4464508,
0.4309046, 0.4351668, 0.4490393, 0.4701232, 0.4911582, 0.5162941,
0.5490387, 0.5737573, 0.6031149, 0.6400073, 0.6770058, 0.7048311,
0.7255012, 0.739107, 0.7338938, 0.7265202, 0.6940718, 0.6757214,
0.6460862, 0.6163091, 0.5743775, 0.5450822, 0.5057753, 0.4715266,
0.4469859, 0.4303232, 0.4187793, 0.4119401, 0.4201316, 0.426369,
0.4419331, 0.4757525, 0.5070846, 0.5248457, 0.5607567, 0.5859825,
0.6107531, 0.6201754, 0.6356589, 0.6336177, 0.6275579, 0.6214981)
I want to compute trend of the water content or moisture data corresponding to different subperiods. Lets say: one trend from 2016 - 09-01 to 2019-11-30.
and other trend from 2019-12-15 to the last date (in this case 2020-07-27).
And I want to make a plot like the one attached.
Appreciate your help. Can be in R or in python.
To draw a trend line, you can look on this tutorial
https://www.statology.org/ggplot-trendline/
Or on this stackoverflow question
Draw a trend line using ggplot
To split your dataset in two groups you simply need to do something like this (in R).
data <- data.frame(dates, water)
#This neat trick allows you to turn a logical value into a number
data$group <- 1 + (data$dates > "2019-11-30")
old <- subset(data,group == 1)
new <- subset(data,group == 2)
For the plots:
library(ggplot2)
ggplot(old,aes(x = dates, y = water)) +
geom_smooth(method = "lm", col = "blue") +
geom_point()
ggplot(new,aes(x = dates, y = water)) +
geom_smooth(method = "lm", col = "red") +
geom_point()
Here is a full-fledged example with added labels:
library(dplyr)
library(ggplot2)
dates <- seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15)
wc <- as.numeric(strsplit("0.5702722 0.5631781 0.5560839 0.5555985 0.5519783 0.5463459 0.5511598 0.5466520 0.5361545 0.5300120 0.5360571 0.5396569 0.5683526 0.6031535 0.6417821 0.6713580 0.7015542 0.7177007 0.7103561 0.7036985 0.6958607 0.6775161 0.6545367 0.6380155 0.6113306 0.5846186 0.5561815 0.5251135 0.5085149 0.4953520 0.4858190 0.4730029 0.4686458 0.4616468 0.4613918 0.4615532 0.4827496 0.5149105 0.5447824 0.5776764 0.6090217 0.6297454 0.6399422 0.6428941 0.6586344 0.6507473 0.6290631 0.6011123 0.5744375 0.5313527 0.5008027 0.4770338 0.4564025 0.4464508 0.4309046 0.4351668 0.4490393 0.4701232 0.4911582 0.5162941 0.5490387 0.5737573 0.6031149 0.6400073 0.6770058 0.7048311 0.7255012 0.7391070 0.7338938 0.7265202 0.6940718 0.6757214 0.6460862 0.6163091 0.5743775 0.5450822 0.5057753 0.4715266 0.4469859 0.4303232 0.4187793 0.4119401 0.4201316 0.4263690 0.4419331 0.4757525 0.5070846 0.5248457 0.5607567 0.5859825 0.6107531 0.6201754 0.6356589 0.6336177 0.6275579 0.6214981", " |\\n")[[1]])
data <- data.frame(date=dates, water_content=wc) %>%
mutate(group = ifelse(date <= as.Date("2019-11-30"), "g1", "g2"))
# calculate linear regression and create labels
lmo <- data %>%
group_by(group) %>%
summarise(res=list(stats::lm(water_content ~ date, data = cur_data()))) %>%
.$res
lab <- sapply(lmo, \(x)
paste("Slope=", signif(x$coef[[2]], 5),
"\nAdj R2=", signif(summary(x)$adj.r.squared, 5),
"\nP=", signif(summary(x)$coef[2,4], 5)))
ggplot(data=data, aes(x=date, y=water_content, col=group)) +
geom_point() +
stat_smooth(geom="smooth", method="lm") +
geom_text(aes(date, y, label=lab),
data=data.frame(data %>% group_by(group) %>%
summarise(date=first(date)), y=Inf, lab=lab),
vjust=1, hjust=.2)
Created on 2022-11-23 with reprex v2.0.2
Here is a way. Create a grouping variable by dates, coerce it to factor and geom_smooth will draw the two regression lines.
suppressPackageStartupMessages({
library(ggplot2)
library(ggpubr)
})
df1 <- data.frame(dates, water)
breakpoint <- as.Date("2019-11-30")
df1$group <- factor(df1$dates > breakpoint, labels = c("before", "after"))
ggplot(df1, aes(dates, water, colour = group)) +
geom_line() +
geom_point(shape = 21, fill = 'white') +
geom_smooth(formula = y ~ x, method = lm) +
geom_vline(xintercept = breakpoint, linetype = "dotdash", linewidth = 1) +
stat_cor(label.y = c(0.43, 0.38), show.legend = FALSE) +
stat_regline_equation(label.y = c(0.45, 0.4), show.legend = FALSE) +
scale_color_manual(values = c(before = 'red', after = 'blue')) +
theme_bw(base_size = 15)
Created on 2022-11-23 with reprex v2.0.2
I'm trying to produce a graph of growth rates over time based upon the following data which has blanks in two groups.
When I try to make a growth plot of this using geom_line to join points there is no line for group c.
I'm just wondering if there is anyway to fix this
One option would be to get rid of the missing values which prevent the points to be connected by the line:
Making use of the code from the answer I provided on your previous question but adding tidyr::drop_na:
Growthplot <- data.frame(
Site = letters[1:4],
July = 0,
August = c(1, -1, NA, 2),
September = c(3, 2, 3, NA)
)
library(ggplot2)
library(tidyr)
library(dplyr, warn=FALSE)
growth_df <- Growthplot %>%
pivot_longer(-Site, names_to = "Month", values_to = "Length") %>%
mutate(Month = factor(Month, levels = c("July", "August", "September"))) %>%
drop_na()
ggplot(growth_df, aes(x = Month, y = Length, colour = Site, group = Site)) +
geom_point() +
geom_line()+
labs(color = "Site", x = "Month", y = "Growth in cm") +
theme(axis.line = element_line(colour = "black", size = 0.24))
So i have a dataframe with 2 columns : "ID" and "Score"
ID contain the name of a simulation and each simulation have 58 different scores that are listed in the column Score.
There is 10 simulations.
I am doing a geom_density plot :
my_dataframe %>%
ggplot(aes(x=`Score`), xlim = c(0, 1)) +
geom_density(aes(color = ID)) +
theme_bw() +
labs(title = "Scores")
https://imgur.com/a/9DUTmWw
How can i tell ggplot that i want the curves of Simulation1 and Simulation2 to not be like the others, i want them to be in red and with an higher width than all the other one.
Thank you for your help,
Best,
Maxime
Something like this?
my_dataframe %>% mutate(group = ifelse(ID %in% c(1,2), 'special', 'NonSpecial')) %>%
ggplot(aes(x=`Score`, lty = group), xlim = c(0, 1)) +
geom_density(aes(color = ID)) +
theme_bw() +
labs(title = "Scores")
I used this data:
my_dataframe <- data.frame(ID = factor(sample(1:4, 100, T)), Score = sin(1:100))
I am quite new to R and especially to ggplot. For my next result I think I have to change from plot() to ggplot() where I need your help:
I have a dataframe with numeric values. One column is an absolute number, the other one is the belonging percentage value. I have 3 of this "two groups" indicators a, b and c.
The rownames are the 6 observations and are stored in the first column "X".
I want to plot them in a kind of grouped barplot, where the absolute+percent column is next to each other for the 3 indicators.
Sample dataframe:
df = data.frame(X = c("e 1","e 1,5","e 2","e 2,5","e 3","e 3,5","e 4"),
a_abs=c(-0.3693,-0.0735,-0.019,0.0015,0,-0.0224,-0.0135),
a_per=c(-0.4736,-0.0943,-0.0244,0.0019,0,-0.0287,-0.0173),
b_abs=c(-0.384,-0.0733,-0.0173,0.0034,0,-0.0204,-0.0179),
b_per=c(-0.546,-0.1042,-0.0246,0.0048,0,-0.029,-0.0255),
c_abs=c(-0.3876,-0.0738,-0.019,0.0015,0,-0.0225,-0.0137),
c_per=c(-0.4971,-0.0946,-0.0244,0.0019,0,-0.0289,-0.0176))
Thanks to #jonspring i got the following plot by using this code:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 2),
stat = str_sub(column, start = 4)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.001),
scales::percent(per, accuracy = 0.01)))
df3$group = gsub(df3$group,pattern = "CK",replacement = "Cohen's\nKappa")
df3$group = gsub(df3$group,pattern = "JA",replacement = "Jaccard")
df3$group = gsub(df3$group,pattern = "KA",replacement = "Krippen-\ndorff's Alpha")
crg = ifelse(df3$abs< 0,"red","darkgreen")
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group,
yend = 0),
color = crg) +
geom_point() +
geom_text(vjust = 1.5,
size = 3,
lineheight = 1.2) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X) +
labs(x= "Exponent", y = "Wert")
plot output
When i zoom and have the positive values visible, the labels are written inside the segments. How to place them above / below depending of a positive or negative value?
Zoom with coord_cartesian(ylim = c(-0.015,0.005))
zoomed plot
Thank you for your helping hands.
EDIT: I found the solution already. Like the color changement from red to green i used ifelse for the vjust parameter.
There are a lot of varieties of ways to display this sort of data with ggplot. I highly recommend you check out https://r4ds.had.co.nz/data-visualisation.html if you haven't already.
One suggestion you'll find there is that ggplot almost always works better if you first convert your data into long (aka "tidy") form. This puts each of the dimensions of the data into its own column, so that you can map the dimension to a visual aesthetic. Here's one way to do that:
library(tidyverse)
df2 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3),
value_label = if_else(stat == "per",
scales::percent(value, accuracy = 0.1),
scales::comma(value, accuracy = 0.01)))
Now, the group a/b/c is in its own column, as is the type of data abs/per, the values are all together in one column, and we also have text labels that suit the type of data.
> head(df2)
X column value group stat value_label
1 e 1 a_abs -0.3693 a abs -0.37
2 e 1,5 a_abs -0.0735 a abs -0.07
3 e 2 a_abs -0.0190 a abs -0.02
4 e 2,5 a_abs 0.0015 a abs 0.00
5 e 3 a_abs 0.0000 a abs 0.00
6 e 3,5 a_abs -0.0224 a abs -0.02
With that out of the way, it's simpler to try out different combinations of ggplot options, which can help highlight different comparisons within the data.
For instance, if you want to compare the different observations within each group, you could put each group into a facet, and each observation along the x axis:
ggplot(df2, aes(X, value, label = value_label)) +
geom_segment(aes(xend = X, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~group)
Or if you want to highlight how the different groups compared within each observation, you could swap them, like this:
ggplot(df2, aes(group, value, label = value_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~X)
You might also try combining the abs and per data, since they only vary slightly based on the different denominators applicable to each group and/or observation. To do that, it might be simpler to transform the data to keep each abs and per together:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.01),
scales::percent(per, accuracy = 0.1)))
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 1.5, size = 2, lineheight = 0.8) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X)