library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line() +
theme_classic()
In the image below the red 'violation' line is broken up into segments. I assume this is because ggplot is plotting lines sequentially, and one of the grey lines, is plotted after the red line, with the same coordinates. How do I stop the grey lines from overlapping the red?
As referenced in other stackoverflow questions, I would add a seperate line like this:
geom_line(df %>% filter(violation == TRUE), aes(color = "red")) +
but this causes problems when there are no violations in my data frames. I do monthly analyses and some months contain violations, some months do not. If I add this single line above I get an error "must be length greater than 0" for the months absent violations, so this one-liner approach probably won't work.
You might use the following code (with only 2 minor changes as compared to your code)
library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(alpha = .5, size= 1.2) + ### changes in transparancy and thickness ###
theme_classic()
Yielding this plot:
"H 1"'s suggestion is an alternative approach which changes the sequence of line drawings:
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(aes(group = rev(key))) + ### changes sequence of plotting of the lines ###
theme_classic()
This produces the following plot:
Related
So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col
The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)
So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing
I am trying to fix the limits of a bar chart so the horizontal bar doesn't go over the plot area. I could set the limit manually using limits=c(0,3000000)but I guess there is a way to make it automatically scalable. The code
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count)) %>% ggplot(.,aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") + coord_flip() + geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4) +
scale_y_continuous(expand = c(0,0))
I thought something like:
scale_y_continuous(expand = c(0,0),limits=c(0,max(count))
Appreciate any suggestions on the fix.
I think it would be easier to read an run the code by splitting it into several parts.
We can use layer_data to get the information from a ggplot object, and the calculate the maximum from that. Based on your example, I would also suggest you multiply the maximum by 1.7 to include the text.
library(tidyverse)
library(data.table)
corona.conf <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv",header = TRUE,check.names=FALSE)
dat <- corona.conf %>% .[,c(-1,-3,-4)] %>% melt(.,variable.name="day") %>%
group_by(`Country/Region`,day) %>% summarize(value=sum(value)) %>%
mutate(day=as.Date(day,format='%m/%d/%y')) %>% mutate(count=value-lag(value)) %>%
replace(is.na(.),0) %>% group_by(`Country/Region`) %>% summarize(count=sum(count)) %>%
top_n(20) %>% arrange(desc(count))
p <- ggplot(dat, aes(x=reorder(`Country/Region`,count),y=count,fill=count)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label=format(count,big.mark = ",")),hjust=-0.1,size=4)
p +
scale_y_continuous(expand = c(0,1), limits = c(0, max(layer_data(p)$y) * 1.7))
I would like to see the y-axis (in the plot is flipped) starting at some arbitrary value, like 7.5
After a little bit of researching, I came across ylim, but in this case is giving me some
errors:
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.
Warning message:
Removed 10 rows containing missing values (geom_col).
This is my code, and a way to download the data I'm using:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
#"best" writers with at least 10 episodes
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
ylim(7.5,10)
You should use coord_cartesian for zoom in a particular location (here the official documentation: https://ggplot2.tidyverse.org/reference/coord_cartesian.html).
With your example, your code should be something like that:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(desc(mean_rating)) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
coord_flip() + theme(legend.position = "None") + scale_y_continuous(breaks = seq(7.5,10,0.5)) +
coord_cartesian(ylim = c(7.5,10))
If this is not working please provide a reproducible example of your dataset (see: How to make a great R reproducible example)
I found out the solution. With my actual plot, the answer submitted by #dc37 didn't work because coord_flip() and coord_cartesian() exclude each other. So the way to do this is:
friends_raw %>%
unnest(writers) %>%
group_by(writers) %>%
summarize(mean_rating = mean(rating),
n = n()) %>%
arrange(mean_rating) %>%
filter(n > 10) %>%
head(10) %>%
mutate(writers = fct_reorder(writers, mean_rating)) %>%
ggplot(aes(x = writers, y = mean_rating, fill = writers)) + geom_col() +
theme(legend.position = "None") +
coord_flip(ylim = c(8,8.8))
I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)
I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)