I am learning R with the Australian athletes data set.
By using ggplot, I can plot a histogram like this.
library(DAAG)
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5)
By using summary(ais$wt), the 3rd Quartile is 84.12. Now I want to split the data by the wt 84.12. and plot 2 similar histograms accordingly (side by side)
The split is:
ais1 = ais$wt[which(ais$wt>=0 & ais$wt<=84.12)]
ais2 = ais$wt[which(ais$wt>84.12)]
But I don’t know how to fit them in the plotting. I tried but it doesn't work:
ggplot(ais1, aes(wt, fill = sex)) +...
How can I plot the histograms (2 similar histograms accordingly, side by side)?
Add the split as a column to your data
ais$wt_3q = ifelse(ais$wt < 84.12, "Quartiles 1-3", "Quartile 4")
Then use facets:
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
facet_wrap(~ wt_3q)
The created variable is a factor, if you specify the order of the levels you can order the facets differently (lots of questions on here showing that if you search for them - same as reordering bars for a ggplot barplot). You can also let the scales vary - look at ?facet_wrap for more details.
Generally, you shouldn't create more data frames. Creating ais1 and ais2 is usually avoidable, and your life will be simpler if you use a single data frame for a single data set. Adding a new column for grouping makes it easy to keep things organized.
We can do this with ggarrange to arrange the plot objects for each subset
library(DAAG)
library(ggplot2)
library(ggpubr)
p2 <- ais %>%
filter(wt>=0, wt<=84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
p1 <- ais %>%
filter(wt>84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
ggarrange(p1, p2, ncol =2, nrow = 1, labels = c("p1", "p2"))
-output
Related
Say, I make a gpplot2 plot like the following with several facets:
ggplot(iris) +
geom_tile(aes(x = Petal.Width, fill = Sepal.Width, y = Petal.Length)) +
facet_wrap(~Species)
Note that there is one colourbar for all three plots, but each facet could potentially have a very different values. Is it possible to have a separate colourbar for each facet?
I agree with Alex's answer, but against my better scientific and design judgment, I took a stab at it.
require(gridExtra)
require(dplyr)
iris %>% group_by(Species) %>%
do(gg = {ggplot(., aes(Petal.Width, Petal.Length, fill = Sepal.Width)) +
geom_tile() + facet_grid(~Species) +
guides(fill = guide_colourbar(title.position = "top")) +
theme(legend.position = "top")}) %>%
.$gg %>% arrangeGrob(grobs = ., nrow = 1) %>% grid.arrange()
Of course, then you're duplicating lots of labels, which is annoying. Additionally, you lose the x and y scale information by plotting each species as a separate plot, instead of facets of a single plot. You could fix the axes by adding ... + coord_cartesian(xlim = range(iris$Petal.Width), ylim = range(iris$Petal.Length)) + ... within that ggplot call.
To be honest, the only way this makes sense at all is if it's comparing two different variables for the fill, which is why you don't care about comparing their true value between plots. A good alternative would be rescaling them to percentiles within a facet using dplyr::group_by() and dplyr::percent_rank.
Edited to update:
In the two-different-variables case, you have to first "melt" the data, which I assume you've already done. Here I'm repeating it with the iris data. Then you can look at the relative values by examining the percentiles, rather than the absolute values of the two variables.
iris %>%
tidyr::gather(key = Sepal.measurement,
value = value,
Sepal.Length, Sepal.Width) %>%
group_by(Sepal.measurement) %>%
mutate(percentilevalue = percent_rank(value)) %>%
ggplot(aes(Petal.Length, Petal.Width)) +
geom_tile(aes(fill = percentilevalue)) +
facet_grid(Sepal.measurement ~ Species) +
scale_fill_continuous(limits = c(0,1), labels = scales::percent)
Separate palettes for facets in ggplot facet_grid
It has been asked before. This is the best solution I have seen so far, however I think having a common palette is more ideal from a visualization standpoint.
If this is what you want then there is a simple hack to it.
tf1 <- iris
tf1$COL <- rep(1:50, each=3)
ggplot(tf1) +
geom_tile(aes(x = Petal.Width, fill = interaction(Petal.Length,COL), y = Petal.Length)) +
facet_wrap(~Species, scales = "free") + theme(legend.position="none")
I am struggling with changing the position of a few points' labels in the geom_point in ggplot. So far, my code:
p <- ggplot(all_short_filled, aes(x=Modernization, y=Passionate_love)) +
geom_point(size=2)+geom_abline(intercept = 0.965830, slope = -0.001127)+ theme_bw()
p1 <- p + geom_text(label=all_short_filled$Country, position = position_dodge(width = 1),
vjust = -0.5)
p1
It gives me something like this:
And I want to change the position of a few overlapping labels (such as Russia and Serbia, or the Netherlands and Belgium, so that, e.g., the label of Serbia would go below the dot, not the above). Please, send help :-)
You could create two label columns in your dataset: one for countries that should be plotted above their point and the other for below. Since I do not have a sample of your data I used the mtcars dataset to create a reproducible example:
This will require you to know which countries and is hardcoded.
library(datasets) # used to create fake data
library(tidyverse)
# create fake dataset for example
df <- tail(datasets::mtcars) %>%
tibble::rownames_to_column("car")
below <- c("Ferrari Dino", "Maserati Bora")
# create two columns for geom_text labels
data <- df %>%
dplyr::mutate(label_above = ifelse(car %in% below, "", car),
label_below = ifelse(car %in% below, car, ""))
# ignore scale_x.. and scale_y.. those were to fit points/labels neatly
ggplot2::ggplot(data, aes(x = hp, y = mpg)) +
geom_point() +
geom_text(aes(label = label_above), vjust = -0.5) + # labels above their points
geom_text(aes(label = label_below), vjust = 1) + # labels below their points
scale_x_continuous(expand = ggplot2::expansion(0.3)) +
scale_y_continuous(expand = ggplot2::expansion(0.15))
That being said, as mentioned in the comments ggrepel is usually very good at handling this sort of thing.
I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)
Say, I make a gpplot2 plot like the following with several facets:
ggplot(iris) +
geom_tile(aes(x = Petal.Width, fill = Sepal.Width, y = Petal.Length)) +
facet_wrap(~Species)
Note that there is one colourbar for all three plots, but each facet could potentially have a very different values. Is it possible to have a separate colourbar for each facet?
I agree with Alex's answer, but against my better scientific and design judgment, I took a stab at it.
require(gridExtra)
require(dplyr)
iris %>% group_by(Species) %>%
do(gg = {ggplot(., aes(Petal.Width, Petal.Length, fill = Sepal.Width)) +
geom_tile() + facet_grid(~Species) +
guides(fill = guide_colourbar(title.position = "top")) +
theme(legend.position = "top")}) %>%
.$gg %>% arrangeGrob(grobs = ., nrow = 1) %>% grid.arrange()
Of course, then you're duplicating lots of labels, which is annoying. Additionally, you lose the x and y scale information by plotting each species as a separate plot, instead of facets of a single plot. You could fix the axes by adding ... + coord_cartesian(xlim = range(iris$Petal.Width), ylim = range(iris$Petal.Length)) + ... within that ggplot call.
To be honest, the only way this makes sense at all is if it's comparing two different variables for the fill, which is why you don't care about comparing their true value between plots. A good alternative would be rescaling them to percentiles within a facet using dplyr::group_by() and dplyr::percent_rank.
Edited to update:
In the two-different-variables case, you have to first "melt" the data, which I assume you've already done. Here I'm repeating it with the iris data. Then you can look at the relative values by examining the percentiles, rather than the absolute values of the two variables.
iris %>%
tidyr::gather(key = Sepal.measurement,
value = value,
Sepal.Length, Sepal.Width) %>%
group_by(Sepal.measurement) %>%
mutate(percentilevalue = percent_rank(value)) %>%
ggplot(aes(Petal.Length, Petal.Width)) +
geom_tile(aes(fill = percentilevalue)) +
facet_grid(Sepal.measurement ~ Species) +
scale_fill_continuous(limits = c(0,1), labels = scales::percent)
Separate palettes for facets in ggplot facet_grid
It has been asked before. This is the best solution I have seen so far, however I think having a common palette is more ideal from a visualization standpoint.
If this is what you want then there is a simple hack to it.
tf1 <- iris
tf1$COL <- rep(1:50, each=3)
ggplot(tf1) +
geom_tile(aes(x = Petal.Width, fill = interaction(Petal.Length,COL), y = Petal.Length)) +
facet_wrap(~Species, scales = "free") + theme(legend.position="none")
I have two data frames, one larger (10 people) and one smaller (two people). I have generated a gantt chart for each data frame. How do I get it so the distance between lines is the same for each plot (i.e. not scaled based on number of entries).
# Generate vectors:
name <- paste("person", seq(10), sep = '_')
start <- sample(seq(5), size = 10, replace = T)
end <- sample(seq(6,10), size = 10, replace = T)
# Generate data frames:
big_chart <- data.frame(name = c(name,name), value = c(start,end))
small_chart <- big_chart[c(1:2,11:12),]
# big plot
library(ggplot)
ggplot(big_chart, aes(value, name)) +
geom_line()
# small plot
ggplot(small_chart, aes(value, name)) +
geom_line()
Below is my solution for you, hopefully it is what you were looking for. I made use of the coord_fixed function to control the overall scaling. In addition, I also fixed your x-axis range using the xlim function.
library(ggplot2)
ggplot(big_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)
ggplot(small_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)