I am attempting to make boxplots of some complex data. I have sorted the classes by one particular field (not the class field) and would now like to be able to label each box with the value of that sort-by field. I know from the way the data is structured that the value of this sort-by attribute will be the same for every observation within the class, and I would like to essentially annotate the chart with this additional piece of information.
I thought of trying to accomplish this by adding a point layer to the plot and then labeling those points. I attempted to do this using code like this example I mocked up using the mtcars data set for reproducability. For the sake of this example pretend that the variable gears would be the same for each distinct value of cyl. The "gear/1000000" part is just to get the labels all near the axis.
mtcars %>% group_by(cyl) %>%
ggplot(aes(x = reorder(cyl, gear), y = mpg)) +
geom_point(show.legend = FALSE, aes(x = reorder(cyl, gear), y = gear/1000000)) +
geom_text(aes(label = gear)) +
geom_boxplot(aes(colour=carb),varwidth = TRUE)
I feel like this is close, but this code is putting the labels on the boxplots instead of on the points, which is the opposite of what I'm looking for. How can I ask ggplot to label only the points from geom_point()? Or is there an easier way to accomplish my objective?
EDIT:
Here is what my plot now looks like, thanks to the answer provided below.
Boxplots of IRI distribution for various pavement segments
Set a separate x and y aes for geom_text. In your code, you are plotting a label for each x,y in aes(x = reorder(cyl, gear), y = mpg) as that is the aes set in the parent ggplot. Instead, set a fixed y (offset by a given amount from your geom_point y value), and x (corresponding to the x value from your geom_point) inside geom_text:
For example (note: there is more than one gear value per cylinder as you stated)
mtcars %>% group_by(cyl) %>%
ggplot(aes(x = reorder(cyl, gear), y = mpg)) +
geom_point(show.legend = FALSE, aes(x = reorder(cyl, gear), y = gear/1000000)) +
geom_boxplot(aes(colour=carb),varwidth = TRUE) +
geom_text(aes(label = gear, x = reorder(cyl, gear), y = gear/1000000 - 2))
Related
I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.
The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")
I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)
I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.
A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.
I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)
I have a data frame containing the test_outcome (PASS/FAIL) for each test_type performed on each test_subject. For example:
test_subject, test_type, test_outcome
person_a, height, PASS
person_b, height, PASS
person_c, height, FAIL
person_d, height, PASS
person_a, weight, FAIL
person_b, weight, FAIL
person_c, weight, PASS
person_d, weight, PASS
I would like to prepare a yield plot by test_type and test_subject.
Y-axis = yield i.e. num pass/(num pass + num fail)
X-axis = test_subject
fill: = A line for each test_type.
I would prefer to use ggplot2, can you please recommend the best approach here? e.g. how to reshape the data before plotting?
A quick dplyr answer, you will want to tidy up the graph based on your desired colours etc.
library(dplyr)
library(ggplot2)
dat <- dat %>% group_by(test_subject, test_type) %>%
summarise(passrate = sum(test_outcome=="PASS") / n())
ggplot(dat, aes(x = test_subject, y = passrate, fill = test_type)) +
geom_bar(stat = "identity", position = "dodge")
Edit: a line graph was requested. Normally, categorical groups shouldn't be connected by a line graph - as there is no reason to order them in a particular way.
ggplot(dat, aes(x = test_subject, y = passrate, col = test_type)) +
geom_line(aes(group = test_type)) +
geom_point()