Method of ordering groups in ggplot line plot - r

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.

The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

Related

Multi-row labels in ggplot2

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?
A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

Display `dplyr` groups ordered by their corresponding value (not name) in `geom_point`

Using the mpg dataset I want to produce a scatterplot that shows for every manufacturer one point with the grouped (by manufacturer) mean of displ.
The following works so far:
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = manufacturer, y = mean_displ)) +
guides(x = guide_axis(angle = 90))
Now I want to show the points in ascending order according to their displ value. Or: I want to sort the manufacturer variable on the x-axis according to the corresponding mean_displ value.
I tried to insert a arrange(mean_displ) statement in my dplyr chain. No success.
So I introduced a dummy variable x that produces the plot I want, but now the labeling is gone..
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ)) %>%
arrange(mean_displ) %>%
mutate(x = 1:15)) +
geom_point(aes(x = x, y = mean_displ))
How can I get the later plot but with the labeling from above?
fct_reorder from the forcats package can order the levels of a factor.
library(tidyverse)
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = fct_reorder(manufacturer, mean_displ), y = mean_displ)) +
guides(x = guide_axis(angle = 90))

Convert a geom_tile in dotplot in ggplot2

I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)

using facets on every column with color grouping

I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')

Boxplots grouped by group value

I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.
A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.

Resources