ggplot2 how to plot rows to multiple x-axis datapoints - r

I am trying to create this type of chart from the data on the left (arbitrary values for simplicity):
The goal is to plot variable X on the x-axis with the mean on the Y-axis and error bars equal to the standard error se.
The problem is that values 1-10 should be each be represented individually (blue curve), and that the values for A and B should be plotted on each of the 1-10 values (green and red line).
I can draw the curve if I manually save the data and manually copy the values for A and B to each value for X but this is not very time efficient. Is there a more elegant way to do this?
Thanks in advance!
EDIT: As suggested the code:
df <- structure(list(X = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L, 11L, 12L), .Label = c("1", "10", "2", "3", "4", "5",
"6", "7", "8", "9", "A", "B"), class = "factor"), mean = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 5.5, 6.5), sd = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), se = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("X", "mean", "sd", "se"), class = "data.frame", row.names = c(NA,-12L))
df<-as.data.frame(df)
df$X<-factor(df$X)
plot <- ggplot(df, aes(x=df$X, y=df$mean)) + geom_point() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1)
plot

Im afraid I don't know ggplot, but hopefully this is what you want (it might also aid others in understanding your question).
You want a ggplot with three lines,
1. df$X,df$mean
2. df$X,df$row_A_mean
3. df$X,df$row_B_mean
4. error bars of the SE column
df <- structure(list(X = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L, 11L, 12L), .Label = c("1", "10", "2", "3", "4", "5",
"6", "7", "8", "9", "A", "B"), class = "factor"), mean = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 5.5, 6.5), sd = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), se = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("X", "mean", "sd", "se"), class = "data.frame", row.names = c(NA,-12L))
df<-as.data.frame(df)
df$X<-factor(df$X)
plot <- ggplot(df, aes(x=df$X, y=df$mean)) + geom_point() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1)
plot
#row A mean
df$row_A_mean<-rep(df[11,]$mean,nrow(df))# note that this could also be replaces by a horizontal line, unless the mean changes
#row A sd
df$row_A_sd<-rep(df[11,]$sd,nrow(df))
plot(as.numeric(df$X),df$mean,type="p",col="red")
lines(as.numeric(df$X),df$row_A_mean,col="green")

If we use a subset to define the data elements of the ggplot, we can come up with one solution using geom_hline:
theme_set(theme_bw())
ggplot(data = df[1:10,])+
geom_errorbar(aes(x = X, ymin = mean - se, ymax = mean + se))+
geom_point(aes(x = X, y = mean))+
geom_line(aes(x = X, y = mean), group = 1)+
geom_hline(data = df[11,], aes(yintercept = mean, colour = 'A'))+
geom_hline(data = df[12,], aes(yintercept = mean, colour = 'B'))

It's helpful to reorient your data into long form so that you can really utilize the aesthetic part of ggplot. Generally I would use reshape2::melt for this, but your data the way it's currently formatted doesn't really lend itself to it. I'll show you what I mean by long form and you can get the idea what we're shooting for:
#setting variables for your classes so it's a bit more scalable - reset as applicable
x.seriesLength <- 10
x.class.name <- "X" #name of the main series class; X in your example
a.vec <- c(5.5, 1, 1, "A")
b.vec <- c(6.5, 1, 1, "B")
#trimming df so we can reshape
df <- df[1:x.seriesLength, 2:4]
df$class <- x.class.name #adding class column
#converting your static A and B values to long form, sending to a data.frame and adding to df
add <- matrix(c(rep(a.vec, times = x.seriesLength),
rep(b.vec, times = x.seriesLength)),
byrow = T,
ncol = 4)
colnames(add) <- c("mean", "sd", "se", "class")
df <- rbind(df, add)
print(df)
Then we need to do a bit more cleaning:
df$rownum <- rep(1:x.seriesLength, times = 3)
df[,1:3] <- sapply(df[,1:3], as.numeric) #casting as numeric
df$barmin <- df$mean - df$sd
df$barmax <- df$mean + df$sd
Now we have a long form data frame with the required data. We can then use the new class column to plot and color multiple series.
#use class column to tell ggplot which points belong to which series
g <- ggplot(data = df) +
geom_point(aes(x = rownum, y = mean, color = class)) +
geom_errorbar(aes(x = rownum, ymin=barmin, ymax=barmax, color = class), width=.1)
g
Edit: If you want lines instead of points, just replace geom_point with geom_line.

Related

How to order a plot from lowest to highest?

I looked at other threads where this question is answered,but i couldn't adapt the code. I plot the following graph below. I try then to order from lowest to highest according to the blue color (education==3) when time is at 0. I use the following code to create the order.
country_order <- df %>%
filter(education == 3 & time==0) %>%
arrange(unemployment) %>%
ungroup() %>%
mutate(order = row_number())
However, i am not sure how to introduce the new variable order into ggplot to get the ordering i want. Could someone help?
Here is the plot
ggplot(df, aes(y=unemployment, x=time, fill= education)) +
geom_col(, color = "black") +
facet_wrap(~ country)
Here is the data:
df= structure(list(time = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
unemployment = structure(c(25, 35, 40, 10, 20, 70, 20, 25,
55, 23, 17, 60), format.stata = "%9.0g"), education = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), country = structure(c(1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2), format.stata = "%9.0g")), row.names = c(NA,
-12L), class = "data.frame")
I think you can use fct_reorder() to reorder factor levels of the desired variable by sorting along another variable.
df %>%
ggplot(aes(y=unemployment, x=time, fill= fct_reorder(education, unemployment, .desc = T))) +
geom_col(, color = "black") +
facet_wrap(~ country)

How to organize all shapes for each color in rows of ggplot legend?

For a plot like this:
df <- structure(list(x = c(-0.951618567265016, -0.0450277248089203,
-0.784904469457076, -1.66794193658814, -0.380226520287762, 0.918996609060766,
-0.575346962608392, 0.607964322225033, -1.61788270828916, -0.0555619655245394
), y = c(0.519407203943462, 0.301153362166714, 0.105676194148943,
-0.640706008305376, -0.849704346033582, -1.02412879060491, 0.117646597100126,
-0.947474614184802, -0.490557443700668, -0.256092192198247),
color = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L
), .Label = c("1", "2", "3", "4"), class = "factor"), shape = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), .Label = c("1", "2",
"3"), class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
g <- ggplot() +
geom_point(data = df, aes(x = x, y = y, colour = color, shape = shape)) +
theme(legend.position = "right")
Is it possible to somehow obtain the legend in the following format?
Maybe this is what you are looking for.
The starting point is to have only one legend. To this end I add a new variable shape_color as the interaction of your factos color and shape.
Map shape_color on both color and shape.
To get the colors and shapes right we make use of scale_xxx_manual. To this end I set up two vectors with colors and shapes.
Organize the legend in rows using guide_legend with arguments nrow = 4 and byrow = TRUE
The tricky part is the labelling.
a. To this end I use a helper function which replaces the unwanted labels to empty strings, i.e. only every third label is shown, and makes sure that only the color category shows up in the label
b. Finally, to have the label for the fourth row also on the right we have to make sure that the empty categories are "included" in the legend. To this end I use arguemnt drop=FALSE in both scales so that unused levels are included in the legend. However, I set the color and the shape for these categories to NA so that they are invisible.
library(ggplot2)
df <- structure(list(x = c(-0.951618567265016, -0.0450277248089203,
-0.784904469457076, -1.66794193658814, -0.380226520287762, 0.918996609060766,
-0.575346962608392, 0.607964322225033, -1.61788270828916, -0.0555619655245394
), y = c(0.519407203943462, 0.301153362166714, 0.105676194148943,
-0.640706008305376, -0.849704346033582, -1.02412879060491, 0.117646597100126,
-0.947474614184802, -0.490557443700668, -0.256092192198247),
color = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L
), .Label = c("1", "2", "3", "4"), class = "factor"), shape = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), .Label = c("1", "2",
"3"), class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
df$shape_color = interaction(df$shape, df$color)
colors <- rep(scales::hue_pal()(4), each = 3)
shapes <- rep(scales::shape_pal()(3), 4)
colors <- setNames(colors, levels(df$shape_color))
shapes <- setNames(shapes, levels(df$shape_color))
colors[!levels(df$shape_color) %in% df$shape_color] <- NA
shapes[!levels(df$shape_color) %in% df$shape_color] <- NA
mylabels <- function(breaks) {
breaks[!grepl("^3", breaks)] <- ""
gsub("^\\d+\\.", "", breaks)
}
ggplot() +
geom_point(data = df, aes(x = x, y = y, colour = shape_color, shape = shape_color)) +
scale_color_manual(values = colors, labels = mylabels, drop = FALSE) +
scale_shape_manual(values = shapes, labels = mylabels, drop = FALSE) +
guides(color = guide_legend(nrow = 4, byrow = TRUE, label.position = "right")) +
theme(legend.position = "right", legend.key = element_rect(fill = NA))
Thanks for this answer, it works really well although Im having trouble generalizing to my real data which is like 12 shapes and 34 colors lol. probably need to play around with this idea a bit more

How do I visualize a three way table as a heat map in R

I am a newbie to R and have been struggling like crazy to visualize a 3 way table as a heat map using geom_tile in R. I can easily do this in Excel, but cannot find any examples of how to do this in R. I have looked at using Mosaics but this is not what I want and I have found hundreds of examples of two way tables, but seems there are no examples of three way tables.
I want the output to look like this:
my data set looks like this: (its a small snapshot of 30,000 records):
xxx <- structure(list(rfm_score = c(111, 112, 113, 114, 115, 121), n = c(2624L,
160L, 270L, 23L, 5L, 650L), rec = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
freq = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), mon = structure(c(1L,
2L, 3L, 4L, 5L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
It is essentially an RFM analysis of customer shopping behavior (Recency, Frequency and Monetary). The output heat map (that I want) should be the count of customers in each RFM segments. In the heat map I supplied, you will see there are two variables on the left (e.g. R = Recency(quintile ranges 1 to 5) and F = Frequency (quintile ranges 1 to 5)and at the top of the heat map is the M = monetary variable (quintile ranges 1 to 5). So, for instance, the segment RFM = 555 has a count of 2511 customers.
I have tried the following code and variations of it, but just get errors
library(ggplot2)
library(RColorBrewer)
library(dplyr)
cols <- rev(brewer.pal(11, 'RdYlBu'))
ols <- brewer.pal(9, 'RdYlGn')
ggplot(xxx)+ geom_tile(aes(x= mon, y = reorder(freq, desc(freq)), fill = n)) +
theme_change +
facet_grid(rec~.) +
# geom_text(aes(label=n)) +
# scale_fill_gradient2(midpoint = (max(xxx$n)/2), low = "red", mid = "yellow", high = "darkgreen") +
# scale_fill_gradient(low = "red", high = "blue") + scale_fill_gradientn(colours = cols) +
# scale_fill_brewer() +
labs(x = "monetary", y= "frequency") +
scale_x_discrete(expand = c(0,0)) + scale_y_discrete(expand = c(0,0)) +
coord_fixed(ratio= 0.5)
I have no idea how to to create this heat map in R. Can anyone please help me..
Kind regards
Heinrich
You can use DT and formattable package to make table with conditional colour formatting:
library(DT)
library(formattable)
xxx <- data.frame(rfm_score = c(111, 112, 113, 114, 115, 121),
n = c(2624L, 160L, 270L, 23L, 5L, 650L),
rec = c(1L, 1L, 1L, 1L, 1L, 1L),
freq = c(1L, 1L, 1L, 1L, 1L, 2L),
mon = c(1L, 2L, 3L, 4L, 5L, 1L))
xxx_dt <- formattable(
xxx,
list(
rfm_score = color_tile("pink", "light blue"),
n = color_tile("pink", "light blue"),
rec = color_tile("pink", "light blue"),
freq = color_tile("pink", "light blue"),
mon = color_tile("pink", "light blue")))
as.datatable(xxx_dt)
Output:

Adding geom_line mean to reordered geom_point plot in ggplot

Trying to produce a point plot that reorders my values and also has a mean line above the values.
I can produce the plot with the mean line, or the reordered values but not both at the same time because I get the error
"geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?".
I believe I am getting the error as some of my data only has one observation but I don't understand why this only becomes an issue with the reorder data.
In the end all I want is to be able to show the means of the two different values groups for each x value.
Here is my sample code
library(ggplot2)
typ <- c("T", "N", "T", "T", "N")
samplenum <- c(7,7,6,8,8)
values <- c(1,2,1,3,2)
df = data.frame(typ, samplenum, values)
d <- ggplot(df, aes(x= reorder(samplenum, values), y= values))
d <- d + geom_point(position=position_jitter(width=0.15, height=0.05))
d <- d + aes(colour = factor(df$typ))
d <- d + stat_summary(fun.y = mean, geom="line")
d
Thank you for the help in advance.
This is what I am going for
Here is some steps before the completion sample pictures of what I have produced from my larger data set.
With Line but Not Reordered
Reordered but No Mean Line
As the error message suggests, you need to adjust the group aesthetic. When you use reorder you will end up with a discrete scale but you want to draw lines that connect across groups, that's why the error.
You can try this
ggplot(df, aes(x = reorder(samplenum, values), y = values, colour = factor(typ))) +
geom_jitter(width = 0.15, height = 0.05) +
stat_summary(fun.y = mean, geom = "line", aes(group = factor(typ)))
(I altered your data slighly so it contains more observations.)
data
df <- structure(list(typ = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L), .Label = c("N", "T"), class = "factor"),
samplenum = c(7, 7, 6, 8, 8, 7, 7, 6, 8, 8, 7, 7, 6, 8, 8
), values = c(1L, 3L, 2L, 1L, 3L, 3L, 1L, 3L, 2L, 2L, 2L,
1L, 3L, 1L, 2L)), .Names = c("typ", "samplenum", "values"
), row.names = c(NA, -15L), class = "data.frame")
The resulting plot with your input data

Reordering factor for plotting using forcats and ggplot2 packages from tidyverse

First of all, thanks^13 to tidyverse. I want the bars in the chart below to follow the same factor levels reordered by forcats::fct_reorder (). Surprisingly, I see different order of levels in the data set when View ()ed as when they are displayed in the chart (see below). The chart should illustrate the number of failed students before and after the bonus marks (I want to sort the bars based on the number of failed students before the bonus).
MWE
ggplot (df) +
geom_bar (aes (forcats::fct_reorder (subject, FailNo, .desc= TRUE), FailNo, fill = forcats::fct_rev (Bonus)), position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Data output of dput (df)
structure(list(subject = structure(c(1L, 2L, 5L, 6L, 3L, 7L,
4L, 9L, 10L, 8L, 12L, 11L, 1L, 2L, 5L, 6L, 3L, 7L, 4L, 9L, 10L,
8L, 12L, 11L), .Label = c("CAB_1", "DEM_1", "SSR_2", "RRG_1",
"TTP_1", "TTP_2", "IMM_1", "RRG_2", "DEM_2", "VRR_2", "PRS_2",
"COM_2", "MEB_2", "PHH_1", "PHH_2"), class = "factor"), Bonus = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("After", "Before"), class = "factor"),
FailNo = c(29, 28, 20, 18, 15, 13, 12, 8, 5, 4, 4, 2, 21,
16, 16, 14, 7, 10, 10, 5, 3, 4, 4, 1)), .Names = c("subject",
"Bonus", "FailNo"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L))
Bar chart
The issue
According to the table above, SSR_2 var should come in the fifth rank and IMM_1 in the sixth, however in the chart we see these two variables swapping their positions. How to sort it right after tidyverse in this case?
Use factor with unique levels for your x -axis.
ggplot (df) +
geom_bar (aes(factor(forcats::fct_reorder
(subject, FailNo, .desc= TRUE),
levels=unique(subject)),
FailNo,
fill = forcats::fct_rev (Bonus)),
position = 'dodge', stat = 'identity') +
theme(axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Edited: #dotorate comment
Sort failNo before the bonus
library(dplyr)
df_before_bonus <- df %>% filter(Bonus == "Before") %>% arrange(desc(FailNo))
Use FailNo before the bonus to create the factor
df$subject <- factor(df$subject, levels = df_before_bonus$subject, ordered = TRUE)
Updated plot
ggplot(df) +
geom_bar(aes (x = subject, y = FailNo, fill = as.factor(Bonus)),
position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))

Resources