Shapes on line graph using stat_summary

Shapes on line graph using stat_summary - r

I'm sure the answer is very simple but at the moment it eludes me. I want to make a line graph using stat_summary(), with different shapes for each group (representing an experimental condition) at each x-axis tick (representing a separate time point).
Here's the data
set.seed(124)
ID <- rep(1:12, times = 3)
Group <- rep(c("A", "B", "C"), times = 12)
score <- rnorm(36, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 12)
df <- data.frame(ID, Group, session, score)
Now I can get there by making a table of means for each time point. Like so.
gMeans <- aggregate(score ~ session + Group, data = df, mean)
And then graphing it like so.
pMeans <- ggplot(data = gMeans, aes(x = session, y = score, group = Group, shape = Group)) +
geom_line(aes(linetype = Group), size = 1) +
geom_point(size = 5, fill = "white") +
scale_color_hue(name = "Group", l = 30) +
scale_shape_manual(name = "Group", values = c(23,22, 21)) +
scale_linetype_discrete(name = "Group") +
theme_bw()
pMeans
However I would like to be able to skip the step of having to make the table of means by using stat_summary(). I can get a similar graph with different line types, but I can't work out how to get the different shapes on each axis tick for each group. I tried the code below and many different permutations of geom_point() and geom_line(), but to no avail. How do I alter the code below to get output that looks like the output derived from the code above?
pline <- ggplot(df, aes(x=session, y=score, group = Group, shape = Group)) +
stat_summary(fun.y="mean", geom="line", size=1.1, aes(linetype=Group, shape = Group)) +
scale_shape_manual(values=c(1:3))
pline

This should help and also clean up the legend:
library(ggplot2)
set.seed(124)
ID <- rep(1:12, times = 3)
Group <- rep(c("A", "B", "C"), times = 12)
score <- rnorm(36, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 12)
df <- data.frame(ID, Group, session, score)
gg <- ggplot(df, aes(x=session, y=score, group = Group, shape = Group))
gg <- gg + stat_summary(fun.y="mean", geom="line", size=1.1,
aes(linetype = Group), show.legend=FALSE)
gg <- gg + stat_summary(fun.y="mean", geom="point", size=5,
aes(shape = Group), fill="white")
gg <- gg + scale_shape_manual(name = "Group", values = c(23, 22, 21))
gg <- gg + theme_bw()
gg <- gg + theme(legend.key=element_blank())
gg
The lines were obscured, so it makes little sense to keep them in the legend. Since you used stat_summary() for the line (vs geom_line() with an embedded stat="summary" it's best to keep the idiom for the point geom as well IMO).

Related

Make geom_histogram display x-axis labels as integers instead of numerics

I have a data.frame that has counts for several groups:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,2)))
df$n <- as.integer(df$n)
And I'm trying to display a histogram of df$n, facetted by the group using ggplot2's geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) + geom_histogram() + facet_grid(~group) + theme_minimal()
Any idea how to get ggplot2 to label the x-axis ticks with the integers the histogram is summarizing rather than the numeric values it is currently showing?

You could tweak this by the binwidth argument of geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()
Another example:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,5)))
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()

You can manually specify the breaks with scale_x_continuous(breaks = seq(1, 2)). Alternatively, you can set the breaks and labels separately as well.

Split barplot using position = "fill" into separate facets

I created a bar graph in ggplot using stat = "count" and position = "fill" to show the proportional occurrence of each feature per year (below). I find the readability of this graph rather poor and therefore I'd like to split the graph into facets. However, if I add facet_wrap(~Features), it just fills the bars in every separate facet. How can I prevent this from happening?
The code for my original graph is:
data %>% ggplot(aes(x = Year, fill = Features)) + geom_bar(stat = "count", position = "fill") + theme_classic() + theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent)
I've tried:
data %>% ggplot(aes(x = Year)) + stat_count(geom = "bar", aes(y = ..prop..)) + facet_wrap(~Features) + theme_classic() + theme(axis.text.x = element_text(angle = 90))
but this calculates the proportion within the facet rather than within each year.
Any ideas how I can solve this (using ggplot, rather than by restructuring my data)?
A little about my data:
I have a data frame of features (factor) with for each feature the year (factor) in which this feature was observed. The same feature can occur several times per year, so there are several rows with the same entry for year and feature.

This should work. First, I'll make some data that has similar properties:
labs <- c("Digital labels", "Produce ID (barcode)",
"Smart labels", "Product Recommendation",
"Shopping list", "Product Browsing",
"Product ID (computer vision)",
"Navigation (in-store)", "Product ID (RFID)",
"Other")
years <- vector(mode="list", length=13)
years[[1]] <- c(1,2)
years[[2]] <- c(1,2,8)
years[[3]] <- c(1,2,4,10)
years[[4]] <- c(1,2,3,4,5,6,8,9,10)
years[[5]] <- c(2,3,4,5,8,10)
years[[6]] <- c(1:6, 10)
years[[7]] <- c(1:6, 10)
years[[8]] <- 1:10
years[[9]] <- c(1,3,6,9,10)
years[[10]] <- c(1:5, 7,9,10)
years[[11]] <- 1:10
years[[12]] <- c(1:6, 8:10)
years[[13]] <- c(1,2,3,6,8,9,10)
y <- 2008:2020
dat <- NULL
for(i in 1:13){
tmp <- tibble(
Features = sample(years[[i]], runif(1,600,1000), replace=TRUE),
Year = y[i]
) %>%
mutate(Features = factor(Features, levels=1:10, labels=labs))
dat <- rbind(dat, tmp)
}
Next, here's the original plot like the one you made initially.
dat %>%
ggplot(aes(x = Year, fill = Features)) +
geom_bar(stat = "count", position = "fill") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)
And here's how that would translate into different facets. The key is to make the percentages by hand first and then plot them directly.
agdat %>% filter(Features != "Other") %>%
ggplot(aes(x=Year, y=pct)) +
geom_bar(stat="identity") +
facet_wrap(~Features, ncol=3) +
labs(x="Year", y="Percent") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
scale_y_continuous(labels = scales::percent)

Raster-like timeseries graph in ggplot2

I'm trying to recreate a graph like the one here using ggplot2.
I can get pretty close if I mess around with the size and shape of points using coord_equal, but...
Example data and code
library(ggplot2)
df <- data.frame()
Years <- 1990:2020
for(i in 1:length(Years)) {
Year <- Years[i]
week <-1:52
value <- sort(round(rnorm(52, 50, 30), 0))
df.small <- data.frame(Year = Year, week = week, value = value)
df <- bind_rows(df, df.small)
}
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10)) +
coord_equal()
The problem is,
with my real data I want to "stretch" the graph so I can see it more clearly (my timeseries is shorter) and when I don't use coord_equal, squares don't fill the graphing area:
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10))

Is this as simple as using the geom_raster geom?
ggplot(df, aes(week, Year)) +
geom_raster(aes(fill = value)) +
scale_fill_gradientn(colours = rainbow(10)) +
coord_equal()

In ggplot2, generate error bars across Facets in a data with multiple independent variables

I am trying to put error bar on mean values from a data frame which has three independent variables and plotted as facet_grid. However, the plot below is putting error bars in wrong facets. Could anyone please help me?
Please see below the example data and associated code:
life <- rep(c("1d", "2d", "4d"), 2, each = 2)
trt <- rep(c("c1", "c2"), 6)
species <- rep(c("SP1", "SP2"), each = 6)
mean_v <- runif(12, 12, 45)
sem_v <- runif(12, 1, 4)
data1 <- data.frame(life, trt, species, mean_v, sem_v)
plot1 <- ggplot(data1, aes(x = trt, y = mean_v, group = species, fill = species))
plot1 + geom_bar(stat = "identity", position = "dodge") +
facet_grid(~life) +
geom_errorbar(aes(ymin = data1$mean_v - data1$sem_v,
ymax = data1$mean_v + data1$sem_v,
width = 0.2),
position = position_dodge(width = 0.90),
group = data1$trt)
Thanks very much in advance.

The solution seems to be to specify position=position_dodge(width=0.9) in both geom_bar and geom_errorbar.
library(ggplot2)
plot1 <- ggplot(data1, aes(x=trt, y=mean_v, group=species, fill=species)) +
geom_bar(stat="identity", position=position_dodge(width=0.9)) +
facet_grid(. ~ life) +
geom_errorbar(aes(ymin=mean_v - sem_v, ymax=mean_v + sem_v),
width=0.2, position=position_dodge(width=0.9))
ggsave("dodged_barplot.png", plot=plot1, height=4, width=6, dpi=150)

Directlabels package-- labels do not fit in plot area

I want to explore the directlabels package with ggplot. I am trying to plot labels at the endpoint of a simple line chart; however, the labels are clipped by the plot panel. (I intend to plot about 10 financial time series in one plot and I thought directlabels would be the best solution.)
I would imagine there may be another solution using annotate or some other geoms. But I would like to solve the problem using directlabels. Please see code and image below. Thanks.
library(ggplot2)
library(directlabels)
library(tidyr)
#generate data frame with random data, for illustration and plot:
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw()
On data visualization principles, I would like to avoid extending the x-axis to make the labels fit--this would mean having data space with no data. Rather, I would like the labels to extend toward the white space beyond the chart box/panel (if that makes sense).

In my opinion, direct labels is the way to go. Indeed, I would position labels at the beginning and at the end of the lines, creating space for the labels using expand(). Also note that with the labels, there is no need for the legend.
This is similar to answers here and here.
library(ggplot2)
library(directlabels)
library(grid)
library(tidyr)
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0.15, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x - .3), "first.bumpup")) +
theme_bw()
If you prefer to push the labels into the plot margin, direct labels will do that. But because the labels are positioned outside the plot panel, clipping needs to be turned off.
p1 <- ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
theme_bw() +
theme(plot.margin = unit(c(1,4,1,1), "lines"))
# Code to turn off clipping
gt1 <- ggplotGrob(p1)
gt1$layout$clip[gt1$layout$name == "panel"] <- "off"
grid.draw(gt1)
This effect can also be achieved using geom_text (and probably also annotate), that is, without the need for direct labels.
p2 = ggplot(tidy_data, aes(x = month, y = value, group = asset, colour = asset)) +
geom_line() +
geom_text(data = subset(tidy_data, month == 100),
aes(label = asset, colour = asset, x = Inf, y = value), hjust = -.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_colour_discrete(guide = 'none') +
theme_bw() +
theme(plot.margin = unit(c(1,3,1,1), "lines"))
# Code to turn off clipping
gt2 <- ggplotGrob(p2)
gt2$layout$clip[gt2$layout$name == "panel"] <- "off"
grid.draw(gt2)

Since you didn't provide a reproducible example, it's hard to say what the best solution is. However, I would suggest trying to manually adjust the x-scale. Use a "buffer" increase the plot area.
#generate data frame with random data, for illustration and plot:
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw() +
xlim(minimum_value, maximum_value + buffer)
Using scale_x_discrete() or scale_x_continuous() would likely also work well here if you want to use the direct labels package. Alternatively, annotate or a simple geom_text would also work well.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Shapes on line graph using stat_summary - r

Related

Make geom_histogram display x-axis labels as integers instead of numerics

Split barplot using position = "fill" into separate facets

Raster-like timeseries graph in ggplot2

In ggplot2, generate error bars across Facets in a data with multiple independent variables

Directlabels package-- labels do not fit in plot area

Categories

Resources