using barplot in ggplot without collapsing by groups - r

I would like to use ggplot to create a barchart, but not aggregate the observations by (categorical) x. For example, here is what I want using the R base plot system:
library(ggplot2)
data <- data.frame(lab = c("a", "b", "b", "c", "a"),
val = c(2, 5, 6, 3, 1))
barplot(data$val, names.arg = data$lab)
and here is what I want:
However, if I use ggplot, this is what I get:
ggplot(data, aes(lab, val)) + geom_bar(stat = "identity")
What is the right way of using ggplot to get the plot I want? Thanks!

You can create a new variable along the lab value as the x and then relabel them.
ggplot(data, aes(as.character(seq_along(lab)), val)) + geom_bar(stat = "identity") +
scale_x_discrete("lab", labels = c("1" = "a", "2" = "b", "3" = "b", "4" = "c", "5" = "a"))

Related

Assign colours in geom_bar in different plots

I have a dataset with different species and year of their observation (in 3 categories). I would now like to make one plot per species with their observations per year and assign each year-category (each bar) a different color, but it should be the same colour in every plot.
I tired to do lapply and then do a ggplot with geom_bar (see code below). I know I can assign the colours with geom_bar(fill = c("#e31a1c", "#ff7f00", "#33a02c") but the problem is, that there are species, that were only observed in some of the years, so I get an error that the aesthetics are not the same length as the data. So is there another way to assign the colours to the bars?
Species = c(rep("X", 7), rep("Y", 3), rep("Z", 4), "V", rep("W", 3))
Year = c("A", "A", "A", "B", "B", "C", "C","A", "A", "C","B", "B", "C", "C","A", "A", "B", "C")
df <- data.frame(Species, Year)
mylist = lapply(split(df, as.factor(df$Species)), function(memefin){
ggplot(memefin, aes(x = Year, fill = Year))+
geom_bar(fill = c("#e31a1c", "#ff7f00", "#33a02c"))+
ggtitle(memefin$Species)+
scale_x_discrete(breaks=c("A","B","C"),labels=c("2000-2004", "2005-2009", "2010-2014"), drop = F)
})
You are on the right track:
mylist = lapply(split(df, as.factor(df$Species)), function(memefin){
ggplot(memefin, aes(x = Year, fill = Year))+
geom_bar()+
ggtitle(memefin$Species)+
scale_x_discrete(breaks=c("A","B","C"),labels=c("2000-2004", "2005-2009", "2010-2014"), drop = F)+
scale_fill_manual(values=c("A"="#e31a1c","B"= "#ff7f00","C" ="#33a02c")) # just apply the colours to specific "Years"
})

Plotting multiple X variables for one continuous Y variable in a single plot

I am trying to draw a line plot having two x variables in the x-axis with one continuous y variable in the y-axis. The count of x1 and x2 are different. The df looks like the following-
df <- structure(list(val = c(3817,2428,6160,6729,7151,7451,6272,7146,7063,6344,5465,6169,7315,6888,7167,6759,4903,6461,7010,7018,6920,3644,6541,31862,31186,28090,28488,29349,28284,25815,23529,20097,19945,22118), type = c("1wt", "1wt", "3wt", "3wt", "3wt", "5wt", "5wt", "7wt", "7wt", "7wt","10wt","10wt","10wt","15wt","15wt","20wt","20wt","25wt","25wt","25wt","30wt","30wt","30wt","20m","20m","15m","15m","15m","10m","10m","5m", "5m", "5m", "5m"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")), row.names = c(NA, 34L), class = "data.frame")
where the x variables are-
x1 <- factor(df$type, levels = c("1wt", "3wt", "5wt", "7wt", "10wt", "15wt", "20wt", "25wt", "30wt")) and
x2 <- factor(df$type, levels = c("20m", "15m","10m","5m"))
I want to have separate lines for the x1 and x2 with different colors and legends as per the df$group at the x-axis with df$val at the y -axis. could you please help me doing this? Thanks in advance.
EDIT: added below
Here's an approach that assumes the intent is to map the span of possible type values from group A against the span of possible values from group B.
Labeling could be added manually, but I don't think there's any simple way to use two categorical x axes together in one plot.
df2 <- df %>%
mutate(x = case_when(type == "1wt" ~ 0,
type == "3wt" ~ 1,
type == "5wt" ~ 2,
type == "7wt" ~ 3,
type == "10wt" ~ 4,
type == "15wt" ~ 5,
type == "20wt" ~ 6,
type == "25wt" ~ 7,
type == "30wt" ~ 8,
type == "20m" ~ 0/3 * 8,
type == "15m" ~ 1/3 * 8,
type == "10m" ~ 2/3 * 8,
type == "5m" ~ 3/3 * 8))
ggplot(df2, aes(x, val, color = group, group = group)) +
geom_point() +
geom_smooth(method = lm)
2nd approach
It sounds like the OP would like to use the type values numerically in some fashion. If they aren't intrinsically linked to each other in the way that's described, I suspect it will be misleading to plot them as if they are. (See here for a discussion of why this is trouble.)
That said, here's how you could do it. First, here's an approach that just uses the numeric portion of type as is. Note that "m", associated with group B, is on the bottom and "wt" is on the top, associated with group A, as in the example added in the OP comment below. I've added colors to the axes to clarify this. It's a little counterintuitive visually, since the points related to the top axis are on the bottom, and vice versa.
df2 <- df %>%
# First, let's take the number used in "type" without adjustment
mutate(x_unadj = parse_number(type))
ggplot(df2, aes(x_unadj, val, color = group, group = group)) +
geom_point() +
geom_smooth(method = lm) + # Feel free to use other smoothing method, but
# not obvious to me what would be improvement.
scale_x_continuous("m", sec.axis = sec_axis(~., name = "wt")) +
theme(axis.text.x.bottom = element_text(color = "#00BFC4"),
axis.title.x.bottom = element_text(color = "#00BFC4"),
axis.text.x.top = element_text(color = "#F8766D"),
axis.title.x.top = element_text(color = "#F8766D"))
If this is not satisfactory, we might reverse the order of both axes using
scale_x_reverse("m", sec.axis = sec_axis(~., name = "wt")) +
Using ggplot 3.1.0 (from Oct 2018), I could not get the secondary x axis to shift in the opposite direction as the primary axis. This example from 2017 doesn't seem to work with this version any more. As of Dec 2018, there is a proposed fix being reviewed that is meant to address this.

How to point each plot to correct y axis (many plots, two y axes, in R with ggplot2)

So I have compared two groups with a third using a range of inputs. For each of the three groups I have a value and a confidence interval for a range of inputs. For the two comparisons I also have a p-value for that range of inputs. Now I would like to plot all five data series, but use a second axis for the p values.
I am able to do that except for one thing: how do I make sure that R knows which of the plots to assign to the second axis?
This is what it looks like now. The bottom two data series should be scaled up to the Y axis to the right.
ggplot(df) +
geom_pointrange(aes(x=x, ymin=minc, ymax=maxc, y=meanc, color="c")) +
geom_pointrange(aes(x=x, ymin=minb, ymax=maxb, y=meanb, color="b")) +
geom_pointrange(aes(x=x, ymin=mina, ymax=maxa, y=meana, color="a")) +
geom_point(aes(x=x, y=c, color="c")) +
geom_point(aes(x=x, y=b, color="b")) +
scale_y_continuous(sec.axis = sec_axis(~.*0.2))
df is a dataframe whose column names are all the variables you see listed above, all row values are the corresponding datapoints.
You can get what you want, staying true to Hadley's cannon and Grammar of Graphics gospel, if you transform your DF from wide to long, and employ a different aes (i.e. shape, color, fill) between means and CI.
You did not provide a reproducible example, so I employ my own. (Dput at the end of the post)
df2 <- df %>%
mutate(CatCI = if_else(is.na(CI), "", Cat)) # Create a categorical name to map the CI to the legend.
ggplot(df2, aes(x = x)) +
geom_pointrange(aes(ymin = min, ymax = max, y = mean, color = Cat), shape = 16) +
geom_point(data = dplyr::filter(df2,!is.na(CI)), ## Filter the NA within the CI
aes(y = (CI/0.2), ## Transform the CI's y position to fit the right axis.
fill = CatCI), ## Call a second aes the aes
shape = 25, size = 5, alpha = 0.25 ) + ## I changed shape, size, and fillto help with visualization
scale_y_continuous(sec.axis = sec_axis(~.*0.2, name = "P Value")) +
labs(color = "Linerange\nSinister Axis", fill = "P value\nDexter Axis", y = "Mean")
Result:
Dataframe:
df <- structure(list(Cat = c("a", "b", "c", "a", "b", "c", "a", "b",
"c", "a", "b", "c", "a", "b", "c"), x = c(2, 2, 2, 2.20689655172414,
2.20689655172414, 2.20689655172414, 2.41379310344828, 2.41379310344828,
2.41379310344828, 2.62068965517241, 2.62068965517241, 2.62068965517241,
2.82758620689655, 2.82758620689655, 2.82758620689655), mean = c(0.753611797661977,
0.772340941644911, 0.793970086962944, 0.822424652072316, 0.837015408776649,
0.861417383841253, 0.87023105762465, 0.892894201949377, 0.930096326498796,
0.960862178366363, 0.966600321596147, 0.991206984637544, 1.00714201832596,
1.02025006679944, 1.03650896186786), max = c(0.869753641121797,
0.928067675294351, 0.802815304215019, 0.884750162053761, 1.03609814491961,
0.955909854315582, 1.07113399603486, 1.02170928767791, 1.05504846273091,
1.09491706586801, 1.20235615364205, 1.12035782960649, 1.17387406039167,
1.13909154635088, 1.0581878034897), min = c(0.632638511783381,
0.713943701135991, 0.745868763626567, 0.797491261486603, 0.743382797144923,
0.827693203320894, 0.793417962991821, 0.796917421637021, 0.92942504556723,
0.89124101157585, 0.813058838839382, 0.91701749675892, 0.943744642652422,
0.912869230576973, 0.951734254896252), CI = c(NA, 0.164201137643034,
0.154868406784159, NA, 0.177948094206453, 0.178360305763648,
NA, 0.181862670931493, 0.198447350829814, NA, 0.201541499248143,
0.203737532636542, NA, 0.205196077692786, 0.200992205838595),
CatCI = c("", "b", "c", "", "b", "c", "", "b", "c", "", "b",
"c", "", "b", "c")), .Names = c("Cat", "x", "mean", "max",
"min", "CI", "CatCI"), row.names = c(NA, 15L), class = "data.frame")

ggplot2 draw graph with respect to a specific order

If duplicated, please point me to the original question.
I would like to draw a figure in R using ggplot2, and the following codes show what I would like to achieve.
require(ggplot2)
require(data.table)
set.seed(1)
dat <- data.table(time = rep(c(1:40), times = 5),
value = runif(200),
team = rep(c("A","B","C","D","E"), each = 40))
dat[, value := value / sum(value), by = .(time)]
ggplot(dat, aes(x = time, y = value, group=team, fill=team)) +
geom_area(position = "fill") +
scale_fill_manual(values = c("red","blue","green","pink","yellow"),
breaks = c("D", "B", "E", "A", "C"),
labels = c("D", "B", "E", "A", "C"))
ggplot2 output:
As you can see, the order of the figure does not match the order of the legend. It is the order of A, B, C, D, E, but not D, B, E, A, C. I would like to draw the figure with pink at the top, then blue, then yellow, then red, then green (DBEAC). How can I achieve this?
Thanks in advance!
This is pretty much a duplicate of ggplot2: Changing the order of stacks on a bar graph,
geom_area appears to stack the areas in the order in which they first appear in the data.
Ordering dat in the appropriate order appears to solve your problem
# create order you want
my_order <- c("D", "B", "E", "A", "C")
# reversed to get correct ordering in data table
dat[, order := match(team, rev(my_order))]
# sort the data.table
setorder(dat, time, order)
# make the plot
ggplot(dat, aes(x = time, y = value, fill=team))+
geom_area(position = "fill") +
scale_fill_manual(values = c("red","blue","green","pink","yellow"),
breaks = my_order ,
labels = my_order )

Change color of hline by group: ggplot2

I have a facet_grid plot with 2 geom_hlines per plot. I'd like to color each of those lines separately. I thought if I added this color to the geom_hline dataframe I could supply the color inside of aes. This colors by group but uses the default ggplot colors.
Here's the code:
p <- qplot(mpg, factor(sample(c("a", "b", "c", "d"), nrow(mtcars), T)),
data=mtcars, facets = vs ~ am)
hline.data <- data.frame(z = factor(c("a", "b", "c", "d")),
vs = c(0,0,1,1), am = c(0,1,0,1))
hline.data <- transform(hline.data, z0 = as.numeric(z))
hline.data <- rbind.data.frame(hline.data, hline.data)
hline.data[5:8, 1] <- c("b", "c", "d", "a")
hline.data[5:8, 4] <- c(2, 3, 4, 1)
hline.data[, "col"] <- rep(c("red", "black"), each=4)
p + geom_hline(aes(yintercept = z0, colour=col), hline.data)
How can I get the "red" and black geom_hlines I am expecting?
Since you are specifying the exact values in the data (hline.data), you want to use the identity scale:
+ scale_colour_identity()
You just need to set the scale values:
+ scale_colour_manual(values = c("black","red"))

Resources