I created a new data set using tidyr:
library(tidyverse)
##Create some fake data
set.seed(3)
data <- tibble(
year = 1991:2020,
One = 11:40,
Two = 31:60,
Three = 61:90,
)
##Gather the variables to create a long dataset
new_data <- data %>%
gather(model, value, -year)
##plot the data
ggplot(new_data, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack")+
geom_rangeframe() +
theme_tufte()
The Problem is that the y-axis is not at the correct length:
Adding a facet_grid to the code:
# facet_grid(~model)
I also tried adding
scale_y_continuous(limits = c(0, 150))
however it did not work.
I als tried adding a fake dataset which contains the range from min to max of my real data:
data2 <- tibble(
year = 1991:2020,
bmsum = dummy = c(11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,90)
)
new_data2 <- data2 %>%
gather(model, value, -year)
ggplot(new_data, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack")+
geom_rangeframe(data=new_data2) +
facet_grid(~model)+
theme_pubclean()
There's nothing wrong with the axis, and this is nothing to do with the stacked plot. You're using ggthemes::geom_rangeframe(), which, if you view the description, creates:
Axis lines which extend to the maximum and minimum of the plotted data.
If you don't want those, don't use them. Your call to theme_tufte() is removing the background breaks, making it look like there's no axis.
You can put the lines back in after your theme_tufte() call by adding another call to theme() with an axis.line argument:
ggplot(new_data, aes(x = year, y = value, fill = model)) +
geom_bar(stat = "identity", position = "stack")+
theme_tufte() +
theme(axis.line = element_line(color = "black", size = 1))
Related
I have three lots of data I want to plot together: two geom_line() over the top of one stacked geom_bar(). All of that is over a time series, with a bar and two line values for each year.
The data looks something like this:
df <- data.frame(year = rep(1:5, each = 3),
cat = c("small", "med", "large"),
count = rep(sample(1:10, 5)),
line1 = rep(sample(30000:40000, 5), each = 3),
line2 = rep(sample(200:300, 5), each = 3))
It's easy enough to plot all three together, but I don't want to show the y-axis label for the bars. Instead, I want the left axis to show one line and the right to show the other. I want the plot to look something like this:
but to have the left axis show the line1 value (i.e. the 30000:40000 value). How would I go about including the two line axes, but still showing the bars across the whole height of the plot?
library(ggplot2)
ggplot(data = df, aes(x=year)) +
geom_bar(aes(y = count, x = year, fill = cat), position = "fill", stat="identity") +
geom_line(aes(y = line1/max(line1))) +
geom_line(aes(y = line2/max(line2)), color = "red") +
scale_y_continuous(sec.axis = sec_axis(~.*max(df$line2), name = "line2 (red)"))
Simply adding the geom_bar() after a two-axis line plot results in the bars not showing, because the scale of the lines data is far beyond that of the proportional (0-1) bar data:
ggplot(data = df, aes(x=year)) +
geom_line(aes(y = line1)) +
geom_line(aes(y = line2*100), color = "red") +
scale_y_continuous(sec.axis = sec_axis(~./100, name = "line2 (red)")) +
geom_bar(aes(y = count, x = year, fill = cat), position = "fill", stat="identity")
I'd like these axes, but still show the bars:
Could do it with dplyr calculations in line:
library(dplyr); library(ggplot2)
ggplot(data = df, aes(x=year)) +
geom_col(data = df %>% group_by(year) %>%
mutate(share = count / sum(count) * max(df$line1)),
aes(y = share, x = year, fill = cat)) +
geom_line(aes(y = line1)) +
geom_line(aes(y = line2*100), color = "red") +
scale_y_continuous(sec.axis = sec_axis(~./100, name = "line2 (red)"))
I'm trying to replicate a plot drawn by graphpad in r
but I have a problem changing the y axis
here is an example data and my plot codes:
library(ggplot2)
data <- data.frame(names = rep(factor(LETTERS[1:3])),
values = c(0.001, 0.02 ,0.95),
group = rep("A",3))
ggplot(data,
aes(x = group,
y = values,
fill = names)) + geom_bar(stat = "identity", position='stack') +
scale_y_continuous(breaks = c(0.001,0.01,0.1,1), labels=c(0.001,0.01,0.1,1))
the result of my code is on top but I want to plot it like the image on the bottom.
You can convert the values to their logs, then add a constant that makes all the log values greater than 0. Then re-label the axis appropriately:
data2 <- data %>%
mutate(logvalues = log10(values) - min(log10(0.0001)))
ggplot(data2[order(-data2$logvalues),],
aes(x = group, y = logvalues, fill = names)) +
geom_col(position = 'identity') +
scale_y_continuous(breaks = 0:4, labels = ~ 10^(.x-4))
I am plotting a distribution of two variables on a single histogram. I am interested in highlighting each distribution's mean value on that graph through a doted line or something similar (but hopefully something that matches the color present already in the aes section of the code).
How would I do that?
This is my code so far.
hist_plot <- ggplot(data, aes(x= value, fill= type, color = type)) +
geom_histogram(position="identity", alpha=0.2) +
labs( x = "Value", y = "Count", fill = "Type", title = "Title") +
guides(color = FALSE)
Also, is there any way to show the count of n for each type on this graph?
i've made some reproducible code that might help you with your problem.
library(tidyverse)
# Generate some random data
df <- data.frame(value = c(runif(50, 0.5, 1), runif(50, 1, 1.5)),
type = c(rep("type1", 50), rep("type2", 50)))
# Calculate means from df
stats <- df %>% group_by(type) %>% summarise(mean = mean(value),
n = n())
# Make the ggplot
ggplot(df, aes(x= value, fill= type, color = type)) +
geom_histogram(position="identity", alpha=0.2) +
labs(x = "Value", y = "Count", fill = "Type", title = "Title") +
guides(color = FALSE) +
geom_vline(data = stats, aes(xintercept = mean, color = type), size = 2) +
geom_text(data = stats, aes(x = mean, y = max(df$value), label = n),
size = 10,
color = "black")
If things go as intended, you'll end up something akin to the following plot.
histogram with means
I am trying to create a grid of bargraphs that show the average for different species. I am using the iris dataset for this question.
I summarised the data, melted it into long form long, and tried to use facet_wrap.
iris %>%
group_by(Species) %>%
summarise(M.Sepal.Length=mean(Sepal.Length),
M.Sepal.Width=mean(Sepal.Width),
M.Petal.Length= mean(Petal.Length),
M.Petal.Width=mean(Petal.Width)) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(Part, Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_grid(cols=vars(Part)) +
facet_grid(cols = vars(Part))
However, the graph I am getting has x.axis labels that are strung across each facet grid. Additionally the clustered graphs are not centered within each facet box. Instead they appear at the location of their respective x-axis label. I'd like to get rid of the x-axis labels, center the graphs, and scale the graphs within each facet.
Here is an image of the resulting graph marked up with my expected output:
Perhaps this is what you're looking for?
The key changes are:
Remove Part as the variable mapped to x, that way the data is plotted in the same location in every facet
Switch to facet_wrap so you can use scales = "free_y"
Use labs to manually add the x title
Add theme to get rid of the x-axis ticks and tick labels.
library(ggplot2)
library(dplyr) # Version >= 1.0.0
iris %>%
group_by(Species) %>%
summarise(across(1:4, mean, .names = "M.{col}")) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(x = 1, y = Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
I also took the liberty of switching out your manual call to summarise with the new across functionality.
Here's how you might also calculate error bars:
library(tidyr)
iris %>%
group_by(Species) %>%
summarise(across(1:4, list(M = mean, SE = ~ sd(.)/sqrt(length(.))),
.names = "{fn}_{col}")) %>%
pivot_longer(-Species, names_to = c(".value","Part"),
names_pattern = "([SEM]+)_(.+)") %>%
ggplot(., aes(x = 1, y = M, group = Species, fill=Species)) +
geom_col(position = "dodge") +
geom_errorbar(aes(ymin = M - SE, ymax = M + SE), width = 0.5,
position = position_dodge(0.9)) +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part", y = "Value") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
I'm currently trying to plot mean values of a variable pt for each combination of species/treatments in my experiments. This is the code I'm using:
ggplot(data = data, aes(x=treat, y=pt, fill=species)) +
geom_bar(position = "dodge", stat="identity") +
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right")
As you can see, the plot seems to assume the mean of my 5N and 95E treatments are 1.00, which isn't correct. I have no idea where the problem could be here.
Took a stab at what you are asking using tidyverse and ggplot2 which is in tidyverse.
dat %>%
group_by(treat, species) %>%
summarise(mean_pt = mean(pt)) %>%
ungroup() %>%
ggplot(aes(x = treat, y = mean_pt, fill = species, group = species)) +
geom_bar(position = "dodge", stat = "identity")+
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right") +
geom_text(aes(label = round(mean_pt, 3)), size = 3, hjust = 0.5, vjust = 3, position = position_dodge(width = 1))
dat is the actual dataset. and I calculated the mean_pt as that is what you are trying to plot. I also added a geom_text piece just so you can see what the results were and compare them to your thoughts.
From my understanding, this won't plot the means of your y variable by default. Have you calculated the means for each treatment? If not, I'd recommend adding a column to your dataframe that contains the mean. I'm sure there's an easier way to do this, but try:
data$means <- rep(NA, nrow(data))
for (x in 1:nrow(data)) {
#assuming "treat" column is column #1 in your data fram
data[x,ncol(data)] <- mean(which(data[,1]==data[x,1]))
}
Then try replacing
geom_bar(position = "dodge", stat="identity")
with
geom_col(position = "dodge")
If your y variable already contains means, simply switching geom_bar to geom_col as shown should work. Geom_bar with stat = "identity" will sum the values rather than return the mean.