Connecting the points of a dodged geom_point by group - r

Consider this nested data set, in which i aim at plotting the two nested factor variables on the x-axis:
df <- data.frame(X=c(rep("A",9), rep("B",9), rep("C",9)),
nested=c(rep(c(rep("X",3), rep("Y",3), rep("Z",3)),3)),
response=runif(27))
ggplot(df) +
geom_point(aes(x=X, y=response, col=nested, group=nested, shape=nested), position=position_dodge(width=1))
I want to connect the dots in each level of nested in each level of X to have vertical, parallel lines in the plot, spanning from the maximum to the minimum of response in each nested level. (much alike i would use fill=nested if would go for boxplots), but my first approach was not satisfactory:
ggplot(df) +
geom_point(aes(x=X, y=response, col=nested, group=nested, shape=nested), position=position_dodge(width=0.3))+
geom_line(aes(x=X, y=response, col=nested, group=nested))
I can imagine using geom_errorbar, but this means i would need to create a separate dataframe with min and max values, correct?

You can dodge the lines too. Just make sure that the group aesthetic is mapped to the interaction between nested and X:
ggplot(df) +
geom_point(aes(x = X, y = response, col = nested, shape = nested),
position=position_dodge(width = 0.3)) +
geom_line(aes(x = X, y = response, col = nested,
group = interaction(nested, X)),
position = position_dodge(width = 0.3))

Maybe try this approach with facet_wrap():
library(ggplot2)
#Code
ggplot(df) +
geom_point(aes(x=X, y=response,
col=nested, group=nested, shape=nested),
position=position_dodge(width=1))+
geom_line(aes(x=X, y=response,
col=nested,group=nested),position=position_dodge(width=1))+
facet_wrap(.~X,scales='free_x')+
theme(strip.background = element_blank(),
strip.text = element_blank())
Output:

Related

ggplot two histograms in one plot

I created the plot below using:
ggplot(data_all, aes(x = data_all$Speed, fill = data_all$Season)) +
theme_bw() +
geom_histogram(position = "identity", alpha = 0.2, binwidth=0.1)
As you can see, the difference in the amount of data available is very large. Is there a way to look only at the distribution and not at the total data amount?
You can reference some of the other calculated values from stat functions using a notation that you may have seen before: ..value... I'm not sure the proper name for these or where you can find a list documented, but sometimes these are called "special variables" or "calculated aesthetics".
In this case, the default calculated aesthetic on the y axis for geom_histogram() is ..count... When comparing distributions of different total N size, it's useful to use ..density... You can access ..density.. by passing it to the y aesthetic directly in the geom_histogram() function.
First, here's an example of two histograms with vastly different sizes (similar to OP's question):
library(ggplot2)
set.seed(8675309)
df <- data.frame(
x = c(rnorm(1000, -1, 0.5), rnorm(100000, 3, 1)),
group = c(rep("A", 1000), rep("B", 100000))
)
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
alpha=0.2, color='gray80',
position="identity", bins=80)
And here's the same plot using ..density..:
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
aes(y=..density..), alpha=0.2, color='gray80',
position="identity", bins=80)

Calculating means with stat_summary for two different groupings and plotting in one plot

I am having issues with plotting two calculated means using stat_summary in the same figure.
I am using ggplot and stat_summary to plot means of a dataset that I grouped based on variable A. Variable A can have value 1,2,3,4. The same data also have variable B that can have value 1,2.
So, I can make a plot with means of the data grouped after variable A, and I get 4 lines.
I can also make a plot with means of the data grouped after variable B, where I get 2 lines.
But how can I plot them in the same figure, so that I get 6 lines? I have made a somewhat similar example using the mtcars dataset:
library(ggplot2)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars
plot1 <- ggplot(mtcars, aes(x=gear, y=hp, color=cyl, fill=cyl)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot1
plot2 <- ggplot(mtcars, aes(x=gear, y=hp, color=vs, fill=vs)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot2
So far I have the impression, that since I start with ggplot(xxx), where xxx defines the data and grouping, I can't combine it with another ggplot with another grouping. If I could initiate ggplot() without defining anything in the argument, but only defining data and grouping in the argument for stat_summary, I feel like that would be the solution. But I can't figure out how to use stat_summary like that, if even possible.
You can just add more layers, defining the aes for each seperately:
ggplot(mtcars) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl), fill = paste('cyl:', cyl)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl)), geom='line', fun.y = mean, size=1) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs), fill=paste('vs:', vs)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs)), geom='line', fun.y = mean, size=1)

Problem when trying to plot two histograms using fill aesthetic

I've been trying to plot two histograms by using the fill aesthetic and a specific column with two levels. However, instead of displaying both desired histograms, my code displays one histogram with the whole data and another only for the second classification. I don't know if there is a problem in my syntax neither if this is some kind of tricky issue.
library(tidyverse)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
P1 <- ggplot(db1, aes(x=val)) + geom_histogram()
P2 <- ggplot(db2, aes(x=val)) + geom_histogram()
PF <- ggplot(dbf, aes(x=val)) + geom_histogram()
I want to get this, P1 and P2
ggplot(db1, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I want
But the code I think should work, P1 and P2 with the fill aesthetic for column val
ggplot(dbf, aes(x=val)) + geom_histogram(aes(fill=type), alpha=0.5)
My code
Produces the combination of PF and P2
ggplot(dbf, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I get
Any help or idea will be highly appreciated!
All you need is to pass position = "identity" to your geom_histogram function.
library(tidyverse)
library(ggplot2)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
ggplot(dbf, aes(x=val, fill = type)) + geom_histogram(alpha=0.5, position = "identity")
Is your goal to show the overlap via the color combination? I'm not sure how to force geom_histogram to show the overlap, but geom_density does do what you want. You can play with the bandwidth (bw) to show more or less detail.
dbf %>% ggplot() +
aes(x = val, fill = type) +
geom_density(alpha = .5, bw = .5) +
scale_fill_manual(values = c("red","green"))

How to trim extra space from ggplot

I am trying to make an extremely single heatmap of percentages using ggplot2 which ideally will just be two single thin columns. I tried the following code, believing that the width option in aes would solve the problem.
p_prev_tg <- ggplot(tg_melt, aes(x = variable , y = OTU, fill = value,
width=.3)) + geom_tile() +
scale_fill_gradientn(colours = hm.palette2(10)) +
xlab(NULL) + ylab(NULL) +
theme(axis.text=element_text(size=7))
p_prev_tg
Unfortunately, this returns a plot with lots of empty space as shown. The plot I would like is those two bars side by side, how can I do this in ggplot?
thanks
What about this solution ?
set.seed(1234)
tg_melt <- data.frame(variable=rep(c("Prevalence_T","Prevalence_NT"), each=10),
OTU=rep(paste0("OTU_",1:10),2),
value=rnorm(20))
library(RColorBrewer)
library(ggplot2)
hm.palette2 <- colorRampPalette(rev(brewer.pal(11, 'Spectral')))
p_prev_tg <- ggplot(tg_melt, aes(x = as.numeric(variable), y = OTU, fill = value)) +
geom_tile() +
scale_fill_gradientn(colours = hm.palette2(10)) +
xlab(NULL) + ylab(NULL) +
theme(axis.text=element_text(size=7)) +
scale_x_continuous(breaks=c(1,2),
limits=c(0,3),
labels=levels(tg_melt$variable))+
theme_bw()
p_prev_tg

How to plot two histograms on the same axis scale?

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

Resources