is it possible to create a ggMarginal plot without desaggredating the data? - r

I have a data frame with some points and their frequency of occurrence and I want to plot points (balls) using their frequency to represent their size. But I also want to use ggMarginal to create the marginal plots. The code bellow creates the marginal without taking in account their frequencies.
library(ggplot2)
df <- data.frame("x" = 1:5, "y" = c(5,8,8,12,10), "f" = c(4,5,8,8,5))
p <- ggplot(df, aes(x=x, y=y, size=f)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df, type = "histogram")
I don't want to create another data frame with disaggregated data. But it would lead to the right marginals. As presented bellow:
# disaggregated data
df2 <- df[ rep(1:nrow(df), df$f), c("x", "y") ]
p <- ggplot(df2, aes(x=x, y=y)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df2, type = "histogram")
But even if I try to use both data frames, the resulting marginals still go wrong.
p <- ggplot(df, aes(x=x, y=y, size=f)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df2, type = "histogram")
Is it possible to create the marginals with disaggregating the data? How?
If 1. is not possible, how to do it anyway, since none of the examples above provided the desired plot?

It can be done with cowplot package.
library(tidyverse)
library(cowplot)
df <- data.frame("x" = 1:5,
"y" = c(5,8,8,12,10),
"f" = c(4,5,8,8,5))
df2 <- df[rep(1:nrow(df), df$f), c("x", "y") ]
p <-
ggplot(df, aes(x=x, y=y, size=f)) +
geom_count() +
theme_bw()
xhist <-
axis_canvas(p, axis = "x") +
geom_histogram(data = df2, aes(x = x), color = 'lightgray')
yhist <-
axis_canvas(p, axis = "y", coord_flip = TRUE) +
geom_histogram(data = df2, aes(x = y), color = 'lightgray') +
coord_flip()
p %>%
insert_xaxis_grob(xhist, grid::unit(1, "in"), position = "top") %>%
insert_yaxis_grob(yhist, grid::unit(1, "in"), position = "right") %>%
ggdraw()

Related

Make geom_histogram display x-axis labels as integers instead of numerics

I have a data.frame that has counts for several groups:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,2)))
df$n <- as.integer(df$n)
And I'm trying to display a histogram of df$n, facetted by the group using ggplot2's geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) + geom_histogram() + facet_grid(~group) + theme_minimal()
Any idea how to get ggplot2 to label the x-axis ticks with the integers the histogram is summarizing rather than the numeric values it is currently showing?
You could tweak this by the binwidth argument of geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()
Another example:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,5)))
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()
You can manually specify the breaks with scale_x_continuous(breaks = seq(1, 2)). Alternatively, you can set the breaks and labels separately as well.

How to add legend of boxplot and points in ggplot2?

I have the following to plot a boxplot of some data "Samples" and add points of the "Baseline" and "Theoretical" data.
library(reshape2)
library(ggplot2)
meltshear <- melt(Shear)
samples <- rep(c("Samples"), each = 10)
baseline <- c("Baseline",samples)
method <- rep(baseline, 4)
xlab <- rep(c("EXT.Single","EXT.Multi","INT.Single","INT.Multi"), each = 11)
plotshear <- data.frame(Source = c(method,"theoretical","theoretical","theoretical"),
Shear = c(xlab,"EXT.Multi","INT.Single","INT.Multi"),
LLDF = c(meltshear[,2],0.825,0.720,0.884))
data <- subset(plotshear, Source %in% c("Samples"))
baseline <- subset(plotshear, Source %in% c("Baseline"))
theoretical <- subset(plotshear, Source %in% c("theoretical"))
ggplot(data = data, aes(x = Shear, y = LLDF)) + geom_boxplot(outlier.shape = NA) +
stat_summary(fun = mean, geom="point", shape=23, size=3) +
stat_boxplot(geom='errorbar', linetype=1, width=0.5) +
geom_jitter(data = baseline, colour = "green4") +
geom_jitter(data = theoretical, colour = "red")
I get the following plot but I cannot add the legend to the plot. I want to have the legend showing labels = c("Samples","Baseline","Theoretical") for the boxplot shape, green dot, and red dot respectively.
You could try to add fill into aes.
ggplot(data = data, aes(x = Shear, y = LLDF, fill = Shear))
Or you can see this resource, maybe it is useful http://www.cookbook-r.com/Graphs/

add y=0 line in some plots facet_grid ggplot2

I have a big plot, using facet_grid().
I want to add a vertical line to indicate y=0, but only in some of the plot.
Reproducible example -
df <- data.frame(x = 1:100, y = rnorm(100,sd=0.5), type = rep(c('A','B'), 50))
ggplot(df) + facet_grid(type~.) +
geom_point(data = df[df$type == 'A',], mapping = aes(x=x, y=y)) +
geom_rect(data = df[df$type == 'B',], mapping=aes(xmin=x,ymin=0,xmax=(x+2),ymax=y)) +
theme(panel.background=element_rect(fill="white"))
I want the line only in the top ptot for example.
Just create another data object for an hline geom and make sure to include the relevant faceted variable.
df <- data.frame(x = 1:100, y = rnorm(100,sd=0.5), type = rep(c('A','B'), 50))
ggplot(df) + facet_grid(type~.) +
geom_point(data = df[df$type == 'A',], mapping = aes(x=x, y=y)) +
geom_rect(data = df[df$type == 'B',], mapping=aes(xmin=x,ymin=0,xmax=(x+2),ymax=y)) +
geom_hline(data = data.frame(type="A", y=0), mapping=aes(yintercept=y)) +
theme(panel.background=element_rect(fill="white"))

How to rearrange the data to produce a barplot

I created a small example here for my big data set with more than 400000 records, I was able to plot the point_plot, here the code:
Data1 <- data.frame(State=rep('SC',24),ID=rep(11,24),Month=c(rep(1,times=9),
rep(2,times=6),rep(3,times=9)),Day=c(rep(1:3,each=3),rep(1:2,each=3),
rep(1:3,each=3)),Group=rep(1:3,8),Value=rep(10.1:20.9,length.out=24))
Data2 <- Data1[rep(1:nrow(Data1),4),]
Data <- data.frame(State=c(rep('SC',48),rep('NC',48)),ID=c(rep(11:14,each=24)),
Month=Data2$Month,Day=Data2$Day,Group=rep(1:3,32),Value=rep(10:20,length.out=96))
states = unique(Data$State)
for(j in 1:length(states)) {
jpeg(file=paste("Pic", j, ".jpeg", sep=""))
data <- subset(Data,State==states[j])
plot(data$Month, data$Value, type="p", xlab="Months", ylab="Value")
colors <- rainbow(3)
for (i in 1:3) { # add lines
group <- subset(data, Group==i)
lines(group$Month, group$Value, type="p", col=colors[i])
}
title(paste(unique(data$State),"Value",sep=' ')) # add a title and subtitle
leg.txt <- c("G1","G2","G3") # add a legend
legend("topleft", legend=leg.txt, fill=colors, bty="o")
dev.off()
}
But now I need to plot the bar_plot with the 3 groups side by side for each month, I tried with the following two, but was not able to get it right:
1)ggplot(data, aes(factor(data$Month), data$Value, fill = factor(data$Group))) +
geom_bar(position = "dodge", width = 0.5)
scale_x_discrete(labels = data$Month)
2) barplot(data$Value,beside=T,names.arg=factor(data$Month))
Any help would be greatly appreciated!
You need the argument stat = "identity" for geom_bar:
ggplot(data, aes(as.factor(Month), Value, fill = as.factor(Group))) +
geom_bar(position = "dodge", width = 0.5, stat = "identity") +
scale_fill_discrete("Group", labels = c("N", "E", "L"))
ggplot(Data, aes(x=Month, y=Value, fill=as.factor(Group))) +
geom_bar(stat="identity", position="dodge", color="black")

ggplot: relative frequencies of two groups

I want a plot like this except that each facet sums to 100%. Right now group M is 0.05+0.25=0.30 instead of 0.20+0.80=1.00.
df <- rbind(
data.frame(gender=c(rep('M',5)), outcome=c(rep('1',4),'0')),
data.frame(gender=c(rep('F',10)), outcome=c(rep('1',7),rep('0',3)))
)
df
ggplot(df, aes(outcome)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
facet_wrap(~gender, nrow=2, ncol=1)
(Using y = ..density.. gives worse results.)
here's another way
ggplot(df, aes(outcome)) +
geom_bar(aes(y = ..count.. / sapply(PANEL, FUN=function(x) sum(count[PANEL == x])))) +
facet_wrap(~gender, nrow=2, ncol=1)
I usually do this by simply precalculating the values outside of ggplot2 and using stat = "identity":
df1 <- melt(ddply(df,.(gender),function(x){prop.table(table(x$outcome))}),id.vars = 1)
ggplot(df1, aes(x = variable,y = value)) +
facet_wrap(~gender, nrow=2, ncol=1) +
geom_bar(stat = "identity")

Resources