Number of Observations in ggplot R - r

I output each of boxplot of my response variable with respect to my categorical feautures but I cannot highlight number of observations of each category. I tried stat_summary and geom_text() options which are stated here but they are not working.
How can I show them in my boxplots?
Below is my code:
for(i in 3:ncol(Train_factor)){
b<-paste("Boxplot for",colnames(Train_factor[i]))
p10 <- (ggplot(data=Train_factor, aes_string(x = names(Train_factor)[i],
y = "Response",fill=variable)) +
geom_boxplot())
plot_list[[i]] = p10
}
for (i in 3:ncol(Train_factor)) {
file_name = paste("boxplot", i, ".tiff", sep="")
tiff(file_name)
print(plot_list[[i]])
dev.off()
}

You haven't provided a reproducible example, so here's a generic example using the built-in mtcars data frame. We use geom_text() but instead of stat="identity" (the default) we use stat="count" and label=..count.. (which is the internally calculated count of the number of values) so that the displayed value will be the count of values.
library(ggplot2)
ggplot(mtcars, (aes(x=factor(cyl), y=mpg))) +
geom_boxplot() +
geom_text(aes(label=..count..), y=0, stat='count', colour="red", size=4) +
coord_cartesian(ylim=c(0,max(mtcars$mpg))) +
theme_classic()

Related

How to plot two histograms on the same axis scale?

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

plot multiple figures of smaller facets instead of one large figure of large facets

i want to use ggplot to produce histograms for all of my columns (132 columns).
i use the following code but it produces all of histograms in one figure (132 tiny histograms). is there a way to produce for example 11 figures that each contains 12 histogram?
d <- melt(data)
ggplot(d,aes(x = value)) +
facet_wrap(~variable,scales = "free_x") +
geom_histogram(aes(y=..density..),colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666")
thanks
The code below will create a separate plot for each level of variable in your melted data and save it in a list.
p.list=list()
for (var in unique(d$variable)) {
p.list[[var]] = ggplot(d[d$variable==var,], aes(x = value)) +
facet_wrap(~variable,scales = "free_x") +
geom_histogram(aes(y=..density..),colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666")
}
Now you can plot any number of them at a time. For example:
library(gridExtra)
for (i in seq(1, length(p.list), 12)) {
do.call(grid.arrange, c(p.list[i:(i+11)], ncol=3))
}

plot number of data points in r

I am using the following in R to generate a Boxplot out of a given set of data:
ggplot(data = daten, aes(x=Bodentyp, y=Fracht)) + geom_boxplot(aes(fill=Bewirtschaftungsform))
Now I want to display the number of data points going into each category of the column "Bodentyp". How do I achieve this?
You can use fun.datato apply a function (f) to the grouped data to return a count (length(y)) and a position for the label (median(y))
f <- function(y)
c(label=length(y), y=median(y))
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=as.factor(cyl), y=mpg)) +
geom_boxplot() + theme_bw() +
stat_summary(fun.data=f, geom="text", vjust=-0.5, col="blue")

ggplot with extracting rows from data frame in for loop, showing different colors

I have a data frame that contains monthly time series data (from jan 2010 through dec 2012).
df<- data.frame(code=NA,Year=NA,Month=rep(seq(as.Date('2010/1/1'),by='month',length.out=36),3),x1=rnorm(3*36))
df$code[1:36]<-1; df$code[37:72]<-2; df$code[73:108]<-3
yr <- c(rep(2010,12),rep(2011,12),rep(2012,12))
df$Year<-rep(yr,3)
I would like to extract the rows that have the same code (there will be 36 rows for each code), and plot the values for each code on top of each other. I tried achieving this by the following code:
m <- ggplot(df[1:36,], aes(x=Month,y=x1)) + geom_point() + geom_line(aes(color ='ID:1')) +
scale_x_date(labels = date_formatv(format = "%m"),breaks = date_breaks("month"))+
xlab("") + ylab("")+ facet_wrap(~Year,scales=("free_x"))
Now I wrote a for loop to extract the next 36 observations and add them to the plot:
for(i in 1:2){
data2 <- df[((i*36)+1):((i+1)*36),]
m<-m+geom_point(data=data2,aes(x=Month,y=x1))+geom_line(data=data2,aes(x=Month,y=x1
,color=paste0('ID:',i+1)))
}
This code produces the following plot:
Now my questions are:
(1) As you can see, I don't get a legend for ID:2 (it only produces the legend for the last one), how can I get that?
(2) I would like to see different color for each code (associated with the legend), how can I achieve that?
(3) I am sure there should be a better way to produce the desired output, rather than using for loop, which is not recommended, any suggestion?
Map code to color to your aes statement.
m <- ggplot(df, aes(x=Month,y=x1,color=factor(code))) +
geom_point() +
geom_line() +
scale_x_date(labels = date_format(format = "%m"),breaks = date_breaks("month"))+
xlab("") + ylab("")+ facet_wrap(~Year,scales=("free_x"))
m
Instead of using a for loop or subsetting, add color = factor(code) to your aesthetics, which will add separately colored lines (and points) for each group of 36:
m <- ggplot(df, aes(x=Month, y=x1, color = factor(code))) +
geom_point() + geom_line() +
scale_x_date(labels = date_format(format = "%m"),breaks = date_breaks("month"))+
xlab("") + ylab("")+ facet_wrap(~Year,scales=("free_x"))
print(m)
(You could naturally customize the label title with something like labs(color = "ID"), or customize the choices of colors with scale_color_manual).

How to fix the geom_text label position so it is always on the middle of the plot?

I would like to create a function that produce a ggplot graph.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
ggplot(data, aes(x=x, y=y)) +
geom_point() +
geom_text(aes(label=y), y=3) +
facet_grid(z~.)
}
myfun(data2)
It is supposed to label some text on the graph. However, without knowing the data in advance I am unable to adjust the positions of text vertically manually. Especially I don't want the label to move positions with data: I want it always stays at about 1/4 vertically of the plots. (top-mid)
How can I do that?
Is there a function that returns the y.limit.up and y.limit.bottom then I can assign y = (y.limit.up + y.limit.bottm) / 2 or something.
Setting either x or y position in geom_text(...) relative to the plot scale in a facet is actually a pretty big problem. #agstudy's solution works if the y scale is the same for all facets. This is because, in calculating range (or max, or min, etc), ggplot uses the unsubsetted data, not the data subsetted for the appropriate facet (see this question).
You can achieve what you want using auxiliary tables, though.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
label.pos <- data[,ypos:=min(y)+0.75*diff(range(y)),by=z] # 75% to the top...
ggplot(data, aes(x=x, y=y)) +
geom_point() +
# geom_text(aes(label=y), y=3) +
geom_text(data=label.pos, aes(y=ypos, label=y)) +
facet_grid(z~., scales="free") # note scales = "free"
}
myfun(data2)
Produces this.
If you want scales="fixed", then #agstudy's solution is the way to go.
You can do this for example:
ggplot(data2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_text(aes(label=y, y=mean(range(y)))) +
facet_grid(z~.)
Or fix y limits manually:
scale_y_continuous(limits = c(10, 15))
#user890739 :
with geom_density you can estimate an ypos variable like this :
data<-dplyr::mutate(group_by(data, z), ypos=max(density(y)$y)*.75*nrow(data))
Then plot the result :
ggplot(data, aes(x=x)) +
stat_density(aes(y=..density..)) +
geom_text(aes(label=y, y=ypos)) +
facet_grid(z~., scales="free")

Resources