Plot Frequency of Data in ggplot2 [duplicate] - r

This question already has answers here:
ggplot side by side geom_bar()
(2 answers)
Closed 4 years ago.
I have a data frame that looks like the following:
threshold <- c("thresh1","thresh3","thresh10","thresh3","thresh3", "thresh10")
expression <- c("expressed", "expressed", "expressed", "depleted", "expressed", "depleted")
data.frame("Threshold" = threshold, "Expression" = expression)
I would like to generate a histogram of counts of the different thresholds, bucketed by the expression.
I have attempted to do so using geom_bar(), but I not want the data stacked. Rather, I want the different categories (depleted, enriched etc...) to be represented in their own bars.
ggplot(final_nonexpressed, aes(x = threshold, fill = expression))+geom_bar(width = 0.5)
Any help would be appreciated!

Check out the help page for ?geom_bar(), specifically the dodge argument. For example:
library(ggplot2)
g <- ggplot(mpg, aes(class, fill = factor(drv)))
g + geom_bar()
g + geom_bar(position = "dodge")
Created on 2019-01-15 by the reprex package (v0.2.1)

Related

How to respect quantitative nature of discrete/group variables in R ggplot2? [duplicate]

This question already has an answer here:
How to plot a boxplot with correctly spaced continuous x-axis values in ggplot2
(1 answer)
Closed 3 years ago.
I'd like to do a plot with R ggplot2 functions to highlight relations between a categorical X and a continuous Y variable. But my categorical variable is quantitative (e.g integers) and I would like my plots to respect the position suggested by the quantitative value of X.
Imagine the following dataset:
library(tidyverse)
df <- data.frame(Category=sample(c(1, 2, 5), 1000, replace = T)) %>%
mutate(Value=Category+rnorm(1000))
The easiest boxplot would be :
ggplot(df, aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
But what I would like is :
add_row(df, Category=3:4, Value=NA) %>%
ggplot(aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
Do you know any proper way to achieve that beyond the ugly trick above that is not really scalable? Because we can imagine many boxplots. Or even the case in which my categories are decimal values (with of course a limited number of categories). All in all, my wish is to be able to distribute my boxplots along the x-axis according to the quantitative value of the categories. The same question could apply to barplot instead of boxplots of course...
Thanks a lot!
As mentioned by #camille, you should write:
ggplot(df, aes(x=Category, y=Value, group = Category)) +
geom_boxplot() +
labs(x="Category")

Summary plot of ggplot2 facets as a facet [duplicate]

This question already has answers here:
Easily add an '(all)' facet to facet_wrap in ggplot2?
(3 answers)
Closed 5 years ago.
It is often the case that we produce facets to decompose the data according to a variable, but that we still would like to see a summary as a stack of the facets. Here is an example:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length,y=Petal.Length)) +
geom_point(aes(color=Species)) +
facet_wrap(~Species, ncol=2)
However, I would also like that one of the facets is the overlay of the 3 facets:
ggplot(data=iris, aes(x=Sepal.Length,y=Petal.Length)) +
geom_point(aes(color=Species))
Is there anyway of doing this easily?
Many thanks,
I wrote the following function to duplicate the dataset and create an extra copy under of the data under variable all.
library(ggplot2)
# Create an additional set of data
CreateAllFacet <- function(df, col){
df$facet <- df[[col]]
temp <- df
temp$facet <- "all"
return(rbind(temp, df))
}
Instead of overwriting the original facet data column, the function creates a new column called facet. The benefit of this is that we can use the original column to specify the aesthetics of the plot point.
df <- CreateAllFacet(iris, "Species")
ggplot(data=df, aes(x=Sepal.Length,y=Petal.Length)) +
geom_point(aes(color=Species)) +
facet_wrap(~facet, ncol=2)
I feel the legend is optional in this case, as it largely duplicates information already available within the plot. It can easily be hidden with the extra line + theme(legend.position = "none")

How to remove dots and extend boxplots in ggplot2 [duplicate]

This question already has answers here:
ggplot2 - Boxplot Whiskers at Min/Max
(2 answers)
Closed 7 years ago.
I have some data that I'm trying to build some boxplots with, but I'm getting this error:
Warning message: Removed 1631 rows containing non-finite values
(stat_boxplot).
There are no NA values and all the data seems fine. How can I fix this as these are certainly valuable points in my data and should be extended by the whiskers?
Data
The data is fairly large, and I couldn't get a smaller subsample to produce the errors, so I'll just post the original data.
dat.rds
ggplot2
dat <- readRDS("./dat.rds")
ggplot(dat, aes(x = factor(year), y = dev)) + geom_boxplot() + ylim(-40, 260)
Edit
I was able to get it to work in boxplot with `range = 6'. Is there a way to do this in ggplot?
boxplot(dev~year, data = d, range = 6)
Remove the ylim restriction and use the coef argument of geom_boxplot, then it works fine:
library(ggplot2)
download.file(url = "https://www.dropbox.com/s/5mgogyclhim6hom/dat.rds?dl=1", tf <- tempfile(fileext = ".rds"))
dat <- readRDS(tf)
ggplot(dat, aes(x = factor(year), y = dev)) +
geom_boxplot(coef = 6)

How can I resize the boxes in a boxplot created with R and ggplot2 to account for different frequencies amongst different boxplots? [duplicate]

This question already has answers here:
Is there an equivalent in ggplot to the varwidth option in plot?
(2 answers)
Closed 8 years ago.
I have a boxplot that I made in R with ggplot2 analagous to the sample boxplot below.
The problem is, for the values on the y axis (in this sample, the number of cylinders in the car) I have very different frequencies -- I may have included 2 8 cylinder cars, but 200 4 cylinder cars. Because of this, I'd like to be able to resize the boxplots (in this case, change the height along the y axis) so that the 4 cylinder boxplot is a larger portion of the chart than the 8 cylinder boxplot. Does someone know how to do this?
As #aosmith mentioned, varwidth is the argument you want. It looks like it may have been accidentally removed from ggplot2 at some point (https://github.com/hadley/ggplot2/blob/master/R/geom-boxplot.r). If you look at the commit title, it is adding back in the varwidth parmeter. I'm not sure if that ever made into the cran package, but you might want to check your version. It works with my version: ggplot2 v.1.0.0 I'm not sure how recently the feature was added.
Here is an example:
library(ggplot2)
set.seed(1234)
df <- data.frame(cond = factor( c(rep("A",200), rep("B",150), rep("C",200), rep("D",10)) ),
rating = c(rnorm(200),rnorm(150, mean=0.2), rnorm(200, mean=.8), rnorm(10, mean=0.6)))
head(df, 5)
tail(df, 5)
p <- ggplot(df, aes(x=cond, y=rating, fill=cond)) +
guides(fill=FALSE) + coord_flip()
p + geom_boxplot()
Gives:
p + geom_boxplot(varwidth=T)
Gives:
For a couple of more options, you can also use a violin plot with scaled widths (the scale="count" argument):
p+ geom_violin(scale="count")
Or combine violin and boxplots to maximize your information.
p+ geom_violin(scale="count") + geom_boxplot(fill="white", width=0.2, alpha=0.3)

Plotting multiple columns with ggplot2 [duplicate]

This question already has answers here:
Plot multiple columns on the same graph in R [duplicate]
(4 answers)
Closed 4 years ago.
I need to plot the following dataset in the same graph.
Bin1,Bin2,Bin3,Cat
4,3,5,S
6,4,5,M
3,5,4,M
1,4,5,M
,5, ,M
In each bin, first data point belongs to a different category than the rest. (So I added the Cat column)
I need to plot these as points (different colors for the different categories)
Following lines of code achieve what I need for a single bin
p <- ggplot(data,aes(Bin1,1))
p + geom_point(aes(color=Cat, size=Cat))
How do I do this for the entire dataset ?
Here is a related question?
What if I need to use a bunch of columns to color the points. Color Bin1 points according to Cat1 and so on..
Bin1,Cat1,Bin2,Cat2
4,S,5,S
6,L,5,M
3,M,4,L
1,M,5,L
3,M
How do I do this??
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.vars = "Cat"), aes(value, variable, colour = Cat)) +
geom_point(size = 4)
Just melt the data.frame and plot it.
library(reshape2)
dataM <- melt(data, id.vars = "Cat")
p <- ggplot(dataM, aes(value, variable, colour = Cat, size = Cat) + geom_point()

Resources