Maybe someone has run into someissue. I am trying to plot different proteins (e.g., CS, AMPK) in other type points (pre and post), and two different intervention groups (control, treatment). I plot a grouped barplot nicely based on group and time, but I cannot create a unique grouped barplot per protein.
My data has as columns the following variables:
PID, time, group, protein_name, protein_content (in the plot named area)
I used the following code to plot a grouped barplot
ggplot(data, aes(group, area, fill=time)) + theme_classic() +
geom_bar(position = position_dodge(), stat = "identity")
With the following output:
How do I create a plot like the one above for each protein rather than using all the protein values as a unique value? I have 5 different proteins, so I would like to plot 5 different plots like the one above.
Thank you in advance! All help is very much appreciated :)
Related
I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot
how in R, should I have a histogram with a categorical variable in x-axis and
the frequency of a continuous variable on the y axis?
is this correct?
There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results.
First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.
set.seed(950141237) # for reproducibility of results
df <- data.frame(group = rep(c("A","B","C","D"),200),
y_value = rnorm(800,mean=100,sd = 20))
The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.
Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group.
ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()
...and the resulting chart looks like this:
In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.
We can verify the "stacking" behavior by removing the fill = group argument from aes().
# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()
...and the output, which looks just like the first chart, but drawn in a single color.
Another way to render the data is to use group with facet_wrap(), where each distribution appears in a different facet on one chart.
ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)
The resulting chart looks like this:
The facet approach makes it easier to see differences in frequency of y values between the groups.
I have a data set of several features of several organisms. I'm displaying each feature individually by several different categories individually and in combination (e.g. species, location, population). Both in raw counts and a percentage of the total sample size and a percentage within a give group.
My problem comes when I'm trying to display a stacked bar chart using ggplot for the percent of individuals within a group. Since the groups do not have the same number of individuals in them, I'd like to display the raw number or count of individuals with that feature on their respective bars for context. I've managed to properly display the stacked percentage bar chat and get the number of individuals from the most populous groups to display. I'm having trouble displaying the rest of the groups.
ggplot(data=All.k6,aes(x=Second.Dorsal))+
geom_bar(aes(fill=Species),position="fill")+
scale_y_continuous(labels=scales::percent)+
labs(x="Number of Second Dorsal Spines",y="Percentage of Individuals within Species",title="Second Dorsal Spines")+
geom_text(aes(label=..count..),stat='count',position=position_fill(vjust=0.5))
You need to include a group= aesthetic so that position_fill knows how to position things. In geom_bar, you set the fill= aesthetic, so ggplot assumed you also want to group by that aesthetic. In geom_text it assumes the group is your x= aesthetic. In your case, just add group=Species after your label= aesthetic. Here's an example:
# sample dataset
set.seed(1234)
types <- data.frame(
x=c('A','A','A','B','B','B','C','C','C'),
x1=rep(c('aa','bb','cc'),3)
)
df <- rbind(types[sample(1:9,50,replace=TRUE),])
Plot without grouping:
ggplot(df, aes(x=x)) +
geom_bar(aes(fill=x1),position='fill') +
scale_y_continuous(label=scales::percent) +
geom_text(aes(label=..count..),stat='count',
position=position_fill(vjust=0.5))
Plot with group= aesthetic:
ggplot(df, aes(x=x)) +
geom_bar(aes(fill=x1),position='fill') +
scale_y_continuous(label=scales::percent) +
geom_text(aes(label=..count..,group=x1),stat='count',
position=position_fill(vjust=0.5))
My data frame (df) consists of 5 columns with 2,000 numerical values for each one.
Using reshape I reformatted my data frame to two columns: 1st containing the values (df$Values) (a total of 10,000) and a 2nd containing the name of the column (df$Labels) from where the value in col 1 is coming from.
I will use the 2nd column as a group factor.
I generated a mycolor and myshapes for coloring and setting the shape of lines.
With ggplot I tried to generate a density plot containing the density plot for the five factors.
The problem is that the x-axis show the counts, which maximum is 10,000. This value does not make any sense because the maximum possible counts for each plot must be 2,000. Anyone knows what is going on? Which is code I need to use to properly correct the x-axis?
ggplot2, geom_density() plot:
Here is the code:
ggplot(df, aes(x=Values, colour=Labels, linetype=Labels))+
geom_density(aes(y=..count..))+
theme_classic()+
scale_colour_manual(values = mycolor)+
scale_linetype_manual(values = myshapes)+
ggtitle("Title")+
scale_x_continuous(limits = c(0.5,1.5))
I'm struggling with making a graph of proportion of a variable across a factor in ggplot.
Taking mtcars data as an example and stealing part of a solution from this question I can come up with
ggplot(mtcars, aes(x = as.factor(cyl))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent_format())
This graph gives me proportion of each cyl category in the whole dataset.
What I'd like to get though is the proportion of cars in each cyl category, that have automatic transmission (binary variable am).
On top of each bar I would like to add an error bar for the proportion.
Is it possible to do it with ggplot only? Or do I have to first prepare a data frame with summaries and use it with identity option of bar graphs?
I found some examples on Cookbook for R web page, but they deal with continuous y variable.
I think that it would be easier to make new data frame and then use it for plotting. Here I calculated proportions and lower/upper confidence interval values (took them from prop.test() result).
library(plyr)
mt.new<-ddply(mtcars,.(cyl),summarise,
prop=sum(am)/length(am),
low=prop.test(sum(am),length(am))$conf.int[1],
upper=prop.test(sum(am),length(am))$conf.int[2])
ggplot(mt.new,aes(as.factor(cyl),y=prop,ymin=low,ymax=upper))+
geom_bar(stat="identity")+
geom_errorbar()