Determine order of several boxplots in one plot in R qqplot - r

I tried to create a relatively simple boxplot plot in R's ggplot2: One value on the x axis and several variables on the y axis. I'm using a code similar to this one:
ggplot() +
# Boxplot 1
geom_boxplot(df[which(df$Xvalue=="Boxplot1"),],
mapping = aes(X, "Y")) +
# Boxplot 2
geom_boxplot(df[which(df$Xvalue=="Boxplot2"),],
mapping = aes(X, "Y")) +
# Boxplot 3
geom_boxplot(df[which(df$Xvalue=="Boxplot3"),],
mapping = aes(X, "Y")) +
The boxplots in my real code are ordered alphabetically, however, I need them to be in a customized, categorial order.
I'm aware I could restructure my data frame so that I don't use a subset and a new geom_boxplot command for each boxplot, but I've structured the data that way for other reasons and that's not the solution I'm looking for right now.
Maybe there is an easy way using the scale_Y_manual or else? Any help is appreciated!

Related

How do I create a grouped boxplot in R?

I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot

Missing scale on ggplot 2

I am creating a graph using ggplot2. Here is the first output of the graph before any tidying is done.
And here is the code:
graph <- ggplot(data = village.times,
aes(x=village.times$a6ncopo, y=(village.times$a5species=="funestus")))
+ geom_bar(stat="identity", position = "stack", fill="#FF4444")
What I don't know is why there isn't a scale on the y axis and how to remove the True-False labels. Is there a way I can force ggplot to include a scale on the y axis or do I have to change the way I use my data?
Maybe subsetting your data frame before using ggplot and just creating a histogram? Otherwise I don't what your expected result should be...
ggplot(subset(village.times, a5species=="funestus"),
aes(x=a6ncopo)) +
geom_bar()

Vertical data groups from table in R

I have data in table in vertical groups of 7 columns which I need to plot against each other to find relationships. How do I group them vertically in groups of 7 and what plots would be helpful for nice graphics? I looked at "car" which seems appropriate.
A_0,A_1,A_2,A_3,A_4,A_5,A_6,B_0,B_1,B_2,B_3,B_4,B_5,B_6,C_0,C_1,C_2,C_3,C_4,C_5,C_6,D_0,D_1,D_2,D_3,D_4,D_5,D_6,D_0,D_1,D_2,D_3,D_4,D_5,D_6,E_0,E_1,E_2,E_3,E_4,E_5,E_6
0,2,0,0,1,1,1,0,1,0,0,1,1,1,0,0.000003,0,0,0.000004,0.000004,0.000004,0,1,0,0,1,1,1,0,2,0,0,1,1,1,0,0.000003,0,0,0.000002,0.000002,0.000002
2,1,0,0,0,0,4,2,1,0,0,0,0,4,0.000007,0.000003,0,0,0,0,0.000012,1,1,0,0,0,0,1,2,1,0,0,0,0,4,0.000003,0.000002,0,0,0,0,0.000006
1,0,0,0,0,4,1,1,0,0,0,0,4,1,0.000003,0,0,0,0,0.000012,0.000003,1,0,0,0,0,1,1,1,0,0,0,0,4,1,0.000002,0,0,0,0,0.000006,0.000002
1,0,0,1,0,0,1,1,0,0,1,0,0,1,0.000003,0,0,0.000003,0,0,0.000001,1,0,0,1,0,0,1,1,0,0,1,0,0,1,0.000002,0,0,0.000001,0,0,0
0,1,0,0,1,2,3,0,1,0,0,1,2,3,0,0.000003,0,0,0.000001,0.000002,0.000003,0,1,0,0,1,1,1,0,1,0,0,1,2,3,0,0.000001,0,0,0,0.000001,0.000002
1,0,0,1,2,3,0,1,0,0,1,2,3,0,0.000003,0,0,0.000001,0.000002,0.000003,0,1,0,0,1,1,1,0,1,0,0,1,2,3,0,0.000001,0,0,0,0.000001,0.000002,0
1,0,0,0,2,0,2,1,0,0,0,2,0,2,0.000001,0,0,0,0.000002,0,0.000002,1,0,0,0,1,0,1,1,0,0,0,2,0,2,0,0,0,0,0.000001,0,0.000001
0,2,0,0,0,1,1,0,2,0,0,0,1,1,0,0.000002,0,0,0,0.000001,0.000001,0,1,0,0,0,1,1,0,2,0,0,0,1,1,0,0.000001,0,0,0,0.000001,0
2,0,0,0,1,1,2,2,0,0,0,1,1,2,0.000002,0,0,0,0.000001,0.000001,0.000002,1,0,0,0,1,1,1,2,0,0,0,1,1,2,0.000001,0,0,0,0.000001,0,0.000001
0,1,1,2,2,0,3,0,1,1,2,2,0,2,0,0.000001,0.000001,0.000002,0.000002,0,0.000003,0,1,1,1,1,0,1,0,1,1,2,2,0,3,0,0.000001,0,0.000001,0.000001,0,0.000001
1,1,2,2,0,3,1,1,1,2,2,0,2,1,0.000001,0.000001,0.000002,0.000002,0,0.000003,0.000001,1,1,1,1,0,1,1,1,1,2,2,0,3,1,0.000001,0,0.000001,0.000001,0,0.000001,0.000001
0,1,1,1,2,2,0,0,1,1,1,1,2,0,0,0.000001,0.000001,0.000001,0.000002,0.000002,0,0,1,1,1,1,1,0,0,1,1,1,2,2,0,0,0.000001,0.000001,0,0.000001,0.000001,0
0,1,1,0,0,0,12,0,1,1,0,0,0,6,0,0.000001,0.000001,0,0,0,0.000007,0,1,1,0,0,0,1,0,1,1,0,0,0,12,0,0,0,0,0,0,0.000006
1,1,0,0,0,12,34,1,1,0,0,0,6,15,0.000001,0.000001,0,0,0,0.000007,0.000017,1,1,0,0,0,1,1,1,1,0,0,0,12,34,0,0,0,0,0,0.000006,0.000015
1,0,0,0,12,34,1,1,0,0,0,6,15,0,0.000001,0,0,0,0.000007,0.000017,0.000001,1,0,0,0,1,1,1,1,0,0,0,12,34,1,0,0,0,0,0.000006,0.000015,0
1,0,13,4,11,9,10,0,0,1,2,1,5,7,0.000001,0,0.000002,0.000002,0.000002,0.000005,0.000008,1,0,1,1,1,1,1,1,0,13,4,11,9,10,0,0,0.000006,0.000002,0.000005,0.000004,0.000005
0,13,4,11,9,10,18,0,1,2,1,5,7,4,0,0.000002,0.000002,0.000002,0.000005,0.000008,0.000006,0,1,1,1,1,1,1,0,13,4,11,9,10,18,0,0.000006,0.000002,0.000005,0.000004,0.000005,0.000008
13,4,11,9,10,18,39,1,2,1,5,7,4,9,0.000002,0.000002,0.000002,0.000005,0.000008,0.000006,0.000011,1,1,1,1,1,1,1,13,4,11,9,10,18,39,0.000006,0.000002,0.000005,0.000004,0.000005,0.000008,0.000017
4,11,9,10,18,39,8,2,1,5,7,4,9,4,0.000002,0.000002,0.000005,0.000008,0.000006,0.000011,0.000005,1,1,1,1,1,1,1,4,11,9,10,18,39,8,0.000002,0.000005,0.000004,0.000005,0.000008,0.000017,0.000004
11,9,10,18,39,8,16,1,5,7,4,9,4,5,0.000002,0.000005,0.000008,0.000006,0.000011,0.000005,0.000006,1,1,1,1,1,1,1,11,9,10,18,39,8,16,0.000005,0.000004,0.000005,0.000008,0.000017,0.000004,0.000007
9,10,18,39,8,16,0,5,7,4,9,4,5,0,0.000005,0.000008,0.000006,0.000011,0.000005,0.000006,0,1,1,1,1,1,1,0,9,10,18,39,8,16,0,0.000004,0.000005,0.000008,0.000017,0.000004,0.000007,0
1,0,1,6,0,0,2,1,0,1,3,0,0,2,0.000001,0,0.000001,0.000003,0,0,0.000002,1,0,1,1,0,0,1,1,0,1,6,0,0,2,0,0,0,0.000003,0,0,0.000001
0,1,6,0,0,2,2,0,1,3,0,0,2,2,0,0.000001,0.000003,0,0,0.000002,0.000002,0,1,1,0,0,1,1,0,1,6,0,0,2,2,0,0,0.000003,0,0,0.000001,0.000001
0,2,2,0,0,8,4,0,2,2,0,0,8,4,0,0.000002,0.000002,0,0,0.000011,0.000006,0,1,1,0,0,1,1,0,2,2,0,0,17,19,0,0.000001,0.000001,0,0,0.000008,0.000009
2,2,0,0,8,4,1,2,2,0,0,8,4,1,0.000002,0.000002,0,0,0.000011,0.000006,0.000002,1,1,0,0,1,1,1,2,2,0,0,17,19,3,0.000001,0.000001,0,0,0.000008,0.000009,0.000001
2,0,0,8,4,1,1,2,0,0,8,4,1,2,0.000002,0,0,0.000011,0.000006,0.000002,0.000003,1,0,0,1,1,1,0.5,2,0,0,17,19,3,5,0.000001,0,0,0.000008,0.000009,0.000001,0.000002
Since you haven't put data or specified what sort of chart, I shall put untested code assuming a bubble plot -
library(ggplot2)
library(reshape2
datamelted <- melt(data, id.vars = 'ABcol')
ggplot(datamelted, aes(x = ABcol, y = variable, size = value) + geom_point()

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

ggplot without the use of subset

I'm using ggplot2 with the faceting option to plot several results of a data.frame.
It's a data.frame with three factors :
participant (N) with 6 levels;
condition (C) with 6 levels;
stimuli (S) with 10 conditions.
I plot the results of one participants in one condition using the subset function and then I facet with ggplot. However, I was wondering if there was an easier solution in ggplot2?
Thanks for any help, I'm currently learning R and ggplot2.
It sounds like you're trying to ask how to set up a two-way facet. I'm going to guess that 'stimuli is your predictor variable.
One way is like this:
ggplot( mydata, aes( x = stimuli, y = my.response) +
facet_wrap( condition ~ participant) +
geom_line()
or
geom_point()

Resources