Faceted bar charts from multiple columns in ggplot2 - r

I have some data containing resistance data against three different antibiotics, each stored as a column.
Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32
...
I've plotted barcharts for each individual antibiotic thus:
ggplot(data, aes(factor(MIC1))) + geom_bar()
(the values are discrete as powers of 2 - 0.008, 0.016, 0.032 etc - plotting as factor evenly spaces the bars, if there more elegant way of doing this, please let me know!)
What I'd really like to do is to have a faceted stack of the three graphs with a shared x-axis.
Is there an easier way to do this than to recode the variables like this:
isolate antibiotic MIC
1 1 0.008
1 2 0.064
1 3 0.064
2 1 0.016
2 2 0.250
2 3 0.500
3 1 0.064
3 2 0.125
3 3 32
...
and then doing it this way?
ggplot(data, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)
Thanks in advance.

Reshaping your data frame to look like your second example is pretty easy using reshape2, to the point where I'm not sure it could really get much easier. Are there any specific problems you would have with this solution?
df = read.table(text="Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32", header=TRUE)
library(reshape2)
df_melted = melt(df, id.vars="Isolate", variable.name="antibiotic", value.name="MIC")
ggplot(df_melted, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)

Related

Error Adding p-value to parallel coordinates plot (ggplot)

I have a data set that is paired data from multiple samples that I want to do a parallel coordinates graph with and include a p-value above (i.e. plot each data point in each group and link the pairs with a line and have the comparison statistic above the plotted data).
I can get the graph to (largely) look the way I want it to, but when I try and add a p-value using stat_compare_means(paired=TRUE), I get 3 errors:
2 x:
"Don't know how to automatically pick scale for object of type quosure/formula. Defaulting to continuous."
1 x:
"Error in validDetails.text(x) : 'pairlist' object cannot be coerced to type 'double'".
My data is a data.frame with three variables: a sample variable so I know which pair is which, a group variable so I know which category the value is, and the value variable. I've pasted the code below and am more than happy to take other suggestions on any other ways to make the code look better as well.
ggplot(test_OCI, aes(x=test_OCI$variable, y=test_OCI$value, group =test_OCI$Pt)) +
geom_point(aes(x=test_OCI$variable),size=3)+
geom_line(aes(x=test_OCI$variable),group=test_OCI$Pt)+
theme_bw()+
theme(panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
axis.line=element_line(color="black"))+
scale_x_discrete(labels=c("OCI_pre_ART"="Pre-ART OCI", "OCI_on_ART"="On-ART OCI"))+
stat_compare_means(paired=TRUE)
edit 1: adding sample data
There isn't too much data, but I've added it below per request.
Pt variable value
1 Pt1 OCI_pre_ART 0.024
2 Pt2 OCI_pre_ART 0.027
3 Pt3 OCI_pre_ART 0.027
4 Pt4 OCI_pre_ART 0.010
5 Pt5 OCI_pre_ART 0.075
6 Pt6 OCI_pre_ART 0.040
7 Pt7 OCI_pre_ART 0.070
8 Pt8 OCI_pre_ART 0.011
9 Pt9 OCI_pre_ART 0.022
10 Pt10 OCI_pre_ART 0.006
11 Pt11 OCI_pre_ART 0.019
12 Pt1 OCI_on_ART 0.223
13 Pt2 OCI_on_ART 0.166
14 Pt3 OCI_on_ART 0.163
15 Pt4 OCI_on_ART 0.126
16 Pt5 OCI_on_ART 0.090
17 Pt6 OCI_on_ART 0.139
18 Pt7 OCI_on_ART 0.403
19 Pt8 OCI_on_ART 0.342
20 Pt9 OCI_on_ART 0.092
edit 2: packages
all lines in the figure code are from ggplot2 except stat_compare_means(paired=TRUE) which is from ggpubr.
I'm not sure if this is the reason, but it appears that the stat_compare_means() line was not interpreting the x~y aesthestic. Changing the line to
stat_compare_means(comparisons = list(c("OCI_pre_ART","OCI_on_ART")), paired=TRUE) resulted in a functional graph.

Graphing using r and points (pch)

What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.
The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)

Marginal densities (or bar plots) on facets in ggplot2

my problem is the following: I have this table below
0 1-5 6-10 11-15 16-20 21-26 27-29
a 0.019 0.300 0.296 0.211 0.117 0.042 0.014
b 0.058 0.448 0.308 0.120 0.042 0.019 0.005
c 0.026 0.277 0.316 0.187 0.105 0.068 0.020
d 0.054 0.297 0.378 0.108 0.108 0.041 0.014
e 0.004 0.252 0.358 0.216 0.102 0.053 0.015
f 0.032 0.097 0.312 0.280 0.161 0.065 0.054
g 0.113 0.500 0.233 0.094 0.043 0.014 0.003
h 0.328 0.460 0.129 0.050 0.020 0.010 0.003
representing some marginal frequencies (by row) for each subgroups of my data (a to h).
My dataset is actually in the long format (very long, counting more than 100 thousand entries), with the first 6 rows as you see below:
RX_SUMM_SURG_PRIM_SITE Nodes.Examined.Class
1 Wedge Resection 1-5
2 Segmental Resection 1-5
3 Lobectomy w/mediastinal LNdissection 6-10
4 Lobectomy w/mediastinal LNdissection 6-10
5 Lobectomy w/mediastinal LNdissection 1-5
6 Lobectomy w/mediastinal LNdissection 11-15
When I plot a barplot by group (the table above is simply the cross tabulation of of these two covariates with the row marginal probabilities taken) here's what happens:
The code I have for this is
ggplot(data.ln.red, aes(x=Nodes.Examined.Class))+geom_bar(aes(x=Nodes.Examined.Class, group=RX_SUMM_SURG_PRIM_SITE))+
facet_grid(RX_SUMM_SURG_PRIM_SITE~.)
Actually I would be very happy only with the marginal frequencies (i.e. the ones in the table) on each y-axis of the facets of the plot (instead of the counts).
Anybody can help me with this?
Thanks for all your help!
EM
geom_bar calculates both counts and proportions of observations. You can access these calculated proportions with either ..prop.. (the old way) or calc(prop) (introduced in newer versions of ggplot2). Use this as your y aesthetic.
You can also get rid of the aes you have in geom_bar, as this is just a repeat of what you've already covered by ggplot and facet_grid.
It looks like your counts/proportions are going to vary widely between groups, so I'm adding free y-scaling to the faceting.
Here's an example of a similar plot with the iris data, which you can model your code off of:
library(tidyverse)
ggplot(iris, aes(x = Sepal.Length, y = calc(prop))) +
geom_bar() +
facet_grid(Species ~ ., scales = "free_y")
Created on 2018-04-06 by the reprex package (v0.2.0).
Edit: the calculated prop variable is proportions within each group, not proportions across all groups, so it works differently when x is a factor. For categorical x, prop treats x as the group; to override this, include group = 0 or some other dummy value in your aes. Sorry I missed that the first time!

plot iteratively segments over intervals using ggplot in R

I have this data frame "df" (showing 15 of the 1000 tuples)
inf sup frec prob
1 1.000318 1.005308 12 0.060
2 1.005308 1.010297 5 0.025
3 1.010297 1.015286 5 0.025
4 1.015286 1.020276 2 0.010
5 1.020276 1.025265 3 0.015
6 1.025265 1.030254 3 0.015
7 1.030254 1.035244 8 0.040
8 1.035244 1.040233 2 0.010
9 1.040233 1.045223 3 0.015
10 1.045223 1.050212 0 0.000
11 1.050212 1.055201 4 0.020
12 1.055201 1.060191 1 0.005
13 1.060191 1.065180 1 0.005
14 1.065180 1.070169 0 0.000
15 1.070169 1.075159 1 0.005
And i want to plot a segment in the interval of x = [ inf[ i ]:sup[ i ] ], and in the y axis = prob[i], for each row.
I tried this solution, using a "for loop" to plot each segment:
plot <- ggplot(data = df)
for(i in 1:15){
plot <- plot + geom_segment(aes(x = df$inf[i], xend = df$sup[i], y = df$prob[i], yend = df$prob[i]))
}
plot
But all i get is a single line in y = 0; i assume because my "prob" has values close to zero. The other problem is that if the for loop goes up to a decent value, an error pops saying:
Error: nested evaluation too deep; Infinite recursion options (expressions =)?
Is there any way to plot those segments by its x intervals?
Or maybe abandon the idea of intervals and plot some points per interval would be better?

R multiple line plot using a dataframe [duplicate]

This question already has answers here:
Plot multiple lines (data series) each with unique color in R
(10 answers)
Closed 6 years ago.
I can't seem to figure out how to create a graph for multiple line plots.
This is my dataframe:
topics before_event after_event current
1 1 0.057 0.044 0.064
2 2 0.059 0.055 0.052
3 3 0.058 0.037 0.044
4 4 0.036 0.055 0.044
5 5 0.075 0.064 0.066
6 6 0.047 0.045 0.045
7 7 0.043 0.043 0.041
8 8 0.042 0.041 0.046
9 9 0.049 0.046 0.039
10 10 0.043 0.060 0.045
11 11 0.054 0.054 0.062
12 12 0.065 0.056 0.068
13 13 0.042 0.045 0.048
14 14 0.067 0.054 0.055
15 15 0.049 0.052 0.053
The variables in the dataframe are all numeric vectors, example:
topics <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
I know I should be using 'ggplot2' and 'reshape' but I can't seem to work out the correct code to represent topics on the x-axis, the scale 0-1 on the y-axis, and each var (before_event, after_event, current) as three individual lines.
Any help would be really appreciated!
We can use matplot from base R
matplot(df1[,1], df1[-1], type = 'l', xlab = "topics", ylab = "event", col = 2:4, pch = 1)
legend("topright", legend = names(df1)[-1], pch = 1, col=2:4)
we can use ggplot and geom_line
library(ggplot2)
topics <- seq(1,15,1)
before_event <- runif(15, min=0.042, max=0.070)
after_event <- runif(15, min=0.040, max=0.065)
current <- runif(15, min=0.041, max=0.066)
df <- data.frame(topics,before_event,after_event,current) #create data frame from the above vectors
df.m <- melt(df, id.vars="topics") # melt the dataframe using topics as id
# plot the lines using ggplot and geom_line
ggplot(data = df.m,
aes(x = topics, y = value, group = variable, color = variable)) +
geom_line(size = 2)

Resources