Marginal densities (or bar plots) on facets in ggplot2 - r

my problem is the following: I have this table below
0 1-5 6-10 11-15 16-20 21-26 27-29
a 0.019 0.300 0.296 0.211 0.117 0.042 0.014
b 0.058 0.448 0.308 0.120 0.042 0.019 0.005
c 0.026 0.277 0.316 0.187 0.105 0.068 0.020
d 0.054 0.297 0.378 0.108 0.108 0.041 0.014
e 0.004 0.252 0.358 0.216 0.102 0.053 0.015
f 0.032 0.097 0.312 0.280 0.161 0.065 0.054
g 0.113 0.500 0.233 0.094 0.043 0.014 0.003
h 0.328 0.460 0.129 0.050 0.020 0.010 0.003
representing some marginal frequencies (by row) for each subgroups of my data (a to h).
My dataset is actually in the long format (very long, counting more than 100 thousand entries), with the first 6 rows as you see below:
RX_SUMM_SURG_PRIM_SITE Nodes.Examined.Class
1 Wedge Resection 1-5
2 Segmental Resection 1-5
3 Lobectomy w/mediastinal LNdissection 6-10
4 Lobectomy w/mediastinal LNdissection 6-10
5 Lobectomy w/mediastinal LNdissection 1-5
6 Lobectomy w/mediastinal LNdissection 11-15
When I plot a barplot by group (the table above is simply the cross tabulation of of these two covariates with the row marginal probabilities taken) here's what happens:
The code I have for this is
ggplot(data.ln.red, aes(x=Nodes.Examined.Class))+geom_bar(aes(x=Nodes.Examined.Class, group=RX_SUMM_SURG_PRIM_SITE))+
facet_grid(RX_SUMM_SURG_PRIM_SITE~.)
Actually I would be very happy only with the marginal frequencies (i.e. the ones in the table) on each y-axis of the facets of the plot (instead of the counts).
Anybody can help me with this?
Thanks for all your help!
EM

geom_bar calculates both counts and proportions of observations. You can access these calculated proportions with either ..prop.. (the old way) or calc(prop) (introduced in newer versions of ggplot2). Use this as your y aesthetic.
You can also get rid of the aes you have in geom_bar, as this is just a repeat of what you've already covered by ggplot and facet_grid.
It looks like your counts/proportions are going to vary widely between groups, so I'm adding free y-scaling to the faceting.
Here's an example of a similar plot with the iris data, which you can model your code off of:
library(tidyverse)
ggplot(iris, aes(x = Sepal.Length, y = calc(prop))) +
geom_bar() +
facet_grid(Species ~ ., scales = "free_y")
Created on 2018-04-06 by the reprex package (v0.2.0).
Edit: the calculated prop variable is proportions within each group, not proportions across all groups, so it works differently when x is a factor. For categorical x, prop treats x as the group; to override this, include group = 0 or some other dummy value in your aes. Sorry I missed that the first time!

Related

Graphing using r and points (pch)

What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.
The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)

Removing outliers from facet_wrap boxplots in ggplot

How can I change the y axis to exclude outliers (not just hide them but scale the y axis so as not to include them) for geom_boxplot with multiple individual boxplots using facet_wrap? An example of my dataset is:
Pop. grp1 grp2 grp3 grp4 grp5 grp6 grp7 grp8
a 0.00652 1.27 0.169 0.859 0.388 0.521 3.58 0.0912
a 0.0133 0.136 0.154 0.167 0.845 0.159 0.561 0.108
a 0.0270 1.60 0.119 0.515 0.0386 0.0145 0.884 0.0155
b 0.00846 0.331 0.100 0.897 0.330 2.52 0.663 0.0338
b 0.0154 0.0997 0.122 0.0873 0.905 0.136 0.413 0.139
b 0.0353 0.536 0.171 0.471 0.0280 0.00608 0.414 0.00973
where I'd like to make a boxplot for each column showing populations a and b.
I've melted the data by population and then used geom_boxplot + facet_wrap but some outliers are so far above the whiskers that the boxes themselves barely show. The code I've used is:
wc.m <- melt(w_c_diff_ab, id.var="Pop.")
p.wc <- ggplot(data = wc.m, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Population))
p.wc + facet_wrap( ~ variable, scales="free") + scale_fill_manual(values=c("skyblue", violetred1"))
but I'm struggling to remove outliers as I'm not sure how to calculate limits for the y axes on a per-boxplot basis.

multiple columns plot with correlation value in ggplot2

Hi I have a dataframe df as below.
I would like to make a facet plot that shows relation between columns A & B, A & C, A & D , B & C and C & D and overlay a regression line and person's correlation coefficient value.
I am trying to make a facet plot to show relation between each of these column could not figure out exactly how.
Any help would be appreciated. This question is unique in SO as there are not any ans for plot among columns.
df<- read.table(text =c("A B C D
0.451 0.333 0.034 0.173
0.491 0.27 0.033 0.207
0.389 0.249 0.084 0.271
0.425 0.819 0.077 0.281
0.457 0.429 0.053 0.386
0.436 0.524 0.049 0.249
0.423 0.27 0.093 0.279
0.463 0.315 0.019 0.204
"), header = T)
df
pl<-ggplot(data=df) + geom_point(aes(x=A,y=B,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=D,size=10)) +
geom_smooth(method = "lm", se=FALSE, color="black")
pl

R multiple line plot using a dataframe [duplicate]

This question already has answers here:
Plot multiple lines (data series) each with unique color in R
(10 answers)
Closed 6 years ago.
I can't seem to figure out how to create a graph for multiple line plots.
This is my dataframe:
topics before_event after_event current
1 1 0.057 0.044 0.064
2 2 0.059 0.055 0.052
3 3 0.058 0.037 0.044
4 4 0.036 0.055 0.044
5 5 0.075 0.064 0.066
6 6 0.047 0.045 0.045
7 7 0.043 0.043 0.041
8 8 0.042 0.041 0.046
9 9 0.049 0.046 0.039
10 10 0.043 0.060 0.045
11 11 0.054 0.054 0.062
12 12 0.065 0.056 0.068
13 13 0.042 0.045 0.048
14 14 0.067 0.054 0.055
15 15 0.049 0.052 0.053
The variables in the dataframe are all numeric vectors, example:
topics <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
I know I should be using 'ggplot2' and 'reshape' but I can't seem to work out the correct code to represent topics on the x-axis, the scale 0-1 on the y-axis, and each var (before_event, after_event, current) as three individual lines.
Any help would be really appreciated!
We can use matplot from base R
matplot(df1[,1], df1[-1], type = 'l', xlab = "topics", ylab = "event", col = 2:4, pch = 1)
legend("topright", legend = names(df1)[-1], pch = 1, col=2:4)
we can use ggplot and geom_line
library(ggplot2)
topics <- seq(1,15,1)
before_event <- runif(15, min=0.042, max=0.070)
after_event <- runif(15, min=0.040, max=0.065)
current <- runif(15, min=0.041, max=0.066)
df <- data.frame(topics,before_event,after_event,current) #create data frame from the above vectors
df.m <- melt(df, id.vars="topics") # melt the dataframe using topics as id
# plot the lines using ggplot and geom_line
ggplot(data = df.m,
aes(x = topics, y = value, group = variable, color = variable)) +
geom_line(size = 2)

Faceted bar charts from multiple columns in ggplot2

I have some data containing resistance data against three different antibiotics, each stored as a column.
Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32
...
I've plotted barcharts for each individual antibiotic thus:
ggplot(data, aes(factor(MIC1))) + geom_bar()
(the values are discrete as powers of 2 - 0.008, 0.016, 0.032 etc - plotting as factor evenly spaces the bars, if there more elegant way of doing this, please let me know!)
What I'd really like to do is to have a faceted stack of the three graphs with a shared x-axis.
Is there an easier way to do this than to recode the variables like this:
isolate antibiotic MIC
1 1 0.008
1 2 0.064
1 3 0.064
2 1 0.016
2 2 0.250
2 3 0.500
3 1 0.064
3 2 0.125
3 3 32
...
and then doing it this way?
ggplot(data, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)
Thanks in advance.
Reshaping your data frame to look like your second example is pretty easy using reshape2, to the point where I'm not sure it could really get much easier. Are there any specific problems you would have with this solution?
df = read.table(text="Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32", header=TRUE)
library(reshape2)
df_melted = melt(df, id.vars="Isolate", variable.name="antibiotic", value.name="MIC")
ggplot(df_melted, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)

Resources