How to plot a butterfly plot or symmetric barchart in R - r

I need to create a bargraph with middle x-axis and two positive y axis above and below.
It should look like a butterfly plot in SAS, but transposed x and y axis.
My data is lengths of male and female fish.
Sample data:
length <- c(12,13,15,14,13,16,18)
sex<-c("m","m","m","f","f","f","f")
dat=data.frame(length,sex)

Another term is 'opposed horizontal barchart'. (There are multiple authors to package: plotrix but Jim Lemon stands out as the most productive and is both the maintainer of the package and the author of pyramid.plot.) This is a modified version of an example in ?pyramid.plot:
install.packages("plotrix")
xy.pop<-c(3.2,3.5,3.6,3.6,3.5,3.5,3.9,3.7,3.9,3.5,3.2,2.8,2.2,1.8,
1.5,1.3,0.7,0.4)
xx.pop<-c(3.2,3.4,3.5,3.5,3.5,3.7,4,3.8,3.9,3.6,3.2,2.5,2,1.7,1.5,
1.3,1,0.8)
agelabels<-c("0-4","5-9","10-14","15-19","20-24","25-29","30-34",
"35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74",
"75-79","80-44","85+")
mcol<-plotrix::color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),18)
fcol<-plotrix::color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),18)
# removed labels in center but you could run the example and see another approach
par(mar=plotrix::pyramid.plot(xy.pop,xx.pop, labels=rep("",18),
main="Australian population pyramid 2002",lxcol=mcol,rxcol=fcol,
gap=0,show.values=TRUE))

Related

ggplot2 violin plot for columns with less than 3 samples

I am wondering if anyone has found a way to display violin plots through ggplot2 with variables of 1 or 2 samples.
example code:
library(ggplot2)
testData <- data.frame(x=c("a","a","a","b","b"), y=c(1,2,2,1,2))
ggplot(data=testData ) + geom_violin(aes(x=x,y=y))
As you can see the violin plot for a has been drawn as it has 3 samples, the one for b no -> only 2 samples.
I saw geom_violin produces error when all values in a series are the same but no answer has been given, and it's been 7
years.
I know it is possible to display a violin plot with the violplot package, but I'd really prefer to keep to the ggplot package if possible.
Thanks,
HY
Thanks to #MarcoSandri and others.
I was on ggplot2 3.3.3, it now works on 3.3.6.

Grouped bar chart not working with lattice in R

I'm having trouble creating grouped barplots. Have explored base graphics and lattice.
My data looks like
compound detection LUtype
a 50 ag
a 75 urban
a 34 mixed
b 89 ag
......
I'd like to create a plot with compounds on the y axis (horizontal bar plot) with the bars colored to represent the land use type and detection on the x axis.
These data are stored in a data frame, which I tried converting to a matrix with as.matrix, but this doesn't work and from what I can tell, the matrix is only the row of compounds. This does not produce a plot.
bars<-data.frame(data6$compound,data6$detection,data6$LUtype)
barsM<-as.matrix(data6$compound,data6$detection,data6$LUtype)
barplot(barsM,horiz=TRUE,beside=TRUE)
I also tried to bypass the matrix by using lattice, by no plot here either.
library(lattice)
require(lattice)
barchart(data6$detection~data6$compound,groups=data6$LUtype,bars)
I'm reading this article
plotting grouped bar charts in R, and I have basically the same set up, but these solutions aren't working for me.

R + ggplot2, multiple histograms in the same plot with each histogram normalised to unit area?

Sorry for the newbie R question...
I have a data.frame that contains measurements of a single variable. These measurements will be distributed differently depending on whether the thing being measured is of type A or type B; that is, you can imagine that my column names are: measurement, type label (A or B). I want to plot the histograms of the measurements for A and B separately, and put the two histograms in the same plot, with each histogram normalised to unit area (this is because I expect the proportions of A and B to differ significantly). By unit area, I mean that A and B each have unit area, not that A+B have unit area. Basically, I want something like geom_density, but I don't want a smoothed distributions for each; I want the histogram bars. Not interleaved, but plotted one on top of the other. Not stacked, although it would be interesting to know how to do this also. (The purpose of this plot is to explore differences in the shapes of the distributions that would indicate that there are quantitative differences between A and B that could be used to distinguish between them.) That's all. Two or more histograms -- not smoothed density plots -- in the same plot with each normalised to unit area. Thanks!
Something like this?
# generate example
set.seed(1)
df <- data.frame(Type=c(rep("A",1000),rep("B",4000)),
Value=c(rnorm(1000,mean=25,sd=10),rchisq(4000,15)))
# you start here...
library(ggplot2)
ggplot(df, aes(x=Value))+
geom_histogram(aes(y=..density..,fill=Type),color="grey80")+
facet_grid(Type~.)
Note that there are 4 times as many samples of type B.
You can also set the y-axis scales to float using: scales="free_y" in the call to facet_grid(...).

How to plot a filled contour, with z axis being a factor

I have a dataset
test<-data.frame(expand.grid(x=seq(0.01,1,0.01), y=seq(0.01,1,0.01)))
test$z<-c(rep(1,2500),rep(2,2500),rep(3,2500),rep(4,2500))
(x,y) define cartesian coordinates. I would like to plot a filled contour plot, with xlim=ylim=c(0,1), and the color being z (a factor with 4 levels).
i could do :
plot(test$x, test$y, col=test$z, pch=16) but it does not look good.
The example looks terrible but in my data makes sense. I am familiar with akima::interp and filled.contour() but i do not wish any interpolation and z is not continuous but a factor.
Could you please recommend me a proper and pleasant visualization for my data? I would prefer base graphics.
You can use image for example :
image(outer(seq(0.01,1,0.01),seq(0.01,1,0.01),
FUN=function(x,y)test$z))
I think you can use raster package to deal better with such plots.

Density plots comparing two columns in R

I have a matrix of Chip-seq results data like this for 26000 genes
LncRNA_ID LncRNA_Name Control_Raw_TagCount ICLIP_EZH2_Raw_TagCount
1 AK092525 47908 194887
2 ENST00000423879 RP11-12M5.1 10794 90146
3 AF318349 5514 61617
4 ENST00000506392 CTC-313D10.1 288 40880
5 ENST00000438080 RP11-177A2.4 25005 37380
6 AK123756 800 35469
I want to plot the counts densities of both samples, control and EZH2, that is column 3 and 4, in order to compare them. I am using R and I am very confused, mainly because I can't plot them as histograms, I get one figure with only one bar and not all the bars that I am waiting for, the same if I am interested to do a boxplot. Probably is a very silly question but I am a bit desperate
ezh2<-data$ICLIP_EZH2_Raw_TagCount
control<-data$Control_Raw_TagCount
hist(ezh2)# not working, i can't see distribution at all
Do you have any idea to do it?
Thanks in advance
Box plot, where the two columns are stuck together and then split along the groups:
N <- length(d$Control_Raw_TagCount)
x <- c(d$Control_Raw_TagCount, d$ICLIP_EZH2_Raw_TagCount)
group <- rep(c("Control_Raw_TagCount", "ICLIP_EZH2_Raw_TagCount"), c(N, N))
boxplot(x ~ group)
Here I've assumed the data name is d, so adjust that to your data frame's name. If you want something like hollow histograms (see pg26 of OpenIntro Statistics), the histPlot function in the openintro package will do the trick using the arguments probability=TRUE, hollow=TRUE:
# install.packages("openintro")
library(openintro)
histPlot(d$Control_Raw_TagCount, probability=TRUE, hollow=TRUE)
histPlot(d$ICLIP_EZH2_Raw_TagCount, probability=TRUE, hollow=TRUE,
lty=3, border='red')
If the vertical scale isn't right, add a ylim argument to the first call to histPlot (e.g. ylim=c(0,0.05)).

Resources