Plotting number of times a name occurs in a column (histogram) - r

I have a list of names sorted, like below:
ACVR2B
ADAM19
ADAM29
ADAM29
ADAMTS1
ADAMTS1
ADAMTS1
ADAMTS12
ADAMTS16
ADAMTS16
ADAMTS16
ADAMTS17
ADAMTS17
ADAMTSL1
ADCY10
would like to plot them as a histogram. It is very easy when these are values but with characters how can I do it in R or in open office?
Thank you

Try plotting the result of table(). The function table() computes the cross-tabulation frequency, which is exactly what you want.
set.seed(42)
x <- sample(letters, 100, replace = TRUE)
plot(table(x))
To plot the sorted values, try this:
z <- sort(table(x))
plot(z, xaxt="n", type="h")
axis(1, at=seq_along(z), names(z))

Given to what Andrie suggested: I did this:
Letter<-read.table("letters", header=T)
x <- sample(Letters, replace = F)
plot(sort(table(x)))
but the things is when, I want to plot in a descending order with only top 10 I miss out on the labels.
Can anyone suggest how to fix it and get only top 10.

Related

Axis break with gap.plot while data contains NAs

I want to make a x-axis break with gap.plot() function from package plotrix while my data contains NAs.
My code works fine if there aren´t any NAs but with NAs it tells me:
Error in if (lostones) warning("some values of x will not be
displayed") : argument is not interpretable as logical
and it doesn´t plot anything at all.
dt is just an example dataset
dt <- data.frame(c(1.2,NA,5,6,4.3,1),c(22,33,22,25,NA,27))
names(dt) <- c("a","b")
library(plotrix)
gap.plot(dt$a, dt$b, gap=c(1.5,3.5), gap.axis="x",col="blue", ylim=range(c(dt$b)),xtics=c(0:1.5,3.5:6), xticlab=c(0:1.5,3.5:6))
abline(v=1.5, col="white")
abline(v=1.56, col="white", lwd=4)
axis.break(1,breakpos=1.55,style="slash", brw=0.03)
axis.break(3,breakpos=1.55,style="slash", brw=0.03)
What do I have to change? By the way I don´t want to use ggplot.
Since you are trying to produce a scatterplot, you have to omit your lines that contain NA.
For example:
dt <- data.frame(c(1.2,NA,5,6,4.3,1),c(22,33,22,25,NA,27))
names(dt) <- c("a","b")
Now remove your NA's:
library(dplyr)
dt <- dt %>%
na.omit()
Plot:
library(plotrix)
gap.plot(dt$a, dt$b, gap=c(1.5,3.5), gap.axis="x",col="blue", ylim=range(dt$b) ,xtics=c(0:1.5,3.5:6), xticlab=c(0:1.5,3.5:6))
abline(v=1.5, col="white")
abline(v=1.56, col="white", lwd=4)
axis.break(1,breakpos=1.55,style="slash", brw=0.03)
axis.break(3,breakpos=1.55,style="slash", brw=0.03)
Result:

How to adjust x labels in R boxplot

This is my code to create a boxplot in R that has 4 boxplots in one.
psnr_x265_256 <- c(39.998,39.998, 40.766, 38.507,38.224,40.666,38.329,40.218,44.746,38.222)
psnr_x264_256 <- c(39.653, 38.106,37.794,36.13,36.808,41.991,36.718,39.26,46.071,36.677)
psnr_xvid_256 <- c(33.04564,33.207269,32.715427,32.104696,30.445141,33.135261,32.669766, 31.657039,31.53103,31.585865)
psnr_mpeg2_256 <- c(32.4198,32.055051,31.424819,30.560274,30.740421,32.484694, 32.512268,32.04659,32.345848, 31)
all_errors = cbind(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
modes = cbind(rep("PSNR",10))
journal_linear_data <-data.frame(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
yvars <- c("psnr_x265_256","psnr_x264_256","psnr_xvid_256","psnr_mpeg2_256")
xvars <- c("x265","x264","xvid","mpeg2")
bmp(filename="boxplot_PSNR_256.bmp")
boxplot(journal_linear_data[,yvars], xlab=xvars, ylab="PSNR")
dev.off()
This is the image I get.
I want to have the corresponding values for each boxplot in x axis "x265","x264","xvid","mpeg2".
Do you have any idea how to fix this?
There are multiple ways of changing the labels for your boxplot variables. Probably the simplest way is changing the column names of your data frame:
colnames(journal_linear_data) <- c("x265","x264","xvid","mpeg2")
Even simpler: you could do this right at the creation of your data frame too:
journal_linear_data <- data.frame(x265=psnr_x265_256, x264=psnr_x264_256, xvid=psnr_xvid_256, mpeg2=psnr_mpeg2_256)
If you run into the problem of your labels not being shown or overlapping due to too few space, try rotating the x labels using the las parameter, e.g. las=2 or las=3.

Densityplots using colwise - different colors for each line?

I need a plot of different density lines, each in another color. This is an example code (but much smaller), using the built-in data.fame USArrests. I hope it is ok to use it?
colors <- heat.colors(3)
plot(density(USArrests[,2], bw=1, kernel="epanechnikov", na.rm=TRUE),col=colors[1])
lines1E <- function(x)lines(density(x,bw=1,kernel="epanechnikov",na.rm=TRUE))
lines1EUSA <- colwise(lines1E)(USArrests[,3:4])`
Currently the code produces with colwise() just one color. How can I get each line with another color? Or is there ab better way to plot several density lines with different colors?
I don't quite follow your example, so I've created my own example data set. First, create a matrix with three columns:
m = matrix(rnorm(60), ncol=3)
Then plot the density of the first column:
plot(density(m[,1]), col=2)
Using your lines1E function as a template:
lines1E = function(x) {lines(density(x))}
We can add multiple curves to the plot:
colwise(lines1E)(as.data.frame(m[ ,2:3]))
Personally, I would just use:
##Added in NA for illustration
m = matrix(rnorm(60), ncol=3)
m[1,] = NA
plot(density(m[,1], na.rm=T))
sapply(2:ncol(m), function(i) lines(density(m[,i], na.rm=T), col=i))
to get:

Labeling outliers on boxplot in R

I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example:
vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F)
rownames(vv)=c("one","two","three","four","five","six","seven")
boxplot(vv)
I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to.
There is a simple way. Note that b in Boxplot in following lines is a capital letter.
library(car)
Boxplot(y ~ x, id.method="y")
Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.
See the following link: https://CRAN.R-project.org/package=car
In the example given it's a bit boring because they are all the same row. but here is the code:
bxpdat <- boxplot(vv)
text(bxpdat$group, # the x locations
bxpdat$out, # the y values
rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]], # the labels
pos = 4)
This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot. Boxplot calls and returns the values from boxplot.stats. Take a look at:
str(bxpdat)
#DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:
#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums,
c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)
On this dataset, #DWin's solution gives:
Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.
This solution is monstrous (and I hope can be simplified), but effective.
#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")
#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])
#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1,
function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))
#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))
Or you can simply run the code from this blog post:
source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)
(which handles multiple outliers which are close to one another)
#sebastian-c
This is a slight modification of DWin solution that seem to work with more generality
bx1<-boxplot(pb,las=2,cex.axis=.8)
if(length(bx1$out)!=0){
## get the row of each outlier
out.rows<-sapply(1:length(bx1$out),function(i) which(vv[,bx1$group[i]]==bx1$out[i]))
text(bx1$group,bx1$out,
rownames(vv)[out.rows],
pos=4
)
}

R plotting frequency distribution

I know that we normally do in this way:
x=c(rep(0.3,100),rep(0.5,700))
plot(table(x))
However, we can only get a few dots or vertical lines in the graph.
What should I do if I want 100 dots above 0.3 and 700 dots above 0.5?
Something like this?
x <- c(rep(.3,100), rep(.5, 700))
y <- c(seq(0,1, length.out=100), seq(0,1,length.out=700))
plot(x,y)
edit: (following OP's comment)
In that case, something like this should work.
x <- rep(seq(1, 10)/10, seq(100, 1000, by=100))
x.t <- as.matrix(table(x))
y <- unlist(apply(x.t, 1, function(x) seq(1,x)))
plot(x,y)
You can lay with the linetype and linewidth settings...
plot(table(x),lty=3,lwd=0.5)
For smaller numbers (counts) you can use stripchart with method="stack" like this:
stripchart(c(rep(0.3,10),rep(0.5,70)), pch=19, method="stack", ylim=c(0,100))
But stripchart does not work for 700 dots.
Edit:
The dots() function from the package TeachingDemos is probably what you want:
require(TeachingDemos)
dots(x)

Resources