Dynamically coloring boxplot in R - r

I have data with the following columns: lot, sublot, size, data. I have multiple lot(s) and each lot can have multiple sublot(s). Each sublot has size(s) of 1 to 4.
I have created a boxplot for this data using the following code:
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
x11()
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
title(xlab='Size.Sublot.Lot', line=9)
I wanted to use the boxfill command to color each boxplot based on the lot#. I have seen two solutions:
create a vector and explicitly specify the colors to be used e.g. colr = c("red", "red", "red", .... "green", "green", "green", ... "blue"). The problem with this solution is that it requires me to know apriori the number of lots in df and number of times the color needs to be repeated.
use "ifelse" statement. The problem with this solution is that (a) I need to know the number of lots and (b) I need to create multiple nested ifelse statements.
I would prefer to create a "dynamic" solution which creates the color vector based on the number of lot entries I have in my file.
I have tried to create:
uniqlot <- unique(df$lot)
colr <- palette(rainbow(length(uniqlot)))
but am stuck since the entries in the colr vector do not repeat for the number of unique combinations of size.sublot.lot. Note: I want all boxplots for lot ABC to be colored with one color, all boxplots for lot DEF to be colored with another color etc.
I am attaching a picture of the uncolored boxplot. Uncolored Boxplot
Raw data (example.xlsx) can be accessed at the following link:
example.xlsx

This is what I would do:
n1 <- length(unique(df$sublot))
n2 <- length(unique(df$size))
colr <- palette(rainbow(length(n)))
colr <- rep(colr, each = n1*n2)
boxplot(data ~ size*sublot*lot,
col = colr,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
Using ggplot:
df$size <- as.factor(df$size)
ggplot(df, aes(sublot, data, group = interaction(size, sublot), col = size)) +
geom_boxplot() +
facet_wrap(~lot, nrow = 1)
Also, you can get rid of df$size <- as.factor(df$size) if you want continuous colour.

thanks to the pointers provided in the responses and after digging around a little more, I was able to find a solution to my own question. I wanted to submit this piece of code in case someone needed to replicate.
Here is a picture of the boxplot this code creates (and I wanted to create). colored boxplot
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
unqlot <- unique(df$lot)
unqsublot <- unique(df$sublot)
unqsize <- unique(df$size)
cul <- palette(rainbow(length(unqlot)))
culur <- character()
for (i in 1:length(unqsize)) {
culur_temp = rep(cul[i], each=(length(unqsize)*length(unqsublot)))
culur = c(culur, culur_temp)
}
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
col = culur,
las=2,
data=df)

Related

How to color different groups in qqplot?

I'm plotting some Q-Q plots using the qqplot function. It's very convenient to use, except that I want to color the data points based on their IDs. For example:
library(qualityTools)
n=(rnorm(n=500, m=1, sd=1) )
id=c(rep(1,250),rep(2,250))
myData=data.frame(x=n,y=id)
qqPlot(myData$x, "normal",confbounds = FALSE)
So the plot looks like:
I need to color the dots based on their "id" values, for example blue for the ones with id=1, and red for the ones with id=2. I would greatly appreciate your help.
You can try setting col = myData$y. I'm not sure how the qqPlot function works from that package, but if you're not stuck with using that function, you can do this in base R.
Using base R functions, it would look something like this:
# The example data, as generated in the question
n <- rnorm(n=500, m=1, sd=1)
id <- c(rep(1,250), rep(2,250))
myData <- data.frame(x=n,y=id)
# The plot
qqnorm(myData$x, col = myData$y)
qqline(myData$x, lty = 2)
Not sure how helpful the colors will be due to the overplotting in this particular example.
Not used qqPlot before, but it you want to use it, there is a way to achieve what you want. It looks like the function invisibly passes back the data used in the plot. That means we can do something like this:
# Use qqPlot - it generates a graph, but ignore that for now
plotData <- qqPlot(myData$x, "normal",confbounds = FALSE, col = sample(colors(), nrow(myData)))
# Given that you have the data generated, you can create your own plot instead ...
with(plotData, {
plot(x, y, col = ifelse(id == 1, "red", "blue"))
abline(int, slope)
})
Hope that helps.

R) Create double-labeled MDS plot

I am totally new to R.
I have expression profile data which is preprocessed and combined. Looks like this ("exp.txt")
STUDY_1_CANCER_1 STUDY_1_CON_1 STUDY_2_CANCER_1 STUDY_2_CANCER_2
P53 1.111 1.22 1.3 1.4
.....
Also, I created phenotype data. Looks lite this ("pheno.txt")
Sample Disease Study
STUDY_1_CANCER_1 Cancer GSE1
STUDY_1_CON_1 Normal GSE1
STUDY_2_CANCER_1 Cancer GSE2
STUDY_2_CON_1 Normal GSE2
Here, I tried to make MDS plot using classical cmdscale command like this.
data=read.table("exp.txt", row.names=1, header=T)
DATA=as.matrix(data)
pc=cor(DATA, method="p")
mds=cmdscale(as.dist(1-pc),2)
plot(mds)
I'd like to create plot like this figure with color double-labeling (Study and Disease). How should I do?
First create an empty plot, then add the points with specified colors/shapes.
Here's an example:
require(vegan)
data(dune)
data(dune.env)
mds <- cmdscale(vegdist(dune, method='bray'))
# set colors and shapes
cols = c('red', 'blue', 'black', 'steelblue')
shps = c(15, 16, 17)
# empty plot
plot(mds, type = 'n')
# add points
points(mds, col = cols[dune.env$Management], pch = shps[dune.env$Use])
# add legend
legend('topright', col=cols, legend=levels(dune.env$Management), pch = 16, cex = 0.7)
legend('bottomright', legend=levels(dune.env$Use), pch = shps, cex = 0.7)
Note that factors are internally coded as integers, which is helpful here.
> levels(dune.env$Management)
[1] "BF" "HF" "NM" "SF"
so
cols[dune.env$Management]
will take the first entry of cols for the first factor levels. Similariy for the different shapes.
Finally add the legend. Of course this plot still needs some polishing, but thats the way to go...
BTW: Gavin Simpson has a nice blogpost about customizing ordination plots.
Actually, you can do this directly in default plot command which can take pch and col arguments as vectors. Use:
with(data, plot(mds, col = as.numeric(Study), pch = as.numeric(Disease), asp = 1)
You must use asp = 1 when you plot cmdscale results: both axes must be scaled similarly. You can also add xlab and ylab arguments for nicer axis labels. For adding legend and selecting plotting characters and colours, see other responses.

Heatmap like plot with Lattice

I can not figure out how the lattice levelplot works. I have played with this now for some time, but could not find reasonable solution.
Sample data:
Data <- data.frame(x=seq(0,20,1),y=runif(21,0,1))
Data.mat <- data.matrix(Data)
Plot with levelplot:
rgb.palette <- colorRampPalette(c("darkgreen","yellow", "red"), space = "rgb")
levelplot(Data.mat, main="", xlab="Time", ylab="", col.regions=rgb.palette(100),
cuts=100, at=seq(0,1,0.1), ylim=c(0,2), scales=list(y=list(at=NULL)))
This is the outcome:
Since, I do not understand how this levelplot really works, I can not make it work. What I would like to have is the colour strips to fill the whole window of the corresponding x (Time).
Alternative solution with other method.
Basically, I'm trying here to plot the increasing risk over time, where the red is the highest risk = 1. I would like to visualize the sequence of possible increase or clustering risk over time.
From ?levelplot we're told that if the first argument is a matrix then "'x' provides the
'z' vector described above, while its rows and columns are
interpreted as the 'x' and 'y' vectors respectively.", so
> m = Data.mat[, 2, drop=FALSE]
> dim(m)
[1] 21 1
> levelplot(m)
plots a levelplot with 21 columns and 1 row, where the levels are determined by the values in m. The formula interface might look like
> df <- data.frame(x=1, y=1:21, z=runif(21))
> levelplot(z ~ y + x, df)
(these approaches do not quite result in the same image).
Unfortunately I don't know much about lattice, but I noted your "Alternative solution with other method", so may I suggest another possibility:
library(plotrix)
color2D.matplot(t(Data[ , 2]), show.legend = TRUE, extremes = c("yellow", "red"))
Heaps of things to do to make it prettier. Still, a start. Of course it is important to consider the breaks in your time variable. In this very simple attempt, regular intervals are implicitly assumed, which happens to be the case in your example.
Update
Following the advice in the 'Details' section in ?color2D.matplot: "The user will have to adjust the plot device dimensions to get regular squares or hexagons, especially when the matrix is not square". Well, well, quite ugly solution.
par(mar = c(5.1, 4.1, 0, 2.1))
windows(width = 10, height = 2.5)
color2D.matplot(t(Data[ , 2]),
show.legend = TRUE,
axes = TRUE,
xlab = "",
ylab = "",
extremes = c("yellow", "red"))

Legend colours to call on dictionary R

I have a plot which plots points with a particular symbol and color. I want my legend to show the exact same colors and symbols as those in the plot. I can do this manually, but I have over 50 plots to generate and data is going to be conually updated so I would like to automate the process. I tried to create a dictionary and wanted to search the dictionary. If the value was found in levels(Color_test), then color the symbol in legend the same as outlined in the dictionary.
My legend code is as follows:
legend(legend_X, legend_Y,
xjust=x_adj, yjust=y_adj,
levels(Color_test),
col=Labels.col,
pch=Labels.sym,
horiz=FALSE)
May be what you are looking for is some kind of merging your data with the dictionary. Here is how it is done with only colors as it is just an example
data <- data.frame(type = sample(letters[1:3],20,replace=T),
x = runif(20),
y = runif(20))
dict <- data.frame(type = letters[1:4],
color = c("red","green","blue","black"))
plot(data$x, data$y, col = merge(data,dict)$color)
legend("topleft",legend=dict$type, col=dict$color, pch=1)
Easily you can modify the legend so that is justs displays the actually used colors.
data_dict <- merge(data,dict)
plot(y~x, col=color, data=data_dict, pch=as.vector(type))
legend("topleft",legend=unique(data_dict$type), col=unique(data_dict$color), pch=1)

R: color certain cells in Matrix

I am wondering whether I can color only certain cells in an R matrix using the
image
command. Currently, I am doing this:
library(Matrix)
args <- commandArgs(trailingOnly=TRUE)
csv_name <- args[1]
pdf_name <- args[2]
pdf(pdf_name)
data <- scan(csv_name, sep=",")
len <- length(data)
num <- sqrt(len)
matrix <- Matrix(data, nrow=num, ncol=num)
image(matrix)
dev.off()
The CSV file contains values between 0 and 1.
Executing the above code gives me the following image:
Now, I want to color in each row the six smallest values red.
Does anyone have an idea how to achieve this?
Thanks in advance,
Sven
Matrix seems to use lattice (levelplot). You can add a layer on top,
m = Matrix(1:9, 3)
library(latticeExtra)
image(m) + layer(panel.levelplot(1:2,1:2,1:2,1:2, col.regions="red"))
Edit: actually, it makes more sense to give the colors in the first place,
levelplot(as.matrix(m), col.regions=c(rep("red", 6), "blue", "green", "yellow"), at=1:9)
but I haven't succeeded with image:
image(m, col.regions = c(rep("red", 6), "blue", "green", "yellow"), at=1:9)
I may have missed a fine point in the docs...
You can also simply make another matrix where all values are NaN and then add a value of 1 to those that you want to highlight:
set.seed(1)
z <- matrix(rnorm(100), 10,10)
image(z)
z2 <- z*NaN
z2[order(z)[1:5]] <- 1
image(z2, add=TRUE, col=4)

Resources