Heatmap like plot with Lattice - r

I can not figure out how the lattice levelplot works. I have played with this now for some time, but could not find reasonable solution.
Sample data:
Data <- data.frame(x=seq(0,20,1),y=runif(21,0,1))
Data.mat <- data.matrix(Data)
Plot with levelplot:
rgb.palette <- colorRampPalette(c("darkgreen","yellow", "red"), space = "rgb")
levelplot(Data.mat, main="", xlab="Time", ylab="", col.regions=rgb.palette(100),
cuts=100, at=seq(0,1,0.1), ylim=c(0,2), scales=list(y=list(at=NULL)))
This is the outcome:
Since, I do not understand how this levelplot really works, I can not make it work. What I would like to have is the colour strips to fill the whole window of the corresponding x (Time).
Alternative solution with other method.
Basically, I'm trying here to plot the increasing risk over time, where the red is the highest risk = 1. I would like to visualize the sequence of possible increase or clustering risk over time.

From ?levelplot we're told that if the first argument is a matrix then "'x' provides the
'z' vector described above, while its rows and columns are
interpreted as the 'x' and 'y' vectors respectively.", so
> m = Data.mat[, 2, drop=FALSE]
> dim(m)
[1] 21 1
> levelplot(m)
plots a levelplot with 21 columns and 1 row, where the levels are determined by the values in m. The formula interface might look like
> df <- data.frame(x=1, y=1:21, z=runif(21))
> levelplot(z ~ y + x, df)
(these approaches do not quite result in the same image).

Unfortunately I don't know much about lattice, but I noted your "Alternative solution with other method", so may I suggest another possibility:
library(plotrix)
color2D.matplot(t(Data[ , 2]), show.legend = TRUE, extremes = c("yellow", "red"))
Heaps of things to do to make it prettier. Still, a start. Of course it is important to consider the breaks in your time variable. In this very simple attempt, regular intervals are implicitly assumed, which happens to be the case in your example.
Update
Following the advice in the 'Details' section in ?color2D.matplot: "The user will have to adjust the plot device dimensions to get regular squares or hexagons, especially when the matrix is not square". Well, well, quite ugly solution.
par(mar = c(5.1, 4.1, 0, 2.1))
windows(width = 10, height = 2.5)
color2D.matplot(t(Data[ , 2]),
show.legend = TRUE,
axes = TRUE,
xlab = "",
ylab = "",
extremes = c("yellow", "red"))

Related

R: plot() Function with type="h" Misrepresents Small Numbers ( For Larger Values of "lwd" )

I am trying to generate a plot showing the probabilities of a Binomial(10, 0.3) distribution.
I'd like to do this in base R.
The following code is the best I have come up with,
plot(dbinom(1:10, 10, 0.3), type="h", lend=2, lwd=20, yaxs="i")
My issue with the above code is the small numbers get disproportionately large bars. (See below) For example P(X = 8) = 0.00145 but the height in the plot looks like about 0.025.
It seems to be an artifact created by wanting wider bars, if the lwd = 20 argument is removed you get tiny bars but their heights seem to be representative.
I think the problem is your choice of lend (line-end) parameter. The 'round' (0) and 'square' (2) choices are intended for when you want a little bit of extra extension beyond the end of a segment, e.g. so that adjacent segments join nicely, e.g. if you were plotting line segments that should be part of a connected line (see example below).
f <- function(le) plot(dbinom(1:10, 10, 0.3),
type="h", lend = le, lwd=20, yaxs="i", main = le)
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f))
"round", "butt", and "square" could also be specified (less mnemonically) as 0, 1, and 2 ...
x <- 1:5; y <- c(1,4,2,3,5)
f2 <- function(le) {
plot(x,y, type ="n", main = le)
segments(x[-length(x)], y[-length(x)], x[-1], y[-1],
lwd = 20, lend = le)
}
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f2))
Here you can see that the round end caps work well, both 'butt' and 'square' have issues. (I can't think offhand of a use case for "square", but I'm sure one exists ...) There is a good description of line-drawing parameters here (although it also doesn't suggest use cases ...)

R multi boxplot in one graph with value (quantile)

How to create multiple boxplot with value shown in R ?
Now I'm using this code
boxplot(Data_frame[ ,2] ~ Data_frame[ ,3], )
I tried to use this
boxplot(Data_frame[ ,2] ~ Data_frame[ ,3], )
text(y=fivenum(Data_frame$x), labels =fivenum(Data_frame$x), x=1.25)
But only first boxplot have value. How to show value in all boxplot in one graph.
Thank you so much!
As far as I understand your question (it is not clear how the fivenum summary should be displayed) here is one solution. It presents the summary using the top axis.
x <- data.frame(
Time = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3),
Value = c(5,10,15,20,30,50,70,80,100,5,7,9,11,15,17,19,17,19,100,200,300,400,500,700,1000,200))
boxplot(x$Value ~ x$Time)
fivenums <- aggregate(x$Value, by=list(Time=x$Time), FUN=fivenum)
labels <- apply(fivenums[,-1], 1, function(x) paste(x[-1], collapse = ", "))
axis(3, at=fivenums[,1],labels=labels, las=1, col.axis="red")
Of course you can additionally play with the font size or rotation for this summary. Moreover you can break the line in one place, so the label will have smaller width.
Edit
In order to get what have you posted in the comment below you can add
text(x = 3 + 0.5, y = fivenums[3,-1], labels=fivenums[3,-1])
and you will get
however it won't be readable for other boxplots.

Dynamically coloring boxplot in R

I have data with the following columns: lot, sublot, size, data. I have multiple lot(s) and each lot can have multiple sublot(s). Each sublot has size(s) of 1 to 4.
I have created a boxplot for this data using the following code:
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
x11()
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
title(xlab='Size.Sublot.Lot', line=9)
I wanted to use the boxfill command to color each boxplot based on the lot#. I have seen two solutions:
create a vector and explicitly specify the colors to be used e.g. colr = c("red", "red", "red", .... "green", "green", "green", ... "blue"). The problem with this solution is that it requires me to know apriori the number of lots in df and number of times the color needs to be repeated.
use "ifelse" statement. The problem with this solution is that (a) I need to know the number of lots and (b) I need to create multiple nested ifelse statements.
I would prefer to create a "dynamic" solution which creates the color vector based on the number of lot entries I have in my file.
I have tried to create:
uniqlot <- unique(df$lot)
colr <- palette(rainbow(length(uniqlot)))
but am stuck since the entries in the colr vector do not repeat for the number of unique combinations of size.sublot.lot. Note: I want all boxplots for lot ABC to be colored with one color, all boxplots for lot DEF to be colored with another color etc.
I am attaching a picture of the uncolored boxplot. Uncolored Boxplot
Raw data (example.xlsx) can be accessed at the following link:
example.xlsx
This is what I would do:
n1 <- length(unique(df$sublot))
n2 <- length(unique(df$size))
colr <- palette(rainbow(length(n)))
colr <- rep(colr, each = n1*n2)
boxplot(data ~ size*sublot*lot,
col = colr,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
Using ggplot:
df$size <- as.factor(df$size)
ggplot(df, aes(sublot, data, group = interaction(size, sublot), col = size)) +
geom_boxplot() +
facet_wrap(~lot, nrow = 1)
Also, you can get rid of df$size <- as.factor(df$size) if you want continuous colour.
thanks to the pointers provided in the responses and after digging around a little more, I was able to find a solution to my own question. I wanted to submit this piece of code in case someone needed to replicate.
Here is a picture of the boxplot this code creates (and I wanted to create). colored boxplot
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
unqlot <- unique(df$lot)
unqsublot <- unique(df$sublot)
unqsize <- unique(df$size)
cul <- palette(rainbow(length(unqlot)))
culur <- character()
for (i in 1:length(unqsize)) {
culur_temp = rep(cul[i], each=(length(unqsize)*length(unqsublot)))
culur = c(culur, culur_temp)
}
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
col = culur,
las=2,
data=df)

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

R) Create double-labeled MDS plot

I am totally new to R.
I have expression profile data which is preprocessed and combined. Looks like this ("exp.txt")
STUDY_1_CANCER_1 STUDY_1_CON_1 STUDY_2_CANCER_1 STUDY_2_CANCER_2
P53 1.111 1.22 1.3 1.4
.....
Also, I created phenotype data. Looks lite this ("pheno.txt")
Sample Disease Study
STUDY_1_CANCER_1 Cancer GSE1
STUDY_1_CON_1 Normal GSE1
STUDY_2_CANCER_1 Cancer GSE2
STUDY_2_CON_1 Normal GSE2
Here, I tried to make MDS plot using classical cmdscale command like this.
data=read.table("exp.txt", row.names=1, header=T)
DATA=as.matrix(data)
pc=cor(DATA, method="p")
mds=cmdscale(as.dist(1-pc),2)
plot(mds)
I'd like to create plot like this figure with color double-labeling (Study and Disease). How should I do?
First create an empty plot, then add the points with specified colors/shapes.
Here's an example:
require(vegan)
data(dune)
data(dune.env)
mds <- cmdscale(vegdist(dune, method='bray'))
# set colors and shapes
cols = c('red', 'blue', 'black', 'steelblue')
shps = c(15, 16, 17)
# empty plot
plot(mds, type = 'n')
# add points
points(mds, col = cols[dune.env$Management], pch = shps[dune.env$Use])
# add legend
legend('topright', col=cols, legend=levels(dune.env$Management), pch = 16, cex = 0.7)
legend('bottomright', legend=levels(dune.env$Use), pch = shps, cex = 0.7)
Note that factors are internally coded as integers, which is helpful here.
> levels(dune.env$Management)
[1] "BF" "HF" "NM" "SF"
so
cols[dune.env$Management]
will take the first entry of cols for the first factor levels. Similariy for the different shapes.
Finally add the legend. Of course this plot still needs some polishing, but thats the way to go...
BTW: Gavin Simpson has a nice blogpost about customizing ordination plots.
Actually, you can do this directly in default plot command which can take pch and col arguments as vectors. Use:
with(data, plot(mds, col = as.numeric(Study), pch = as.numeric(Disease), asp = 1)
You must use asp = 1 when you plot cmdscale results: both axes must be scaled similarly. You can also add xlab and ylab arguments for nicer axis labels. For adding legend and selecting plotting characters and colours, see other responses.

Resources