I am wondering if you could help me out with the following question:
I have a correlation matrix and a third variable (continuous) for every possible pair in the correlation matrix.
Here is a toy example:
set.seed(1234)
x <- rnorm(1000,2,1)
y <- 0.1*x+rnorm(1000,1,1)
z <- y+rnorm(1000)
third.dimension <- c("(x,y)" = 0.3, "(x,z)" = 0.5, "(y,z)"= 1)
my.df <- data.frame(x,y,z)
First, I want to create a heatmap of that correlation matrix which I do with
heatmap(cor(my.df))
Next, I would like to have a coloured dot within each "cell" of the heatmap, depending on the value of the third dimension for the respective pair. Example - if the value is between 0 and 0.49, I have a black dot, if it is between 0.5 and 1, a grey dot etc.
Hence, where I have the correlation between z and y, say, I would have a grey dot painted in the corresponding "cell" of the correlation matrix.
Thanks in advance for the help!
This should work for you:
set.seed(1234)
x <- rnorm(1000,2,1)
y <- 0.1*x+rnorm(1000,1,1)
z <- y+rnorm(1000)
third.dimension <- c("(x,y)" = 0.3, "(x,z)" = 0.5, "(y,z)"= 1)
my.df <- data.frame(x,y,z)
# required function
val2col <- function(z, zlim, col = heat.colors(12), breaks){
if(!missing(breaks)){
if(length(breaks) != (length(col)+1)){stop("must have one more break than color")}
}
if(missing(breaks) & !missing(zlim)){
breaks <- seq(zlim[1], zlim[2], length.out=(length(col)+1))
}
if(missing(breaks) & missing(zlim)){
zlim <- range(z, na.rm=TRUE)
breaks <- seq(zlim[1], zlim[2], length.out=(length(col)+1))
}
CUT <- cut(z, breaks=breaks, include.lowest = TRUE)
colorlevels <- col[match(CUT, levels(CUT))] # assign colors to heights for each point
return(colorlevels)
}
# plot
COR <- list(
x = seq(ncol(my.df)),
y = seq(ncol(my.df)),
z = cor(my.df)
)
image(COR, xaxt="n", yaxt="n")
axis(1, at=COR$x, labels = names(my.df))
axis(2, at=COR$x, labels = names(my.df))
box()
COR$col <- val2col(c(COR$z), col = grey.colors(21), zlim=c(0,1))
points(expand.grid(x=COR$x, y=COR$y), col=COR$col, pch=16, cex=3)
Related
The type of plot I am trying to achieve in R seems to have been known as either as moving distribution, as joy plot or as ridgeline plot:
There is already a question in Stackoverflow whose recorded answer explains how to do it using ggplot: How to reproduce this moving distribution plot with R?
However, for learning purposes, I am trying to achieve the same using only base R plots (no lattice, no ggplot, no any plotting package).
In order to get started, I generated the following fake data to play with:
set.seed(2020)
shapes <- c(0.1, 0.5, 1, 2, 4, 5, 6)
dat <- lapply(shapes, function(x) rbeta(1000, x, x))
names(dat) <- letters[1:length(shapes)]
Then using mfrow I can achieve this:
par(mfrow=c(length(shapes), 1))
par(mar=c(1, 5, 1, 1))
for(i in 1:length(shapes))
{
values <- density(dat[[names(dat)[i]]])
plot(NA,
xlim=c(min(values$x), max(values$x)),
ylim=c(min(values$y), max(values$y)),
axes=FALSE,
main="",
xlab="",
ylab=letters[i])
polygon(values, col="light blue")
}
The result I get is:
Clearly, using mfrow (or even layout) here is not flexible enough and also does allow for the overlaps between the distributions.
Then, the question: how can I reproduce that type of plot using only base R plotting functions?
Here's a base R solution. First, we calculate all the density values and then manually offset off the y axis
vals <- Map(function(x, g, i) {
with(density(x), data.frame(x,y=y+(i-1), g))
}, dat, names(dat), seq_along(dat))
Then, to plot, we calculate the overall range, draw an empty plot, and the draw the densities (in reverse so they stack)
xrange <- range(unlist(lapply(vals, function(d) range(d$x))))
yrange <- range(unlist(lapply(vals, function(d) range(d$y))))
plot(0,0, type="n", xlim=xrange, ylim=yrange, yaxt="n", ylab="", xlab="Value")
for(d in rev(vals)) {
with(d, polygon(x, y, col="light blue"))
}
axis(2, at=seq_along(dat)-1, names(dat))
d = lapply(dat, function(x){
tmp = density(x)
data.frame(x = tmp$x, y = tmp$y)
})
d = lapply(seq_along(d), function(i){
tmp = d[[i]]
tmp$grp = names(d)[i]
tmp
})
d = do.call(rbind, d)
grp = unique(d$grp)
n = length(grp)
spcx = 5
spcy = 3
rx = range(d$x)
ry = range(d$y)
rx[2] = rx[2] + n/spcx
ry[2] = ry[2] + n/spcy
graphics.off()
plot(1, type = "n", xlim = rx, ylim = ry, axes = FALSE, ann = FALSE)
lapply(seq_along(grp), function(i){
x = grp[i]
abline(h = (n - i)/spcy, col = "grey")
axis(2, at = (n - i)/spcy, labels = grp[i])
polygon(d$x[d$grp == x] + (n - i)/spcx,
d$y[d$grp == x] + (n - i)/spcy,
col = rgb(0.5, 0.5, 0.5, 0.5))
})
I have data cdecn:
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
I have created a plot which displays the cumulative probabilities.
aprob <- ecdf(a)
plot(aprob)
I am wondering how I can switch the x-axis and y-axis to get a new plot, i.e., the inverse of ECDF.
Also, for the new plot, is there a way to add a vertical line through where the my curve intersects 0?
We can do the following. My comments along the code is very explanatory.
## reproducible example
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn)) ## random samples
a <- sort(a) ## sort samples in ascending order
e_cdf <- ecdf(a) ## ecdf function
e_cdf_val <- 1:length(a) / length(a) ## the same as: e_cdf_val <- e_cdf(a)
par(mfrow = c(1,2))
## ordinary ecdf plot
plot(a, e_cdf_val, type = "s", xlab = "ordered samples", ylab = "ECDF",
main = "ECDF")
## switch axises to get 'inverse' ECDF
plot(e_cdf_val, a, type = "s", xlab = "ECDF", ylab = "ordered sample",
main = "'inverse' ECDF")
## where the curve intersects 0
p <- e_cdf(0)
## [1] 0.01578947
## highlight the intersection point
points(p, 0, pch = 20, col = "red")
## add a dotted red vertical line through intersection
abline(v = p, lty = 3, col = "red")
## display value p to the right of the intersection point
## round up to 4 digits
text(p, 0, pos = 4, labels = round(p, 4), col = "red")
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
aprob <- ecdf(a)
plot(aprob)
# Switch the x and y axes
x <- seq(0,1,0.001754386)
plot(y=knots(aprob), x=x, ylab = "Fn(y)")
# Add a 45 degree straight line at 0, 0
my_line <- function(x,y,...){
points(x,y,...)
segments(min(x), y==0, max(x), max(y),...)
}
lines(my_line(x=x, y = knots(aprob)))
The "straight line at x==0" bit makes me suspect that you want a QQplot:
qqnorm(a)
qqline(a)
I would like to make a plot like the this image what I want, however I don't know how. I wrote the code below but I don't find a way to obtain the plot. The point here is to add density lines to my original plot (Relation Masa-SFR) the density is supposed to be every 0.3 in x. I mean one line from 7 to 7.3, the next one from 7.3 to 7.6 and so on. With the code below (continue until x=12), I obtain the this [plot][2]
plot(SFsl$MEDMASS, SFR_SalpToMPA,xlim= range(7:12),
ylim= range(-3:2.5),ylab="log(SFR(M(sun)/yr)",
xlab="log(M(star)/(M(sun)")
title("Relacion Masa-SFR")
par(new=TRUE)
FCUTsfrsl1=(SFsl$MEDMASS >= 7 & SFsl$MEDMASS <=7.3 &
SFR_SalpToMPA < 2 & SFR_SalpToMPA > -3)
x <- SFR_SalpToMPA[FCUTsfrsl1]
y <- density(x)
plot(y$y, y$x, type='l',ylim=range(-3:2.5), col="red",
ylab="", xlab="", axes=FALSE)
I did what you said but I obtained this plot, I don't know if I did something wrong
Since I don't have your data, I had to make some up. If this does what you want, I think you can adapt it to your actual data.
set.seed(7)
x <- runif(1000, 7, 12)
y <- runif(1000, -3, 3)
DF <- data.frame(x = x, y = y)
plot(DF$x, DF$y)
# Cut the x axis into 0.3 unit segments, compute the density and plot
br <- seq(7, 12, 0.333)
intx <- cut(x, br) # intervals
intx2 <- as.factor(cut(x, br, labels = FALSE)) # intervals by code
intx3 <- split(x, intx) # x values
inty <- split(y, intx2) # corresponding y values for density calc
for (i in 1:length(intx3)) {
xx <- seq(min(intx3[[i]]), max(intx3[[i]]), length.out = 512)
lines(xx, density(inty[[i]])$y, col = "red")
}
This produce the following image. You need to look closely but there is a separate density plot for each 0.3 unit interval.
EDIT Change the dimension that is used to compute the density
set.seed(7)
x <- runif(1000, 7, 12)
y <- runif(1000, -3, 3)
DF <- data.frame(x = x, y = y)
plot(DF$x, DF$y, xlim = c(7, 15))
# Cut the x axis into 0.3 unit segments, compute the density and plot
br <- seq(7, 12, 0.333)
intx <- cut(x, br) # intervals
intx2 <- as.factor(cut(x, br, labels = FALSE)) # intervals by code
intx3 <- split(x, intx) # x values
inty <- split(y, intx2) # corresponding y values
# This gives the density values in the horizontal direction (desired)
# This is the change, the above is unchanged.
for (i in 1:length(intx3)) {
yy <- seq(min(inty[[i]]), max(inty[[i]]), length.out = 512)
offset <- min(intx3[[i]])
lines(density(intx3[[i]])$y + offset, yy, col = "red")
}
Which gives:
I am trying to visualize a curve for pollination distribution. I am very new to R so please don't be upset by my stupidity.
llim <- 0
ulim <- 6.29
f <- function(x,y) {(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))}
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), llim, ulim)$value
})
}, llim, ulim)
fv <- Vectorize(f)
curve(fv, from=0, to=1000)
And I get:
Error in y^2 : 'y' is missing
I'm not quite sure what you're asking to plot. But I know you want to visualise your scalar function of two arguments.
Here are some approaches. First we define your function.
llim <- 0
ulim <- 6.29
f <- function(x,y) {
(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))
}
From your title I thought of the following. The function defined below intf integrates your function over the square [0,ul] x [0,ul] and return the value. We then vectorise and plot the integral over the square as a function the length of the side of the square.
intf <- function(ul) {
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), 0, ul)$value
})
}, 0, ul)$value
}
fv <- Vectorize(intf)
curve(fv, from=0, to=1000)
If f is a distribution, I guess you can make your (somewhat) nice probability interpretation of this curve. (I.e. ~20 % probability of pollination(?) in the 200 by 200 meter square.)
However, you can also do a contour plot (of the log-transformed values) which illustrate the function we are integrating above:
logf <- function(x, y) log(f(x, y))
x <- y <- seq(llim, ulim, length.out = 100)
contour(x, y, outer(x, y, logf), lwd = 2, drawlabels = FALSE)
You can also plot some profiles of the surface:
plot(1, xlim = c(llim, ulim), ylim = c(0, 0.005), xlab = "x", ylab = "f")
y <- seq(llim, ulim, length.out = 6)
for (i in seq_along(y)) {
tmp <- function(x) f(x, y = y[i])
curve(tmp, llim, ulim, add = TRUE, col = i)
}
legend("topright", lty = 1, col = seq_along(y),
legend = as.expression(paste("y = ",y)))
They need to be modified a bit to make them publication worthy, but you get the idea. Lastly, you can do some 3d plots as others have suggested.
EDIT
As per your comments, you can also do something like this:
# Define the function times radius (this time with general a and b)
# The default of a and b is as before
g <- function(z, a = 5e-6, b = .156812) {
z * (b/(2*pi*a^2*gamma(2/b)))*exp(-(z/a)^b)
}
# A function that integrates g from 0 to Z and rotates
# As g is not dependent on the angle we just multiply by 2pi
intg <- function(Z, ...) {
2*pi*integrate(g, 0, Z, ...)$value
}
# Vectorize the Z argument of intg
gv <- Vectorize(intg, "Z")
# Plot
Z <- seq(0, 1000, length.out = 100)
plot(Z, gv(Z), type = "l", lwd = 2)
lines(Z, gv(Z, a = 5e-5), col = "blue", lwd = 2)
lines(Z, gv(Z, b = .150), col = "red", lwd = 2)
lines(Z, gv(Z, a = 1e-4, b = .2), col = "orange", lwd = 2)
You can then plot the curves for the a and b you want. If either is not specified, the default is used.
Disclaimer: my calculus is rusty and I just did off this top of my head. You should verify that I've done the rotation of the function around the axis properly.
The lattice package has several functions that can help you draw 3 dimensional plots, including wireframe() and persp(). If you prefer not to use a 3d-plot, you can create a contour plot using contour().
Note: I don't know if this is intentional, but your data produces a very large spike in one corner of the plot. This produces a plot that is for all intents flat, with a barely noticable spike in one corner. This is particularly problematic with the contour plot below.
library(lattice)
x <- seq(0, 1000, length.out = 50)
y <- seq(0, 1000, length.out = 50)
First the wire frame plot:
df <- expand.grid(x=x, y=y)
df$z <- with(df, f(x, y))
wireframe(z ~ x * y, data = df)
Next the perspective plot:
dm <- outer(x, y, FUN=f)
persp(x, y, dm)
The contour plot:
contour(x, y, dm)
How would you you make an image from a matrix in R?
Matrix values would correspond to pixel intensity on image (although I am just interested in 0,1 values white or black at the moment.), while column and row numbers correspond to vertical and horizontal location on the image.
By make an image I mean display it on the screen and save it as a jpg.
You can display it on the screen easiest using 'image':
m = matrix(runif(100),10,10)
par(mar=c(0, 0, 0, 0))
image(m, useRaster=TRUE, axes=FALSE)
You can also have a look at the raster package...
Set up a plot with no margin:
par(mar = rep(0, 4))
Image the matrix with greyscale, like spacedman's answer but completely filling the device:
m = matrix(runif(100),10,10)
image(m, axes = FALSE, col = grey(seq(0, 1, length = 256)))
Wrap that in a call to png() to create the file:
png("simpleIm.png")
par(mar = rep(0, 4))
image(m, axes = FALSE, col = grey(seq(0, 1, length = 256)))
dev.off()
If you need to do this with spatial axes (defaults to [0,1] for X and Y) then use the image.default(x, y, z, ...) form where x and y give the central positions of the pixels in z. x and y can be of length dim(z) + 1 to give corner coordinates for that convention.
Centres of pixels (this is the default for image):
x <- seq(0, 1, length = nrow(m))
y <- seq(0, 1, length = ncol(m))
image(x, y, m, col = grey(seq(0, 1, length = 256)))
Corners of pixels (need 1 extra x and y, and 0 is now the very bottom left corner):
x <- seq(0, 1, length = nrow(m) + 1)
y <- seq(0, 1, length = ncol(m) + 1)
image(x, y, m, col = grey(seq(0, 1, length = 256)))
Note that from R 2.13 image.default gains an argument useRaster which uses the very efficient newish graphics function rasterImage rather than the old image which is effectively multiple calls to rect under the hood to draw every pixel as a polygon.
I do a matrix (where the vertical axis increases going down) one of two ways. Below is the first way using heatmap.2(). It has more control over how the numeric values are formatted in the plot (see the formatC statement below), but is a little harder to deal with when changing the layout.
library(gplots)
#Build the matrix data to look like a correlation matrix
x <- matrix(rnorm(64), nrow=8)
x <- (x - min(x))/(max(x) - min(x)) #Scale the data to be between 0 and 1
for (i in 1:8) x[i, i] <- 1.0 #Make the diagonal all 1's
#Format the data for the plot
xval <- formatC(x, format="f", digits=2)
pal <- colorRampPalette(c(rgb(0.96,0.96,1), rgb(0.1,0.1,0.9)), space = "rgb")
#Plot the matrix
x_hm <- heatmap.2(x, Rowv=FALSE, Colv=FALSE, dendrogram="none", main="8 X 8 Matrix Using Heatmap.2", xlab="Columns", ylab="Rows", col=pal, tracecol="#303030", trace="none", cellnote=xval, notecol="black", notecex=0.8, keysize = 1.5, margins=c(5, 5))
You can create a heatmap of the matrix.
library(pheatmap)
# Create a 10x10 matrix of random numbers
m = matrix(runif(100), 10, 10)
# Save output to jpeg
jpeg("heatmap.jpg")
pheatmap(m, cluster_row = FALSE, cluster_col = FALSE, color=gray.colors(2,start=1,end=0))
dev.off()
See ?pheatmap for more options.
Try levelplot:
library(lattice)
levelplot(matrix)
Here's the second way (again, where the vertical axis increases going down). This method is easier to layout, but has less control over the format of the numeric values displayed in the plot.
library(plotrix)
#Build the matrix data to look like a correlation matrix
n <- 8
x <- matrix(runif(n*n), nrow=n)
xmin <- 0
xmax <- 1
for (i in 1:n) x[i, i] <- 1.0 #Make the diagonal all 1's
#Generate the palette for the matrix and the legend. Generate labels for the legend
palmat <- color.scale(x, c(1, 0.4), c(1, 0.4), c(0.96, 1))
palleg <- color.gradient(c(1, 0.4), c(1, 0.4), c(0.96, 1), nslices=100)
lableg <- c(formatC(xmin, format="f", digits=2), formatC(1*(xmax-xmin)/4, format="f", digits=2), formatC(2*(xmax-xmin)/4, format="f", digits=2), formatC(3*(xmax-xmin)/4, format="f", digits=2), formatC(xmax, format="f", digits=2))
#Set up the plot area and plot the matrix
par(mar=c(5, 5, 5, 8))
color2D.matplot(x, cellcolors=palmat, main=paste(n, " X ", n, " Matrix Using Color2D.matplot", sep=""), show.values=2, vcol=rgb(0,0,0), axes=FALSE, vcex=0.7)
axis(1, at=seq(1, n, 1)-0.5, labels=seq(1, n, 1), tck=-0.01, padj=-1)
#In the axis() statement below, note that the labels are decreasing. This is because
#the above color2D.matplot() statement has "axes=FALSE" and a normal axis()
#statement was used.
axis(2, at=seq(1, n, 1)-0.5, labels=seq(n, 1, -1), tck=-0.01, padj=0.7)
#Plot the legend
pardat <- par()
color.legend(pardat$usr[2]+0.5, 0, pardat$usr[2]+1, pardat$usr[2], paste(" ", lableg, sep=""), palleg, align="rb", gradient="y", cex=0.7)
With ggplot2:
library(tidyverse)
n <- 12
m <- matrix(rnorm(n*n),n,n)
rownames(m) <- colnames(m) <- 1:n
df <- as.data.frame(m) %>% gather(key='y', value='val')
df$y <- as.integer(df$y)
df$x <- rep(1:n, n)
ggplot(df, aes(x, y, fill= val)) +
geom_tile() +
geom_text(aes(x, y, label=round(val,2))) +
scale_fill_gradient(low = "white", high = "red") +
theme_bw()