Scatter plot in R - r

I'm fairly new to r and I have to plot the scatter plot with:
residues1 residues2 coovariance
1 1 0.99613318
2 1 0.98771518
3 1 0.98681384
4 1 0.99225447
residue 1 and residue2 as x,y axis and the coovariance is to be color scale rather than height. I have previously used scatter plot 3d but don't know how to plot the third axis as a color scale. Please help .
Thanks
Vibhor

I'm not sure an x-y plot with color per column 3 is the best way to visualize this. If residues2 is a constant, prob. better to leave it out altogether and plot the other values against each other.
Perhaps you could adapt the following to your needs:
df1 <- data.frame(r1=seq(4), r2=rep(1,4),
c1=c(0.99613318, 0.98771518, 0.98681384, 0.99225447) )
### give order (for plotting)
df1 <- within(df1, c2 <- rank(c1))
### create blank plot
with(df1, plot(r1,r2, xlab="residues_1", ylab="residues_2", cex.lab=1.5))
### strongest red to largest color
with(df1, points(r1, r2, cex=15, pch=19, col = rev(heat.colors(4))[c2] ))
### make legend
l1 <- as.matrix(df1[ ,"c1"])
graphics::legend("topright", legend=l1, lty=1, title="covariance", lwd=3,
col = rev(heat.colors(4))[df1$c2], cex=2)
giving:
(I've made the image elements a bit oversize, and manually adjusted dimensions before saving as .png in order to display better on here).

Related

How to put legend outside the plot area?

My problem is related to car package.
I create Kernal plot. However, since legend is too big, I would like to move legend outside the plot are, upper or lower?
Otherwise, I tried with cowplot::get_legend( ), but it did not work properly.
library(car)
mtcars$g <- as.factor(mtcars$vs)
densityPlot(mpg,mtcars$g,show.bw=T, kernel=depan,legend=list(location="topleft",title=NULL))
Probably the easiest thing is to not plot the legend using the densityPlot() function but rather add it separately using legend(). The following code is an example of how this can be done. The resulting figure look like this:
library(car)
mtcars$g <- as.factor(mtcars$vs)
par(mar=c(4,4,4,2))
# obtaining results from kernel density and saving results
# need saved values for bandwidth in legend
# also plots the kernel densities
d <- densityPlot(mtcars$mpg,mtcars$g
,show.bw=T
,kernel=depan
,legend=F # no default legend
,col = c('black','blue')
,lty=c(1,2))
# allows legend outside of plot area to be displayed
par(xpd=T)
# defining location based on the plot coordinates from par('usr')
legend(x=mean(par('usr')[c(1,2)]) # average of range of x-axis
,y=par('usr')[4]+0.015 # top of the y axis with additional shift
,legend = c(paste('0 (bw = ',round(d$`0`['bw'][[1]],4),')',sep='') # extract bw values from saved output and
,paste('1 (bw = ',round(d$`1`['bw'][[1]],4),')',sep='')) # formatting similar to default, except with rounding bw value
,ncol=1 # change to 2 if you want entries beside each other
,lty=c(1,2) # line types, same as above
,col=c('black','blue') # colors, same as above
,lwd=1
,xjust = 0.5 # centers legend at x coordinate
,yjust = 0.5 # centers legend at y coordinate
)
par(xpd=F)

Heatmaply sidebar colors, how to show specified colors correctly?

I have a problem, might be a bug in heatmaply or plotly. Colors in the sidebar of a heatmap are not showing the colors I specified. See the code example below, At the end of the code in part # 6) the first plot, plotted using the plot function (simple plot showing the colors), shows the colors correctly (yellow and blue):
The second plot using these colors in a heatmaply side bar (heatmamply side bar with wrong color):
fails to show them correctly and instead what appears to show random colors. In a similar plot with real data there are even red and orange colors in the sidebar (heatmaply sidebar shows red and orange while color range is blue-yellow):
while all codes are generated using a blue yellow color range. Any ideas what might cause this bug and how to show colors in the sidebar consistent with their color code a?
Compare cophenetic similarity between leaves in two trees build on full data and subsample of the data
# 1 ) Generate random data to build trees
set.seed(2015-04-26)
dat <- matrix(rnorm(100), 10, 50) # Dataframe with 50 columns
datSubSample <- dat[, sample(ncol(dat), 30)] #Dataframe with 30 columns sampled from the dataframe with 50
dat_dist1 <- dist(datSubSample)
dat_dist2 <- dist(dat)
hc1 <- hclust(dat_dist1)
hc2 <- hclust(dat_dist2)
# 2) Build two dendrograms, one based on all data, second based a sample of the data (30 out of 50 columns)
dendrogram1 <- as.dendrogram(hc1)
dendrogram2 <- as.dendrogram(hc2)
# 3) For each leave in a tree get cophenetic distance matrix,
# each column represent distance of that leave to all others in the same tree
cophDistanceMatrix1 <- as.data.frame(as.matrix(cophenetic(dendrogram1)))
cophDistanceMatrix2 <- as.data.frame(as.matrix(cophenetic(dendrogram2)))
# 4) Calculate correlation between cophenetic distance of a leave to all other leaves, between two trees
corPerLeave <- NULL # Vector to store correlations for each leave in two trees
for (leave in colnames(cophDistanceMatrix1)){
cor <- cor(cophDistanceMatrix2[leave], cophDistanceMatrix1[leave])
corPerLeave <- c(corPerLeave, unname(cor))
}
# 5) Convert cophenetic correlation to color to show in side bar of a heatmap
corPerLeave <- corPerLeave / max(corPerLeave) #Scale 0 to 1 correlation
byPal <- colorRampPalette(c('yellow', 'blue')) #blue yellow color palette, low correlation = yellow
colCopheneticCor <- byPal(20)[as.numeric(cut(corPerLeave, breaks =20))]
# 6) Plot heatmap with dendrogram with side bar that shows cophenetic correlation for each leave
row_dend <- dendrogram2
x <- as.matrix(dat_dist2)
#### Plot belows use the same color code, normal plot works, however heatmaply shows wrong colors
plot(x = 1:length(colCopheneticCor), y = 1:length(colCopheneticCor), col = colCopheneticCor)
heatmaply(x, colD = row_dend, row_side_colors = colCopheneticCor)
Found the solution, you can use a function for the color with the heatmaply build in row_side_palette parameter. Minimal example code, that can be combined with the code in the question itself to show heatmap with cophenetic distance per leave/species in the sidebar represented by a different color:
ByPal <- colorRampPalette(c('red','blue')) # Bi color palette function to be used in sidebar
heatmaply(m,colD = row_dend, file=fileName1, plot_method= "plotly",colorscale='Viridis',row_side_palette= byPal ,
row_side_colors=data.frame("Correlation cophenetic distances" = corPerLeave, check.names=FALSE))
One problem I did not solve yet is how to show a continuous colorbar in the legend, any suggestions?

Plot continuous data with discrete colors

I found some similar questions but the answers didn't solve my problem.
I try to plot a time series of to variables as a scatterplot and using the date to color the points. In this example, I created a simple dataset (see below) and I want to plot all data with timesteps in the 1960ties, 70ties, 80ties and 90ties with one colour respectively.
Using the standard plot command (plot(x,y,...)) it works the way it should, as I try using the ggplot library some strange happens, I guess I miss something. Has anyone an idea how to solve this and generate a correct plot?
Here is my code using the standard plot command with a colorbar
# generate data frame with test data
x <- seq(1,40)
y <- seq(1,40)
year <- c(rep(seq(1960,1969),2),seq(1970,1989,2),seq(1990,1999))
df <- data.frame(x,y,year)
# define interval and assing color to interval
myinterval <- seq(1959,1999,10)
mycolors <- rainbow(4)
colbreaks <- findInterval(df$year, vec = myinterval, left.open = T)
# basic plot
layout(array(1:2,c(1,2)),widths =c(5,1)) # divide the device area in two panels
par(oma=c(0,0,0,0), mar=c(3,3,3,3))
plot(x,y,pch=20,col = mycolors[colbreaks])
# add colorbar
ncols <- length(myinterval)-1
colbarlabs <- seq(1960,2000,10)
par(mar=c(5,0,5,5))
image(t(array(1:ncols, c(ncols,1))), col=mycolors, axes=F)
box()
axis(4, at=seq(0.5/(ncols-1)-1/(ncols-1),1+1/(ncols-1),1/(ncols-1)), labels=colbarlabs, cex.axis=1, las=1)
abline(h=seq(0.5/(ncols-1),1,1/(ncols-1)))
mtext("year",side=3,line=0.5,cex=1)
As I would like to use ggplot package, as I do for other plots, I tried this version with ggplot
# plot with ggplot
require(ggplot2)
ggplot(df, aes(x=x,y=y,color=year)) + geom_point() +
scale_colour_gradientn(colours= mycolors[colbreaks])
but it didn't work the way I thought it would. Obviously, there is something wrong with the color coding. Also, the colorbar looks strange. I also tried it with scale_color_manual and scale_color_gradient2 but I got more errors (Error in continuous_scale).
Any idea how to solve this and generate a plot according to the standard plot 3 including a colorbar.

How to change values on y-axis for lattice xyplot

I have an xy plot in lattice on which I'm showing four different things. The plot looks like this right now. The values for pink line range from 1 to 15000, however, values for other lines range from 20 to 300. This is why all lines other than pink seem static. However, there are fluctuations in them but I feel the graph isn't showing them property because of yaxis. Is there a way I can shorten the yaxis such that the graph is better representing the other lines as well?
This is how it looks when I don't plot the pink line all together. This shows there are fluctuations which I'd like to show.
If you can use the base package instead of lattice it is quite simple. The code below is vastly simplified from one of my own plots. You will have to fiddle a little with it to add two more lines.
line description
1,2 plot from a data frame. ylab will be on side 2 (left) scale will be automatically determined from the data
3 start a second plot
4 plot from a data frame, use axes=FALSE, xlab=NA, ylab=NA
5 create the axis for side 4 (right) scale will be automatically determined from the data
6 make the ylab for side 4
1 plot(df[c(4,5)], type = "s", col = "blue", main = "Battery Life",
2 xlab="minutes", ylab="percent")
3 par(new=TRUE)
4 plot(df[c(4,6)], type = "s", col = "red", axes = FALSE, xlab = NA, ylab = NA)
5 axis(side = 4)
6 mtext(side = 4, line = 3, "Slope ( minutes)")
You can use the latticeExtra package to create a graph with 2 separate y-axis.
As the comments suggest, I would rather create 2 separate plots. It's a cleaner solution.
As an alternativ: maybe you could add a conditioning variable to your data ("magnitude" or so) which groups your data into suitable chunks. Then you could present your data as shown below.
library("lattice")
library("latticeExtra")
dat1 <- data.frame(x=1:100, y1=rep(1:10,10), y2=rep(100:91,10))
dat2 <- data.frame(x=1:200, y=c(rep(1:10,10), rep(100:91,10)),
z=c(rep("small",100), rep("huge",100)))
p1 <- xyplot(y1~x, data=dat1, type="l")
p2 <- xyplot(y2~x, data=dat1, type="l")
doubleYScale(p1, p2) # 2 y-axis: bad
xyplot(y ~ x | z, data=dat2, type="l", scales="free") # 2 plots: good

R Scatter Plot: symbol color represents number of overlapping points

Scatter plots can be hard to interpret when many points overlap, as such overlapping obscures the density of data in a particular region. One solution is to use semi-transparent colors for the plotted points, so that opaque region indicates that many observations are present in those coordinates.
Below is an example of my black and white solution in R:
MyGray <- rgb(t(col2rgb("black")), alpha=50, maxColorValue=255)
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
dev.new(width=3.5, height=5)
par(mfrow=c(2,1), mar=c(2.5,2.5,0.5,0.5), ps=10, cex=1.15)
plot(x1, x2, ylab="", xlab="", pch=20, col=MyGray)
plot(x1, x2, ylab="", xlab="", pch=20, col="black")
However, I recently came across this article in PNAS, which took a similar a approach, but used heat-map coloration as opposed to opacity as an indicator of how many points were overlapping. The article is Open Access, so anyone can download the .pdf and look at Figure 1, which contains a relevant example of the graph I want to create. The methods section of this paper indicates that analyses were done in Matlab.
For the sake of convenience, here is a small portion of Figure 1 from the above article:
How would I create a scatter plot in R that used color, not opacity, as an indicator of point density?
For starters, R users can access this Matlab color scheme in the install.packages("fields") library, using the function tim.colors().
Is there an easy way to make a figure similar to Figure 1 of the above article, but in R? Thanks!
One option is to use densCols() to extract kernel densities at each point. Mapping those densities to the desired color ramp, and plotting points in order of increasing local density gets you a plot much like those in the linked article.
## Data in a data.frame
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
df <- data.frame(x1,x2)
## Use densCols() output to get density at each point
x <- densCols(x1,x2, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L
## Map densities to colors
cols <- colorRampPalette(c("#000099", "#00FEFF", "#45FE4F",
"#FCFF00", "#FF9400", "#FF3100"))(256)
df$col <- cols[df$dens]
## Plot it, reordering rows so that densest points are plotted on top
plot(x2~x1, data=df[order(df$dens),], pch=20, col=col, cex=2)
You can get a similar effect by doing hexagonal binning, divide the region into hexagons, color each hexagon based on the number of points in the hexagon. The hexbin package has functions to do this and there are also functions in the ggplot2 package.
You can use smoothScatter for this.
colramp = colorRampPalette(c('white', 'blue', 'green', 'yellow', 'red'))
smoothScatter(x1, x2, colramp=colramp)

Resources