I am trying to put together a Plots.jl user recipe, that would create several scatter plots and histograms arranged in a grid. The scatter plots should occupy the lower triangle of the grid. Histograms go on the diagonals. Each scatter plot should have several series, shown in different colours. The colours should be consistent across subplots.
My problem is when I allow the user to specify custom colours, the series colours in the scatter subplots get messed up.
Here is the (abridged) recipe code. data is is a vector of size r. Each element of data is a matrix of size n_r x d. n_r may vary, d stays the same. There should be d x (d-1) / 2 scatter plots, with r series on each plot, each series having n_r points.
#recipe function my_func(data::my_type; custom_colors=nothing)
# get d, r, ...
for i in i:d
for j in 1:d
#series begin
subplot := (i - 1) * d + j
if i == j
seriestype := :histogram
plot_data = # ... prepare data for histograms
elseif j < i
# scatter subplot recipe
seriestype := :scatter
if custom_colors !== nothing
color := reshape(custom_colors, (1, r))
end
x = Vector()
y = Vector()
for r in runs
ser = data[r]
append!(x, [ser[:, j]])
append!(y, [ser[:, i]])
end # for r
plot_data = (x, y)
else
# leave empty
plot_data = [0]
end # if/else
plot_data # return from the macro function
end # #series
end # for j
end # for i
end # #recipe
My problem is that whenever I supply custom colours, I get inconsistency across subplots:
plot(my_data, custom_colors=["blue", "green", "black"])
Note how subplot (2, 1) has black dots in the middle, whereas all other scatter subplots have black on the outside
If I do the plot without custom colors:
plot(my_data)
I get consistent colours in all scatter subplots:
Any clues why the colours are mixed up on the first plot?
OK, turns out the trick here is that each series on each scatter plot should be plotted with its own macro. In other words, the #series ... end should go inside for r in runs ... end. And a separate #series macro for each of the other branches of the code.
Related
I want to create multi-layer plots using for-loops. The main dataframe I am working with has the following characteristics:
product: 55_ab_LL_bubbles_D1 | 55_ab_LL_troubles_D1 | 34_ac_LL_bubbles_D1 | 34_ac_LL_troubles_D1
Color
Blue 453.3 766.1 562.1 883.3
Green 775.5 897.1 434.5 983.4
Purple 883.4 445.7 787.2 555.5
Yellow 764.1 445.6 887.3 673.5
From the code below, I am running loop down the rownames (Color) to create a scatter plot.
What I would like to do is, not only run the current loop down the rownames, but I also want to create individual scatter plots for each product (based on the first string "55_", "34_" etc..). I want to group all data points for the number preceding in the product, create the independent scatter plots for each of these numbers for each of the colors. So instead of the four scatter plots it gives me right now (one for each color), I would like to have 8 (for each color and each product number).
Any suggestion is appreciated :) !
CODE:
pdf("scatterplot.pdf")
for(i in seq_len(nrow(data))){
df <- data.frame(x= data[i, grep("bubbles_", colnames(data))],
y= data[i, grep("troubles_",colnames(data))])
plot(df$x, df$y,
xlim=xy, ylim = xy)
}
dev.off()
My problem is related to car package.
I create Kernal plot. However, since legend is too big, I would like to move legend outside the plot are, upper or lower?
Otherwise, I tried with cowplot::get_legend( ), but it did not work properly.
library(car)
mtcars$g <- as.factor(mtcars$vs)
densityPlot(mpg,mtcars$g,show.bw=T, kernel=depan,legend=list(location="topleft",title=NULL))
Probably the easiest thing is to not plot the legend using the densityPlot() function but rather add it separately using legend(). The following code is an example of how this can be done. The resulting figure look like this:
library(car)
mtcars$g <- as.factor(mtcars$vs)
par(mar=c(4,4,4,2))
# obtaining results from kernel density and saving results
# need saved values for bandwidth in legend
# also plots the kernel densities
d <- densityPlot(mtcars$mpg,mtcars$g
,show.bw=T
,kernel=depan
,legend=F # no default legend
,col = c('black','blue')
,lty=c(1,2))
# allows legend outside of plot area to be displayed
par(xpd=T)
# defining location based on the plot coordinates from par('usr')
legend(x=mean(par('usr')[c(1,2)]) # average of range of x-axis
,y=par('usr')[4]+0.015 # top of the y axis with additional shift
,legend = c(paste('0 (bw = ',round(d$`0`['bw'][[1]],4),')',sep='') # extract bw values from saved output and
,paste('1 (bw = ',round(d$`1`['bw'][[1]],4),')',sep='')) # formatting similar to default, except with rounding bw value
,ncol=1 # change to 2 if you want entries beside each other
,lty=c(1,2) # line types, same as above
,col=c('black','blue') # colors, same as above
,lwd=1
,xjust = 0.5 # centers legend at x coordinate
,yjust = 0.5 # centers legend at y coordinate
)
par(xpd=F)
I have a problem, might be a bug in heatmaply or plotly. Colors in the sidebar of a heatmap are not showing the colors I specified. See the code example below, At the end of the code in part # 6) the first plot, plotted using the plot function (simple plot showing the colors), shows the colors correctly (yellow and blue):
The second plot using these colors in a heatmaply side bar (heatmamply side bar with wrong color):
fails to show them correctly and instead what appears to show random colors. In a similar plot with real data there are even red and orange colors in the sidebar (heatmaply sidebar shows red and orange while color range is blue-yellow):
while all codes are generated using a blue yellow color range. Any ideas what might cause this bug and how to show colors in the sidebar consistent with their color code a?
Compare cophenetic similarity between leaves in two trees build on full data and subsample of the data
# 1 ) Generate random data to build trees
set.seed(2015-04-26)
dat <- matrix(rnorm(100), 10, 50) # Dataframe with 50 columns
datSubSample <- dat[, sample(ncol(dat), 30)] #Dataframe with 30 columns sampled from the dataframe with 50
dat_dist1 <- dist(datSubSample)
dat_dist2 <- dist(dat)
hc1 <- hclust(dat_dist1)
hc2 <- hclust(dat_dist2)
# 2) Build two dendrograms, one based on all data, second based a sample of the data (30 out of 50 columns)
dendrogram1 <- as.dendrogram(hc1)
dendrogram2 <- as.dendrogram(hc2)
# 3) For each leave in a tree get cophenetic distance matrix,
# each column represent distance of that leave to all others in the same tree
cophDistanceMatrix1 <- as.data.frame(as.matrix(cophenetic(dendrogram1)))
cophDistanceMatrix2 <- as.data.frame(as.matrix(cophenetic(dendrogram2)))
# 4) Calculate correlation between cophenetic distance of a leave to all other leaves, between two trees
corPerLeave <- NULL # Vector to store correlations for each leave in two trees
for (leave in colnames(cophDistanceMatrix1)){
cor <- cor(cophDistanceMatrix2[leave], cophDistanceMatrix1[leave])
corPerLeave <- c(corPerLeave, unname(cor))
}
# 5) Convert cophenetic correlation to color to show in side bar of a heatmap
corPerLeave <- corPerLeave / max(corPerLeave) #Scale 0 to 1 correlation
byPal <- colorRampPalette(c('yellow', 'blue')) #blue yellow color palette, low correlation = yellow
colCopheneticCor <- byPal(20)[as.numeric(cut(corPerLeave, breaks =20))]
# 6) Plot heatmap with dendrogram with side bar that shows cophenetic correlation for each leave
row_dend <- dendrogram2
x <- as.matrix(dat_dist2)
#### Plot belows use the same color code, normal plot works, however heatmaply shows wrong colors
plot(x = 1:length(colCopheneticCor), y = 1:length(colCopheneticCor), col = colCopheneticCor)
heatmaply(x, colD = row_dend, row_side_colors = colCopheneticCor)
Found the solution, you can use a function for the color with the heatmaply build in row_side_palette parameter. Minimal example code, that can be combined with the code in the question itself to show heatmap with cophenetic distance per leave/species in the sidebar represented by a different color:
ByPal <- colorRampPalette(c('red','blue')) # Bi color palette function to be used in sidebar
heatmaply(m,colD = row_dend, file=fileName1, plot_method= "plotly",colorscale='Viridis',row_side_palette= byPal ,
row_side_colors=data.frame("Correlation cophenetic distances" = corPerLeave, check.names=FALSE))
One problem I did not solve yet is how to show a continuous colorbar in the legend, any suggestions?
I'm using the pvclust package in R to perform bootstrapped hierarchical clustering. The output is then plotted as a hclust object with a few extra features (different default title, p-values at nodes). I've attached a link to one of the plots here.
This plot is exactly what I want, except that I need the leaf labels to be displayed horizontally instead of vertically. As far as I can tell there isn't an option for rotating the leaf labels in plot.hclust. I can plot the hclust object as a dendrogram
(i.e. plot(as.dendrogram(example$hclust), leaflab="textlike") instead of plot(example))
but the leaf labels are then printed in boxes that I can't seem to remove, and the heights of the nodes in the hclust object are lost. I've attached a link to the dendrogram plot here.
What would be the best way to make a plot that is as similar as possible to the standard plot.pvclust() output, but with horizontal leaf labels?
One way to get the text the way you want is to have plot.dendrogram print nothing and just add the labels yourself. Since you don't provide your data, I illustrate with some built-in data. By default, the plot was not leaving enough room for the labels, so I set the ylim to allow the extra needed room.
set.seed(1234)
HC = hclust(dist(iris[sample(150,6),1:4]))
plot(as.dendrogram(HC), leaflab="none", ylim=c(-0.2, max(HC$height)))
text(x=seq_along(HC$labels), y=-0.2, labels=HC$labels)
I've written a function that plots the standard pvclust plot with empty strings as leaf labels, then plots the leaf labels separately.
plot.pvclust2 <- function(clust, x_adj_val, y_adj_val, ...){
# Assign the labels in the hclust object to x_labels,
# then replace x$hclust$labels with empty strings.
# The pvclust object will be plotted as usual, but without
# any leaf labels.
clust_labels <- clust$hclust$labels
clust$hclust$labels <- rep("", length(clust_labels))
clust_merge <- clust$hclust$merge #For shorter commands
# Create empty vector for the y_heights and populate with height vals
y_heights <- numeric(length = length(clust_labels))
for(i in 1:nrow(clust_merge)){
# For i-th merge
singletons <- clust_merge[i,] < 0 #negative entries in merge indicate
#agglomerations of singletons, and
#positive entries indicate agglomerations
#of non-singletons.
y_index <- - clust_merge[i, singletons]
y_heights[y_index] <- clust$hclust$height[i] - y_adj_val
}
# Horizontal text can be cutoff by the margins, so the x_adjust moves values
# on the left of a cluster to the right, and values on the right of a cluster
# are moved to the left
x_adjust <- numeric(length = length(clust_labels))
# Values in column 1 of clust_merge are on the left of a cluster, column 2
# holds the right-hand values
x_adjust[-clust_merge[clust_merge[ ,1] < 0, 1]] <- 1 * x_adj_val
x_adjust[-clust_merge[clust_merge[ ,2] < 0, 2]] <- -1 * x_adj_val
# Plot the pvclust object with empty labels, then plot horizontal labels
plot(clust, ...)
text(x = seq(1, length(clust_labels)) +
x_adjust[clust$hclust$order],
y = y_heights[clust$hclust$order],
labels = clust_labels[clust$hclust$order])
}
I need to make a histogram for my variable which is 'travel time'. And inside that, I need to plot the regression(correlation) data i.e. my observed data vs predicted. And I need to repeat it for different time of day and week(in simple words, make a matrix of such figure using par function). for now, I can draw histograms and arrange that in matrix form but I am facing a problem in inside plot (plotting x and y data together with y=x line, and arranging them within their consecutive histograms plot, in a matrix ). How can I do that, as in the figure below. Any help would be appreciated. Thanks!
One way to do this is to loop over your data and on every iteration create a desired plot. Here is one not very polished example, but it shows the logic how plotting a small plot over larger plot can be done. You will have to tweak the code to get it work in the way you need, but it shouldn't be that difficult.
# create some sample dataset (your x values)
a <- c(rnorm(100,0,1))
b <- c(rnorm(100,2,1))
# create their "y" values counterparts
x <- a + 3
y <- b + 4
# bind the data into two dataframes (explanatory variables in one, explained in the other)
data1 <- cbind(a,b)
data2 <- cbind(x,y)
# set dimensions of the plot matrix
par(mfrow = c(2,1))
# for each of the explanatory - explained pair
for (i in 1:ncol(data2))
{
# set positioning of the histogram
par("plt" = c(0.1,0.95,0.15,0.9))
# plot the histogram
hist(data1[, i])
# set positioning of the small plot
par("plt" = c(0.7, 0.95, 0.7, 0.95))
# plot the small plot over the histogram
par(new = TRUE)
plot(data1[, i], data2[, i])
# add some line into the small plot
lines(data1[, i], data1[, i])
}