Interpolation (akima) omits part of data when x/y contains duplicate elements - r

I am making a function which receives three vectors, interpolates them using akima and plots them using plot_ly(). Although the general code works, I am encountering issues with scaling of the z-matrix that interp() outputs.
Let me give you an example:
x is a non-NA numeric containing some duplicate values.
y is a non-NA numeric containing some duplicate values.
z is a non-NA continuous vector
Some summary statistics:
> unique(x)
[1] 60 48 36 32 18 24 30 15 12 28 21 19 54 20 16 27 10 39 14 17 9 6 50 8 13
> range(x)
[1] 6 60
> unique(y)
[1] 10.00 10.50 13.50 12.50 14.00 12.00 11.00 9.00 11.50 9.25 13.00 10.25 6.50 6.75 8.25 9.50
[17] 8.00 8.85 9.75 7.90 7.00 8.60 8.75 7.50 8.90 8.50 7.49 7.40 5.50 7.60 7.25 8.35
[33] 6.00 5.00 7.75 7.35 6.30 4.50 5.75 8.40 5.60 5.90 7.74 9.90 6.20 5.80
> range(y)
[1] 4.5 14.0
> head(z)
[1] 2.877272 3.267328 3.175478 3.843326 4.809792 2.827825
> range(z)
[1] 2.316529 28.147808
I implement the baseline function below:
labs = list(x = 'x', y = 'y', z = 'z')
mat = interp(x, y, z, duplicate = 'mean', extrap = T, xo = sort(unique(x)))
plot_ly(x = mat$x, y = mat$y, z = mat$z, type = 'surface') %>%
layout(title = title,
scene = list(xaxis = list(title = labs$x),
yaxis = list(title = labs$y),
zaxis = list(title = labs$z)))
When I run this, the output is the following:
The issue is that a portion of the data is not covered in this picture. For instance, there is a sizeable data portion around x > 50, y < 11 that is omitted by the interpolation (and hence not plotted).
length(x[x > 50])
[1] 304
> length(y[x > 50 & y < 11])
[1] 290
> length(z[x > 50 & y < 11])
[1] 290
I suspected that this has to do with the duplicate x values. Hence, I configured the xo argument in interp() such that:
mat = interp(x, y, z, duplicate = 'mean', xo = sort(unique(x)), decreasing = T)
In which case the previously omitted region is partially plotted. It looks like the following:
Nonetheless, the x and y axes still do not correspond to their respective data ranges (despite data availability). Bottom line: How do I tweak the function such that the surface always extends the full range of x and y?
Best

It turns out that the error arose from plot_ly(). Apparently, the z-matrix cannot be passed straight through from interp() to plot_ly(), as the axis become erroneously passed through to the graph. Hence, the interpolated z-matrix needs to be transformed.
If you use these two functions in combination, ensure to carry out the transformation of z as shown below:
mat = interp(x,y,z, duplicate = 'mean')
x = mat$x
y = mat$y
z = matrix(mat$z, nrow = length(mat$y), byrow = TRUE)
plot_ly(x, y ,z, type = 'surface')

Related

how to remove ticks and values within name panels in hexplom() hexbin plot in R?

I have followed this documentation https://rdrr.io/cran/hexbin/man/hexplom.html to build some scatter plot matrices for my continuous variables.
my data is like this
dep_delay temp dewp humid wind_speed precip visib date
1 39.02 28.04 64.43 11.50 13.24 0.00 10.00 2013-01-01
...
301 43 39.20 30.20 69.88 14.96 0.00 3.0 2013-03-25
...
1253 392 46.04 42.98 88.99 4.60 0.00 10.0 2013-12-21
my code
library(hexbin)
hexplom(~df_w_delays_num[1:7] , data = df_w_delays_num,
xbins = 15, colramp = BTC, varnames = c("dep\ndelay", "temp", "dew", "humidity", "wind\nspeed","wind\ngust", "precipi\ntation", "visibility"))
Does anyone know how to remove the ticks and values obstructing the names of my variables?
Try adding pscales = 0. pscales is documented in ?lattice::panel.pairs.

Plotting dendrogram from vegan::meandist() in ggplot

Vegan::meandist() has a really nice plot method that creates a dendrogram of the mean dissimilarities. How can I incorporate the output into ggplot to have full control over the aesthetics? Here is some sample code using Dune. As an example, I'd like to recreate the dendrogram in ggplot and color each Management level by 'Use' (see factors in Dune.env).
# Species and environmental data
require(vegan)
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
dune_dist <- vegdist(dune, method = "bray", na.rm=T)
dissim <- meandist(dune_dist, grouping = dune.env$Management)
plot(dissim)
From ?vegan:::plot.meandist it is clear hclust function is used for kind = "dendrogram". To recreate:
zz <- hclust(as.dist(dissim), method = "average") #use desired method, "average" is the default in vegan:::plot.meandist
Now to visualize the tree using ggplot:
library(ggdendro)
create a data.frame from the tree:
dd <- as.dendrogram(zz)
dd <- dendro_data(zz)
get the diagonal elements from the dissimilarity matrix since they represent within-cluster variability (see #Jari Oksanens comments bellow):
data.frame(diag = diag(dissim)) %>%
rownames_to_column("label") -> dissim_diag
dissim_diag
label diag
1 BF 0.4159972
2 HF 0.4418115
3 NM 0.6882438
4 SF 0.5813015
now there is a need to change segment data so the leaves do not end at 0 but at the appropriate distance.
segment(dd)
x y xend yend
1 1.875 0.7412760 1.00 0.7412760
2 1.000 0.7412760 1.00 0.0000000
3 1.875 0.7412760 2.75 0.7412760
4 2.750 0.7412760 2.75 0.5960416
5 2.750 0.5960416 2.00 0.5960416
6 2.000 0.5960416 2.00 0.0000000
7 2.750 0.5960416 3.50 0.5960416
8 3.500 0.5960416 3.50 0.4736637
9 3.500 0.4736637 3.00 0.4736637
10 3.000 0.4736637 3.00 0.0000000
11 3.500 0.4736637 4.00 0.4736637
12 4.000 0.4736637 4.00 0.0000000
In other words where x is a whole number and yend is 0 we need to change the yend to the appropriate distance. The following code accomplishes this in two joins. First join adds the label(dd) data and the second join adds dissim_diag data to the segment data:
segment_data <- segment(dd) %>%
left_join(
label(dd),
by = c("xend" = "x",
"yend" = "y")) %>%
left_join(dissim_diag) %>%
mutate(yend = pmax(yend, diag, na.rm = TRUE)) #use as yend whichever is higher yend or diag, ignoring NA.
segment_data
x y xend yend label diag
1 1.875 0.7412760 1.00 0.7412760 <NA> NA
2 1.000 0.7412760 1.00 0.6882438 NM 0.6882438
3 1.875 0.7412760 2.75 0.7412760 <NA> NA
4 2.750 0.7412760 2.75 0.5960416 <NA> NA
5 2.750 0.5960416 2.00 0.5960416 <NA> NA
6 2.000 0.5960416 2.00 0.5813015 SF 0.5813015
7 2.750 0.5960416 3.50 0.5960416 <NA> NA
8 3.500 0.5960416 3.50 0.4736637 <NA> NA
9 3.500 0.4736637 3.00 0.4736637 <NA> NA
10 3.000 0.4736637 3.00 0.4159972 BF 0.4159972
11 3.500 0.4736637 4.00 0.4736637 <NA> NA
12 4.000 0.4736637 4.00 0.4418115 HF 0.4418115
A similar manipulation is needed to create appropriate label cooridnates:
text_data <- label(dd) %>%
left_join(dissim_diag) %>%
mutate(y = diag,
group = factor(rep(c("one", "two"), 2))) #just some random groups to color by
Now the actual plot:
ggplot(segment_data) +
geom_segment(aes(x = x,
y = y,
xend = xend,
yend = yend)) +
theme_dendro() +
theme(axis.line.y = element_line(),
axis.ticks.y = element_line(),
axis.text.y = element_text()) +
geom_text(aes(x = x,
y = y,
label = label,
color = group),
angle = -90, hjust = 0,
data = text_data)
Kudos to #Jari Oksanens for his comments!
The dendextend package was designed to assist with such tasks (and it is based on code from ggdendro - it is a more flexible and simple-to-us codebase at this point than ggdendro).
You can see a section on ggplot2 integration in the vignette.
Because of issues with my own computer (old linux mint), I can't install vegan to reproduce your example.
But if I use similar data, here is an example of usage:
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
dissim <- dist(dune) # a hack, just because I can't get vegan to work
zz <- hclust(as.dist(dissim), method = "average") #use desired method, "average" is the default in vegan:::plot.meandist
The code to get a dendrogram:
library(dendextend)
library(ggplot2)
ggplot(as.dendrogram(zz))
Output (again, not the same analysis as above, but you can use your own code to fix it)
This also supports coloring of branches, labels, rotations, and more.
You can go to the vignette here, to learn more.

Single heatmap on two symetric matrices with different colours and scales R

I want to achieve the same end goal as this question: Create a single heatmap based on two symmetric matrices in R but to take it further than the answer currently provided.
The answer given does not explain how one would go about having different colours for the upper and lower sections of the matrix and different scales?
Here is the example dataset:
library(Matrix)
set.seed(123)
s1<-forceSymmetric(matrix(round(rnorm(25),2),5))
colnames(s1)<-LETTERS[1:5]
rownames(s1)<-LETTERS[6:10]
diag(s1)<-1
s2<-forceSymmetric(matrix(round(rbinom(25,25,0.3),2),5))
colnames(s2)<-LETTERS[1:5]
rownames(s2)<-LETTERS[6:10]
diag(s2)<-1
s1
# 5 x 5 Matrix of class "dsyMatrix"
# A B C D E
# F 1.00 1.72 1.22 1.79 -1.07
# G 1.72 1.00 0.36 0.50 -0.22
# H 1.22 0.36 1.00 -1.97 -1.03
# I 1.79 0.50 -1.97 1.00 -0.73
# J -1.07 -0.22 -1.03 -0.73 1.00
s2
# 5 x 5 Matrix of class "dsyMatrix"
# A B C D E
# F 1 6 8 7 9
# G 6 1 5 9 8
# H 8 5 1 10 9
# I 7 9 10 1 1
# J 9 8 9 1 1
The suggested answer suggests to add the two matrices together as so:
#Get upper diagonal
reverse = s1[,ncol(s1):1]
diag(reverse) = 0
reverse[lower.tri(reverse, diag = FALSE)] <- 0
upper = reverse[,ncol(reverse):1]
# Get lower diagonal
reverse1 = s2[,ncol(s2):1]
diag(reverse1) = 0
reverse1[upper.tri(reverse1, diag = FALSE)] <- 0
upper1 = reverse1[,ncol(reverse1):1]
# Add them together
merged = as.matrix(upper+upper1)
merged
A B C D E
F 1.00 1.72 1.22 1.79 0
G 1.72 1.00 0.36 0.00 8
H 1.22 0.36 0.00 10.00 9
I 1.79 0.00 10.00 1.00 1
J 0.00 8.00 9.00 1.00 1
It then suggests using heatmap(merged) - however, how would you go about having different colours and scales for the upper and lower part of the matrix?
I am happy with using any package including ggplot2 to make this work.
Thanks in advance!
You can subset the relevant parts of the matrix in the data argument of a layer, and use {ggnewscale} to assign different fill scales to different layers. The trick is to declare a fill scale before adding new_scale_fill(), otherwise the order of operations goes wrong (which usually doesn't matter a lot, but here they do).
You can then tweak every individual scale. In the example below I just tweaked the palettes, but you can also adjust limits, breaks, labels etc.
# Assuming code from question has been executed and we have a 'merged' in memory
library(ggplot2)
library(ggnewscale)
# Wide matrix to long dataframe
# Later, we'll be relying on the notion that the dimnames have been
# converted to factor variables to separate out the upper from the lower
# matrix.
df <- reshape2::melt(merged)
ggplot(df, aes(Var1, Var2)) +
# The first layer, with its own fill scale
geom_raster(
data = ~ subset(.x, as.numeric(Var1) > as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Blues") +
# Declare new fill scale for the second layer
new_scale_fill() +
geom_raster(
data = ~ subset(.x, as.numeric(Var1) < as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Reds") +
# I'm not sure what to do with the diagonal. Make it grey?
new_scale_fill() +
geom_raster(
data = ~ subset(.x, as.numeric(Var1) == as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Greys", guide = "none")
In my opinion #teunbrand's answer is what you're looking for, but another potential option is to use the ComplexHeatmap package, e.g. based on one of the examples in the docs:
library(Matrix)
set.seed(123)
s1<-forceSymmetric(matrix(round(rnorm(25),2),5))
colnames(s1)<-LETTERS[1:5]
rownames(s1)<-LETTERS[6:10]
diag(s1)<-1
s2<-forceSymmetric(matrix(round(rbinom(25,25,0.3),2),5))
colnames(s2)<-LETTERS[1:5]
rownames(s2)<-LETTERS[6:10]
diag(s2)<-1
#Get upper diagonal
reverse = s1[,ncol(s1):1]
diag(reverse) = 0
reverse[lower.tri(reverse, diag = FALSE)] <- 0
upper = reverse[,ncol(reverse):1]
# Get lower diagonal
reverse1 = s2[,ncol(s2):1]
diag(reverse1) = 0
reverse1[upper.tri(reverse1, diag = FALSE)] <- 0
upper1 = reverse1[,ncol(reverse1):1]
# Add them together
m = as.matrix(upper+upper1)
m
#> A B C D E
#> F 1.00 1.72 1.22 1.79 0
#> G 1.72 1.00 0.36 0.00 8
#> H 1.22 0.36 0.00 10.00 9
#> I 1.79 0.00 10.00 1.00 1
#> J 0.00 8.00 9.00 1.00 1
library(ComplexHeatmap)
#> Loading required package: grid
#> ========================================
#> ComplexHeatmap version 2.8.0
#> Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
#> Github page: https://github.com/jokergoo/ComplexHeatmap
#> Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
#>
#> If you use it in published research, please cite:
#> Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
#> genomic data. Bioinformatics 2016.
#>
#> The new InteractiveComplexHeatmap package can directly export static
#> complex heatmaps into an interactive Shiny app with zero effort. Have a try!
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(ComplexHeatmap))
#> ========================================
library(circlize)
#> ========================================
#> circlize version 0.4.13
#> CRAN page: https://cran.r-project.org/package=circlize
#> Github page: https://github.com/jokergoo/circlize
#> Documentation: https://jokergoo.github.io/circlize_book/book/
#>
#> If you use it in published research, please cite:
#> Gu, Z. circlize implements and enhances circular visualization
#> in R. Bioinformatics 2014.
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(circlize))
#> ========================================
col1 = colorRamp2(c(-1, 10), c("white", "red"))
col2 = colorRamp2(c(-1, 10), c("white", "blue3"))
# here reordering the symmetric matrix is necessary
od = hclust(dist(m))$order
m = m[od, od]
ht = Heatmap(m, rect_gp = gpar(type = "none"), show_heatmap_legend = FALSE,
cluster_rows = FALSE, cluster_columns = FALSE,
layer_fun = function(j, i, x, y, w, h, fill) {
l = i > j
grid.rect(x[l], y[l], w[l], h[l],
gp = gpar(fill = col1(pindex(m, i[l], j[l])), col = NA))
l = i < j
grid.rect(x[l], y[l], w[l], h[l],
gp = gpar(fill = col2(pindex(m, i[l], j[l])), col = NA))
})
draw(ht, heatmap_legend_list = list(
Legend(title = "Group_A", col_fun = col1),
Legend(title = "Group_B", col_fun = col2)
))
Created on 2022-03-07 by the reprex package (v2.0.1)

How to add "labels" and "value" arguments with highcharter

I want to specify which column I want as label and value in a pie chart
The problem is when I use the function hc_add_series_labels_values() which accept this 2 argument I have no output because seems to be deprecated.
The hc_add_series() seems to automaticly the 2 column depending on order, type ...
This package is not well documented I couldnt find what I need
Thanks
In my example I want to specify the name2 column as label
and high as value, how to do that ?
library(dplyr)
library(highcharter)
n <- 5
set.seed(123)
colors <- c("#d35400", "#2980b9", "#2ecc71", "#f1c40f", "#2c3e50", "#7f8c8d")
colors2 <- c("#000004", "#3B0F70", "#8C2981", "#DE4968", "#FE9F6D", "#FCFDBF")
df <- data.frame(x = seq_len(n) - 1) %>%
mutate(
y = 10 + x + 10 * sin(x),
y = round(y, 1),
z = (x*y) - median(x*y),
e = 10 * abs(rnorm(length(x))) + 2,
e = round(e, 1),
low = y - e,
high = y + e,
value = y,
name = sample(fruit[str_length(fruit) <= 5], size = n),
color = rep(colors, length.out = n),
segmentColor = rep(colors2, length.out = n)
)
df$name2 <- c("mos", "ok", "kk", "jji", "hufg")
## x y z e low high value name color segmentColor
## 1 0 10.0 -25.6 7.6 2.4 17.6 10.0 plum #d35400 #000004
## 2 1 19.4 -6.2 4.3 15.1 23.7 19.4 lemon #2980b9 #3B0F70
## 3 2 21.1 16.6 17.6 3.5 38.7 21.1 mango #2ecc71 #8C2981
## 4 3 14.4 17.6 2.7 11.7 17.1 14.4 pear #f1c40f #DE4968
## 5 4 6.4 0.0 3.3 3.1 9.7 6.4 apple #2c3e50 #FE9F6D
highchart() %>%
hc_chart(type = "pie") %>%
hc_add_series(df, name = "Fruit Consumption", showInLegend = FALSE)
For People who have same problem you can check this :
This package seems to work like ggplot2, the function hchart do the job with the hcaes argument
hchart(df, type = "pie", hcaes(name2, high))
Output :

Sorting pairs by first coordinate

I have the following vectors:
X<-c(140,140,130,109,124,114,65,162,150,0)
Y<-c(30.65,6.45,17.74,11.29,3.23,3.23,3.23,8.06,14.52,1.61)
What I would like to do is assign each entry in X to the corresponding entry in Y, and then order them by X. For example, if I had
J<-c(10,40,20)
K<-c(9,9,2)
I would like it to give me
Jo = (10,20,40)
Ko = (9,2,9)
How do I do this in R? Thanks for the help.
Use the order() function:
X <- c(140,140,130,109,124,114,65,162,150,0)
Y <- c(30.65,6.45,17.74,11.29,3.23,3.23,3.23,8.06,14.52,1.61)
ord <- order(X)
(X2 <- X[ord])
## [1] 0 65 109 114 124 130 140 140 150 162
(Y2 <- Y[ord])
## [1] 1.61 3.23 11.29 3.23 3.23 17.74 30.65 6.45 14.52 8.06
(Don't really need to save ord if you re-order Y first; could use Y2 <- Y[order(X)]; X2 <- sort(X) instead.)

Resources