Iterate a numeric vector in a list - r

I have a list NList of numeric vector like
[[1]]
[1] 1959 9 4 62
[[2]]
[1] 2280 2 13
[[3]]
[1] 15 4 13
[[4]]
[1] 2902 178 13
and the structure is like
list(c(1959, 13), c(2280, 178, 13), c(2612, 178, 13), c(2902,
178, 13), c(2389, 178, 13), c(216, 736, 13), c(2337, 178, 13),
c(2639, 2126, 13), c(2924, 676, 178, 13), c(2416, 674, 178,
13), c(2223, 13), c(842, 178, 13), c(2618, 1570, 178, 13),
c(854, 178, 13), c(1847, 178, 13), c(2529, 178, 13), c(511,
178, 13), c(2221, 736, 13), c(415, 674, 178, 13), c(2438,
178, 13), c(2127, 178, 13), c(1910, 2126, 13), c(1904, 674,
178, 13), c(2310, 674, 178, 13), c(1732, 178, 13), c(1843,
178, 13), c(2539, 178, 13), c(1572, 676, 178, 13), c(1616,
876, 13).....)
I want to iterate the numeric vectors in this list, I would like to do something as:
sum<- 0
index<-1
list1 <- apply(NList,1,function (i){
#I want to get each of the numeric vector here
row <- NList[i]
#then I want to iterate the numeric vector for some calculation.
#I am expecting, for [[1]], I get f(1959,9)+f(9,4)+f(4,62), in which f is my customized function, below I use a simple multiple as example
for (j in (1:(length(row)-1)))
{
origin <- row[j]
dest <- row[j+1]
#a simple calculation example...I am expecting an array of sum which is the calculation result
sum[index] <- sum[index] + origin*dest
}
index <- index+1
})
but it does not work and returns:
dim(X) must have a positive length
The lapply is not working for me and return sum as 0...
listR1 <- lapply(NList,function (i){
row <- i
for (j in 1:length(row))
{origin <- row[j]
dest <- row[j+1]
sum[index] <- sum[index] + origin*dest
}
})
Did I miss something? How can I do this?
Thanks!

I took the function out of your apply statement to look at it a bit closer.
f=function(Row)
{
Sum<- 0
for (j in 1:(length(Row)-1) )
{
Sum<- j + Row[j]*Row[j+1]
}
Sum # returns the Sum
}
Then I can apply the function to each row with:
list1 <- lapply(NList,f)

Okay, so this code would work:
f=function(a,b) sum(a,b)
test.func=function (i){
for (j in 1:(length(i)-1))
ret.val[j]=f(i[j],i[j+1])
ret.val
}
# Use lapply for a list.
lapply(NList,test.func)
Or you could do it in one line:
lapply(NList,apply(seq_along(i)[-length(i)],function(x) f(i[x],i[x+1])))

Related

R:mgcv add colorbar to 2D heatmap of GAM

I'm fitting a gam with mgcv and plot the result with the default plot.gam() function. My model includes a 2D-smoother and I want to plot the result as a heatmap. Is there any way to add a colorbar for the heatmap?
I've previously looked into other GAM potting packages, but none of them provided the necessary visualisation. Please note, this is just a simplification for illustration purposes; the actual model (and reporting needs) is much more complicated
edited: I initially had swapped y and z in my tensor product, updated to reflect the correct version both in the code and the plot
df.gam<-gam(y~te(x,z), data=df, method='REML')
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), rug=F)
sample data:
structure(list(x = c(3, 17, 37, 9, 4, 11, 20.5, 11.5, 16, 17,
18, 15, 13, 29.5, 13.5, 25, 15, 13, 20, 20.5, 17, 11, 11, 5,
16, 13, 3.5, 16, 16, 5, 20.5, 2, 20, 9, 23.5, 18, 3.5, 16, 23,
3, 37, 24, 5, 2, 9, 3, 8, 10.5, 37, 3, 9, 11, 10.5, 9, 5.5, 8,
22, 15.5, 18, 15, 3.5, 4.5, 20, 22, 4, 8, 18, 19, 26, 9, 5, 18,
10.5, 30, 15, 13, 27, 19, 5.5, 18, 11.5, 23.5, 2, 25, 30, 17,
18, 5, 16.5, 9, 2, 2, 23, 21, 15.5, 13, 3, 24, 17, 4.5), z = c(144,
59, 66, 99, 136, 46, 76, 87, 54, 59, 46, 96, 38, 101, 84, 64,
92, 56, 69, 76, 93, 109, 46, 124, 54, 98, 131, 89, 69, 124, 105,
120, 69, 99, 84, 75, 129, 69, 74, 112, 66, 78, 118, 120, 103,
116, 98, 57, 66, 116, 108, 95, 57, 41, 20, 89, 61, 61, 82, 52,
129, 119, 69, 61, 136, 98, 94, 70, 77, 108, 118, 94, 105, 52,
52, 38, 73, 59, 110, 97, 87, 84, 119, 64, 68, 93, 94, 9, 96,
103, 119, 119, 74, 52, 95, 56, 112, 78, 93, 119), y = c(96.535,
113.54, 108.17, 104.755, 94.36, 110.74, 112.83, 110.525, 103.645,
117.875, 105.035, 109.62, 105.24, 119.485, 107.52, 107.925, 107.875,
108.015, 115.455, 114.69, 116.715, 103.725, 110.395, 100.42,
108.79, 110.94, 99.13, 110.935, 112.94, 100.785, 110.035, 102.95,
108.42, 109.385, 119.09, 110.93, 99.885, 109.96, 116.575, 100.91,
114.615, 113.87, 103.08, 101.15, 98.68, 101.825, 105.36, 110.045,
118.575, 108.45, 99.21, 109.19, 107.175, 103.14, 94.855, 108.15,
109.345, 110.935, 112.395, 111.13, 95.185, 100.335, 112.105,
111.595, 100.365, 108.75, 116.695, 110.745, 112.455, 104.92,
102.13, 110.905, 107.365, 113.785, 105.595, 107.65, 114.325,
108.195, 96.72, 112.65, 103.81, 115.93, 101.41, 115.455, 108.58,
118.705, 116.465, 96.89, 108.655, 107.225, 101.79, 102.235, 112.08,
109.455, 111.945, 104.11, 94.775, 110.745, 112.44, 102.525)), row.names = c(NA,
-100L), class = "data.frame")
It would be easier (IMHO) to do this reliably within the ggplot2 ecosphere.
I'll show a canned approach using my {gratia} package but also checkout {mgcViz}. I'll also suggest a more generic solution using tools from {gratia} to extra information about your model's smooths and then plot them yourself using ggplot().
library('mgcv')
library('gratia')
library('ggplot2')
library('dplyr')
# load your snippet of data via df <- structure( .... )
# then fit your model (note you have y as response & in the tensor product
# I assume z is the response below and x and y are coordinates
m <- gam(z ~ te(x, y), data=df, method='REML')
# now visualize the mode using {gratia}
draw(m)
This produces:
{gratia}'s draw() methods can't plot everything yet, but where it doesn't work you should still be able to evaluate the data you need using tools in {gratia}, which you can then plot with ggplot() itself by hand.
To get values for your smooths, i.e. the data behind the plots that plot.gam() or draw() display, use gratia::smooth_estimates()
# dist controls what we do with covariate combinations too far
# from support of the data. 0.1 matches mgcv:::plot.gam behaviour
sm <- smooth_estimates(m, dist = 0.1)
yielding
r$> sm
# A tibble: 10,000 × 7
smooth type by est se x y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 te(x,y) Tensor NA 35.3 11.5 2 94.4
2 te(x,y) Tensor NA 35.5 11.0 2 94.6
3 te(x,y) Tensor NA 35.7 10.6 2 94.9
4 te(x,y) Tensor NA 35.9 10.3 2 95.1
5 te(x,y) Tensor NA 36.2 9.87 2 95.4
6 te(x,y) Tensor NA 36.4 9.49 2 95.6
7 te(x,y) Tensor NA 36.6 9.13 2 95.9
8 te(x,y) Tensor NA 36.8 8.78 2 96.1
9 te(x,y) Tensor NA 37.0 8.45 2 96.4
10 te(x,y) Tensor NA 37.2 8.13 2 96.6
# … with 9,990 more rows
In the output, x and y are a grid of values over the range of both covariates (the number of points in the grid in each covariate is controlled by n such that the grid for a 2d tensor product smooth is of size n by n). est is the estimated value of the smooth at the values of the covariates and se its standard error. For models with multiple smooths, the smooth variable uses the internal label that {mgcv} gives each smooth - these are the labels used in the output you get from calling summary() on your GAM.
We can add a confidence interval if needed using add_confint().
Now you can plot your smooth(s) by hand using ggplot(). At this point you have two options
if draw() can handle the type of smooth you want to plot, you can use the draw() method for that object and then build upon it, or
plot everything by hand.
Option 1
# evaluate just the smooth you want to plot
smooth_estimates(m, smooth = "te(x,y)", dist = 0.1) %>%
draw() +
geom_point(data = df, alpha = 0.2) # add a point layer for original data
This pretty much gets you what draw() produced when given the model object itself. And you can add to it as if it were a ggplot object (which is not the case of the objects returned by gratia:::draw.gam(), which is wrapped by {patchwork} and needs other ways to interact with the plots).
Option 2
Here you are in full control
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_viridis_c(option = "plasma")
which produces
A diverging palette is likely better for this, along the lines of the one gratia:::draw.smooth_estimates uses
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_contour(aes(z = est), colour = "black") +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_distiller(palette = "RdBu", type = "div") +
expand_limits(fill = c(-1,1) * abs(max(sm[["est"]])))
which produces
Finally, if {gratia} can't handle your model, I'd appreciate you filing a bug report here so that I can work on supporting as many model types as possible. But do try {mgcViz} as well for an alternative approach to visualsing GAMs fitted using {mgcv}.
A base plot solution would be to use fields::image.plot directly. Unfortunately, it require data in a classic wide format, not the long format needed by ggplot.
We can facilitate plotting by grabbing the object returned by plot.gam(), and then do a little manipulation of the object to get what we need for image.plot()
Following on from #Anke's answer then, instead of plotting with plot.gam() then using image.plot() to add the legend, we proceed to use plot.gam() to get what we need to plot, but do everything in image.plot()
plt <- plot(df.gam)
plt <- plt[[1]] # plot.gam returns a list of n elements, one per plot
# extract the `$fit` variable - this is est from smooth_estimates
fit <- plt$fit
# reshape fit (which is a 1 column matrix) to have dimension 40x40
dim(fit) <- c(40,40)
# plot with image.plot
image.plot(x = plt$x, y = plt$y, z = fit, col = heat.colors(999, rev = TRUE))
contour(x = plt$x, y = plt$y, z = fit, add = TRUE)
box()
This produces:
You could also use the fields::plot.surface() function
l <- list(x = plt$x, y = plt$y, z = fit)
plot.surface(l, type = "C", col = heat.colors(999, rev = TRUE))
box()
This produces:
See ?fields::plot.surface for other arguments to modify the contour plot etc.
As shown, these all have the correct range on the colour bar. It would appear that #Anke's version the colour bar mapping is off in all of the plots, but mostly just a little bit so it wasn't as noticeable.
Following Gavin Simpson's answer and this thread (How to add colorbar with perspective plot in R), I think I've come up with a solution that uses plot.gam() (though I really love that {gratia} takes it into a ggplot universe and will definitely look more into that)
require(fields)
df.gam<-gam(y~te(x,z), data=df, method='REML')
sm <- as.data.frame(smooth_estimates(df.gam, dist = 0.1))
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), contour.col='black', rug=F, main='', cex.lab=1.75, cex.axis=1.75)
image.plot(legend.only=T, zlim=range(sm$est), col=heat.colors(999, rev =T), legend.shrink = 0.5, axis.args = list(at =c(-10,-5,0,5, 10, 15, 20)))
I hope I understood correctly that gratia:smooth_estimates() actually pulls out the partial effects.
For my model with multiple terms (and multiple tensor products), this seems to work nicely by indexing the sections of the respective terms in sm. Except for one, where the colorbar and the heatmap aren't quite matching up. I can't provide the actual underlaying data, but add that plot for illustration in case anyone has any idea. I'm using the same approach as outlined above. In the colorbar, dark red is at 15-20, but in the heatmap the isolines just above 0 already correspond with the dark red (while 0 is dark yellow'ish in the colorbar).

SVD function in R. I want to get the singular values $d from a list of datasets. I want to put it in a table form

I want to use the svd function to get the singular values of a large datasets in a list.
When I use the svd function in a single matrix, I am able to use $d and get the values, but for the list I cannot get the output.
Here is the code for a matrix and the output.
tb = matrix(c(64, 112, 59, 174, 111, 37,
39, 135, 115, 92, 161, 70,
93, 119, 50, 142, 20, 114,
149, 191, 62, 17, 145, 21,
60, 37, 29, 74, 42, 242), nrow = 5, ncol = 6, byrow = TRUE)
## Compute SVD of tb
#
my_svd = svd(tb)
## Retrieve (save) singular values
#
sv = my_svd$d
## Compute ratio between "1st/2nd" & "2nd/3rd" singular values
#
ratios = matrix(c(sv[1]/sv[2], sv[2]/sv[3]), nrow = 1)
colnames(ratios) = c("sv1/sv2", "sv2/sv3")
## Print ratios
ratios
How do I apply this to the list of dataset?
my current code
svdresult <- lapply(d1,svd)
svdresult
d1 is my list of dataset
How do I get svdresult$d on the list of datasets.
Thanks in advance
Maybe something like the following?
get_svd_ratios <- function(data) {
sv = svd(data)$d
n = length(sv)
ratios = matrix(sv[1:(n - 1)] / sv[2:n] , nrow = 1)
names = paste(
paste0("sv", 1:(n - 1)),
paste0("sv", 2:n),
sep = "/"
)
colnames(ratios) = names
return(ratios)
}
lapply(list(tb), get_svd_ratios)
# [[1]]
# sv1/sv2 sv2/sv3 sv3/sv4 sv4/sv5
# [1,] 2.261771 1.680403 1.29854 2.682195

Nested vectors in a function that is applied to dataframe in R

I have the basic data below (dput provided at the end of the question):
P <- 72
E_c <- 80000
head (deflections)
L I_comp
1 60 17299.21
2 70 25760.94
3 80 32734.69
4 90 51343.59
5 100 60167.30
6 110 64887.87
I want to find the maximum deflection, using the equation:
Where b is defined as b = L - a and a is a vector that can be defined as
a <- seq(10,L,10)
and x is a nested sequence whitin a such that
x <- seq(1,a,1)
I have written the function below to calculate the maximum deflection delta at all x values for all a values.
delta_less <- function(x){
a_seq <- seq(10,L,10)
L <- x[1]
I_comp <- x[2]
X_seq <- seq(a_seq)
delta <- ((P*(L-a_seq)*X_seq)/(6*E_c*I_comp*L))*(L^2-(L-a_seq)^2-X_seq^2)
c(val = max(delta), pos = which.max(delta))
}
test <- cbind(deflections, vars = t(apply(deflections,1,delta_less)))
The problem is that my code is not recognizing the nested sequence. It only works if I give a_seq a singular numerical value.
Data provided:
dput(deflections)
structure(list(L = c(60, 70, 80, 90, 100, 110, 120, 130, 140,
60, 70, 80, 90, 100, 110, 120, 130, 140, 60, 70, 80, 90, 100,
110, 120, 130, 140, 60, 70, 80, 90, 100, 110, 120, 130, 140),
I_comp = c(17299.2063000114, 25760.9420686703, 32734.6949444858,
51343.5889683982, 60167.3001695375, 64887.8729936444, 83451.9473750139,
103000.852590734, 112631.744898846, 23539.3443398425, 23544.8558575609,
32808.739059987, 40880.4964809276, 60171.1172692296, 64894.6680546358,
83469.4777437875, 103114.132264558, 120223.534252571, 23539.3443398425,
25772.7904955165, 32810.8231970971, 51345.5936366056, 74673.7079705815,
80752.3694583712, 103114.132264558, 103114.132264558, 147386.853127916,
23539.3443398425, 25767.8092758621, 40881.3376639665, 55608.9342154021,
80694.6568665257, 80752.3694583712, 119068.355471205, 119402.817753462,
147386.853127916)), class = "data.frame", row.names = c(NA,
-36L))

Replicating a repeated measures compound symmetry structure from SAS to R using lme

I'm trying to replicate an analysis in a paper by Milliken (https://sci-hub.tw/10.1016/s0169-7161(03)22007-1, section 8) from SAS code to R. I'm quite stumped to be honest. It's a split plot repeated measure design where the correlation structure is a compound symmetry structure. Below is the data and SAS code and it's results.
Data
library(magrittr)
library(tidyr)
library(dplyr)
dta <- data.frame(
tmp = c(rep(900, 3), rep(1000, 3), rep(1100, 3)),
posit = rep(c("top", "mid", "bot"), 3),
lot_1 = c(189, 211, 178, 213, 220, 197, 194, 212, 189),
lot_2 = c(195, 206, 162, 199, 230, 198, 215, 208, 193),
lot_3 = c(183, 210, 173, 189, 228, 202, 194, 201, 180),
lot_4 = c(187, 223, 181, 183, 221, 168, 232, 215, 192),
lot_5 = c(173, 191, 149, 202, 213, 151, 190, 198, 182)
)
dta <- dta %>%
tidyr::pivot_longer(., cols = c(lot_1, lot_2, lot_3, lot_4, lot_5),
names_to = "Lot") %>%
dplyr::mutate(Lot = as.factor(Lot),
tmp = as.factor(tmp),
lot_tmp = as.factor(paste0(Lot, "-", tmp)))
SAS Code
proc mixed data = dta cl covtest ic;
class Posit temp lot;
model thick = temp Posit Posit*temp/ddfm = kr; random lot;
repeated posit/type = cs subject = lot*temp r rcorr
Output from SAS
R code attempt
## this works but isn't doing the same thing as above
library(nlme)
m1 <- lme(
value ~ temp + posit + temp:posit,
random = ~ 1 | lot ,
correlation = corCompSymm(form=~1|lot),
data = dta, method = "REML"
)
I'm stuck at this point on how to add a repeated structure to the posit factor.
Thank you for the help!

Spectral separability using Jeffries-Matusita distance method in R

I am writing to analyse separability on my data using j-m (jeffries matusita) distance method in R. The main goal is to calculate j-m distance between my variables which are more that two.
Assuming i have the following data on reflectance, the main task is showing separability between the four fruit trees at the chosen wavelengths.
orange <- c(37, 27, 45, 30, 57, 48, 34, 50, 20, 53, 33, 25, 51),
lemon <- c(12, 17, 20, 32, 16, 30, 30, 37, 25, 42, 13, 56, 13),
pear <- c(41, 19, 15, 12, 15, 55, 33, 37, 40, 40, 43, 46, 54),
apple <- c(38, 39, 12, 60, 34, 47, 13, 24, 30, 19, 57, 54, 55)
Wavelength <- c(354, 576, 842, 853, 918, 948, 1142, 1221, 1253, 1322, 1545, 1684, 2407)
So you need a distance method that accepts an arbitrary distance function, and you need a definition of JM distance. The latter is available in this post. For the former we use the dist(...) function in package proxy, which allows specifying an arbitrary function to calculate pairwise distances.
jm.dist <- function ( Vector.1 , Vector.2 ) {
# this function adapted from:
# https://stats.stackexchange.com/questions/78849/measure-for-separability
Matrix.1 <- as.matrix (Vector.1)
Matrix.2 <- as.matrix (Vector.2)
mean.Matrix.1 <- mean ( Matrix.1 )
mean.Matrix.2 <- mean ( Matrix.2 )
mean.difference <- mean.Matrix.1 - mean.Matrix.2
cv.Matrix.1 <- cov ( Matrix.1 )
cv.Matrix.2 <- cov ( Matrix.2 )
p <- ( cv.Matrix.1 + cv.Matrix.2 ) / 2
# calculate the Bhattacharryya index
bh.distance <- 0.125 *t ( mean.difference ) * p^ ( -1 ) * mean.difference +
0.5 * log (det ( p ) / sqrt (det ( cv.Matrix.1 ) * det ( cv.Matrix.2 )))
# calculate Jeffries-Matusita
# following formula is bound between 0 and 2.0
jm.distance <- 2 * ( 1 - exp ( -bh.distance ) )
# also found in the bibliography:
# jm.distance <- 1000 * sqrt ( 2 * ( 1 - exp ( -bh.distance ) ) )
# the latter formula is bound between 0 and 1414.0
return(jm.distance)
}
df <- data.frame(orange,lemon,pear,apple)
library(proxy)
dist(df,method=jm.dist,by_rows=FALSE)
# orange lemon pear
# lemon 0.24530946
# pear 0.04906073 0.09034789
# apple 0.05878462 0.14807198 0.01435419
Note that once you load the proxy library you've masked the default dist(...) function.

Resources