Use of mapply() to prevent double nested loop - r

I am trying to compute the density of a bivariate normal distribution for sets of x and y values. Using mapply(), I want to iterate over a set of means (means, means2) and each x and y values specified in the lower = and upper = arguments. I want to use mapply() to provide a nested for-loop (one loop for elements in lower and upper, one for elements in means, and one for elements in means2.
# Params needed for pmvnorm()
sigma1 <- matrix(c(1, 0.5, 0.5, 2), 2)
means <- seq(from = 0, to = 15, by = 0.5)
means_2 <- seq(from = 10, to = 15, by = 0.5)
mapply(
pmvnorm,
lower = c(
c(-Inf, 7, 10),
c(-Inf, seq(from = -3, to = 4, by = 1))
),
upper = c(
c(7, 10, Inf),
c(seq(from = -3, to = 4, by = 1), Inf)
),
mean = c(
means,
means_2
),
MoreArgs = list(sigma = sigma1, keepAttr = FALSE)
)
)
However, this does produces the following error message:
Error in checkmvArgs(lower = lower, upper = upper, mean = mean, corr = corr, :
‘diag(sigma)’ and ‘lower’ are of different length
For simply calculating the density for one set of x and y values and means, the following code works:
pmvnorm(lower = c(0, 1), upper = c(7, 10),
mean = c(1, 1), sigma = matrix(c(1, 0.5, 0.5, 2), 2), keepAttr = FALSE)
Could someone provide me pointers as to how do fix this error?

Related

Prp plot - Coloring positive and negative values differently

I am fitting regression trees via the function rpart(). Given my data, I am going to have both positive and negative estimates in nodes. Is there a way to color them differently?
In particular, what I would like to have is a tree whose nodes are shaded in blue for negative values and in red for positive values, where darker colors signal stronger absolute values.
I attach a minimal reproducible example.
library(rpart)
library(rpart.plot)
# Simulating data.
set.seed(1986)
X = matrix(rnorm(2000, 0, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
# Fitting regression tree.
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# Plotting.
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
# Repeating, with median(y) != 0.
X = matrix(rnorm(2000, 5, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# HERE I NEED HELP!
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
As far as I understood, thanks to the box.palette option, I obtained the result I need in the first setting because median(y) is close to zero.
Indeed, in the second setting I am unhappy: I get blue shades for values less than median(y), and red shades for those above such value. How can I impose zero as the threshold for the two colors?
To be more specific, I would like a command that automatically ensures the two-colors system in any tree.
Ook, I answered my own question. The solution is actually quite simple: if the box.palette option is a two-color diverging palette (as in my example), we can use pal.thresh to set the threshold we want. In my case:
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3,
pal.thresh = 0) # HERE THE SOLUTION!
Even if this is probably bad for me, I will leave here the answer for future users and close the question, rather than deleting it.

Creating a function to loop over data frame to create distributions of significant correlations in R

I have trouble creating a function that is too complex for my R knowledge and I'd appreciate any help.
I have a data set (DRC_epi) consisting of ~800.000 columns of epigenetic data. I'd like to randomly draw 1000 samples consisting of 500 column names each:
set.seed(42)
y <- replicate(1000, {
names(DRC_epi[, sample(ncol(DRC_epi), 500, replace = TRUE)])
})
I want to use these samples to select samples of a different data frame (DRC_epi_pheno) from which I want to create correlations with the outcome variable of my interest (phenotype_aas). So for the first sub sample it would look like this:
library(tidyverse)
library(correlation)
DRC_cor_sign_1 <- DRC_epi_pheno %>%
select(phenotype_aas, any_of(y[,1])) %>%
correlation(method = "spearman", p_adjust = "fdr") %>%
filter(Parameter1 %in% "phenotype_aas") %>%
filter(p <= 0.05) %>%
select(Parameter1, Parameter2, p)
From this result, I want to store the percentage of significant results in an object:
percentage <- data.frame()
percentage() <- length(DRC_cor_sign_1)/500*100
The question I have now is, how can I put it all together and automate it, so that I don't have to run the analyses 1000 times manually?
So that you have an idea of my data, I create here a toy data set that is similar to my real data set:
set.seed(42)
DRC_epi <- data.frame("cg1" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg2" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg3" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg4" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg5" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg6" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg7" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg8" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg9" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg10" = rnorm(n = 10, mean = 1, sd = 1.5))
DRC_epi_pheno <- cbind(DRC_epi, phenotype_aas = sample(x = 0:40, size = 10, replace = TRUE))

Error while plotting ODE phase potriat in R package phaseR

I am trying to plot a two-dimensional phase portrait in R using the phaseR package. This is an example of what I want to do:
Example that works
library(phaseR)
lotkaVolterra <- function(t, y, parameters) {
x <- y[1]
y <- y[2]
lambda <- parameters[1]
epsilon <- parameters[2]
eta <- parameters[3]
delta <- parameters[4]
dy <- numeric(2)
dy[1] <- lambda*x - epsilon*x*y
dy[2] <- eta*x*y - delta*y
list(dy)
}
then when I plot it I get
lotkaVolterra.flowField <- flowField(lotkaVolterra, x.lim = c(0, 5), y.lim = c(0, 10), parameters = c(2, 1, 3, 2), points = 19, add = FALSE)
grid()
lotkaVolterra.nullclines <- nullclines(lotkaVolterra, x.lim = c(-1, 5), y.lim = c(-1, 10), parameters = c(2, 1, 3, 2), points = 500)
y0 <- matrix(c(1, 2, 2, 2, 3, 4), ncol = 2, nrow = 3, byrow = TRUE)
lotkaVolterra.trajectory <- trajectory(lotkaVolterra, y0 = y0, t.end = 10, parameters = c(2, 1, 3, 2), colour = rep("black", 3))
this is the plot I get:
The problem
When I try to do the same with my equation however the vector space does not appear:
WalpeFun <- function(t, y, parameters) {
x <- y[1]
y <- y[2]
k <- parameters[1]
z <- parameters[2]
w <- parameters[3]
b <- parameters[4]
d <- parameters[5]
v <- parameters[6]
a <- parameters[7]
g <- parameters[8]
l <- parameters[9]
e <- parameters[10]
dy <- numeric(2)
dy[1] <- 2.5*(1-(x/k)^z)+g*l+w*e - b*(x*y/d^2+y^2)
dy[2] <- 2.5 * (1 - (y/x + v)^a)
list(dy)
}
Walpe.flowField <-flowField(WalpeFun, x.lim = c(0, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273) , points = 20, add = FALSE)
grid()
Walpe.nullclines <-nullclines(WalpeFun, x.lim = c(0, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273))
y0 <- matrix(c(8.2, 2), ncol = 2, nrow = 1, byrow = TRUE)
Walpe.trajectory <-trajectory(WalpeFun, y0 = y0, t.end = 100, parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273),system = "two.dim", colour = "black")
I get this very different plot:
and get the following error:
Error in if ((dx[i, j] != 0) & (dy[i, j] != 0)) { : missing value where TRUE/FALSE needed
I don't understand why the vectors don show, or why the blue nullcline is missing
Mathematically your x.lim range exceeds the domain where the function can have a value. Because your dy[2] expression has x in the denominator of one of its terms, the function blows up at x == 0 and then there will be an NA in the dy[]-matrix that is internal to the function code. (There's a bit of an ambiguity in that your dy-object is a 2 element vector whereas looking at the code, the calculations are being stored in 2d-matrices named dx and dy.)
flowField #look at the code
png()
Walpe.flowField <-flowField(WalpeFun, x.lim = c(0.01, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273) , points = 20, add = FALSE, system="two.dim")
Walpe.nullclines <-nullclines(WalpeFun, x.lim = c(0.01, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273))
y0 <- matrix(c(8.2, 2), ncol = 2, nrow = 1, byrow = TRUE)
Walpe.trajectory <-trajectory(WalpeFun, y0 = y0, t.end = 100, parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273),system = "two.dim", colour = "black")
dev.off()
I don't know why the nullclines don't appear, but I'm guessing there are features of the function that neither of us understands.

R Package Deepnet: Why sae_dnn_train does not work with large data sets

I am trying sae.dnn.train() with 5000 cases, 55-inputs and 3 hidden layers.
Why function nn.predict returns NaN? (vector)
I am using the following command
Nrow <-5000
Ncol <- 55
v <- c(rnorm(Nrow*Ncol,1, 0.5))
x <- matrix(v, nrow=Nrow, ncol=Ncol)
y <- c(rep(1, Nrow/2), rep(0, Nrow/2))
dnn <- sae.dnn.train(x, y, hidden = c(100,90,80),activationfun = "tanh", learningrate = 0.6, momentum = 0.5, learningrate_scale = 1.0,output = "sigm", sae_output = "linear", numepochs = 10, batchsize = 100, hidden_dropout = 0, visible_dropout = 0)
yy <- nn.predict(dnn, x)

Changing the scales in a wireframe ()

In the following 3D plot I want my 'percentile' axis and 'phi' axis to be scaled from 0 to 1 instead of 0 to 10. I would appreciate it if anyone could help me with this:
x1<-c(13,27,41,55,69,83,97,111,125,139)
x2<-c(27,55,83,111,139,166,194,222,250,278)
x3<-c(41,83,125,166,208,250,292,333,375,417)
x4<-c(55,111,166,222,278,333,389,445,500,556)
x5<-c(69,139,208,278,347,417,487,556,626,695)
x6<-c(83,166,250,333,417,500,584,667,751,834)
x7<-c(97,194,292,389,487,584,681,779,876,974)
x8<-c(111,222,333,445,556,667,779,890,1001,1113)
x9<-c(125,250,375,500,626,751,876,1001,1127,1252)
x10<-c(139,278,417,556,695,834,974,1113,1252,1391)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
df.matrix<-as.matrix(df)
library(lattice)
wireframe(df.matrix,
aspect = c(61/87, 0.4),scales=list(arrows=FALSE,cex=.5,tick.number="10",z=list(arrows=T)),ylim=c(1:10),xlab=expression(phi1),ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10),drape=T,col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),screen=list(z=-60,x=-60))
I tried this (following this post):
x <- data.frame(z = as.vector(df.matrix))
x$x <- rep(seq(0, 1, length.out = 10), 10)
x$y <- rep(seq(0, 1, length.out = 10), 10)
wireframe(z ~ x * y, x,
aspect = c(61/87, 0.4),
scales = list(arrows=FALSE,cex=.5,tick.number = 10, z = list(arrows=T)),
# ylim = 1:10,
xlab=expression(phi1),
ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10), drape=T,
col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),
screen=list(z=-60,x=-60))

Resources