Plotting quantile regression estimates as function of their quantiles in R - r

The following code is a reproducible example based on the Swiss dataset (datasets::swiss).
My question is that I would like to plot the betas, which are the quantile regression estimates provided by the smrq() function on the y-axis, according to the tau values (the quantiles) ranging from [0:1]; but unfortunately I am not succeeding. Many thanks for the precious help, of course I can edit my post if I have forgotten anything.
Code:
library(quantreg)
library(limma)
#Generalized Functions
minimize.logcosh <- function(par, X, y, tau) {
diff <- y-(X %*% par)
check <- (tau-0.5)*diff+(0.5/0.7)*logcosh(0.7*diff)+0.4
return(sum(check))
}
smrq <- function(X, y, tau){
p <- ncol(X)
op.result <- optim(
rep(0, p),
fn = minimize.logcosh,
method = 'BFGS',
X = X,
y = y,
tau = tau
)
beta <- op.result$par
return(beta)
}
run_smrq <- function(data, fml, response) {
x <- model.matrix(fml, data)[,-1]
y <- data[[response]]
X <- cbind(x, rep(1,nrow(x)))
n <- 99
betas <- sapply(1:n, function(i) smrq(X, y, tau=i/(n+1)))
return(betas)
}
#Callers
swiss <- datasets::swiss
smrq_models <- run_smrq(data=swiss, fml=Fertility~., response="Fertility")
#langtang's solution gives this graphical output:

Without making any comment on the "correctness" of the output of run_smrq(), you can try this:
library(dplyr)
library(tidyr)
library(ggplot2)
as.data.frame(t(smrq_models)) %>%
mutate(q=row_number()) %>%
pivot_longer(!q,names_to="beta",values_to = "coef") %>%
ggplot(aes(q,coef,color=beta)) +
geom_point()
Also, if the betas are on largely different scales, this your visualization approach may not be the most appropriate. As as starting point, you might add + facet_wrap(~beta, scales="free_y")

Related

Constructing confidence intervals for trimmed means in R

I'd like to test the coverage probabilities for trimmed means, I am using the formula form Wilcox book for confidence intervals:
Confidence interval
The s_w is Winsorised variance and γ is the proportion coefficient, in my code it's denoted as alpha. The problem is, that the code, I have made outputs confidence intervals with 0 always in them, so that the coverage probability is 1. So, I think there is some error in the construction.
Code:
sample_var <- function(data, alpha){
n <- length(data)
data <- sort(data)
data_t <- data[(floor(n*alpha)+1):(n-floor(alpha*n))]
m <- length(data_t)
t_mean <- mean(data_t)
sigma <- (1/(1-2*alpha)^2)* ((1/n) *sum((data_t-t_mean)^2)+ alpha*(data_t[1]-t_mean)^2 +
alpha*(data_t[m]-t_mean)^2)
sigma
}
sample_var <- Vectorize(sample_var, vectorize.args = "alpha")
conf_int <- function(data,alpha){
a <- floor(alpha * n)
n <- length(data)
df <- n-2*a-1
data_t <- data[a:(n-a)]
t_mean <- mean(data_t)
t_quantile <- qt(p = alpha, df = df)
sw <- sample_var(data = data, alpha = alpha)
ul <- t_mean + t_quantile * sw / ((1-2*alpha)*sqrt(n))
ll <- t_mean - t_quantile * sw / ((1-2*alpha)*sqrt(n))
c(ll, ul)
}
Maybe someone sees the error?
EDIT:
Here I tried to construct the intervals using wilcox.test function, but I don't know whether it accurately constructs the interval for the trimmed mean. Furthermore, no matter which alpha I use, for the given data set, I get the same interval. So, I suppose that the subset argument is wrong.
set_seed(1)
data <- rnorm(100)
wilcox_test <- function(data, alpha){
n <- length(alpha)
a <- floor(alpha*n)+1
b <- n-floor(alpha)
wilcox.test(data, subset = data[a:b], conf.int = TRUE)
}
OK...with rnorm(100) and set.seed(1)
Close-ish...
set.seed(1) # note set.seed() is what you want here, I think.
data <- rnorm(100)
wilcox_test_out <- wilcox.test(data, subset = data[a:b], conf.int = .95)
summary(wilcox_test_out)
# Note the CI's are in wilcox_test_out$conf.int for further use should you need them
wilcox_test_out$conf.int

How to write a function to get the x-value which yield the maximum Y in a loess smooth?

does anyone can help to write a function which can return the x value of the loess smooth? I did like follows, but seems wrong. What I am want to get is the x-value, which yield the maximum Y in the loess function. Thanks in advance.
myFmsy<-function(x,y){
model <- loess(y ~ x,span = 0.4)
return(x[which(y==max(y))])
}
The problem is that you are fitting a model and then not using it at all.
The return value of loess is a list (of class "loess") with a member fitted. This is the vector where you want to find the maximum.
myFmsy <- function(x, y){
model <- loess(y ~ x,span = 0.4)
yfit <- model$fitted
x[which(yfit == max(yfit))]
}
set.seed(6589) # Make the results reproducible
x <- rnorm(100)
y <- rnorm(100)
myFmsy(x, y)
#[1] -0.938093
There might be cases where due to floating-point issues several values are close to each other, whithin a given tolerance. The following function checks this and also returns the fitted y and the index ix of where it can be found.
myFmsy2 <- function(x, y, tol = .Machine$double.eps^0.5){
model <- loess(y ~ x,span = 0.4)
yfit <- model$fitted
inx <- which(abs(yfit - max(yfit)) < tol)
list(x = x[inx], y.fitted = yfit[inx], ix = inx)
}
myFmsy2(x, y)
#$`x`
#[1] -0.938093
#
#$y.fitted
#[1] 0.5046313
#
#$ix
#[1] 48

using events in deSolve to prevent negative state variables, R

I am modeling the population change in a food web of species, using ODE and deSolve in R. obviously the populations should not be less than zero. therefore I have added an event function and run it as below. although the answers change from when I did nlt used event function, but it still producds negative values. What is wrong?
#using events in a function to distinguish and address the negative abundances
eventfun <- function(t, y, parms){
y[which(y<0)] <- 0
return(y)
}
# =============================== main code
max.time = 100
start.time = 50
initials <- c(N, R)
#parms <- list(webs=webs, a=a, b=b, h=h, m=m, basals=basals, mu=mu, Y=Y, K=K, no.species=no.species, flow=flow,S=S, neighs=neighs$neighs.per, dispers.maps=dispers.maps)
temp.abund <- ode(y=initials, func=solve.model, times=0:max.time, parms=parms, events = list(func = eventfun, time = 0:max.time))
and here is the ODE function(if it helps in finding the problem):
solve.model <- function(t, y, parms){
y <- ifelse(y<1e-6, 0, y)
with(parms,{
# return from vector form into matrix form for calculations
(R <- as.matrix(y[(max(no.species)*length(no.species)+1):length(y)]))
(N <- matrix(y[1:(max(no.species)*length(no.species))], ncol=length(no.species)))
dy1 <- matrix(nrow=max(no.species), ncol=length(no.species))
dy2 <- matrix(nrow=length(no.species), ncol=1)
no.webs <- length(no.species)
for (i in 1:no.webs){
species <- no.species[i]
(abundance <- N[1:species,i])
adj <- as.matrix(webs[[i]])
a.temp <- a[1:species, 1:species]*adj
b.temp <- b[1:species, 1:species]*adj
h.temp <- h[1:species, 1:species]*adj
(sum.over.preys <- abundance%*%(a.temp*h.temp))
(sum.over.predators <- (a.temp*h.temp)%*%abundance)
#Calculating growth of basal
(basal.growth <- basals[,i]*N[,i]*(mu*R[i]/(K+R[i])-m))
# Calculating growth for non-basal species D
no.basal <- rep(1,len=species)-basals[1:species]
predator.growth<- rep(0, max(no.species))
(predator.growth[1:species] <- ((abundance%*%(a.temp*b.temp))/(1+sum.over.preys)-m*no.basal)*abundance)
predation <- rep(0, max(no.species))
(predation[1:species] <- (((a.temp*b.temp)%*%abundance)/t(1+sum.over.preys))*abundance)
(pop <- basal.growth + predator.growth - predation)
dy1[,i] <- pop
dy2[i] <- 0.0005 #to consider a nearly constant value for the resource
}
#Calculating dispersals .they can be easily replaced
# by adjacency maps of connections between food webs arbitrarily!
disp.left <- dy1*d*dispers.maps$left.immig
disp.left <- disp.left[,neighs[,2]]
disp.right <- dy1*d*dispers.maps$right.immig
disp.right <- disp.right[,neighs[,3]]
emig <- dy1*d*dispers.maps$emigration
mortality <- m*dy1
dy1 <- dy1+disp.left+disp.right-emig
return(list(c(dy1, dy2)))
})
}
thank you so much for your help
I have had success using a similar event function defined like this:
eventfun <- function(t, y, parms){
with(as.list(y), {
y[y < 1e-6] <- 0
return(y)
})
}
I am using a similar event function to the one posted by jjborrelli. I wanted to note that for me it is still showing the ode function returning negative values. However, when ode goes to calculate the next step, it is using 0, and not the negative value shown for the current step, so you can basically ignore the negative values and replace with zeros at the end of the simulation.

Adding two random variables via convolution in R

I would like to compute the convolution of two probability distributions in R and I need some help. For the sake of simplicity, let's say I have a variable x that is normally distributed with mean = 1.0 and stdev = 0.5, and y that is log-normally distributed with mean = 1.5 and stdev = 0.75. I want to determine z = x + y. I understand that the distribution of z is not known a priori.
As an aside the real world example I am working with requires addition to two random variables that are distributed according to a number of different distributions.
Does anyone know how to add two random variables by convoluting the probability density functions of x and y?
I have tried generating n normally distributed random values (with above parameters) and adding them to n log-normally distributed random values. However, I wish to know if I can use the convolution method instead. Any help would be greatly appreciated.
EDIT
Thank you for these answers. I define a pdf, and try to do the convolution integral, but R complains on the integration step. My pdfs are Log Pearson 3 and are as follows
dlp3 <- function(x, a, b, g) {
p1 <- 1/(x*abs(b) * gamma(a))
p2 <- ((log(x)-g)/b)^(a-1)
p3 <- exp(-1* (log(x)-g) / b)
d <- p1 * p2 * p3
return(d)
}
f.m <- function(x) dlp3(x,3.2594,-0.18218,0.53441)
f.s <- function(x) dlp3(x,9.5645,-0.07676,1.184)
f.t <- function(z) integrate(function(x,z) f.s(z-x)*f.m(x),-Inf,Inf,z)$value
f.t <- Vectorize(f.t)
integrate(f.t, lower = 0, upper = 3.6)
R complains at the last step since the f.t function is bounded and my integration limits are probably not correct. Any ideas on how to solve this?
Here is one way.
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
Same thing using package distr.
library(distr)
N <- Norm(mean=1, sd=0.5) # N is signature for normal dist
L <- Lnorm(meanlog=1.5,sdlog=0.75) # same for log-normal
conv <- convpow(L+N,1) # object of class AbscontDistribution
f.Z <- d(conv) # distribution function
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
I was having trouble getting integrate() to work for different density parameters, so I came up with an alternative to #jlhoward's using Riemann approximation:
set.seed(1)
#densities to be convolved. could also put these in the function below
d1 <- function(x) dnorm(x,1,0.5) #
d2 <- function(y) dlnorm(y,1.5, 0.75)
#Riemann approximation of convolution
conv <- function(t, a, b, d) { #a to b needs to cover the range of densities above. d needs to be small for accurate approx.
z <- NA
x <- seq(a, b, d)
for (i in 1:length(t)){
print(i)
z[i] <- sum(d1(x)*d2(t[i]-x)*d)
}
return(z)
}
#check against sampled convolution
X <- rnorm(1000, 1, 0.5)
Y <- rlnorm(1000, 1.5, 0.75)
Z <- X + Y
t <- seq(0, 50, 0.05) #range to evaluate t, smaller increment -> smoother curve
hist(Z, breaks = 50, freq = F, xlim = c(0,30))
lines(t, conv(t, -100, 100, 0.1), type = "s", col = "red")

Plot 3D plane (true regression surface)

I'm trying to simulate some data (x1 and x2 - my explanatory variables), calculate y using a specified function + random noise and plot the resulting observations AND the true regression surface. Here's what I have so far:
set.seed(1)
library(rgl)
# Simulate some data
x1 <- runif(50)
x2 <- runif(50)
y <- sin(x1)*x2+x1*x2 + rnorm(50, sd=0.3)
# 3D scatterplot of observations
plot3d(x1,x2,y, type="p", col="red", xlab="X1", ylab="X2", zlab="Y", site=5, lwd=15)
Now I'm not sure how I can add the "true" regression plane. I'm basically looking for something like curve() where I can plug in my (true) model formula.
Thanks!
If you wanted a plane, you could use planes3d.
Since your model is not linear, it is not a plane: you can use surface3d instead.
my_surface <- function(f, n=10, ...) {
ranges <- rgl:::.getRanges()
x <- seq(ranges$xlim[1], ranges$xlim[2], length=n)
y <- seq(ranges$ylim[1], ranges$ylim[2], length=n)
z <- outer(x,y,f)
surface3d(x, y, z, ...)
}
library(rgl)
f <- function(x1, x2)
sin(x1) * x2 + x1 * x2
n <- 200
x1 <- 4*runif(n)
x2 <- 4*runif(n)
y <- f(x1, x2) + rnorm(n, sd=0.3)
plot3d(x1,x2,y, type="p", col="red", xlab="X1", ylab="X2", zlab="Y", site=5, lwd=15)
my_surface(f, alpha=.2 )
Apologies: ( I didn't read the question very carefllly and now see that I rushed into estimation when you wanted to plot the Truth.)
Here's an approach to estimation followed by surface plotting using loess:
mod2 <- loess(y~x1+x2)
grd<- data.frame(x1=seq(range(x1)[1],range(x1)[2],len=20),
x2=seq(range(x2)[1],range(x2)[2],len=20))
grd$pred <- predict(mod2, newdata=grd)
grd <- grd[order(grd$x1,grd$x2),]
x1 <- unique(grd$x1)
x2 <- unique(grd$x2) # shouldn't have used y
surface3d(x1, x2, z=matrix(grd$pred,length(x1),length(x2)) )
IRTFM's somewhat imperfect answers above let me to a thread on the CRAN help pages. https://stat.ethz.ch/pipermail/r-help/2013-December/364037.html
I extracted the relevant bits of code and turned them into a function like so:
require(rgl)
pred.surf.3d <- function(df, x.nm,y.nm,z.nm, ...){
x <- df[,x.nm]; y <- df[,y.nm]; z<-df[,z.nm]
fit <- lm(z ~ x + y + x*y + x^2 + y^2)
xnew <- seq(range(x)[1],range(x)[2],len=20)
ynew <- seq(range(y)[1],range(y)[2],len=20)
df <- expand.grid(x=xnew, y=ynew)
df$z <- predict(fit, newdata=df)
with(df, surface3d(xnew, ynew, z=df$z))
}
I may end up bundling this into my CRAN utility package at some point.
In the mean time, I hope you find it useful! (Run it on IRTFM's first code chunk like so:)
pred.surf.3d(data.frame(x1,x2,y),'x1','x2','y')

Resources