Linear Discriminant Analysis Functions in R - r

I have the following R code
ldf <- function(x, prior, mu, covar)
{
x <- matrix(as.numeric(diabetes), ncol=1)
log(lda.res$prior) - (0.5*t(lda.res$mean)%*%solve(cov_all)%*%lda.res$mean) + (t(x)%*%solve(covar)%*%mu)
}
I understand that the code is a function which will calculate the linear discriminant function.
However, I am unsure of what the following code is doing. It follows directly after the code above.
I know it is calculating the linear discriminant function for a patient but i'm not sure what each line means.
id <- 1
dfs <- rep(0, G)
for(g in 1:G)
{
dfs[g] <- ldf(diabetes[id,2:4], lda.res$prior[g], lda.res$mean[g,], cov_all)
}
dfs
levels(diabetes$class)[dfs == max(dfs)]
Thank you in advance for any help!

Related

Implement a Monte Carlo Simulation Method to Estimate an Integral in R

I am trying to implement a Monte carlo simulation method to estimate an integral in R. However, I still get wrong answer. My code is as follows:
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
t <- integrate(f,0,1)
n <- 10000 #Assume we conduct 10000 simulations
int_gral <- Monte_Car(n)
int_gral
You are not doing Monte-Carlo here. Monte-Carlo is a simulation method that helps you approximating integrals using sums/mean based on random variables.
You should do something in this flavor (you might have to verify that it's correct to say that the mean of the f output can approximates your integral:
f <- function(n){
x <- runif(n)
return(
((cos(x))/x)*exp(log(x)-3)^3
)
}
int_gral <- mean(f(10000))
What your code does is taking a number n and return ((cos(n))/n)*exp(log(n)-3)^3 ; there is no randomness in that
Update
Now, to get a more precise estimates, you need to replicate this step K times. Rather than using a loop, you can use replicate function:
K <- 100
dist <- data.frame(
int = replicate(K, mean(f(10000)))
)
You get a distribution of estimators for your integral :
library(ggplot2)
ggplot(dist) + geom_histogram(aes(x = int, y = ..density..))
and you can use mean to have a numerical value:
mean(dist$int)
# [1] 2.95036e-05
You can evaluate the precision of your estimates with
sd(dist$int)
# [1] 2.296033e-07
Here it is small because N is already large, giving you a good precision of first step.
I have managed to change the codes as follows. Kindly confirm to me that I am doing the right thing.
regards.
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
set.seed(234)
n<-10000
for (i in 1:10000) {
x<-runif(n)
I<-sum(f(x))/n
}
I

How to update code to create a function for calculating Welch's for polynomial trends?

I am trying to reproduce the SPSS output for significance a linear trend among means when equal variances are not assumed.
I have gratefully used code from http://www-personal.umich.edu/~gonzo/coursenotes/file3.pdf to create a function for calculating separate variances, which based on my searching I understand as the “equal variances not assumed” output in SPSS.
My problem/goal:
I am only assessing polynomial orthogonal trends (mostly linear). I want to adapt the code creating the function so that the contrast argument can take pre-made contrast matrices rather than manually specifying the coefficients each time (room for typos!).
… I have tried those exact commands but receive Error in contrast %*% means : non-conformable arguments . I have played around with the code but I can’t get it to work.
Code for creating the function from the notes:
sepvarcontrast <- function(dv, group, contrast) {
means <- c(by(dv, group, mean))
vars <- c(by(dv, group, var))
ns <- c(by(dv, group, length))
ihat <- contrast %*% means
t.denominator <- sqrt(contrast^2 %*% (vars/ns))
t.welch <- ihat/ t.denominator
num.contrast <- ifelse(is.null(dim(contrast)),1,dim(contrast)[1])
df.welch <- rep(0, num.contrast)
if (is.null(dim(contrast))) contrast <- t(as.matrix(contrast))
for (i in 1:num.contrast) {
num <- (contrast[i,]^2 %*% (vars))^2
den <- sum((contrast[i,]^2 * vars)^2 / (ns-1))
df.welch[i] <- num/den
}
p.welch <- 2*(1- pt(abs(t.welch), df.welch))
result <- list(ihat = ihat, se.ihat = t.denominator, t.welch = t.welch,
df.welch = df.welch, p.welch = p.welch)
return(result)
}
I would like to be able to use the function like this:
# Create a polynomial contrast matrix for 5 groups, then save
contr.mat5 <- contr.poly(5)
# Calculate separate variance
sepvarcontrast(dv, group, contrast = contr.mat5)
I have tried those exact commands to see if they would work but receive Error in contrast %*% means : non-conformable arguments.
All suggestions are appreciated! I am still learning how to create a reprex...

Remove linear trend from raster stack R

Trying remove the linear trend (detrend) from a monthly precipitation raster stack for the US from 1979-2015 (https://www.northwestknowledge.net/metdata/data/monthly/pr_gridMET.nc). These data are large enough that using those data as an example would be a bit unruly here so I am going to use the data from the raster package for sake of efficiency. The working model I have currently is to use `raster"::calc`` on a linear model and pull the residuals. My understanding is that those residuals are the detrended series, but I am not 100% sure that is correct. The code I am using is as follows:
library(raster)
fn <- raster(system.file("external/test.grd", package="raster"))
fn2 <- fn+1000
fn3 <- fn +500
fn4 <- fn +750
fn5 <- fn+100
fns <- stack(fn, fn2, fn3, fn4, fn5)
time <- 1:nlayers(fns)
# Get residuals to detrend the raw data
get_residuals <- function(x) {
if (is.na(x[1])){
rep(NA, length(x)) }
else {
m <- lm(x~time)
q <- residuals(m)
return(q)
}
}
detrended_fns <- calc(fns, get_residuals) # Create our residual (detrended) time series stack
I feel like I'm missing something here. Can anyone confirm that I'm on the right track here? If I'm not any suggestions on how to properly detrend these data would be helpful! thanks!
The residuals remove the slope and the intercept and you get anomalies. Perhaps you only want to remove the slope? In that case you could add the intercept to the residuals in get_residuals
q <- residuals(m) + coefficients(m)[1]
Or better:
q <- residuals(m) + predict(m)[1]
So that you use year 1 (and not year 0) as the base, and it would also work if time is, say, 2000:2004
You could also take the last year, mid year, or average as base.

nls peak fitting with a mixed normal and lognormal dataset (R)

I'm trying to use nls() to to curve-fit a dataset consisting of a mixture of normally and lognormally distributed values. However, the normally distributed subset contains negative values that the lognormal function cannot tolerate. Using nls(), is there a way to constrain the values which a PORTION of the fitted curve evaluate? (e.g. let the normal function evaluate across 0 and force the lognormal function to evaluate only x>0)
here's the test case I've been playing with:
test <- rnorm(5000, 2, 2)
test2 <- rlnorm(10000,2,2)
test3 <- append(test, test2)
bins <- seq(min(test3),100, .1)
tops <- data.frame(bin=bins, count=NA)
for (i in 1:nrow(tops)) { tops[i,2] <- length(test3[which(test3>=tops[i,1] &
test3<tops[i+1,1])]) }
fit <- nls(count ~ exp(-(bin-n.mu)^2/(2*n.sd^2))/(sqrt(2*pi)*n.sd)*C1 +
exp(-(log(bin)-l.mu)^2/(2*l.sd^2))/(sqrt(2*pi)*l.sd*bin)*C2,
data=tops, start=list(n.mu=2, n.sd=2, C1=500, l.mu=2, l.sd=2, C2=1000),
algorithm="port", trace=T)
coef(fit)
topsfit <- data.frame(bin=seq(-3, 100, 0.1))
topsfit$fit <- predict(fit, newdata=topsfit)
ggplot() + geom_point(data=tops, aes(x=(bins), y=count), shape=1, size=4) +
geom_path(data=topsfit, aes(x=(bin), y=fit), colour="red", size=1.5)
Very simply, I'm fitting a normal PDF + lognormal PDF. The problem is that log(bin) in the lognormal PDF does not play nice with negative numbers... but I don't want to crop negative values because that affects the calculations for the underlying, normally distributed values. I just want the lognormal half of my curve to ignore them.
alternatively, is there a different approach to accomplishing this task that doesn't rely on nls()?
Seems like NO ONE wants to touch this topic, so I'll post a solution that I figured out with the help of a non-internet comrade-- the linchpin of my problem was in generating the functions that would comprise my curve. Writing the lognormal function separately allows conditional evaluation of x values, which is what I needed. Once I figured out that the nls() function operates on vectors and wrote my function to match, things shaped up quite nicely.
normal <- function(x, mu, sd, C) {
ans <- vector(length = length(x), mode = "numeric")
for (i in 1:length(x)) {
value <- exp(-(x[i]-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd)*C
ans[i] <- value
}; return(ans) }
lognormal <- function(x, mu, sd, C) {
ans <- vector(length = length(x), mode = "numeric")
for (i in 1:length(x)) {
if (x[i]>0) {
value <- exp(-(log10(x[i])-mu)^2/(2*sd^2))/(sqrt(2*pi)*sd*x[i])*C
ans[i] <- value
} else { ans[i] <- 0 } }; return(ans) }
fit <- nls(count ~ normal(bin, n.mu, n.sd, C1) + lognormal(bin, l.mu, l.sd, C2),
data=tops, start=list(n.mu=30, n.sd=30, C1=5000,
l.mu=4, l.sd=2, C2=5000), algorithm="port", trace=T)
...and just like that, you can solve for mixed normal and lognormal distributions.

Use variable in GLM quasi specification

I'm fitting a GLM to some data, using a quasi-likelihood approach (family=quasi(...)).
I'd like to use a variable, p in the variance specification, like so:
family = quasi(link=log, variance=mu^p)
This however doesn't work (it no longer recongises mu).
Is there any way to get R to just insert the value of p in the expression before it is evaluated, so I can use pinstead of a number?
Here's an example that doesn't work:
set.seed(1)
x <- runif(100)
y <- x^2+2*x+sin(2*pi*x) + rnorm(100)
fitModel <- function(x,y, p) {
model <- glm(y~x, family=quasi(link=log, variance=mu^p))
return(model)
}
fitModel(x,y,2)
Thanks!
The family function does fancy parsing which means the paste0 solution suggested in the comments won't work without jumping through considerable hoops. Also, the following function fails if any of the y values are <= 0, so I changed the example a little bit (if you do have negative response values you'll have to think about what you want to do about this ...)
set.seed(1)
x <- seq(2,10,length=100)
y <- x^2+2*x+sin(2*pi*x) + rnorm(100,)
What I did was to create a quasi family object, then modify its variance function on the fly.
pfamily <- quasi(link="log",variance="mu")
fitModel <- function(x,y, p) {
pfamily[["variance"]] <- function(mu) mu^p
model <- glm(y~x, family=pfamily)
model
}
fitModel(x,y,2)
fitModel(x,y,1)
For what it's worth, this variant should be able to do arbitrary values of p, so e.g. you can draw a curve over the variance power:
dfun <- function(p) {
deviance(fitModel(x,y,p))
}
pvec <- seq(0.1,3,by=0.1)
dvec <- sapply(pvec,dfun)
par(las=1,bty="l")
plot(pvec,dvec,type="b",xlab="variance power",ylab="deviance")

Resources