Integrating in R - r

I am trying to compute in R. I have the following values.
nb <- 100
tb <- 25
ns <- 90
ts <- 15
A0 <- 1
S_norm <- 0.4
R <-tb/ts
y_meas <- (ns-nb/R)/A0
sigma_meas = sqrt(ns+(nb+1)/R^2)/A0
I am very confused on how I can integrate L(psi), say from -10 to 10. Because I am integrating with respect to log A.

You can substitute for logA and for a fixed value of psi you can integrate as follows:
psi <- 5
integrate(function(x) exp(-0.5*(((x/A0)/S_norm)^2 + ((psi-y_meas*A0/exp(x))/sigma_meas)^2)),
-10, 10)
# 0.1775989 with absolute error < 6.6e-05

On top of an excellent answer by #SandipanDey, if you could extend limits to -Infinity...+Infinity, there is a better way to integrate functions with e-x2 kernel: Gauss-Hermite quadrature, and there is an R package for that.
Simple example:
library(gaussquad)
n.quad <- 128 # integration order
# get the particular (weights,abscissas) as data frame
# with 2 observables and n.quad observations
rule <- ghermite.h.quadrature.rules(n.quad, mu = 0.0)[[n.quad]]
# test function - integrate 1 over exp(-x^2) from -Inf to Inf
# should get sqrt(pi) as an answer
f <- function(x) {
1.0
}
q <- ghermite.h.quadrature(f, rule)
print(q - sqrt(pi))

Related

Error in my math formula for implementing CUSUM in R

I'm trying to implement a check for decreasing values of avg temperatures to see when the temperature starts falling. See the chart of temperatures here:
Here is the formula I'm trying to implement:
Here is my code to implement that formula:
temps <- read.delim("temps.txt")
date_avgs <- rowMeans(temps[2:length(temps)], dims=1, na.rm=T)
mu <- 87
threshold <- 86
constant <- 3
date_avgs
S <- 0 * date_avgs
for (i in 2:length(date_avgs)) {
value <- S[i-1] + (mu - date_avgs[i] - constant)
cat("\nvalue", value, "si", date_avgs[i], i)
S[i] <- max(0, value)
if(S[i] >= threshold){
#Once I hit this for the first time, that indicates at this index the temp is decreasing
cat("\nDecreased past my threshold!!!", S[i] ,i)
}
}
But I'm not able to detect the change as I expect. My formula doesn't get over the threshold until index 108, when it should get there around index 60.
Here is the plot of my S (or CUSUM) values:
Any ideas what I'm doing wrong in my formula?
I think the problem is mu <- mean(date_avgs) basically means of all the observations. But mu should be "mean of X if no change". Thus mu should be about 87 but according your code and plotted data seems to be 80 or less.
# simulated data
set.seed(4422)
date_avgs <- c(runif(60, 84, 92), 88-(1:50)-rnorm(50,0,4))
plot(date_avgs)
# setting constants
mu <- 87
threshold <- 86
constant <- 3
# after running for cycle
Index <- match(S[S >= threshold][1], S)
Index
[1] 75
# for data
> date_avgs[74]
[1] 73.41981
# Considering a lower threshold
# (as maximum allowable difference to detect trend 2 * C)
mu <- 87
threshold <- 6 # arbitrary
constant <- 3
# after running for cycle
Index <- match(S[S >= threshold][1], S)
Index
[1] 66
So I think code is fine, maybe the interpretation is not

Optimizing K-means clustering using Genetic Algorithm

I have the following dataset (obtained here):
----------item survivalpoints weight
1 pocketknife 10 1
2 beans 20 5
3 potatoes 15 10
4 unions 2 1
5 sleeping bag 30 7
6 rope 10 5
7 compass 30 1
I can cluster this dataset into three clusters with kmeans() using a binary string as my initial choice of centers. For eg:
## 1 represents the initial centers
chromosome = c(1,1,1,0,0,0,0)
## exclude first column (kmeans only support continous data)
cl <- kmeans(dataset[, -1], dataset[chromosome == 1, -1])
## check the memberships
cl$clusters
# [1] 1 3 3 1 2 1 2
Using this fundamental concept, I tried it out with GA package to conduct the search where I am trying to optimize(minimize) Davies-Bouldin (DB) Index.
library(GA) ## for ga() function
library(clusterSim) ## for index.DB() function
## defining my fitness function (Davies-Bouldin)
DBI <- function(x) {
## converting matrix to vector to access each row
binary_rep <- split(x, row(x))
## evaluate the fitness of each chromsome
for(each in 1:nrow(x){
cl <- kmeans(dataset, dataset[binary_rep[[each]] == 1, -1])
dbi <- index.DB(dataset, cl$cluster, centrotypes = "centroids")
## minimizing db
return(-dbi)
}
}
g<- ga(type = "binary", fitness = DBI, popSize = 100, nBits = nrow(dataset))
Of course (I have no idea what's happening), I received error message of
Warning messages:
Error in row(x) : a matrix-like object is required as argument to 'row'
Here are my questions:
How can correctly use the GA package to solve my problem?
How can I make sure the randomly generated chromosomes contains the same number of 1s which corresponds to k number of clusters (eg. if k=3 then the chromosome must contain exactly three 1s)?
I can't comment on the sense of combining k-means with ga, but I can point out that you had issue in your fitness function. Also, errors are produced when all genes are on or off, so fitness is only calculated when that is not the case:
DBI <- function(x) {
if(sum(x)==nrow(dataset) | sum(x)==0){
score <- 0
} else {
cl <- kmeans(dataset[, -1], dataset[x==1, -1])
dbi <- index.DB(dataset[,-1], cl=cl$cluster, centrotypes = "centroids")
score <- dbi$DB
}
return(score)
}
g <- ga(type = "binary", fitness = DBI, popSize = 100, nBits = nrow(dataset))
plot(g)
g#solution
g#fitnessValue
Looks like several gene combinations produced the same "best" fitness value

Resample a time series in the frequency domain (FFT)

I'm trying to implement the "synthesis equation" from the DSP Guide, equation 8-2, so I can resample a time series in the frequency domain. The way I read the equation, N is the number of output points, and given the loop of k from 0 to N/2, I can only resample to twice the original sampling rate, at most.
I tried writing a quick implementation in R, but the results are not anything close to what I expect. My code:
input <- c(1:9)
nin <- 9
nout <- 17
b <-fft(input)
reals <- Re(b) / (nout / 2)
imags <- Im(b) / (nout / 2)
reals[1] <- reals[1] / 2
reals[(nout/2)] <- reals[(nout/2)] / 2
output <- c(1:nout)
for (i in 1:nout)
{
realSum <- 0
imagSum <- 0
for (k in 1:(nout/2))
{
angle <- 2 * pi * (k-1) * (i-1) / nout
realSum <- realSum + (reals[k] * cos(angle))
imagSum <- imagSum - (imags[k] * sin(angle))
}
output[i] <- (realSum + imagSum)
}
For my input (sampled at say 1 second, resampling to 0.5 second)
[1] 1 2 3 4 5 6 7 8 9
I get the output
[1] -0.7941176 1.5150954 0.7462716 1.5022387 1.6478971 1.8357487
2.4029773 2.1965426 3.1585254 2.6178195 3.7284660 3.3721128 3.8433588 4.6390705
[15] 3.4699088 6.3005605 2.8175240
while my expected output is
[1] 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9
What am I doing wrong?
The above code does look like the inverse DFT that you referenced in the article. From your expected desired input/output, it seems like you need a cheap linear interpolation rather than a DFT/Inverse DFT solution.
From your desired input/output relationship statement above, it seems like you have a desired relationship between a time series input and a desired time series output.
A inverse DFT, which you coded, will go from frequency domain to time domain.
From what I could gather from your statements, start with the series [1,1->9], do a forward DFT (to get to frequency domain), then take that result into an inverse DFT (to get back to time domain).
Hopefully this is a hint in the right direction for you.

Solve systems of nonlinear equations in R / BlackScholesMerton Model

I am writing my Masters final project in which I am deriving probability of default using Black Scholes Merton Model.I have got stuck in R code. Mathematically, I want to solve this system of nonlinear equations with the package nleqslv:
library(nleqslv)
T <- 1
D1 <- 20010.75
R <- 0.8516
sigmaS <- .11
SO1 <- 1311.74
fnewton <- function(x){
y <- numeric(2)
d1 <- (log(x[1]/D1)+(R+x[2]^2/2)*T)/x[2]*sqrt(T)
d2 <- d1 - x[2]*sqrt(T)
y[1] <- SO1 - (x[1]*pnorm(d1) - exp(-R*T)*D1*pnorm(d2))
y[2] <- sigmaS*SO1 - pnorm(d1)*x[2]*x[1]
y
}
xstart <- c(1311.74,0.11)
nleqslv(xstart, fnewton, method="Broyden")
# $x
# [1] 1311.74 0.11
# $fvec
# [1] 1311.7400 144.2914
# $termcd
# [1] 6
# $message
# [1] "Jacobian is singular (see allowSingular option)"
# $scalex
# [1] 1 1
# $nfcnt
# [1] 0
# $njcnt
# [1] 1
# $iter
# [1] 1
I have tried this with many values of the 5 inputs( stated above that I have computed for 2 companies for different years), but I am not getting the final values of S0 and sigma V.
I am getting message as "Jacobian is singular (see allowSingular option)" If I allow singular Jacobean using "control=list(trace=1,allowSingular=TRUE)", then also no answer is displayed. I do not know how to obtain the solution of these 2 variables now.
I really don’t know, what I am doing wrong as I oriented my model on Teterevas slides ( on slide no.5 is her model code), who’s presentation is the first result by googeling
https://www.google.de/search?q=moodys+KMV+in+R&rlz=1C1SVED_enDE401DE401&aq=f&oq=moodys+KMV+in+R&aqs=chrome.0.57.13309j0&sourceid=chrome&ie=UTF-8#q=distance+to+default+in+R
q=distance+to+default+in+R
Like me, however more successful, she calculates the Distance to Default risk measure via the Black Scholes Merton approach. In this model, the value of equity (usually represented by the market capitalization, > SO1) can be written as a European call option.
The other variables are:
x[1]: the variable I want to derive, value of total assets
x[2]: the variable I want to derive, volatility of total assets
D1: the book value of debt (19982009)
R: a riskfree interest rate
T: is set to 1 year (time)
sigmaS: estimated (historical) equity volatility
You should be able to use the initial values of SO1 and sigmaS as starting values for nleqslv.
First of all the R code given by Tetereva doesn't seem quite correct (the variable Z should be D1 as you have named it; similar changes for her S0 and D).
I have modified Tetereva's into this:
library(nleqslv)
T <- 1
D1 <- 33404048
R <- 2.32
sigmaS <- .02396919
SO1 <- 4740291 # Ve?
fnewton <- function(x){
y <- numeric(2)
d1 <- (log(x[1]/D1)+(R+x[2]^2/2)*T)/x[2]*sqrt(T)
d2 <- d1 - x[2]*sqrt(T)
y[1] <- SO1 - (x[1]*pnorm(d1) - exp(-R*T)*D1*pnorm(d2))
y[2] <- sigmaS*SO1 - pnorm(d1)*x[2]*x[1]
y
}
xstart <- c(SO1,sigmaS)
nleqslv(xstart, fnewton, method="Broyden",control=list(trace=1))
nleqslv(xstart, fnewton, method="Newton",control=list(trace=1))
which will give the solution given by Tetereva. (I use trace=1 here just to check the iteration steps.)
I believe the value you give for R should be 8.516 and not something else. Using your values for the parameters
T <- 1
D1 <- 20010.75
R <- 8.516 # modified
sigmaS <- .11
SO1 <- 1311.74
like this
xstart <- c(1311.74,0.11)
nleqslv(xstart, fnewton, method="Broyden")
nleqslv(xstart, fnewton, method="Newton")
Then running nleqslv with these values converges very quickly.
If one uses R <- 2.32 (like Tetereva) nleqslv will also converge albeit with more iterations.
I cannot help you with what R should actually be but from Tetereva's presentation I assume R is in percentages. Since I don't have enough knowledge on the Black-Scholes model I can't be of any help for finding out what the correct values are for the various parameters. It's up to you.

Running 'prop.test' multiple times in R

I have some data showing a long list of regions, the population of each region and the number of people in each region with a certain disease. I'm trying to show the confidence intervals for each proportion (but I'm not testing whether the proportions are statistically different).
One approach is to manually calculate the standard errors and confidence intervals but I'd like to use a built-in tool like prop.test, because it has some useful options. However, when I use prop.test with vectors, it runs a chi-square test across all the proportions.
I've solved this with a while loop (see dummy data below), but I sense there must be a better and simpler way to approach this problem. Would apply work here, and how? Thanks!
dat <- data.frame(1:5, c(10, 50, 20, 30, 35))
names(dat) <- c("X", "N")
dat$Prop <- dat$X / dat$N
ConfLower = 0
x = 1
while (x < 6) {
a <- prop.test(dat$X[x], dat$N[x])$conf.int[1]
ConfLower <- c(ConfLower, a)
x <- x + 1
}
ConfUpper = 0
x = 1
while (x < 6) {
a <- prop.test(dat$X[x], dat$N[x])$conf.int[2]
ConfUpper <- c(ConfUpper, a)
x <- x + 1
}
dat$ConfLower <- ConfLower[2:6]
dat$ConfUpper <- ConfUpper[2:6]
Here's an attempt using Map, essentially stolen from a previous answer here:
https://stackoverflow.com/a/15059327/496803
res <- Map(prop.test,dat$X,dat$N)
dat[c("lower","upper")] <- t(sapply(res,"[[","conf.int"))
# X N Prop lower upper
#1 1 10 0.1000000 0.005242302 0.4588460
#2 2 50 0.0400000 0.006958623 0.1485882
#3 3 20 0.1500000 0.039566272 0.3886251
#4 4 30 0.1333333 0.043597084 0.3164238
#5 5 35 0.1428571 0.053814457 0.3104216

Resources