I have the following question regarding the mc2d package for Monte Carlo simulations.
Given a mc node, i.e. a mc object. How can we get the uncertainty for the values of the distribution?
For instance, as an input distribution I am using an uniform distribution, where the min is e.g. equal to 2, and the max equal to 8. Given this, we produce a mc object, apply it to mc.
The summary function produces values such as the median, mean, 97.5% etc. etc.
But as I said, how can be get an estimate of uncertainty for a given value?
Thanks in advance!
Well, you'd have to collect second momentum
Then
v = <x^2> - <x>^2
u = sqrt(v)/sqrt(N-1)
a = <x> +-u
To make things more clear, you sample events
x = 2 + (8-2)*U(0,1)
somewhere in the summary function you compute sum of events
m = m + x
so after running N events you report mean=m/N
You have to add code to collect second momentum, something like
m2 = m2 + x*x
So after run you could compute
v = m2/N - mean*mean
u = sqrt(v)/sqrt(N-1)
and report mean with uncertainty as mean +-u
Related
Let
dXt = -Yt dt +cos(Xt + Yt)/(sqrt(t+1))*dW1_t
dYt = Xt dt +sin(Xt + Yt)/(sqrt(t+1))*dW2_t
X0=1, Y0=1, T=1,
(W1_t, W2_t) is a Brownian Motion in dimension 2.
Could someone tell me how I could implement this system of SDEs in R?
I tried it with
library(Sim.DiffProc)
set.seed(1234)
fx <- expression(-y , x )
gx <- expression(cos(x+y)/sqrt(t+1),sin(x+y)/sqrt(t+1))
mod2d1 <-snssde2d(drift=fx,diffusion=gx,x0=c(x0=1,y0=1),M=1000)
the default solver in this package here is Euler Maruyama.What I need to calculate is Expected Value ((X_T)^2+(Y_T)^2) (with a different discretiziations and compare it with the analytic solution for this expected value. My question would be, how can I calculate this expected value of the sum of squares of the processes? summary(mod) gives me the mean, but how can I get the result that I want? do I have to square the expressions prior to input them to the solver?
I have no sample and I'd like to compute the variance, mean, median, and mode of a distribution which I only have a vector with it's density and a vector with it's support. Is there an easy way to compute this statistics in R with this information?
Suppose that I only have the following information:
Support
Density
sum(Density) == 1 #TRUE
length(Support)==length(Density)# TRUE
You have to do weighted summations
F.e., starting with #Johann example
set.seed(312345)
x = rnorm(1000, mean=10, sd=1)
x_support = density(x)$x
x_density = density(x)$y
plot(x_support, x_density)
mean(x)
prints
[1] 10.00558
and what, I believe, you're looking for
m = weighted.mean(x_support, x_density)
computes mean as weighted mean of values, producing output
10.0055796130192
There are weighted.sd, weighted.sum functions which should help you with other quantities you're looking for.
Plot
If you don't need a mathematical solution, and an empirical one is all right, you can achieve a pretty good approximation by sampling.
Let's generate some data:
set.seed(6854684)
x = rnorm(50,mean=10,sd=1)
x_support = density(x)$x
x_density = density(x)$y
# see our example:
plot(x_support, x_density )
# the real mean of x
mean(x)
Now to 'reverse' the process we generate a large sample from that density distribution:
x_sampled = sample(x = x_support, 1000000, replace = T, prob = x_density)
# get the statistics
mean(x_sampled)
median(x_sampled)
var(x_sampled)
etc...
I've been running estimations in R by fitting a curve to a price series. I want to evaluate the fitness of the curve by making very small changes to the key parameters m and omega at their optimum values. To do that I want to see how the sum of squared residuals changes at the optimum. I defined the function for residuals as below:
# Define function for sum of squared residuals, to evaluate the fitness of parameters m and omega
residuals <- function(m, omega, tc) {
lm.result <- LPPL(rTicker, m, omega, tc)
return(sum((FittedLPPL(rTicker, lm.result, m, omega, tc) - rTicker$Close) ** 2))
}
I can then yield an absolute value for the SSR at the optimum as follows:
#To return value of SSR
residvalue <- residuals(m, omega,tc)
What I want to do is repeat this code over a sequence of values for m (and then omega).
For instance if the optimum m = 0.5, I want to run this code to calculate the object 'residvalue' for a sequence of m values that lie between 0 < m < 1, interval size = 0.01 (ie run it 100 times for 100 different SSR values). I would then like to store these resulting SSR values in a vector (which I can then turn into a data frame of observations). This appears like a trivial task but I'm not sure how to go about doing it. Any help would be appreciated.
You could use sapply:
sapply(seq(0,1,0.01),function(m) residuals(m,omega,tc))
I can't seem to find the correct way to simulate an AR(1) time series with a mean that is not zero.
I need 53 data points, rho = .8, mean = 300.
However, arima.sim(list(order=c(1,0,0), ar=.8), n=53, mean=300, sd=21)
gives me values in the 1500s. For example:
1480.099 1480.518 1501.794 1509.464 1499.965 1489.545 1482.367 1505.103 (and so on)
I have also tried arima.sim(n=52, model=list(ar=c(.8)), start.innov=300, n.start=1)
but then it just counts down like this:
238.81775870 190.19203239 151.91292491 122.09682547 96.27074057 [6] 77.17105923 63.15148491 50.04211711 39.68465916 32.46837830 24.78357345 21.27437183 15.93486092 13.40199333 10.99762449 8.70208879 5.62264196 3.15086491 2.13809323 1.30009732
and I have tried arima.sim(list(order=c(1,0,0), ar=.8), n=53,sd=21) + 300 which seems to give a correct answer. For example:
280.6420 247.3219 292.4309 289.8923 261.5347 279.6198 290.6622 295.0501
264.4233 273.8532 261.9590 278.0217 300.6825 291.4469 291.5964 293.5710
285.0330 274.5732 285.2396 298.0211 319.9195 324.0424 342.2192 353.8149
and so on..
However, I am in doubt that this is doing the correct thing? Is it still auto-correlating on the correct number then?
Your last option is okay to get the desired mean, "mu". It generates data from the model:
(y[t] - mu) = phi * (y[t-1] - mu) + \epsilon[t], epsilon[t] ~ N(0, sigma=21),
t=1,2,...,n.
Your first approach sets an intercept, "alpha", rather than a mean:
y[t] = alpha + phi * y[t-1] + epsilon[t].
Your second option sets the starting value y[0] equal to 300. As long as |phi|<1 the influence of this initial value will vanish after a few periods and will have no effect
on the level of the series.
Edit
The value of the standard deviation that you observe in the simulated data is correct. Be aware that the variance of the AR(1) process, y[t], is not equal the variance of the innovations, epsilon[t]. The variance of the AR(1) process, sigma^2_y, can be obtained obtained as follows:
Var(y[t]) = Var(alpha) + phi^2 * Var(y[t-1]) + Var(epsilon[t])
As the process is stationary Var(y[t]) = Var(t[t-1]) which we call sigma^2_y. Thus, we get:
sigma^2_y = 0 + phi^2 * sigma^2_y + sigma^2_epsilon
sigma^2_y = sigma^2_epsilon / (1 - phi^2)
For the values of the parameters that you are using you have:
sigma_y = sqrt(21^2 / (1 - 0.8^2)) = 35.
Use the rGARMA function in the ts.extend package
You can generate random vectors from any stationary Gaussian ARMA model using the ts.extend package. This package generates random vectors directly form the multivariate normal distribution using the computed autocorrelation matrix for the random vector, so it gives random vectors from the exact distribution and does not require "burn-in" iterations. Here is an example of generating multiple independent time-series vectors all from an AR(1) model.
#Load the package
library(ts.extend)
#Set parameters
MEAN <- 300
ERRORVAR <- 21^2
AR <- 0.8
m <- 53
#Generate n = 16 random vectors from this model
set.seed(1)
SERIES <- rGARMA(n = 16, m = m, mean = MEAN, ar = AR, errorvar = ERRORVAR)
#Plot the series using ggplot2 graphics
library(ggplot2)
plot(SERIES)
As you can see, the generated time-series vectors in this plot use the appropriate mean and error variance that were specified in the inputs.
I need your helps to explain how I can obtain the same result as this function does:
gini(x, weights=rep(1,length=length(x)))
http://cran.r-project.org/web/packages/reldist/reldist.pdf --> page 2. Gini
Let's say, we need to measure the inocme of the population N. To do that, we can divide the population N into K subgroups. And in each subgroup kth, we will take nk individual and ask for their income. As the result, we will get the "individual's income" and each individual will have particular "sample weight" to represent for their contribution to the population N. Here is example that I simply get from previous link and the dataset is from NLS
rm(list=ls())
cat("\014")
library(reldist)
data(nls);data
help(nls)
# Convert the wage growth from (log. dollar) to (dollar)
y <- exp(recent$chpermwage);y
# Compute the unweighted estimate
gini_y <- gini(y)
# Compute the weighted estimate
gini_yw <- gini(y,w=recent$wgt)
> --- Here is the result----
> gini_y = 0.3418394
> gini_yw = 0.3483615
I know how to compute the Gini without WEIGHTS by my own code. Therefore, I would like to keep the command gini(y) in my code, without any doubts. The only thing I concerned is that the way gini(y,w) operate to obtain the result 0.3483615. I tried to do another calculation as follow to see whether I can come up with the same result as gini_yw. Here is another code that I based on CDF, Section 9.5, from this book: ‘‘Relative
Distribution Methods in the Social Sciences’’ by Mark S. Handcock,
#-------------------------
# test how gini computes with the sample weights
z <- exp(recent$chpermwage) * recent$wgt
gini_z <- gini(z)
# Result gini_z = 0.3924161
As you see, my calculation gini_z is different from command gini(y, weights). If someone of you know how to build correct computation to obtain exactly
gini_yw = 0.3483615, please give me your advices.
Thanks a lot friends.
function (x, weights = rep(1, length = length(x)))
{
ox <- order(x)
x <- x[ox]
weights <- weights[ox]/sum(weights)
p <- cumsum(weights)
nu <- cumsum(weights * x)
n <- length(nu)
nu <- nu/nu[n]
sum(nu[-1] * p[-n]) - sum(nu[-n] * p[-1])
}
This is the source code for the function gini which can be seen by entering gini into the console. No parentheses or anything else.
EDIT:
This can be done for any function or object really.
This is bit late, but one may be interested in concentration/diversity measures contained in the [SciencesPo][1] package.