Converting R code to MATLAB code: Stuck at sapply() - r

I have the following R code, which I am trying to convert to MATLAB. (No, I do not want to run the R code in MATLAB like shown here).
The R code is here:
# model parameters
dt <- 0.001
t <- seq(dt,0.3,dt)
n=700*1000
D = 1
d = 0.5
# model
ft <- n*d/sqrt(2*D*t^3)*dnorm(d/sqrt(2*D*t),0,1)
fmids <- n*d/sqrt(2*D*(t+dt/2)^3)*dnorm(d/sqrt(2*D*(t+dt/2)),0,1)
plot(t,ft*dt,type="l",lwd=1.5,lty=2)
# simulation
#
# simulation by drawing from uniform distribution
# and converting to time by using quantile function of normal distribution
ps <- runif(n,0,1)
ts <- 2*pnorm(-d/sqrt(2*D*t))
sumn <- sapply(ts, FUN = function(tb) sum(ps < tb))
lines(t[-length(sumn)],sumn[-1]-sumn[-length(sumn)],col=4)
And the MATLAB code I have done so far is
% # model
ft = (n*d)./sqrt(2*D.*t.^3).*normpdf(d./sqrt(2*D.*t),0,1);
fmids = (n*d)./sqrt(2*D*((t+dt)./2).^3).*normpdf(d./sqrt(2*D.*((t+dt)./2)),0,1);
figure;plot(t,ft.*dt);
% # simulation
% #
% # simulation by drawing from uniform distribution
% # and converting to time by using quantile function of normal distribution
ps = rand(1,n);
ts = 2*normcdf(-d./sqrt(2*D*t));
So, here is where I am stuck. I don't understand what function sumn = sapply(ts, FUN = function(tb) sum(ps < tb)) does and where the parameter 'tb' came from. It is not defined in the given R code as well.
Could anyone tell me what the equivalent of that function R code is in MATLAB?
[EDIT 1: UPDATE]
So, based on the comments from #Croote, I came up with the following code for the function defined in sapply()
sumidx = bsxfun(#lt,ps,ts');
summat = sumidx.*repmat(ps,300,1);
sumn = sum(summat,2);
sumnfin = sumn(2:end)-sumn(1:end-1);
plot(t(1:length(sumn)-1),sumnfin)
However, I am not getting the desired results. The curves should overlap with each other: the blue curve is correct, so the orange need to overlap with the blue curve.
What am I missing here? Is R's pnorm() equivalent to MATLAB'snormcdf() as I have done here?
[EDIT 2: FOUND THE BUG!]
So, after fiddling around, I discovered that I all I had to do was obtain the number of occurrences of tb < pb. The line summat = sumidx.*repmat(ps,300,1) is not supposed to be there. After removing that line and keeping sumn = sum(sumidx,2);, I get the desired result.

So, based on the comments from #Croote and after fiddling around, I came up with the following code for the function defined in sapply()
sumidx = bsxfun(#lt,ps,ts');
sumn = sum(sumidx,2);
And for the plot, I coded it as
sumnfin = sumn(2:end)-sumn(1:end-1);
plot(t(1:length(sumn)-1),sumnfin)
Finally, I get the desired result

Related

Taylor diagram using JJAS mean values in R

I need some assistance in solving the following issue;
I am drawing a Taylor diagram using R. I calculated the JJAS mean precipitation values (mm/day) for observation and two models than I manually defined those values to get the Taylor diagram. It gives me output but that does not seem right as standard deviation values are too low (sample is attached).
This is the code I am using:
library(plotrix)
ref<-c(3.3592,4.1377,4.0888,3.3098)
model1<-c(2.5053,3.0912,2.9271,2.4238)
model2<-c(2.2181,2.7910,2.7024,2.2495)
taylor.diagram(ref,model1,add=FALSE,col="red")
taylor.diagram(ref,model2,add=TRUE,col="blue")
Alternate way is to use netcdf files of observation and models but I don't know how to extract the precipitation information and use them (I know how to view netcdf data in R but extraction is challenging at this stage for me).
Kindly solve this problem.
I have no knowledge about this but a cursory look at the code of taylor.diagram clarifies what is going on here.
If you enter taylor.diagram without parentheses in the console it will print the function. Around line 15 you will find this function to calculate SD:
SD <- function(x, subn) {
meanx <- mean(x, na.rm = TRUE)
devx <- x - meanx
ssd <- sqrt(sum(devx * devx, na.rm = TRUE)/(length(x[!is.na(x)]) -
subn))
return(ssd)
}
We can run this function with parameter subn as TRUE or FALSE (in R, TRUE equates to 1 and FALSE equates to 0):
> SD(ref, TRUE)
[1] 0.4505061
> SD(ref, FALSE)
[1] 0.3901498
> SD(model1, FALSE)
[1] 0.2798994
> SD(model1, TRUE)
[1] 0.3232
And from this we can see that subn is set to FALSE. Further inspection of the code shows:
subn <- sd.method != "sample"
In other words: if sd.method equals sample (the default value) then subn will be FALSE.
It's up to you to decide what is the correct choice here.
One of the great things of R is that all R functions can be inspected at the console. Doing so can resolve most questions related to 'why is this function behaving like this' with limited effort.

Calculating the mean and variance of a periodic (circular) variable in R

I have several variables in my dataset that represent daily timing of events across a week.
For example for two rows might look like:
t1 = c(NA,12.6,10.7,11.5,12.5,9.5,14.1)
t2 = c(23.7,1.2,NA,22.9,23.2,0.5,0.1)
I want to calculate the variance of these rows. To do this, I need the mean and because these are periodic variables, I've adapted the code from this page:
#This can all be wrapped in a function like this
circ.mean <- function(m,int,na.rm=T) {
if(na.rm) m <- m[!is.na(m)]
rad.m = m*(360/int)*(pi/180)
mean.cos = mean(cos(rad.m))
mean.sin = mean(sin(rad.m))
x.deg = atan(mean.sin/mean.cos)*(180/pi)
return(x.deg/(360/int))
}
This works as expected for t2:
> circ.mean(t2,24)
[1] -0.06803088
although ideally the answer would be 23.93197. But for t1, it gives an incorrect answer:
> circ.mean(t1,24)
[1] -0.1810074
whereas using the normal mean function gives the right answer:
> mean(t1,na.rm=T)
[1] 11.81667
My questions are:
1) Is this "circular mean" code correct and if so, am I using it correctly?
2) I've had a stab my own circ.var function (see below) to calculate the variance of a periodic variable - will this produce the correct variances for all possible input timing vectors?
circ.var <- function(m,int=NULL,na.rm=TRUE) {
if(is.null(int)) stop("Period parameter missing")
if(na.rm) m <- m[!is.na(m)]
if(sum(!is.na(m))==0) return(NA)
n=length(m)
mean.m = circ.mean(m,int)
var.m = 1/(n-1)*sum((((m-mean.m+(int/2))%%int)-(int/2))^2)
return(var.m)
}
Any help would be hugely appreciated! Thanks for taking the time to read this!
I deleted my old answer, as I believe there was a mistake in the solution I provided.
I've written a series of R scripts that I've made available at my GitHub page which should calculate the mean, variance and other stats.
Thanks to #Gregor for his help.

Vectorize Hoeffding's distance

I am trying to vectorise the following function which calculates the Hoeffdings distance between two random variable on [0,1]^2, in a discretise way.
Indeed, if you use the hoeffd function from the Hmisc package, it provides you with a fortran implementation ( that you can find here : https://github.com/harrelfe/Hmisc/blob/master/src/hoeffd.f ), but only give back the maximum of the matrix i'm trying to analyse here. I'm here interested in the place of the maximum, and hence i need to compute the whole matrix.
Here is my current implementation :
hoeffding_D <- function(x,y){
n = length(x)
indep <- outer(0:n,0:n)/(n)^2
bp = list(
c(0,sort(x)) + (c(sort(x),1) - c(0,sort(x)))/2,
c(0,sort(y)) + (c(sort(y),1) - c(0,sort(y)))/2
)
pre_calc <- t(outer(rep(1,n+1),x)<=bp[[1]])
# This is the problematic part :
dep <- t(sapply(bp[[2]],function(bpy){
colMeans(pre_calc*(y<=bpy))
}))
rez <- abs(dep-indep)
return(rez)
}
To use it, consider the folloiwing exemple :
library(copula)
# for 10 values, it's fast enough, but for 1000 it takes a lot of time..
x = pobs(rnorm(10),ties.method = "max")
y = pobs(rnorm(10),ties.method = "max")
hoeffding_D(x,y)
I already suppressed a first sapply via the use of the outer function, but i cant get rid of the other. The issue is that the comparaison x<=bpx must be done for all x and for all bpx, and the same for y, altogether this is a lot of dimensions to the problem...
Do you have an idea on how to speed it up ?

How to do top down forecasted proportions for hts objects with 2 levels?

I had previously asked this question trying to get top down forecasted proportions forecast recombination using the hts package. The solution there works great for multilevel hierarchies, however I have found I get an error when I try to use the solution on a two level hierarchy.
library(hts)
# Create the hierarchy
newhts <- hts(htseg1$bts, list(ncol(htseg1$bts)))
# forecast creation adapted from the `combinef()` example
h <- 12
ally <- aggts(newhts)
allf <- matrix(NA, nrow = h, ncol = ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- forecast(auto.arima(ally[,i]), h = h, PI = FALSE)$mean
allf <- ts(allf, start = 51)
# Earo Wang's solution to my previous question
hts:::TdFp(allf, nodes = htseg1$nodes)
Error in *.default(fcasts[, 1L], prop) : time-series/vector length mismatch
The problem seems to arise because a two level hierarchy skips the last if conditional with the condition if (l.levels > 2L). The last statement of this conditional multiplies includes a piece where prop is multiplied by the time series flist[[k + 1L]], which converts prop into a time series matrix. When this statement is skipped, prop remains a regular matrix causing the error when the time series vector fcasts[, 1L] is multiplied by the matrix prop.
I understand that TdFp is a non exported function and therefore may not be as robust as the other functions in the package, but is there any way around this problem? Since it is a relatively simple case, I can code a solution myself, but since hts::forecast.hts() can handle two level hierarchies for method = "tdfp", I thought there might be a nice clean solution.

Output list of list in R

Explanations of the Goals:
Could someone please help me on this:
I trying to make a Monte Carlo study on the estimators of the Linear Regression beta0hat, beta1hat, R2, R2Adjusted and P-value changing the samples size(30,60,100) and the variance(0.5,0.75,1), using normal a random error.
First i've created 3 samples of each lenght that is relevant for the study which i don´t want to be random.
X1 = sample(0:20,30,T)
X2 = sample(0:20,60,T)
X3 = sample(0:20,100,T)
For the main purpose, i've created this function of Monte Carlo in witch i´m trying to keep the results of each estimator in some approprieated vectors to generate histograms and a plot of P-value in Y axis against R2 in X axis to verify the behavior of the estimator whem i change the variables and set normal random to the errors.
Arguments of the function:
n = sample size, sig = changed variance, b0 = real betahat0, b1 = real betahat1, X = samples of X axis
Monte.Carlo = function(n, sig, b0, b1,X){
Y = b0 + b1 * X + rnorm(n,0,sig)
smr = summary(lm(Y~X))
return(smr)
}
To generate the vector that will be my data in this study to analise the behavior of the estimators, i've used the function replicate like this:
object.1 = replicate(1000,Monte.Carlo(30,0.5,1.4,0.8,1,X1))
beta0_s0.5_n30 <-list(c(object.1[,1:1000][[4]] [1]))
beta1_s0.5_n30<- object.1[[4]] [2]
R2_s0.5_n30 <- object.1[[8]]
R2A_s0.5_n30 <- object.1[[9]]
valorP_s0.5_n30 <- object.1[[4]] [8]
But there is something wrong in this generations above that i can' figured out.
The object.1 seens to have stored 1000 summarys of the regression.
How can i access the 1000 outputs of each estimator of the regression summary and store then in the apropriated vectors, like list of list, as a intented in the comand lines above?
The puspose is to apply this on several objects like in the example below where i've had changed the variance to 0.75 and the sample size to 60:
beta0_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
beta1_s0.75_n60<- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
R2_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
R2A_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
valorP_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
The final go is to generate 120 graphs like in this example to compare the results:
hist(R2A_s0.5_n30,breaks=11)
hist(R2A_s0.75_n30,breaks=11)
hist(R2A_s1_n30,breaks=11)
hist(R2A_s0.5_n60,breaks=11)
hist(R2A_s0.75_n60,breaks=11)
hist(R2A_s1_n60,breaks=11)
hist(R2A_s0.5_n100,breaks=11)
hist(R2A_s0.75_n100,breaks=11)
hist(R2A_s1_n100,breaks=11)
I will really appreciate if someone could help on this, i've tryed a lot of solutions and look in some forums and it doesn't make any difference at all.
Sorry about my english grammar errors.
Thanks a lot!
So I just assumed that your original object.1 call was supposed to have only five arguments like the Monte.Carlo function itself, and shortened this to:
object.1 = replicate(1000,Monte.Carlo(30,0.5,1.4,0.8,X1))
Then I made a dummy dataframe (which is a list of lists) with the column names being the statistics you specified
o1 <- data.frame(b0 = 0, b1 = 0, R2 = 0, R2A = 0, vP = 0)
Then created a for-loop...
for (r in 1:length(object.1)) {
o1[r,"b0"] <- object.1[4,r][[1]][1,1]
o1[r,"b1"] <- object.1[4,r][[1]][2,1]
o1[r,"R2"] <- object.1[8,r]
o1[r,"R2A"] <- object.1[8,r]
o1[r,"vP"] <- object.1[4,r][[1]][2,4]
}
...where a new row is added to o1 for every replication of your Monte.Carlo function and the relevant statistics extracted via the subsetting functions - [[]] and [,] - from the each of the one thousand summary(lm(Y~X)) objects created in object.1. o1 is a dataframe, with each column being a vector of 1000 of each statistic. Apply the same principle for object.2, object.3 etc.
p.s. run the for-loop and you get an error message saying Error in object.1[4, r] : subscript out of bounds but then open o1 and you'll see the loop worked perfectly (I even checked the individual statistics for each replication and they match, so, don't really understand that one :)

Resources