Creating these dummy variables in R

Creating these dummy variables in R - r

Running a simulation of the St Petersberg game paradox, and I need to create these dummy variables for a certain number of outcomes. I have a tried appending in a for loop but cant seem to come out with the correct answer. Attached are the variables I need to create.
P(i) is an array in the form of {1/2^1,1/2^2,..,1/2^n}

It's pretty easy to do this in R
# P= .5^k:r
P = .5^1:100
d1 = sum(P)
d2 = sum(P[-1]) # or just d1-.5
Or just by using the geometric sum formula :
d1 = (1-.5^100)
d2 = .5(1-.5^99)

Related

How to use lapply with get.confusion_matrix() in R?

I am performing a PLS-DA analysis in R using the mixOmics package. I have one binary Y variable (presence or absence of wetland) and 21 continuous predictor variables (X) with values ranging from 1 to 100.
I have made the model with the data_training dataset and want to predict new outcomes with the data_validation dataset. These datasets have exactly the same structure.
My code looks like:
library(mixOmics)
model.plsda<-plsda(X,Y, ncomp = 10)
myPredictions <- predict(model.plsda, newdata = data_validation[,-1], dist = "max.dist")
I want to predict the outcome based on 10, 9, 8, ... to 2 principal components. By using the get.confusion_matrix function, I want to estimate the error rate for every number of principal components.
prediction <- myPredictions$class$max.dist[,10] #prediction based on 10 components
confusion.mat = get.confusion_matrix(truth = data_validatie[,1], predicted = prediction)
get.BER(confusion.mat)
I can do this seperately for 10 times, but I want do that a little faster. Therefore I was thinking of making a list with the results of prediction for every number of components...
library(BBmisc)
prediction_test <- myPredictions$class$max.dist
predictions_components <- convertColsToList(prediction_test, name.list = T, name.vector = T, factors.as.char = T)
...and then using lapply with the get.confusion_matrix and get.BER function. But then I don't know how to do that. I have searched on the internet, but I can't find a solution that works. How can I do this?
Many thanks for your help!

Without reproducible there is no way to test this but you need to convert the code you want to run each time into a function. Something like this:
confmat <- function(x) {
prediction <- myPredictions$class$max.dist[,x] #prediction based on 10 components
confusion.mat = get.confusion_matrix(truth = data_validatie[,1], predicted = prediction)
get.BER(confusion.mat)
}
Now lapply:
results <- lapply(10:2, confmat)
That will return a list with the get.BER results for each number of PCs so results[[1]] will be the results for 10 PCs. You will not get values for prediction or confusionmat unless they are included in the results returned by get.BER. If you want all of that, you need to replace the last line to the function with return(list(prediction, confusionmat, get.BER(confusion.mat)). This will produce a list of the lists so that results[[1]][[1]] will be the results of prediction for 10 PCs and results[[1]][[2]] and results[[1]][[3]] will be confusionmat and get.BER(confusion.mat) respectively.

DEA analysis: variables are excluded in analysis?

I’m working on a DEA (Data Envelopment Analysis) analysis to analyze the relative effects of different banks efficiencies.
The packages I’m using are rDEA and kableExtra.
What this analysis if doing is measuring the relative effect of input and output variables that I use to examine the efficiency for each individual bank.
The problem is that my code only includes two out of four output variables and I can’t find anywhere in the code where I ask it to do so.
Can some of you identify the problem?
Thank you in advance!
I have tried to format the data in several different ways, assign the created "inp_var" and "out_var" as a matrix'.
#install.packages('rDEA')
#install.packages('dplyr')
#install.packages('kableExtra')
library(kableExtra)
library(rDEA)
library(dplyr)
dea <- tbl_df(PANELDATA)
head(dea)
inp_var <- select(dea, 'IE', 'NIE')
out_var <- select(dea, 'L', 'D', 'II','NII')
inp_var <- as.matrix(inp_var)
out_var <- as.matrix(out_var)
model <- dea(XREF= inp_var, YREF = out_var, X = inp_var, Y = out_var, model= "output", RTS = "constant")
model
I want a number between 0 and 1 for every observation, where the most efficient one receives a 1. What I get now is the same result no matter if I include the two extra output variables L and II or not.
L stands for Loans to the public and II for interest income and it would be weird if these variables had NO effect for the efficiency of banks.

I think you could type this:
result <- cbind(round(model$thetaOpt, 3), round(model$lambda, 3))
rownames(result)<-dea[[1]]
colnames(result)<-c("Efficiency", rownames(result))
kable(result[,])

Converting R code to MATLAB code: Stuck at sapply()

I have the following R code, which I am trying to convert to MATLAB. (No, I do not want to run the R code in MATLAB like shown here).
The R code is here:
# model parameters
dt <- 0.001
t <- seq(dt,0.3,dt)
n=700*1000
D = 1
d = 0.5
# model
ft <- n*d/sqrt(2*D*t^3)*dnorm(d/sqrt(2*D*t),0,1)
fmids <- n*d/sqrt(2*D*(t+dt/2)^3)*dnorm(d/sqrt(2*D*(t+dt/2)),0,1)
plot(t,ft*dt,type="l",lwd=1.5,lty=2)
# simulation
#
# simulation by drawing from uniform distribution
# and converting to time by using quantile function of normal distribution
ps <- runif(n,0,1)
ts <- 2*pnorm(-d/sqrt(2*D*t))
sumn <- sapply(ts, FUN = function(tb) sum(ps < tb))
lines(t[-length(sumn)],sumn[-1]-sumn[-length(sumn)],col=4)
And the MATLAB code I have done so far is
% # model
ft = (n*d)./sqrt(2*D.*t.^3).*normpdf(d./sqrt(2*D.*t),0,1);
fmids = (n*d)./sqrt(2*D*((t+dt)./2).^3).*normpdf(d./sqrt(2*D.*((t+dt)./2)),0,1);
figure;plot(t,ft.*dt);
% # simulation
% #
% # simulation by drawing from uniform distribution
% # and converting to time by using quantile function of normal distribution
ps = rand(1,n);
ts = 2*normcdf(-d./sqrt(2*D*t));
So, here is where I am stuck. I don't understand what function sumn = sapply(ts, FUN = function(tb) sum(ps < tb)) does and where the parameter 'tb' came from. It is not defined in the given R code as well.
Could anyone tell me what the equivalent of that function R code is in MATLAB?
[EDIT 1: UPDATE]
So, based on the comments from #Croote, I came up with the following code for the function defined in sapply()
sumidx = bsxfun(#lt,ps,ts');
summat = sumidx.*repmat(ps,300,1);
sumn = sum(summat,2);
sumnfin = sumn(2:end)-sumn(1:end-1);
plot(t(1:length(sumn)-1),sumnfin)
However, I am not getting the desired results. The curves should overlap with each other: the blue curve is correct, so the orange need to overlap with the blue curve.
What am I missing here? Is R's pnorm() equivalent to MATLAB'snormcdf() as I have done here?
[EDIT 2: FOUND THE BUG!]
So, after fiddling around, I discovered that I all I had to do was obtain the number of occurrences of tb < pb. The line summat = sumidx.*repmat(ps,300,1) is not supposed to be there. After removing that line and keeping sumn = sum(sumidx,2);, I get the desired result.

So, based on the comments from #Croote and after fiddling around, I came up with the following code for the function defined in sapply()
sumidx = bsxfun(#lt,ps,ts');
sumn = sum(sumidx,2);
And for the plot, I coded it as
sumnfin = sumn(2:end)-sumn(1:end-1);
plot(t(1:length(sumn)-1),sumnfin)
Finally, I get the desired result

Output list of list in R

Explanations of the Goals:
Could someone please help me on this:
I trying to make a Monte Carlo study on the estimators of the Linear Regression beta0hat, beta1hat, R2, R2Adjusted and P-value changing the samples size(30,60,100) and the variance(0.5,0.75,1), using normal a random error.
First i've created 3 samples of each lenght that is relevant for the study which i don´t want to be random.
X1 = sample(0:20,30,T)
X2 = sample(0:20,60,T)
X3 = sample(0:20,100,T)
For the main purpose, i've created this function of Monte Carlo in witch i´m trying to keep the results of each estimator in some approprieated vectors to generate histograms and a plot of P-value in Y axis against R2 in X axis to verify the behavior of the estimator whem i change the variables and set normal random to the errors.
Arguments of the function:
n = sample size, sig = changed variance, b0 = real betahat0, b1 = real betahat1, X = samples of X axis
Monte.Carlo = function(n, sig, b0, b1,X){
Y = b0 + b1 * X + rnorm(n,0,sig)
smr = summary(lm(Y~X))
return(smr)
}
To generate the vector that will be my data in this study to analise the behavior of the estimators, i've used the function replicate like this:
object.1 = replicate(1000,Monte.Carlo(30,0.5,1.4,0.8,1,X1))
beta0_s0.5_n30 <-list(c(object.1[,1:1000][[4]] [1]))
beta1_s0.5_n30<- object.1[[4]] [2]
R2_s0.5_n30 <- object.1[[8]]
R2A_s0.5_n30 <- object.1[[9]]
valorP_s0.5_n30 <- object.1[[4]] [8]
But there is something wrong in this generations above that i can' figured out.
The object.1 seens to have stored 1000 summarys of the regression.
How can i access the 1000 outputs of each estimator of the regression summary and store then in the apropriated vectors, like list of list, as a intented in the comand lines above?
The puspose is to apply this on several objects like in the example below where i've had changed the variance to 0.75 and the sample size to 60:
beta0_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
beta1_s0.75_n60<- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
R2_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
R2A_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
valorP_s0.75_n60 <- replicate(1000,Monte.Carlo(60,0.75,1.4,0.8,X2))
The final go is to generate 120 graphs like in this example to compare the results:
hist(R2A_s0.5_n30,breaks=11)
hist(R2A_s0.75_n30,breaks=11)
hist(R2A_s1_n30,breaks=11)
hist(R2A_s0.5_n60,breaks=11)
hist(R2A_s0.75_n60,breaks=11)
hist(R2A_s1_n60,breaks=11)
hist(R2A_s0.5_n100,breaks=11)
hist(R2A_s0.75_n100,breaks=11)
hist(R2A_s1_n100,breaks=11)
I will really appreciate if someone could help on this, i've tryed a lot of solutions and look in some forums and it doesn't make any difference at all.
Sorry about my english grammar errors.
Thanks a lot!

So I just assumed that your original object.1 call was supposed to have only five arguments like the Monte.Carlo function itself, and shortened this to:
object.1 = replicate(1000,Monte.Carlo(30,0.5,1.4,0.8,X1))
Then I made a dummy dataframe (which is a list of lists) with the column names being the statistics you specified
o1 <- data.frame(b0 = 0, b1 = 0, R2 = 0, R2A = 0, vP = 0)
Then created a for-loop...
for (r in 1:length(object.1)) {
o1[r,"b0"] <- object.1[4,r][[1]][1,1]
o1[r,"b1"] <- object.1[4,r][[1]][2,1]
o1[r,"R2"] <- object.1[8,r]
o1[r,"R2A"] <- object.1[8,r]
o1[r,"vP"] <- object.1[4,r][[1]][2,4]
}
...where a new row is added to o1 for every replication of your Monte.Carlo function and the relevant statistics extracted via the subsetting functions - [[]] and [,] - from the each of the one thousand summary(lm(Y~X)) objects created in object.1. o1 is a dataframe, with each column being a vector of 1000 of each statistic. Apply the same principle for object.2, object.3 etc.
p.s. run the for-loop and you get an error message saying Error in object.1[4, r] : subscript out of bounds but then open o1 and you'll see the loop worked perfectly (I even checked the individual statistics for each replication and they match, so, don't really understand that one :)

Using a for loop for performing several regressions

I am currently performing a style analysis using the following method: http://www.r-bloggers.com/style-analysis/ . It is a constrained regression of one asset on a number of benchmarks, over a rolling 36 month window.
My problem is that I need to perform this regression for a fairly large number of assets and doing it one by one would take a huge amount of time. To be more precise: Is there a way to tell R to regress columns 1-100 one by one on colums 101-116. Of course this also means printing 100 different plots, one for each asset. I am new to R and have been stuck for several days now.
I hope it doesn't matter that the following excerpt isn't reproducible, since the code works as originally intended.
# Style Regression over Window, constrained
#--------------------------------------------------------------------------
# setup
load.packages('quadprog')
style.weights[] = NA
style.r.squared[] = NA
# Setup constraints
# 0 <= x.i <= 1
constraints = new.constraints(n, lb = 0, ub = 1)
# SUM x.i = 1
constraints = add.constraints(rep(1, n), 1, type = '=', constraints)
# main loop
for( i in window.len:ndates ) {
window.index = (i - window.len + 1) : i
fit = lm.constraint( hist.returns[window.index, -1], hist.returns[window.index, 1], constraints )
style.weights[i,] = fit$coefficients
style.r.squared[i,] = fit$r.squared
}
# plot
aa.style.summary.plot('Style Constrained', style.weights, style.r.squared, window.len)
Thank you very much for any tips!

"Is there a way to tell R to regress columns 1-100 one by one on colums 101-116."
Yes! You can use a for loop, but you there's also a whole family of 'apply' functions which are appropriate. Here's a generalized solution with a random / toy dataset and using lm(), but you can sub in whatever regression function you want
# data frame of 116 cols of 20 rows
set.seed(123)
dat <- as.data.frame(matrix(rnorm(116*20), ncol=116))
# with a for loop
models <- list() # empty list to store models
for (i in 1:100) {
models[[i]] <-
lm(formula=x~., data=data.frame(x=dat[, i], dat[, 101:116]))
}
# with lapply
models2 <-
lapply(1:100,
function(i) lm(formula=x~.,
data=data.frame(x=dat[, i], dat[, 101:116])))
# compare. they give the same results!
all.equal(models, models2)
# to access a single model, use [[#]]
models2[[1]]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating these dummy variables in R - r

It's pretty easy to do this in R # P= .5^k:r P = .5^1:100 d1 = sum(P) d2 = sum(P[-1]) # or just d1-.5 Or just by using the geometric sum formula : d1 = (1-.5^100) d2 = .5(1-.5^99)

Related

How to use lapply with get.confusion_matrix() in R?

DEA analysis: variables are excluded in analysis?

Converting R code to MATLAB code: Stuck at sapply()

Output list of list in R

Using a for loop for performing several regressions

Categories

Resources