Determinant of the sample covariance - r

new here!
I want to find the determinant of the pooled sample covariance of the given matrix. Can someone give a leading clue? (i have searched everywhere)
I have tried many things, this isn't the right solution (i have tried many) such as:
det(cov(dfdata))
mvec <- colMeans(dfdata) #sample mean vector#`enter code here`
covM <- cov(dfdata) #sample covariance matrix#
corM <- cor(dfdata) #sample correlation matrix#
covMnum <- cov(dfdatanum)
The following code is what i have developed:
##uploading the data
data2 <- read.table("file.tsv")
data3 <- read.table("file2.tsv")
data4 <- read.table("file3.tsv")
data5 <- read.table("file4.tsv")
## have a first look at data###
head(dfBull)
n <- nrow(dfBull) #n#
p <- ncol(dfBull) #p#
summary(dfBull)
##removing the first rove as it isnt neccesary
a <- data2[-(1), ]
b <- data3[-(1), ]
c <- data4[-(1), ]
d <- data5[-(1), ]
## finding the covariance
cv1 <- cov(as.numeric(a$V1), as.numeric(a$V2))
cv2 <- cov(as.numeric(b$V1), as.numeric(b$V2))
cv3 <- cov(as.numeric(c$V1), as.numeric(c$V2))
cv4 <- cov(as.numeric(c$V1), as.numeric(c$V2))
##This is the function im trying to use:
mat <- matrix(c(cv1,0,0,0,0,cv2,0,0,0,0,cv3,0,0,0,0,cv4), nrow=4, ncol=4, byrow=TRUE)
det(mat)`

Related

Why are tha standard error of my coefficients the same?

I am trying to run a simulation code,
And in the matrix named par.est1 I am saving in the 5th and 6th columns the standard errors of the coefficients b1 and b2, but those happen to be the exact same on the 1000 repetitions. Could anyone know why is this happening? Maybe it has something to do with the way that I created the correlated variables?
This is the code:
set.seed(185736)
reps <- 1000 #repetitions
par.est1 <- matrix(NA, nrow=reps, ncol=6)
b1 <- 4
b2 <- 5.8
n <- 26
r <- 0.1
#Create correlated variables
library(MASS)
data <- mvrnorm(n=n, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
V1 = data[, 1] # standard normal (mu=0, sd=1)
V2 = data[, 2]
cor(V1, V2)
for(i in 1:reps){
Y <- V1*b1+V2*b2+rnorm(n,0,1) #The true DGP, with N(0,1) error
model1 <- lm(Y~V1+V2)
vcv1 <- vcov(model1)
par.est1[i,1] <- model1$coef[1]
par.est1[i,2] <- model1$coef[2]
par.est1[i,3] <- model1$coef[3]
par.est1[i,4] <- sqrt(diag(vcv1)[1]) #SE
par.est1[i,5] <- sqrt(diag(vcv1)[2])
par.est1[i,6] <- sqrt(diag(vcv1)[3])
}
Thank you.
Thanks user2554330.
Any way I can make correlated variables with different means and variances??

Plot cumulative value for different series

I have run a short simulation and want to plot the outcomes of each simulation in terms of the "running sum" over parameter k. For reference, I want to end up with a plot that looks similar to the ones in this article:
https://www.pinnacle.com/en/betting-articles/Betting-Strategy/betting-bankroll-management/VDM2GY6UX3B552BG
The following is the code for the simulation:
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10000
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
If I wasn't being clear in the description let me know.
Okay so here is how you can achieve the plot of the cumulative value over bets (I set nsims <- 10 so that the plot is readable).
First I generate the data :
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
Then I create a dataframe containing the results of the n simulations (10 here) :
df <- as.data.frame(sims)
What we want to plot is the cumulative sum, not the result at a specific bet so we iterate through the columns (i.e. the simulations) to have that value :
for (i in colnames(df)){
df[[i]] <- cumsum(df[[i]])
}
df <- mutate(df, bets = rownames(df))
output <- melt(df, id.vars = "bets", variable.name = 'simulation')
Now we can plot our data :
ggplot(output, aes(bets,value,group=simulation)) + geom_line(aes(colour = simulation))

R code to normalize the test set

I would like to normalize the data this way:
(trainData - mean(trainData)) / sd(trainData)
(testData - mean(trainData)) / sd(trainData)
For the Train set I can use the function scale(). How can I do for the test set? I tried in different ways the lapply() function .. but I did not succeed.
Many thanks! An exemple of code:
Train <- data.frame(matrix(c(1:100),10,10))
Test <- data.frame(matrix(sample(1:100),10,10))
scaled.Train <- scale(Train)
ct <- ncol(Test)
rt <- nrow(Test)
ncol(Train)
sdmatrix <- data.frame(matrix(,rt,ct))
for (i in 1:ct){
sdmatrix[1,i] <- mean(Train[,i])
sdmatrix[2,i] <- sd(Train[,i])
}
Test <- rbind(Test, sdmatrix)
normTest <- function(x){
a <- x[rt-1]
b <- x[rt]
x <- (x-a)/b
}
Test <- lapply(Test[1:(rt-2),],normTest)

replace NA by truncated normal distribution values in r

I am trying to replace NAs by truncated normal distribution values.
First I used sample as follows and the function worked:
v.new <- replace(vector,v, sample(8,length(v),replace =FALSE))
However when I try to use rtnorm it seems not to work. I got any error messages and it takes ages to replace the NAs by the desired interval. Any suggestion to make this work?
library(msm)
# Some data
data("airquality")
airquality$Ozone
# My function
add.trunc.to.NAvector <- function(vector){
v <- NULL
for(i in 1:length(vector)){
if(is.na(vector[i])==TRUE)
v <- append(v, i)
}
mean.val <- mean(vector)
sd.val <- sd(vector)
min.val <- mean.val - 4 * sd.val
max.val <- mean.val + 4 * sd.val
v.new <- replace(vector,v, rtnorm(length(v), lower = min.val, upper = max.val))
return(v.new)
}
Should not this work?
v <- airquality$Ozone
v.new <- v
indices <- which(is.na(v))
m <- mean(v[-indices])
s <- sd(v[-indices])
v.new[indices] <- rtnorm(length(indices), lower = m-4*s, upper = m+4*s)

Non-Deterministic behaviour of svm{e1071}

I noticed that SVM when fed with decision.values=T (plus sigmoid to get probabilities ) produces non-deterministic result when I permute data frame under analysis. Does anyone has any idea why? Please try the code yourself
install.packages("e1071")
library(e1071)
A <- cbind(rnorm(20,1,1),rnorm(20,1,1),rep(1,20))
B <- cbind(rnorm(20,9,1),rnorm(20,9,1),rep(0,20))
dataframe <- as.data.frame(rbind(A,B))
predc <- rep(0,length(dataframe[,1]))
K <- length(dataframe[1,])
permutator <- sample(nrow(dataframe))
dataframe$V3 <- factor(dataframe$V3)
dataframe <- dataframe[permutator, ]
for(i in 1:length(dataframe[,1])) {
frm <- as.formula(object=paste("V",as.character(K), " ~ .",sep=""))
r <- svm(formula=frm, data=(dataframe[-i,]))
predicted <- predict(r,newdata=dataframe[i,],decision.values=TRUE)
predc[i] <- sigmoid(attr(predicted,'decision.values')[1])
}
plot(sort(predc))
[edited: code]

Resources