R BiCopKDE cov.wt(z) : 'x' must contain finite values only - r

My dataset consists of stock prices. My final goal is to fit for practice a copula to two stocks.
I've transformed my data to a [0,1] scale and would like to plot the bivariate density with BiCopKDE.
However, although I tried to detect possible non-finite values, I still get the same error message "cov.wt(z) : 'x' must contain finite values only". I reduced my dataset to 16 rows in order to understand the reason, but it didn't help.
The code:
DFM.roh <- read.xlsx("C:\\Users\\Simon\\Documents\\ML Seminar\\Deutscher Finanzmarkt Daten.xlsx")
DFM <- data.frame(X_bei = DFM.roh$s_bei, X_bayn = DFM.roh$s_bayn)
y_a <- ecdf(DFM$X_bei)(DFM$X_bei)
y_b <- ecdf(DFM$X_bayn)(DFM$X_bayn)
Datacop <- data.frame(y_a, y_b)
which(is.na(Datacop), arr.ind=TRUE)
#row col
all(sapply(Datacop, is.finite))
#TRUE
BiCopKDE(Datacop$y_a, Datacop$y_b, "surface")
# cov.wt(z) : 'x' must contain finite values only
The dataset:
enter image description here
Anybody with an idea to solve this?
Best,
Simon

A good way to get what you want is to use BiCopSelect, which is a function in the VineCopula package. Once you get the result, then you can just use the plot function available in the same package.

Related

Error in cor.test.default 'x' and 'y' must have the same length (Spearman’s Rank-Order Correlation)

I'm trying to test for correlation between x and y of my data using Spearman Rank-Order Correlation in R but encountered the following error:
Error in cor.test.default(x = Female_ChLO, y = Female_TBL, method = "spearman") :
'x' and 'y' must have the same length
This data "Female_ChLO" had an outlier removed. When tested on the data before removing the outlier, I didn't encounter this error message.
The data does have a lot of NA but they are vital to the test and I'm trying to include na.rm=T but have no idea how to. Would love to hear suggestions but not too complicated please as I'm new to R.

variable lengths differ error when rollapply lm

I am trying to run a rolling window regression on a number of time series but encountered this strange problem. The following codes reproduce my data. I have a data frame containing returns named "rt" and a data frame containing factors named "factors". Then I produce a function to obtain the regression constant variable.
mat<-as.data.frame(matrix(runif(88*6), nrow = 88, ncol = 6))
colnames(mat)<-c("MKT","SMB","HML","AA","BB","CC")
rt<-mat[,c(4,6)]
factors<-mat[,c(1:3)]
coeffstat_alpha<-function(x){
fit<-lm(x~MKT+SMB+HML,data=factors,na.action=na.omit)
nn<-c(t(coeftest(fit)))[1]
return(nn)
}
When I run this function on the whole sample, it works.
apply(rt,2,FUN=coeffstat_alpha)
but when I rollapply the function, I received the error message
rollapply(reg[,1],width=24,FUN=coeffstat_alpha,by=1,align="left")
"Error in model.frame.default(formula = x ~ MKT + SMB + HML, data = factors, :
variable lengths differ (found for 'MKT')"
I have tried to fixed the problem by search online but couldn't find a post with the similar question. Can anyone help? Thanks!
As the error message suggests the length of variables differ meaning you are passing x in the function which is of length 24 (width) whereas using factors matrix which has 88 rows in it. For this to run you need to have equal length of x as well as factor. You can change the function to
library(lmtest)
coeffstat_alpha<-function(x){
fit<-lm(rt[x, 1]~MKT+SMB+HML,data=factors[x, ],na.action=na.omit)
nn<-c(t(coeftest(fit)))[1]
return(nn)
}
and use sapply as :
sapply(1:(nrow(rt)-23), function(x) coeffstat_alpha(x:(x+23)))

PLS-DA deal with Missing values

I am performing an OPLSDA, all my columns have some missing values.
I am following these instructions: https://www.bioconductor.org/packages/devel/bioc/vignettes/ropls/inst/doc/ropls-vignette.html
This is my code:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")
BiocManager::install("ropls")
library(ropls)
dataMatrix=df.Baseline.All[,c(6:63,74:143)]
dataMatrix= dataMatrix[c(23:294),]
dataMatrix = as.matrix(as.data.frame(lapply(dataMatrix, as.numeric)))
str(dataMatrix)
class(dataMatrix)
sampleMetadata = df.Baseline.All[,c(2,165,168,192)]
sampleMetadata= as.data.frame(sampleMetadata)
attach(df.Baseline.All)
dev.off()
view(dataMatrix)
dev.off()
view(sampleMetadata)
adds.pca <- opls(dataMatrix)
adds.pcs <- opls(dataMatrix) gives me an error
Error: 'x' contains columns with 'NA' only
how can I handle Missing data??
This is how SIMCA software deals with missing values:
"Put simply the NIPALS algorithm interpolates the missing point using a least squares fit but give the
missing data no influence on the model. Successive iterations refine the missing value by simply
multiplying the score and the loading for that point. Many different methods exist for missing data,
such as estimation but they generally converge to the same solution. Missing data is acceptable if they
are randomly distributed. Systematic blocks of missing data are problematic. "
How would you do this in R?
Thanks!
lili
Apparently, you have whole columns with NAs only. You should remove those columns from your data before attenpting to perform PCA. I incidentally created a function to detect which columns are all NAs.
NAcols <- function(X){
thecols <- apply(X, 2, function(x){sum(is.na(x))}) == dim(X)[1]
return(thecols)
}
dataMatrixClean <- dataMatrix[,NAcols(dataMatrix)]
adds.pca <- opls(dataMatrixClean)

Non-numeric argument to binary operator, CSV

I've seen that other people before were already struggling with this, however I didn't manage to solve my problem with those posts. I get the error 'Non-numeric argument to binary operator'. The following reproducible example works:
x=rnorm(1000)+sin(c(1:1000)/100)#random data+ sinus superimposed
par(mfrow=c(2,2))
plot(x)# plot random data
plot(filter(x,rep(1/100,100)))
plot(x-filter(x,rep(1/100,100)))
# variances of variable, long term variability and short term variability
var(x)
var(filter(x, rep(1/100,100)),na.rm=T)
var(x-filter(x, rep(1/100,100)),na.rm=T)
However, I of course want to use my own dataset, it's a csv, and this is when the error occurs. It must have something to do with the data format, because when I export the random data to csv:
x=rnorm(1000)+sin(c(1:1000)/100)#random data+ sinus superimposed
write.csv(x,"dat.csv")
and then try to read in dat.csv
y <- read.csv("dat.csv", header=TRUE, stringsAsFactors=FALSE)
par(mfrow=c(2,2))
plot(y)
plot(filter(y,rep(1/100,100)))
plot(y-filter(y,rep(1/100,100)))
[...] I get the error
Error in x - filter(x, rep(1/100, 100)) :
non-numeric argument to binary operator
Calls: plot
In addition: Warning message:
In plot(x - filter(x, rep(1/100, 100))) :
Incompatible methods ("Ops.data.frame", "Ops.ts") for "-"
Execution halted
Why are the values not numeric? I don't get it. Thanks for your help!
I rewrote the post a little so the x variable wasn't reused for the input & output. The value from read.csv() is now y. Notice its a data.frame, while x is an ordinary numeric vector.
To get the 2nd set of graphs to behave like the first set, extract the first vector from y (called y1 below), then pass that vector to the dplyr functions.
y <- read.csv("dat.csv", header=TRUE, stringsAsFactors=FALSE)
y1 <- y$x # Extract the first column
par(mfrow=c(2,2))
plot(y1)
plot(filter(y1,rep(1/100,100)))
plot(y1-filter(y1,rep(1/100,100)))

Kaggle Digit Recognizer Using SVM (e1071): Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty

I am trying to solve the digit Recognizer competition in Kaggle and I run in to this error.
I loaded the training data and adjusted the values of it by dividing it with the maximum pixel value which is 255. After that, I am trying to build my model.
Here Goes my code,
Given_Training_data <- get(load("Given_Training_data.RData"))
Given_Testing_data <- get(load("Given_Testing_data.RData"))
Maximum_Pixel_value = max(Given_Training_data)
Tot_Col_Train_data = ncol(Given_Training_data)
training_data_adjusted <- Given_Training_data[, 2:ncol(Given_Training_data)]/Maximum_Pixel_value
testing_data_adjusted <- Given_Testing_data[, 2:ncol(Given_Testing_data)]/Maximum_Pixel_value
label_training_data <- Given_Training_data$label
final_training_data <- cbind(label_training_data, training_data_adjusted)
smp_size <- floor(0.75 * nrow(final_training_data))
set.seed(100)
training_ind <- sample(seq_len(nrow(final_training_data)), size = smp_size)
training_data1 <- final_training_data[training_ind, ]
train_no_label1 <- as.data.frame(training_data1[,-1])
train_label1 <-as.data.frame(training_data1[,1])
svm_model1 <- svm(train_label1,train_no_label1) #This line is throwing an error
Error : Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty!
Please Kindly share your thoughts. I am not looking for an answer but rather some idea that guides me in the right direction as I am in a learning phase.
Thanks.
Update to the question :
trainlabel1 <- train_label1[sapply(train_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
trainnolabel1 <- train_no_label1[sapply(train_no_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
svm_model2 <- svm(trainlabel1,trainnolabel1,scale = F)
It didn't help either.
Read the manual (https://cran.r-project.org/web/packages/e1071/e1071.pdf):
svm(x, y = NULL, scale = TRUE, type = NULL, ...)
...
Arguments:
...
x a data matrix, a vector, or a sparse matrix (object of class
Matrix provided by the Matrix package, or of class matrix.csr
provided by the SparseM package,
or of class simple_triplet_matrix provided by the slam package).
y a response vector with one label for each row/component of x.
Can be either a factor (for classification tasks) or a numeric vector
(for regression).
Therefore, the mains problems are that your call to svm is switching the data matrix and the response vector, and that you are passing the response vector as integer, resulting in a regression model. Furthermore, you are also passing the response vector as a single-column data-frame, which is not exactly how you are supposed to do it. Hence, if you change the call to:
svm_model1 <- svm(train_no_label1, as.factor(train_label1[, 1]))
it will work as expected. Note that training will take some minutes to run.
You may also want to remove features that are constant (where the values in the respective column of the training data matrix are all identical) in the training data, since these will not influence the classification.
I don't think you need to scale it manually since svm itself will do it unlike most neural network package.
You can also use the formula version of svm instead of the matrix and vectors which is
svm(result~.,data = your_training_set)
in your case, I guess you want to make sure the result to be used as factor,because you want a label like 1,2,3 not 1.5467 which is a regression
I can debug it if you can share the data:Given_Training_data.RData

Resources