Non-numeric argument to binary operator, CSV - r

I've seen that other people before were already struggling with this, however I didn't manage to solve my problem with those posts. I get the error 'Non-numeric argument to binary operator'. The following reproducible example works:
x=rnorm(1000)+sin(c(1:1000)/100)#random data+ sinus superimposed
par(mfrow=c(2,2))
plot(x)# plot random data
plot(filter(x,rep(1/100,100)))
plot(x-filter(x,rep(1/100,100)))
# variances of variable, long term variability and short term variability
var(x)
var(filter(x, rep(1/100,100)),na.rm=T)
var(x-filter(x, rep(1/100,100)),na.rm=T)
However, I of course want to use my own dataset, it's a csv, and this is when the error occurs. It must have something to do with the data format, because when I export the random data to csv:
x=rnorm(1000)+sin(c(1:1000)/100)#random data+ sinus superimposed
write.csv(x,"dat.csv")
and then try to read in dat.csv
y <- read.csv("dat.csv", header=TRUE, stringsAsFactors=FALSE)
par(mfrow=c(2,2))
plot(y)
plot(filter(y,rep(1/100,100)))
plot(y-filter(y,rep(1/100,100)))
[...] I get the error
Error in x - filter(x, rep(1/100, 100)) :
non-numeric argument to binary operator
Calls: plot
In addition: Warning message:
In plot(x - filter(x, rep(1/100, 100))) :
Incompatible methods ("Ops.data.frame", "Ops.ts") for "-"
Execution halted
Why are the values not numeric? I don't get it. Thanks for your help!

I rewrote the post a little so the x variable wasn't reused for the input & output. The value from read.csv() is now y. Notice its a data.frame, while x is an ordinary numeric vector.
To get the 2nd set of graphs to behave like the first set, extract the first vector from y (called y1 below), then pass that vector to the dplyr functions.
y <- read.csv("dat.csv", header=TRUE, stringsAsFactors=FALSE)
y1 <- y$x # Extract the first column
par(mfrow=c(2,2))
plot(y1)
plot(filter(y1,rep(1/100,100)))
plot(y1-filter(y1,rep(1/100,100)))

Related

R BiCopKDE cov.wt(z) : 'x' must contain finite values only

My dataset consists of stock prices. My final goal is to fit for practice a copula to two stocks.
I've transformed my data to a [0,1] scale and would like to plot the bivariate density with BiCopKDE.
However, although I tried to detect possible non-finite values, I still get the same error message "cov.wt(z) : 'x' must contain finite values only". I reduced my dataset to 16 rows in order to understand the reason, but it didn't help.
The code:
DFM.roh <- read.xlsx("C:\\Users\\Simon\\Documents\\ML Seminar\\Deutscher Finanzmarkt Daten.xlsx")
DFM <- data.frame(X_bei = DFM.roh$s_bei, X_bayn = DFM.roh$s_bayn)
y_a <- ecdf(DFM$X_bei)(DFM$X_bei)
y_b <- ecdf(DFM$X_bayn)(DFM$X_bayn)
Datacop <- data.frame(y_a, y_b)
which(is.na(Datacop), arr.ind=TRUE)
#row col
all(sapply(Datacop, is.finite))
#TRUE
BiCopKDE(Datacop$y_a, Datacop$y_b, "surface")
# cov.wt(z) : 'x' must contain finite values only
The dataset:
enter image description here
Anybody with an idea to solve this?
Best,
Simon
A good way to get what you want is to use BiCopSelect, which is a function in the VineCopula package. Once you get the result, then you can just use the plot function available in the same package.

variable lengths differ error when rollapply lm

I am trying to run a rolling window regression on a number of time series but encountered this strange problem. The following codes reproduce my data. I have a data frame containing returns named "rt" and a data frame containing factors named "factors". Then I produce a function to obtain the regression constant variable.
mat<-as.data.frame(matrix(runif(88*6), nrow = 88, ncol = 6))
colnames(mat)<-c("MKT","SMB","HML","AA","BB","CC")
rt<-mat[,c(4,6)]
factors<-mat[,c(1:3)]
coeffstat_alpha<-function(x){
fit<-lm(x~MKT+SMB+HML,data=factors,na.action=na.omit)
nn<-c(t(coeftest(fit)))[1]
return(nn)
}
When I run this function on the whole sample, it works.
apply(rt,2,FUN=coeffstat_alpha)
but when I rollapply the function, I received the error message
rollapply(reg[,1],width=24,FUN=coeffstat_alpha,by=1,align="left")
"Error in model.frame.default(formula = x ~ MKT + SMB + HML, data = factors, :
variable lengths differ (found for 'MKT')"
I have tried to fixed the problem by search online but couldn't find a post with the similar question. Can anyone help? Thanks!
As the error message suggests the length of variables differ meaning you are passing x in the function which is of length 24 (width) whereas using factors matrix which has 88 rows in it. For this to run you need to have equal length of x as well as factor. You can change the function to
library(lmtest)
coeffstat_alpha<-function(x){
fit<-lm(rt[x, 1]~MKT+SMB+HML,data=factors[x, ],na.action=na.omit)
nn<-c(t(coeftest(fit)))[1]
return(nn)
}
and use sapply as :
sapply(1:(nrow(rt)-23), function(x) coeffstat_alpha(x:(x+23)))

Error with RandomForest in R because of "too many categories"

I'm trying to train a RF model in R, but when i try to define the model:
rf <- randomForest(labs ~ .,data=as.matrix(dd.train))
It gives me the error:
Error in randomForest.default(m, y, ...) :
Can not handle categorical predictors with more than 53 categories.
Any idea what could it be?
And no, before you say "You have some categoric variable with more than 53 categories". No, all variables but labs are numeric.
Tim Biegeleisen: Read the last line of my question and you will see why is not the same as the one you are linking!
Edited to address followup from OP
I believe using as.matrix in this case implicitly creates factors. It is also not necessary for this packages. You can keep it as a data frame, but will need to make sure that any unused factor levels are dropped by using droplevels (or something similar). There are many reasons an unused factor may be in your data set, but a common one is a dropped observation.
Below is a quick example that reproduces your error:
library('randomForest')
#making a toy data frame
x <- data.frame('one' = c(1,1,1,1,1,seq(50) ),
'two' = c(seq(54),NA),
'three' = seq(55),
'four' = seq(55) )
x$one <- as.factor(x$one)
x <- na.omit(x) #getting rid of an NA. Note this removes the whole row.
randomForest(one ~., data = as.matrix(x)) #your first error
randomForest(one ~., data = x) #your second error
x <- droplevels(x)
randomForest(one ~., data = x) #OK

How to load a csv file into R as a factor for use with glmnet and logistic regression

I have a csv file (single column, numeric values) called "y" that consists of zeros and ones where the rows with the value 1 indicate the target variable for logistic regression, and another file called "x" with the same number of rows and with columns of numeric predictor values. How do I load these so that I can then use cv.glmnet, i.e.
x <- read.csv('x',header=FALSE,sep=",")
y <- read.csv('y',header=FALSE )
is throwing an error
Error in y %*% rep(1, nc) :
requires numeric/complex matrix/vector arguments
when I call
cvfit = cv.glmnet(x, y, family = "binomial")
I know that "y" should be loaded as a "factor," but how do I do this? My online searches have found all sorts of approaches that have just confused me. What is the simple one-liner to just load this data ready for glmnet?
The cv.glmnet requires data to be provided in vector or matrix format. You can use the following code
xmat = as.matrix(x)
yvec = as.vector(y)
Then use
cvfit = cv.glmnet(xmat, yvec, family = "binomial")
If you can provide your data in dput() format, I can give a try.

R: How can I convert columns of a data.frame to x[,1] and x[,2]?

I'm trying to use the R function ZYmediate from R.R. Wilcox's robust functions ("Introduction to Robust Estimation and Hypothesis Testing"), as implemented in the WRS package hosted at this GitHub repo: https://github.com/nicebread/WRS
The ZYmediate function is specified as follows:
ZYmediate(x,y,nboot=2000,alpha=.05,kappa=.05,SEED=TRUE,xout=FALSE,outfun=out)
Documentation states that x[,1] is predictor, x[,2] is mediator variable, and y is outcome variable.
If I have a data.frame with three columns (mediator, predictor, and outcome), how can I convert the predictor and the mediator columns of the data.frame to an x object where the predictor is x[,1] and the mediator is x[,2]?
Currently, I am trying to get this to work with a data set from the "mediation" package. Here is how I am importing it:
install.packages("mediation")
library("mediation")
set.seed(2014)
data("framing", package = "mediation")
# Mediator = framing$emo, predictor = framing$treat, outcome = framing$cong_mesg
Thinking this might be a matrix object, I tried
x <- matrix(framing$cong_mesg, framing$emo)
But it tells me "data length [265] is not a sub-multiple or multiple of the number of rows [7]"
I also tried
x <- matrix(framing$cong_mesg, framing$emo)
But when I run ZYmediate(x,y) it tells me "Error in t(Z) %*% matrix(1/n, n,1) : requires numeric/complex matrix/vector arguments. In addition: Warning message:In cbind(x, y): number of rows of result is not a multiple of vector length (arg 1)"
What function should I be using? Thanks! I'm still learning R, so forgive me if this question is trivial.

Resources