While programming in R, I'm continuosly facing the following error::
Error in data.validity(data, "data") : Bad usage: input 'data' is
not double type.
Can anyone please explain why this error is happening, i.e. the reasons in the dataset which cause the error to arise?
Here is the code I'm running. The packages I have loaded are cluster, psych and clv.
data1 <- read.table(file='dataset.csv', sep=',', header=T, row.names=1)
data1.p <- as.matrix(data1)
hello.data <- data1.p[,1:15]
agnes.mod <- agnes(hello.data)
v.pred <- as.integer(cutree(agnes.mod,3)) # "cut" the tree
scatt <- clv.Scatt(hello.data, v.pred)
Error in data.validity(data, "data") :
Bad usage: input 'data' is not double type.
The key part of data.validity() raising the error is:
data = as.matrix(data)
if( !is.double(data) )
stop(paste("Bad usage: input '", name, "' is not double type.", sep=""))
data is converted to a matrix and then checked if it is a numeric matrix via is.double(). If it isn't numeric the clause is true and the error raised. So why isn't your data (hello.data) numeric when converted to a matrix? Either you have character variables in your data or there are factors. Do you have factors? Try
str(hello.data)
Are there any non-numeric variables in there? If you have character data then get rid of it. If you have factors, then data.validity() could coerce via data.matrix() but as it doesn't, try
hello.data <- data.matrix(hello.data)
after the line creating hello.data then run the rest of your code.
Whether this makes sense (treating a nominal or ordinal variable as a simple numeric) is unclear as you haven't provided a reproducible example or explained what your data are etc.
Related
I'm getting this error :
Error: f must be a factor (or character vector).
Here is my code
ge19 <- read.csv("ge2019.csv")
aps19 <- read.csv("aps19.csv")
ge19aps19 <- merge(ge19, aps19,by="ons_id")
ge19aps19$london <- ge19aps19$region_name
table(ge19aps19$london)
library (dplyr)
library(forcats)
ge19aps19$london <- fct_drop(ge19aps19$london)
table(ge19aps19$london)
ge19aps19$london <- relevel(ge19aps19$london, ref= "London")
table(ge19aps19$london)
ge19aps19$lab.per <- ge19aps19$lab/ge19aps19$valid_votes
ge19aps19$lab.per <- fct_drop(ge19aps19$lab.per)
Can anyone tell me what's wrong? first time user of this site so please let me know if there's more information needed / I've formatted my question wrong
The error message mean that the param you passed to fct_drop function is neighther factor or character.
And from the your code I saw that ge19aps19$lab.per is a numeric column calculated by this formula
ge19aps19$lab.per <- ge19aps19$lab/ge19aps19$valid_votes
Why you run fct_drop on that column? It is a numeric column so fct_drop threw an error message there!
Hi everyone,
I am using a sample data in RStudio. I used the code below:
njnew <- nj %>%
group_by(NAME_2) %>%
summarise(Num.totalbirths=sum(births),
Num.totalvulnerable=sum(vulnerable)) %>%
mutate(percent.potentailcase=potentialcase/Num.totalpotentialcase,
percent.vulerablecase=vulnerable/Num.vulnerablecase)
I get after running:
Error in sum(births) : invalid 'type' (character) of argument
My dataset is an csv but I manually added/filled in 2 additional columns (births, vulnerable).
Could you kindly let me know how this error may have happened?
Judging from the error message, it looks like births is of type character. However, you can only compute the sum of numeric, complex or logical vectors. This likely happened when you manually added the column after reading in the csv.
You can double-check the type of the variable with class(nj$births), which probably returns character. Try converting your variable(s) with as.numeric(). You may need to repeat that process for other variables (such as vulnerable) which you manually added, e.g.:
nj <- nj %>%
mutate(births = as.numeric(births),
vulnerable = as.numeric(vulnerable))
Then your code should work fine.
I came a across a problem with dplyr, which caused an error message when I used it in a survival analysis. The root cause turned out to be that when a variable in a grouped data frame (or any object with class tbl_df) is referred to using [,] notation, it always reports a length of 1, even when the real length is greater than that. Using the $x notation reports the correct length.
With a data frame, the following return the expected length of 32:
length(mtcars$mpg)
length(mtcars[ , "mpg"])
With a grouped data frame the $ notation returns 32, and all the rest using [] notation return a length of 1:
foo <- mtcars %>% group_by(cyl)
length(foo$mpg)
length(foo[ , "mpg"])
length(foo[ , 1])
VarName <- "mpg"
length(foo[ , VarName])
It is just the reported length that is incorrect The data itself is all there i.e.:
head(foo[ , "mpg"])
The incorrect reported length leads to an error message in functions such as Surv(), which presumably include a length() check. This is obviously a very simplified example to illustrate. In the failed program I was using [ , VarName] notation inside a function to refer to a variable column. The workaround is simply to convert the data from the offending Data Frame Tbl format to an ordinary data frame within the function. Can anyone shed any light on why this happens? It might save others wasting as much time as I have!
I am trying to run a boruta feature selection on my data set.
The code is below:
df<-read.csv('F:/DataAnalyticsClub/DACaseComp/DatasetDist/Datasets/BestFile.csv',stringsAsFactors=FALSE )
install.packages("Boruta")
library(Boruta)
df[is.na(df)] <- 0
df[df == ""] <- 0
X<-df[ , -which(names(df) %in% c("PREVSALEDATE","PREVSALEDATE2","ClassLabel", "PARID", "PROPERTYUNIT", "PriceDiff1", "PriceDiff2", "DateDiff1", "DateDiff2", "SALEDATE"))]
Y<-df['ClassLabel']
factorCols <- c("SCHOOLDESC","MUNIDESC","SALEDESC","INSTRTYPDESC","NEIGHDESC","TAXDESC","TAXSUBCODE_DESC","OWNERDESC","USEDESC","LOTAREA","CLEANGREEN","FARMSTEADFLAG","ABATEMENTFLAG","COUNTYEXEMPTBLDG","STYLEDESC","EXTFINISH_DESC","ROOFDESC","BASEMENTDESC","GRADEDESC","CONDITIONDESC","CDUDESC","HEATINGCOOLINGDESC","BSMTGARAGE")
nonFactorCols<-c("PRICE","COUNTYTOTAL","LOCALTOTAL","FAIRMARKETTOTAL","STORIES","YEARBLT","TOTALROOMS","BEDROOMS","FULLBATHS","HALFBATHS","FIREPLACES","FINISHEDLIVINGAREA","PREVSALEPRICE","PREVSALEPRICE2")
X[factorCols] <- lapply(X[factorCols], factor)
set.seed(123)
boruta.train<-Boruta(X,Y)
So you see that I have a data set of different features, some of them are string features, so I convert them to factors. The rest is numeric. I test my assumptions:
And once I run the Boruta I get
Error in data.matrix(data.selected) :
(list) object cannot be coerced to type 'double'
I am not sure why. All of my columns are Factors or varoius numeric types. What can be wrong?
After googling a bit I found that some people recommend to do the as.matrix() conversion, but in such case:
> boruta.train<-Boruta(as.matrix(X),as.matrix(Y))
Error: Variable none not found. Ranger will EXIT now.
Error in ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", :
User interrupt or internal error.
Ok, after playing around with that I managed to identify the problem. Boruta requires Y (target) to be of the list type, not dataframe or anything else.
So just creating Y like this:
Y<-df[,'ClassLabel']
Solves the problem.
I'm trying to run a LASSO on our dataset, and to do so, I need to convert non-numeric variables to numeric, ideally via a sparse matrix. However, when I try to use the Matrix command, I get the same error:
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
I thought this was due to NA's in my data, so I did an na.omit and got the same error. I tried again with a mini subset of my code and got the same error again:
> sparsecombined <- Matrix(combined1[1:10,],sparse=TRUE)
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
This is the data set I tried to convert with that last line of code:
Is there anything that jumps out that might prevent sparse conversion?
The easiest way to incorporate categorical variables into a LASSO is to use my glmnetUtils package, which provides a formula/data frame interface to glmnet.
glmnet(ArrDelay ~ ArrTime + uniqueCarrier + TailNum + Origin + Dest,
data=combined1, sparse=TRUE)
This automatically handles categorical vars via one-hot encoding (also known as dummy variables). It can also use sparse matrices if so desired.
I think the error is due to the fact that you have non-numeric data types in your matrix.
Perhaps first convert your nun-numeric columns like UniqueCarrier to binary vectors using one-hot encoding. And only then convert the matrix to sparse.
Here is my code that I used for that conversion:
# Convert Genre into binary variables
# Convert genreVector into a corpus in order to parse each text string into a binary vector with 1s representing the presence of a genre and 0s the absence
library(tm)
library(slam)
convertToBinary <- function(category) {
genreVector = category
genreVector = strsplit(genreVector, "(\\s)?,(\\s)?") # separate out commas
genreVector = gsub(" ", "_", genreVector) # combine DirectorNames with whitespaces
genreCorpus = Corpus(VectorSource(genreVector))
#dtm = DocumentTermMatrix(genreCorpus, list(dictionary=genreNames))
dtm = DocumentTermMatrix(genreCorpus)
binaryGenreVector = inspect(dtm)
return(binaryGenreVector)
#return(data.frame(binaryGenreVector)) # convert binaryGenreVector to dataframe
}
directorBinary = convertToBinary(x$Director)
directorBinaryDF = as.data.frame(directorBinary)
See nograpes answer in
recommenderlab, Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
I got this error due to passing a data frame where a matrix was expected, and it looks like that's the same reason you are getting it. The solution in simple -- convert your data to a matrix before passing it to the Matrix function:
sparsecombined <- Matrix(as.matrix(combined1[1:10,]),sparse=TRUE)
In your case, this code will probably complain because you have some non-numeric data stored in there (e.g. the TailNum column). So you would need to downselect to just the numeric columns.