I was attempting an assignment and hit a problem with the dataset. As per the questions, we have to take the Duration, Amount, and Installment columns for analysis. I tried to normalize the data for these columns using scale() command, taking them into a seperate data frame. But, I get an error saying:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
I explored further to find that the dataset may not be purely numeric, although at the sight of it, it seems that all three columns are numeric. I used the is.numeric() command and got the result:
is.numeric(new_dataset)
[1] FALSE
Having gone this far, now I am stuck at how to convert the non-numeric data into numeric type, without having to replace all the values manually. I found some stuff on "as.numeric(levels(f)[f])", but wasn't able to understand how to apply it. I am getting error:
new_dataset_num<-as.numeric(levels(new_dataset[,1:3]))[new_dataset[,1:3]]
Error in as.numeric(levels(new_dataset[, 1:3]))[new_dataset[, 1:3]] :
invalid subscript type 'list'
Can you please help out with this?
Related
I am trying to fit a model with the SuperLearner package. However, I can't even get past the stage of playing with the package to get comfortable with it....
I use the following code:
superlearner<-SuperLearner::SuperLearner(Y=y, X=as.data.frame(data_train[1:30]), family =binomial(), SL.library = list("SL.glmnet"), obsWeights = weights)
y is a numeric vector of the same length as my dataframe "data_train", containing the correct labels with 9 different classes. The dataframe "data_train" contains 30 columns with numeric data.
When i run this, i get the Error:
Error in get(library$screenAlgorithm[s], envir = env) :
Objekt 'All' not found
I don't really know what the problem could be and i can't really wrap my head around the source code. Please note that the variable obsWeights in the function contains a numeric vector of the same length as my data with weights i calculated for the model. This shouldn't be the problem, as it doesn't work either way.
Unfortunately i can't really share my data on here, but maybe someone had this error before...
Thanks!
this seems to happen if you do not attach SuperLearner, you can fix via library(SuperLearner)
I am working with a dataframe from NYC opendata. On the information page it claims that a column, ACRES, is numeric, but when I download it is chr. I've tried the following:
parks$ACRES <- as.numeric(as.character(parks$ACRES))
which turned the column info type into dbl, but I was unable to take the mean, so I tried:
parks$ACRES <- as.integer(as.numeric(parks$ACRES))
I've also tried sapply() and I get an error message with NAs introduced by coercion. I tried convert() to but R didn't recognize it though it is supposed to be part of dplyr.
Either way I get NA as a result for the mean.
I've tried taking the mean a few different ways:
mean(parks[["ACRES"]])
mean(parks$ACRES)
Which also didn't work? Is it the dataframe? I'm wondering since it is from the government there are limits?
I'd appreciate any help.
You have NAs in your data. Either they were there before you converted or some of the data can't be converted to numeric directly (do you have comma separators for the 1000s in your input? Those need to be removed before converting to numeric).
Identifying why you have NAs and fixing if necessary is the first step you'll need to do. If the NAs are valid then what you want to do is to add the na.rm = TRUE parameter to the mean function which ignores NAs while calculating the mean.
Check to see how ACRES is being loaded in (i.e., what data type is it?). If it's being loaded in as a factor, you will have trouble changing a factor to a numerical value. The way to solve this is to use the 'stringsAsFactors = FALSE' argument in your read.csv or whatever function you're using to read in the data.
Trying to scale a dataset with 9 variables to be prepared for clustering. My data has headers (column names). It keeps giving me this response.
I have already excluded the rownames in the dataset
Warning message:
In dist(DF, method = "euclidean") : NAs introduced by coercion
View(DF)
Error in View : cannot coerce class ""dist"" to a data.frame
First of a comment to your question-style: add a snippet of data and take more time explaining the problem, and what you have tried already!
The error NAs introduced by coercion normally occurs, when conversion between datatypes failed (as the name suggests). Check your column for non-numeric elements (are letters included somewhere? Wrong Decimals?).
This great blog explains nicely where and why the problems occur and how to fix it! http://r-bio.github.io/02-data-frames/.
The only clue as to what I needed to fix the this error was to make the levels of the 3 features factors. I tried that but still doesn't work.
Then does the error saying my columns are not logical have anything to do with it? What does not logical mean in this case?
So an image of the error and what the data looks like for those columns is included here:
Solved
Found my problem!, the code for discretizing my columns created new variables, and didn't change the columns in my data set. So that is why I kept getting the error.
Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!
And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!