how to use scale and dist with a dataset with column name - r

Trying to scale a dataset with 9 variables to be prepared for clustering. My data has headers (column names). It keeps giving me this response.
I have already excluded the rownames in the dataset
Warning message:
In dist(DF, method = "euclidean") : NAs introduced by coercion
View(DF)
Error in View : cannot coerce class ""dist"" to a data.frame

First of a comment to your question-style: add a snippet of data and take more time explaining the problem, and what you have tried already!
The error NAs introduced by coercion normally occurs, when conversion between datatypes failed (as the name suggests). Check your column for non-numeric elements (are letters included somewhere? Wrong Decimals?).
This great blog explains nicely where and why the problems occur and how to fix it! http://r-bio.github.io/02-data-frames/.

Related

Convert characters in an existing dataframe into rownames in R (2018)

I tried the solution given here.
I'm having an identical problem, where my dataframe uses numbers instead of the first column as row names.
When I use the solution from 2012, I get the error:
Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length
In addition:
Warning message: Setting row names on a tibble is deprecated.
I saw there is another comment on the original post with the same error, so I think maybe it's an issue with a newer version of R? I would like to know why this error comes up and how I can set my rownames in a dataframe using a column of characters/strings. The end goal is to be able to use these rownames to label the points in a graph (specifically, a prcomp autoplot, if that's important).
I've tried other ways to set my rownames, including loading up the dataframe with just the last two columns while setting the rownames from another dataset, and trying to label plots after the fact with rownames from elsewhere. No matter what, I get the error that my row names are of the wrong length, even though the number of rows is identical.

ImpulseDE2, matrix counts contains non-integer elements

Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!
And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!

Converting Data Type from data.table package in R

this might be a dumb/obvious question but unfortunately I haven't had much luck finding information about it online so I thought I'd ask it here. Basically, I'm working with the data.table package in R and I have imported a data set into R where, in a particular column, the values can be both numeric values and character values (and even blank/empty values), and I want to be able to obtain a value from that column and use it for calculations.
The thing about the data.table package though is that when you import a file using the fread() function it automatically sets all values in that file as a character data type, so this can cause a few issues since this means that all numbers are automatically character types as well. I have worked around this slightly by using the as.numeric() function so that if a value obtained from that column is a number then it can be easily converted to numeric type and used in calculations. However, since the column also contains other characters (specifically, it can also have \N or N as values) and since it can also contain blank/empty values, then this means the as.numeric() function will show up with an error. For example, I initially wrote an IF loop to detect whether a column cell had a character value or a numeric value as follows:
if( as.numeric(..{Reference to column cell from file here}...) == NA ) {
x <- 0
}
(where x is just some variable), but it did not work and instead gave the output:
Error in if ((as.numeric(.... :
missing value where TRUE/FALSE needed
In addition: Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
(I should note that is.numeric() also did not work since all values in a data.table data set are automatically character values so this function always gives FALSE regardless of it's actual data type).
So clearly I need a better function or method to work around this. Is there a function capable of reading a 'character' value from a column and being able to detect whether that value is truly a numeric type or character type (or even neither, in the case of an empty cell)? Thanks in advance

How to use as.numeric(levels(f)[f])

I was attempting an assignment and hit a problem with the dataset. As per the questions, we have to take the Duration, Amount, and Installment columns for analysis. I tried to normalize the data for these columns using scale() command, taking them into a seperate data frame. But, I get an error saying:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
I explored further to find that the dataset may not be purely numeric, although at the sight of it, it seems that all three columns are numeric. I used the is.numeric() command and got the result:
is.numeric(new_dataset)
[1] FALSE
Having gone this far, now I am stuck at how to convert the non-numeric data into numeric type, without having to replace all the values manually. I found some stuff on "as.numeric(levels(f)[f])", but wasn't able to understand how to apply it. I am getting error:
new_dataset_num<-as.numeric(levels(new_dataset[,1:3]))[new_dataset[,1:3]]
Error in as.numeric(levels(new_dataset[, 1:3]))[new_dataset[, 1:3]] :
invalid subscript type 'list'
Can you please help out with this?

irr: Krippendorf's Alpha with non-numeric classifications (Warning Message)

I am trying to calculate Krippendorf's Alpha using the irr::kripp.alpha function.
My input data consists of non-numeric classifications (e.g., "1.a", "1.b" etc.). When using kripp.alpha() I get the following warning message
Warning message:
In kripp.alpha(as.matrix(p8)) : NAs introduced by coercion
It seems that the function -- nevertheless -- works fine.
Anyhow, I tried to get rid of the warning message by using the following procedure:
input <- data.frame(coder1=c("3.a","3.a","3.b.ii","3.b.ii","3.a","3.a","4.d","4.d"),
coder2=c("3.b","3.a","3.b.i","3.b.ii","3.a","3.a","4.d","4.d"))
# Gives the Warning Message
kripp.alpha(as.matrix(t(input)))
input <- unlist(input)
#Replace levels (strings) with unique numeric values
levels(input) <- 1:length(levels(input))
#Transform back into matrix format that kripp.alpha uses
input <- matrix(input, nrow=2)
kripp.alpha(input)
This works fine. However, it is quite cumbersome for such an easy task. Is there a simpler method? Can somebody explain why the function gives the warning message when using non-numeric classification values?
I just had contact with the package maintainer. The kripp.alpha() function expects numeric classifications as input. They may fix it in one of the next package updates.
So either you live with that warning message or you use the solution posted above.

Resources