Subset of a matrix as.numeric in R for log2 - r

I have a data matrix (data) of 54675 obs. of 170 variables. And I want to perform
data.matrix.2 <- log2(data[,9:ncol(data)])
i.e. for values from the 9th column and beyond. The 8 columns before that are characters. I get the following error
Error in Math.data.frame(data.matrix[, 9:ncol(data)]) :
non-numeric variable in data frame:
Is there a way to treat a subset of the matrix as.numeric for the the log transform.
Thanks

My first thought was that you had gotten a character matrix and needed:
as.numeric(data.matrix.2[ , -(1:8) ])
... but data.matrix() should coerce to 'numeric' mode. Oh, no, there you go. You weren't using the data.matrix function .... so it would be better not to use the name "data.matrix" since that is also the name of an R function.
You are properly using "[,]" so your assumptions about your data object are probably flawed. There must be a column of data that got created as factor or character in the remaining 162 columns. You need to run str(data.matrix) to see which one(s) it/they are.

Related

How to use 'as.factor' with 'apply'?

I tried to convert the categorical features in a dataset to factors. However, using apply with as.factor did not work:
convert <- c(2:5, 7:9,11,16:17)
read_file[,convert] <- data.frame(apply(read_file[convert], 2, as.factor))
However, switching to lapply did work:
read_file[,convert] <- data.frame(lapply(read_file[convert], as.factor))
Can someone explain to me what's the difference and why second code works while the first fails?
apply returns a matrix and a matrix cannot contain a factor variable. Factor variables are coerced to character variables if you create a matrix from them. The documentation in help("apply") says:
In all cases the result is coerced by as.vector to one of the basic
vector types before the dimensions are set, so that (for example)
factor results will be coerced to a character array.
lapply returns a list and a list can contain (almost) anything. In fact, a data.frame is just a list with some additional attributes. You don't even need to call data.frame there. You can just subset-assign a list into a data.frame.

Error: Must subset columns with a valid subscript vector. x Can't convert from <double> to <integer> due to loss of precision

I am working on a data frame with all variables of numeric type
summary.default(pfnew)
ID 6016315 -none- numeric
iterator 6016315 -none- numeric
value 6016315 -none- numeric
CV 6016315 -none- numeric
I want to create a pivot table grouped by iterator and CV and summarize the count of ID. In essence, I want number of points in the data frame corresponding to a particular set of iterator and CV value. The code I have used is:
Code
install.packages("tidyr")
install.packages("dplyr")
install.packages("vctrs")
library(vctrs)
library(tidyr)
library(dplyr)
allow_lossy_cast(pivot<-pfnew%>%
select(pfnew$iterator,pfnew$CV,pfnew$ID)%>%
summarise(CT=count(pfnew$ID)))
But as discussed in other forums even after using allow_lossy_cast, I am getting the same error message.
Error: Must subset columns with a valid subscript vector. x Can't convert from to due to loss of precision.
How can we resolve this? Or can we do the same job in any other manner?
I just came across the same error with a different dplyr function and realized that I included the name of the data frame after calling it. Try removing pfnew$ from select and summarise so it's select(c(iterator, CV,ID)).
select function throws error when you are using dataframe to call the predictors here. Try renaming the column name to a more suitable name in case it persists (using space in name will throw error if you donot use dataframe to call the predictor column, therefore avoid using spaces in your column name) and use the predictors name directly and this will resolve the issue.
Example - instead of pfnew$This is an example,
use pfnew$This_Is_an_example
and then directly use this name in select -
select(This_Is_an_example) %>%
....

Selecting unique values from single column of a data frame

I have a data frame consisting of five character variables which represent specific bacteria. I then have thousands of observations of each variable that all begin with the letter K. eg
x <- c(K0001,K0001,K0003,K0006)
y <- c(K0001,K0001,K0002,K0003)
z <- c(K0001,K0002,K0007,K0008)
r <- c(K0001,K0001,K0001,K0001)
o <- c(K0003,K0009,K0009,K0009)
I need to identify unique observations in the first column that don't appear in any of the remaining four columns. I have tried the approach suggested here which I think would work if I could create individual vectors using select ...
How to tell what is in one vector and not another?
but when I try to create a vector for analysis using the code ...
x <- select(data$x)
I get the error
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "character
I have tried to mutate the vectors using as.factor and as.numeric but neither of these approaches work as the first gives an equivalent error as above, and as.numeric returns NAs.
Thanks in advance
The reference that you cited recommended using setdiff. The only thing that you need to do to apply that solution is to convert the four columns into one, so that it can be treated as a set. You can do that with unlist
setdiff(data$x, unlist(data[,2:5]))
"K0006"

Change part of data frame to character, then to numeric in R

I have a simple problem. I have a data frame with 121 columns. columns 9:121 need to be numeric, but when imported into R, they are a mixture of numeric and integers and factors. Columns 1:8 need to remain characters.
I’ve seen some people use loops, and others use apply(). What do you think is the most elegant way of doing this?
Thanks very much,
Paul M
Try the following... The apply function allows you to loop over either rows, cols, or both, of a dataframe and apply any function, so to make sure all your columns from 9:121 are numeric, you can do the following:
table[,9:121] <- apply(table[,9:121],2, function(x) as.numeric(as.character(x)))
table[,1:8] <- apply(table[,1:8], 2, as.character)
Where table is the dataframe you read into R.
Briefly I specify in the apply function the table I want to loop over - in this case the subset of your table we want to make changes to, then we specify the number 2 to indicate columns, and finally give the name of the as.numeric or as.character functions. The assignment operator then replaces the old values in your table with the new ones of correct format.
-EDIT: Just changed the first line as I recalled that if you convert from a factor to a number, what you get is the integer of the factor level and not the number you think you are getting to factors first need to be converted to characters, then numbers, which was can do just by wrapping as.character inside as.numeric.
When you read in the table use strinsAsFactors=FALSE then there will not be any factors.

How to convert dataframe of mostly factors into numeric matrix; unlist is not working

I am having trouble turning my data.frame into a matrix format. Because I wanted to change my data.frame with mostly factor variables into a numeric matrix, I used the following code
UN2010frame <- data.matrix(lapply(UN2010, as.numeric))
However when I checked the mode of the UN2010frame, it still showed up as a list. Because the code I want to run (Ordrating) does not accept data in a list format, I used UN2010matrix <- unlist(UN2010frame) to unlist my matrix. When I did this, my first row ( which was formerly a row with column names) turned into NAs. This was a problem for me because when I tried to run an ordinal IRT model using this data set, I got the following error message.
> Error in 1:nrow(Y) : argument of
> length 0
I think it is because all the values in my first row are now gone.
If you could help me on any front, It would be deeply appreciated.
Thank you very much!
Haillie
First, the correct use of data.matrix is :
data.matrix(UN2010)
as it converts automatically to numeric. The lapply in your code is the first source for the error you get. You put a list in the data.matrix function, not a dataframe. So it returns a list of matrices, and not a matrix.
Second, unlist returns a vector, not a matrix. So pretty sure you won't find a "first row with NA", as you have a vector. Which might explain part of your confusion.
You probably have a character column somewhere. Converting this to numeric gives NA. If you don't want this, then exclude them from the further analysis. One possibility is to use colwise() from the plyr package to convert only the factors:
colwise(as.numeric,is.factor)(UN2010)
Which returns a dataframe with only the factors. This can be easily converted by data.matrix() or as.matrix(). Alternatively you use the base solution :
id <- sapply(UN2010,is.character)
sapply(UN2010[!id],as.numeric)
which will return you a matrix with all non-character columns converted to numeric.If you really want to keep the dataframe with all original columns, you can do :
UN2010frame <- UN2010
UN2010frame[!id] <- lapply(UN2010[!id],as.numeric)
Toy example code :
UN2010 <- data.frame(
F1 = factor(rep(letters[1:3],10)),
F2 = factor(rep(letters[5:10],5)),
Char = rep(letters[11:16],each=5),
Num = 1:30,
stringsAsFactors=FALSE
)
Try as.data.frame instead of data.matrix.

Resources