Matrix turned to class(character) when removing NA values - r

Reproducible example below. I have a simulation loop, within which I occasionally have rows I need to remove from a matrix. I have done this by entering an 'NA' value into the row I need to remove in a specific position, and then I have a line of code to remove any line with an NA. This has worked great so far. My issue is, I am now running simulations in a certain way that occasionally whittles my matrix down to a single row. Then this occurs, the matrix gets transformed into a 'character', and crashes the simulation.
Example:
mat<-matrix(1:10,5,2) #setting up a simplified example matrix
mat[3:5,1]<-NA #Giving 3 rows 'NA' values, for removal of these rows
mat<-mat[!is.na(mat[,1]),] #An example where my procedure works just fine
class(mat)
mat[2,1]<-NA #Setting 1 of the remaining 2 rows as NA
mat<-mat[!is.na(mat[,1]),] #Removing one of final two rows
class(mat) #No longer a matrix
Is there some way I can do this, where I don't lose my formatting as a matrix at the end? I am assuming this issue is coming from my use of the "is.na" command, but I haven't found a good way around using this.
To give a bit more insight into the issue, in case there is a MUCH better way to do this I am too naive to have found yet... In my real-life simulation, I have a column in the matrix that holds a '1' when the individual in the given row is alive, and a '0' when dead. When an individual (a single row) dies, (and the value goes from a '1' to a '0'), I need to remove the row. The only way I knew how to do this was to change the '0' to an 'NA' and then remove all rows with an NA. If there is a way to just remove the rows with a '0' in a specific column that avoids this issue, that would be great!

By default, the [ function coerces the output into the lowest possible dimension. In your example, you have a two dimensional array (a matrix): when extracting a single row, it is coerced into a vector of characters.
To avoid that, have a look at the drop option to the [ function. You should be doing:
mat <- mat[!is.na(mat[,1]),, drop = FALSE]

Related

R- Looping through sets of data

I want to fill up certain values in a matrix by indexing a vector. It should be a simple loop
mat1[ i, as.numeric(index_vec[i]) ] = data[i,"price"]
I believe that is the only command I need for the loop because it fills the first row of the matrix properly if I put 1's where all of the i's are. Does anyone know very basic loops in R? I could be wrong, but I think its just a matter of syntax.
It's not an RStudio question but rather an R question. We would need to know the dimensions of mat1 and data and the lengths of index_vec to know if this makes any sense. It appears you may be coming from another language where everything is done with for-loops using indices. That's not always the best way to work with R. If the length of index_vec is the same as the number of rows of data and the values of as.numeric(index_vec) are above 1 and at or below the number of columns of mat1, then a modified version the suggestion above to use:
mat1[ 1 , as.numeric(index_vec) ] <- data[ ,"price"]
... should succeed as a column to row assignment. The lengths on the RHS need to equal the number of values assigned on the LHS. If my guess was wrong about the nature of index_vec and it's only a single number, then perhaps a column to column assignment:
mat1[ , as.numeric(index_vec) ] <- data[ ,"price"]
Then there is a third possibility as well. If you index_vec is a set of column locations and the implicit row locations go from 1 to length(index_vec) hten you could do this:
mat1[ cbind( seq_along(index_vec) , as.numeric(index_vec) ) ] <- data[ ,"price"]

R matrix and data.frame mix-up

I have a data.frame in R with columns that also have column names.
I have another data.frame with 0s and -1s that controls which columns to use from the first data.frame in a subsequent analysis.
I now ran into an issue that I cannot wrap my head around.
First of all, the "offending" line of code is:
covar.data<-covar.data[,!onoff]
FYI I have confirmed both covar.data and onoff are data.frames.
When I run this with onoff selecting 2 or more columns, everything is fine, and the resulting covar.data is still a data.frame - and this is important, because I need to use the column names in the rest of my analysis.
However, if I have onoff selecting only 1 column, covar.data turns into a matrix!! This is a problem, because the column name also disappears!
I tried
covar.data<-as.data.frame(covar.data[,!onoff])
and
covar.data<-as.data.frame(covar.data[,!onoff], col.names=TRUE)
but that didn't make a difference in the disappearance of the column name.
I don't understand why R decides to turn the data.frame into a matrix (only for the times I am left with one column), and I cannot figure out how to preserve the data.frame PLUS the column names.
If you select a single column of a data.frame, R assumes you want to extract that data as a vector rather than returning another data.frame (and in most cases this is exactly the behavior you want). But if you do want to keep that single column as a data.frame, then you should do
covar.data[,!onoff, drop=F]

skip NA's when computing dot product

I am adjusting the measurements in a data matrix by subtracting their projections onto the first 1-2 principal components. The problem is, if there is even a single NA in the data matrix (almost inevitable for thousands of measurements), the inner product operation x%*%y (I also tried sum(x*y), for vectors x,y) returns NA. Is there a simple way (i.e. avoiding conditional statements and loops) of computing the inner product on the non-NA values, so that the operation actually returns something?
Incidentally, I would like to avoid just replacing NA's with 0's, since then I would have to renormalize the vectors at each stage.
You can try this command:
sum(x*y, na.rm = TRUE)

R: assigning value from one column to another

This probably has a very simple answer, but I'm having trouble figuring it out...
What is a vector-based way to take one value in the cell of one column in a dataframe, conditional on some criterion in a given row being satisfied, and assign it to a cell along the same row but in a different column? I've done it with loops over if-else statements, but I'm working with pretty big data sets, and my little laptop freezes for many minutes going through the looping conditionals.
Eg. if I have sometihng like this:
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <- 1
that works fine. But what doesn't work is something like
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <- Results$TCorrect
In that case I get a warning saying, "number of items to replace is not a multiple of replacement length", which I basically take to mean that it can't figure out which cell of the Results$Subject column to take.
Since your problem statement implies that all these are in the same data frame you may want:
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <-
Results$TCorrect[Results$rownum %in% CorrectTs$rownum]
It will then have the same number of items on the LHS and the RHS of the assignment.

Two data formatting questions for R

I have two questions, both are pretty simple I believe dealing with R.
I would like to create a IF statement that will assign a NA value to certain rows in a column. I have tried the following command:
a[a[,21]==0,5:10] <-NA
the error says:
Error in [<-.data.frame(tmp, a[, 21] == 0, 5:20, value = NA) : missing values are not allowed in subscripted assignments of data frames
Essentially that code is supposed to take any 0 value in column 21, and replace the values for that row from columns 5 to 10 to NA. There are NA's in column 21 already, but I am not sure whether that does anything?
I am not sure how to craft this next function at all. I need to manipulate data that contains positive and negative controls. However, when I manipulate the data, I don't want the positive and negative control values to be apart of the manipulation, but I want the positive and negative controls to remain in the columns because I have to use them later. Is there anyway to temporarily ignore these values so they aren't included in the manipulation?
Here sample data:
L = c(2,1,4,3,1,4,2,4,5,1)
R = c(2,4,5,1,"Neg",2,"",1,2,1)
T = c(2,1,4,2,"CTRL",2,"PCTRL",2,1,4)
test <- data.frame(L=L,R=R,T=T)
I would like to be able to temporarily ignore these rows based on the characters "Neg" "CTRL"/"" "PCTRL" rather than the position of them in the data frame if possible. Notice how for negative control, Neg and CTRL are in separate columns, same row, just like positive control where there is a blank and PCTRL in separate columns yet same rows. Any way to do this given these odd conditions?
Hope this was written clearly enough, and I thank anyone in advance for taking the time to help me!
Try this for subsetting your dataframe to those rows where R is not "Neg":
subset(test, R!="Neg")
For the NA problem, you probably already have NAs in your data frame, right? Try if this works:
a[a[,21] %in% 0, 5:10] <- NA
Try instead:
a[ which(a[,21]==0), 5:10] <-NA
Explanation: the == operation is returning NA values and the [<- function doesn't accept them. The which function will return a numeric vector and "throw away the NA's". As an aside, the [ function (without the '<-') will return all NA rows. This is considered a 'feature', but I find it to be an 'annoyance', so I will typically use which for selection as well as for selective-assignment.
For the first problem: if a[,21] is negative, do you want to assign NA? In this case,
a[replace(a[,21],is.na(a[,21]),0)==0,5:10] <- NA
Otherwise (note that I replaced replacement value of "0" with something nonzero ("1" used here but doesn't really matter as long as it's not zero),
a[replace(a[,21],is.na(a[,21]),1)==0,5:10] <- NA
As for the second problem,
subset(test,! (L %in% c("Neg","") | T %in% c("CTRL","PCTRL")))
In case the filtering conditions in L and T are not always coinciding. If they always coincide, then you can just apply test to one of L or T. Also, you may also want to keep in mind that T used to stand for TRUE in S, S-PLUS, and R (still does); you can reassign another value to T and things will be okay but I believe it's generally discouraged (same for c, which people also like to assign to).

Resources