R- Looping through sets of data - r

I want to fill up certain values in a matrix by indexing a vector. It should be a simple loop
mat1[ i, as.numeric(index_vec[i]) ] = data[i,"price"]
I believe that is the only command I need for the loop because it fills the first row of the matrix properly if I put 1's where all of the i's are. Does anyone know very basic loops in R? I could be wrong, but I think its just a matter of syntax.

It's not an RStudio question but rather an R question. We would need to know the dimensions of mat1 and data and the lengths of index_vec to know if this makes any sense. It appears you may be coming from another language where everything is done with for-loops using indices. That's not always the best way to work with R. If the length of index_vec is the same as the number of rows of data and the values of as.numeric(index_vec) are above 1 and at or below the number of columns of mat1, then a modified version the suggestion above to use:
mat1[ 1 , as.numeric(index_vec) ] <- data[ ,"price"]
... should succeed as a column to row assignment. The lengths on the RHS need to equal the number of values assigned on the LHS. If my guess was wrong about the nature of index_vec and it's only a single number, then perhaps a column to column assignment:
mat1[ , as.numeric(index_vec) ] <- data[ ,"price"]
Then there is a third possibility as well. If you index_vec is a set of column locations and the implicit row locations go from 1 to length(index_vec) hten you could do this:
mat1[ cbind( seq_along(index_vec) , as.numeric(index_vec) ) ] <- data[ ,"price"]

Related

R: Check for finite values in DataFrame

I need to check whether data frame is "empty" or not ("empty" in a sense that dataframe contain zero finite value. If there is mix of finite and non-finite value, it should NOT be considered "empty")
Referring to How to check a data.frame for any non-finite, I came up with one line code to almost achieve this objective
nrow(tmp[rowSums(sapply(tmp, function(x) is.finite(x))) > 0,]) == 0
where tmp is some data frame.
This code works fine for most cases, but it fails if data frame contains a single row.
For example, the above code would work fine for,
tmp <- data.frame(a=c(NA,NA), b=c(NA,NA)) OR tmp <- data.frame(a=c(3,NA), b=c(4,NA))
But not for,
tmp <- data.frame(a=NA, b=NA)
because I think rowSums expects at least two rows
I looked at some other posts such as https://stats.stackexchange.com/questions/6142/how-to-calculate-the-rowmeans-with-some-single-rows-in-data, but I still couldn't come up a solution for my problem.
My question is, are there any clean ways (i.e. avoid using loops and ideally one liner) to check for being "empty" for any dataframes?
Thanks
If you are checking all columns, then you can just do
all(sapply(tmp, is.finite))
Here we are using all rather than the rowSums trick so we don't have to worry about preserving matrices.

Select columns with specific value in R

I have the following problem within R:
I'm working with a huge matrix. Some of the columns contain the value 'zero', which leads to problems during my further work.
Hence, I want to identify the columns, which contain at least one value of 'zero'.
Any ideas how to do it?
If you have a big matrix then this would be probably faster than an apply solution:
mat[,colSums(mat==0)<0.5]
lets say your matrix is called x,
x = matrix(runif(300), nrow=10)
to get the indices of the columns that have at least 1 zero:
ix = apply(x, MARGIN=2, function(col){any(col==0)})

R: assigning value from one column to another

This probably has a very simple answer, but I'm having trouble figuring it out...
What is a vector-based way to take one value in the cell of one column in a dataframe, conditional on some criterion in a given row being satisfied, and assign it to a cell along the same row but in a different column? I've done it with loops over if-else statements, but I'm working with pretty big data sets, and my little laptop freezes for many minutes going through the looping conditionals.
Eg. if I have sometihng like this:
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <- 1
that works fine. But what doesn't work is something like
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <- Results$TCorrect
In that case I get a warning saying, "number of items to replace is not a multiple of replacement length", which I basically take to mean that it can't figure out which cell of the Results$Subject column to take.
Since your problem statement implies that all these are in the same data frame you may want:
Results$TResponseCorrect[Results$rownum %in% CorrectTs$rownum] <-
Results$TCorrect[Results$rownum %in% CorrectTs$rownum]
It will then have the same number of items on the LHS and the RHS of the assignment.

Matrix turned to class(character) when removing NA values

Reproducible example below. I have a simulation loop, within which I occasionally have rows I need to remove from a matrix. I have done this by entering an 'NA' value into the row I need to remove in a specific position, and then I have a line of code to remove any line with an NA. This has worked great so far. My issue is, I am now running simulations in a certain way that occasionally whittles my matrix down to a single row. Then this occurs, the matrix gets transformed into a 'character', and crashes the simulation.
Example:
mat<-matrix(1:10,5,2) #setting up a simplified example matrix
mat[3:5,1]<-NA #Giving 3 rows 'NA' values, for removal of these rows
mat<-mat[!is.na(mat[,1]),] #An example where my procedure works just fine
class(mat)
mat[2,1]<-NA #Setting 1 of the remaining 2 rows as NA
mat<-mat[!is.na(mat[,1]),] #Removing one of final two rows
class(mat) #No longer a matrix
Is there some way I can do this, where I don't lose my formatting as a matrix at the end? I am assuming this issue is coming from my use of the "is.na" command, but I haven't found a good way around using this.
To give a bit more insight into the issue, in case there is a MUCH better way to do this I am too naive to have found yet... In my real-life simulation, I have a column in the matrix that holds a '1' when the individual in the given row is alive, and a '0' when dead. When an individual (a single row) dies, (and the value goes from a '1' to a '0'), I need to remove the row. The only way I knew how to do this was to change the '0' to an 'NA' and then remove all rows with an NA. If there is a way to just remove the rows with a '0' in a specific column that avoids this issue, that would be great!
By default, the [ function coerces the output into the lowest possible dimension. In your example, you have a two dimensional array (a matrix): when extracting a single row, it is coerced into a vector of characters.
To avoid that, have a look at the drop option to the [ function. You should be doing:
mat <- mat[!is.na(mat[,1]),, drop = FALSE]

Two data formatting questions for R

I have two questions, both are pretty simple I believe dealing with R.
I would like to create a IF statement that will assign a NA value to certain rows in a column. I have tried the following command:
a[a[,21]==0,5:10] <-NA
the error says:
Error in [<-.data.frame(tmp, a[, 21] == 0, 5:20, value = NA) : missing values are not allowed in subscripted assignments of data frames
Essentially that code is supposed to take any 0 value in column 21, and replace the values for that row from columns 5 to 10 to NA. There are NA's in column 21 already, but I am not sure whether that does anything?
I am not sure how to craft this next function at all. I need to manipulate data that contains positive and negative controls. However, when I manipulate the data, I don't want the positive and negative control values to be apart of the manipulation, but I want the positive and negative controls to remain in the columns because I have to use them later. Is there anyway to temporarily ignore these values so they aren't included in the manipulation?
Here sample data:
L = c(2,1,4,3,1,4,2,4,5,1)
R = c(2,4,5,1,"Neg",2,"",1,2,1)
T = c(2,1,4,2,"CTRL",2,"PCTRL",2,1,4)
test <- data.frame(L=L,R=R,T=T)
I would like to be able to temporarily ignore these rows based on the characters "Neg" "CTRL"/"" "PCTRL" rather than the position of them in the data frame if possible. Notice how for negative control, Neg and CTRL are in separate columns, same row, just like positive control where there is a blank and PCTRL in separate columns yet same rows. Any way to do this given these odd conditions?
Hope this was written clearly enough, and I thank anyone in advance for taking the time to help me!
Try this for subsetting your dataframe to those rows where R is not "Neg":
subset(test, R!="Neg")
For the NA problem, you probably already have NAs in your data frame, right? Try if this works:
a[a[,21] %in% 0, 5:10] <- NA
Try instead:
a[ which(a[,21]==0), 5:10] <-NA
Explanation: the == operation is returning NA values and the [<- function doesn't accept them. The which function will return a numeric vector and "throw away the NA's". As an aside, the [ function (without the '<-') will return all NA rows. This is considered a 'feature', but I find it to be an 'annoyance', so I will typically use which for selection as well as for selective-assignment.
For the first problem: if a[,21] is negative, do you want to assign NA? In this case,
a[replace(a[,21],is.na(a[,21]),0)==0,5:10] <- NA
Otherwise (note that I replaced replacement value of "0" with something nonzero ("1" used here but doesn't really matter as long as it's not zero),
a[replace(a[,21],is.na(a[,21]),1)==0,5:10] <- NA
As for the second problem,
subset(test,! (L %in% c("Neg","") | T %in% c("CTRL","PCTRL")))
In case the filtering conditions in L and T are not always coinciding. If they always coincide, then you can just apply test to one of L or T. Also, you may also want to keep in mind that T used to stand for TRUE in S, S-PLUS, and R (still does); you can reassign another value to T and things will be okay but I believe it's generally discouraged (same for c, which people also like to assign to).

Resources