Select columns with specific value in R - r

I have the following problem within R:
I'm working with a huge matrix. Some of the columns contain the value 'zero', which leads to problems during my further work.
Hence, I want to identify the columns, which contain at least one value of 'zero'.
Any ideas how to do it?

If you have a big matrix then this would be probably faster than an apply solution:
mat[,colSums(mat==0)<0.5]

lets say your matrix is called x,
x = matrix(runif(300), nrow=10)
to get the indices of the columns that have at least 1 zero:
ix = apply(x, MARGIN=2, function(col){any(col==0)})

Related

Switching the order of columns of a matrix in R

Suppose I generate the following fictional matrix
mat <-matrix(1:12,3)
Now I would like to rearrange the order of the columns from 1:4 to 4:1
Manually I could do this by.
Z <- cbind(mat[,4],mat[,3],mat[,2],mat[,1])
Now when the matrix becomes large with for example 30 columns, doing this manually will be a tedious process.
Does anyone have a suggestion to rewrite the order of the columns with for example a loop?
We can use indexing i.e. create a sequence (:) from the last column index - ncol(mat) to 1 and use that as column index
mat[, ncol(mat):1]
Or with rev
mat[, rev(seq_len(ncol(mat)))]

R: Check for finite values in DataFrame

I need to check whether data frame is "empty" or not ("empty" in a sense that dataframe contain zero finite value. If there is mix of finite and non-finite value, it should NOT be considered "empty")
Referring to How to check a data.frame for any non-finite, I came up with one line code to almost achieve this objective
nrow(tmp[rowSums(sapply(tmp, function(x) is.finite(x))) > 0,]) == 0
where tmp is some data frame.
This code works fine for most cases, but it fails if data frame contains a single row.
For example, the above code would work fine for,
tmp <- data.frame(a=c(NA,NA), b=c(NA,NA)) OR tmp <- data.frame(a=c(3,NA), b=c(4,NA))
But not for,
tmp <- data.frame(a=NA, b=NA)
because I think rowSums expects at least two rows
I looked at some other posts such as https://stats.stackexchange.com/questions/6142/how-to-calculate-the-rowmeans-with-some-single-rows-in-data, but I still couldn't come up a solution for my problem.
My question is, are there any clean ways (i.e. avoid using loops and ideally one liner) to check for being "empty" for any dataframes?
Thanks
If you are checking all columns, then you can just do
all(sapply(tmp, is.finite))
Here we are using all rather than the rowSums trick so we don't have to worry about preserving matrices.

R- Looping through sets of data

I want to fill up certain values in a matrix by indexing a vector. It should be a simple loop
mat1[ i, as.numeric(index_vec[i]) ] = data[i,"price"]
I believe that is the only command I need for the loop because it fills the first row of the matrix properly if I put 1's where all of the i's are. Does anyone know very basic loops in R? I could be wrong, but I think its just a matter of syntax.
It's not an RStudio question but rather an R question. We would need to know the dimensions of mat1 and data and the lengths of index_vec to know if this makes any sense. It appears you may be coming from another language where everything is done with for-loops using indices. That's not always the best way to work with R. If the length of index_vec is the same as the number of rows of data and the values of as.numeric(index_vec) are above 1 and at or below the number of columns of mat1, then a modified version the suggestion above to use:
mat1[ 1 , as.numeric(index_vec) ] <- data[ ,"price"]
... should succeed as a column to row assignment. The lengths on the RHS need to equal the number of values assigned on the LHS. If my guess was wrong about the nature of index_vec and it's only a single number, then perhaps a column to column assignment:
mat1[ , as.numeric(index_vec) ] <- data[ ,"price"]
Then there is a third possibility as well. If you index_vec is a set of column locations and the implicit row locations go from 1 to length(index_vec) hten you could do this:
mat1[ cbind( seq_along(index_vec) , as.numeric(index_vec) ) ] <- data[ ,"price"]

issue summing columns

I have a very large dataset and I'm trying to get the sums of values. The variables are binary with 0s and 1s.
Somehow, when I run a for loop
for (i in 7:39){
agegroup1[53640, i]<-sum(agegroup1[, i])
}
The loop processes but everything but the first column would contain nothing but just NA. I tried calling the values up and would see 0 and 1s, as well as checking the class (it returns "integer"). But when adding it all up, R does not work.
Any advice?
cs <- colSums(agegroup1[, 7:39])
will give you the vector of column sums without looping (at the R level).
If you have any missing values (NAs) in agegroup1[, 7:39] then you may want to add na.rm = TRUE to the colSums() call (or even your sum() call).
You don't say what agegroup1 is or how many rows it has etc, but to finalise what your loop is doing, you then need
agegroup1[53640, 7:39] <- cs
What was in agegroup1[53640, ] before you started adding the column sums? NA? If so that would explain some behaviour.
We do really need more detail though...
#Gavin Simpson provided a workable solution but alternatively you could use apply. This function allows you to apply a function to the row or column margin.
x <- cbind(x1=1, x2=c(1:8), y=runif(8))
# If you wanted to sum the rows of columns 2 and 3
apply(x[,2:3], 1, sum, na.rm=TRUE)
# If you want to sum the columns of columns 2 and 3
apply(x[,2:3], 2, sum, na.rm=TRUE)

Row/column counter in 'apply' functions

What if one wants to apply a functon i.e. to each row of a matrix, but also wants to use as an argument for this function the number of that row. As an example, suppose you wanted to get the n-th root of the numbers in each row of a matrix, where n is the row number. Is there another way (using apply only) than column-binding the row numbers to the initial matrix, like this?
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
t(apply(cbind(as.numeric(rownames(test)),test),1,function(x) x[2:3]^(1/x[1])))
P.S. Actually if test was really a matrix : test <- matrix(c(26,21,20,34,29,28),nrow=3) , rownames(test) doesn't help :(
Thank you.
What I usually do is to run sapply on the row numbers 1:nrow(test) instead of test, and use test[i,] inside the function:
t(sapply(1:nrow(test), function(i) test[i,]^(1/i)))
I am not sure this is really efficient, though.
If you give the function a name rather than making it anonymous, you can pass arguments more easily. We can use nrow to get the number of rows and pass a vector of the row numbers in as a parameter, along with the frame to be indexed this way.
For clarity I used a different example function; this example multiplies column x by column y for a 2 column matrix:
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
myfun <- function(position, df) {
print(df[position,1] * df[position,2])
}
positions <- 1:nrow(test)
lapply(positions, myfun, test)
cbind()ing the row numbers seems a pretty straightforward approach. For a matrix (or a data frame) the following should work:
apply( cbind(1:(dim(test)[1]), test), 1, function(x) plot(x[-1], main=x[1]) )
or whatever you want to plot.
Actually, in the case of a matrix, you don't even need apply. Just:
test^(1/row(test))
does what you want, I think. I think the row() function is the thing you are looking for.
I'm a little confuse so excuse me if I get this wrong but you want work out n-th root of the numbers in each row of a matrix where n = the row number. If this this the case then its really simple create a new array with the same dimensions as the original with each column having the same values as the corresponding row number:
test_row_order = array(seq(1:length(test[,1]), dim = dim(test))
Then simply apply a function (the n-th root in this case):
n_root = test^(1/test_row_order)

Resources