I would like to know how to remove rows from a data frame that have fewer than (let's say 5) non-zero entries.
The closest I've come is:
length(which(df[1,] > 0)) >= 5
but how to apply this to the whole data frame and drop the ones that are FALSE? Is there a function similar to the COUNTIF() function in excel that I can apply here?
Thank you for your help.
You can use boolean values in rowSums and in [:
df[ rowSums(df > 0) >= 5, ]
There are 3 steps hidden in this expression:
expression df > 0 produces a matrix with values TRUE where element > 0
Function rowSums returns number of nonzero elements for every line (when summing it treats values TRUE as 1 and FALSE as 0)
finally [ selects only lines where the number of non-zero elements >= 5
You can also use a for-loop.
We first create a matrix of zero's and one's to test our code. Row 2 has to be excluded because it has less than 5 non-zero values.
In the loop we count the number of non-zero values per row, and assign TRUE if this is less than 5 (FALSE otherwise). The vector named 'drop' holds the information for which row is TRUE then FALSE. In the final step, we exclude those rows for which drop==TRUE.
mat <- matrix(c(1,1,1,1,0,1,1,1,1,1,1,1,1,1,1), nrow=3, ncol=5)
mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 0 1 1 1
[3,] 1 1 1 1 1
drop <- NULL
for(i in 1:NROW(mat)){
count.non.zero <- sum(mat[i,]!=0, na.rm=TRUE)
drop <- c(drop, count.non.zero<5)
}
mat[!drop==TRUE,]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 1 1 1 1
NOTE: na.rm==TRUE allows this script to work when your data contains missing values.
Related
Suppose I have a list of matrices. Suppose further I have found the smallest values by the column.
Here is my last question
I really need to know from which matrix each smallest value is selected. My original function is very complicated. Therefore, I provided a simple example. I have one idea and really do not know to implement it correctly in R.
My idea is:
Suppose that [i,j] is the elements of the matrix. Then,
if(d[[1]][i,j] < d[[2]][i,j]){
d[[1]][i,j] <– "x"
}else { d[[2]][i,j] <– "z"}
So, I would like to sign the name of the matrix that corresponds to each smallest value. Then, store the names in a separate matrix. So, then I can see the values in one matrix and their corresponding names (from where they come from) in another matrix
For example,
y <- c(3,2,4,5,6, 4,5,5,6,7)
x[lower.tri(x,diag=F)] <- y
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 3 0 0 0 0
[3,] 2 6 0 0 0
[4,] 4 4 5 0 0
[5,] 5 5 6 7 0
k <- c(1,4,5,2,5,-4,4,4,4,5)
z[lower.tri(z,diag=F)] <- k
> z
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 1 0 0 0 0
[3,] 4 5 0 0 0
[4,] 5 -4 4 0 0
[5,] 2 4 4 5 0
d <- list(z, x)
Then:
do.call(pmin, d) (answered by #akrun)
Then, I will only get the matrix with smallest values. I would like to know where each value is come from?
Any idea or help, please?
You can use Map and do.call to create your own functions that will be applied element-wise to a list of inputs,
in your case a list of matrices.
pwhich.min <- function(...) {
which.min(c(...)) # which.min takes a single vector as input
}
di <- unlist(do.call(Map, c(list(f = pwhich.min), d)))
dim(di) <- dim(x) # take dimension from one of the inputs
di
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 1 1 1 1
[3,] 1 2 1 1 1
[4,] 1 2 2 1 1
[5,] 2 2 2 2 1
EDIT:
To elaborate,
you could do something like Map(f = min, z, x) to apply min to each pair of values in z and x,
although in that case min already supports arbitrary amount of inputs through an ellipsis (...).
By contrast,
which.min only takes a single vector as input,
so you need a wrapper with an ellipsis that combines all values into a vector
(pwhich.min above).
Since you may want to have more than two matrices,
you can put them all in a list,
and use do.call to put each element in the list as a parameter to the function you specify in f.
Or another option would be to convert it to a 3D array and use apply with which.min
apply(array(unlist(d), c(5, 5, 2)), c(1, 2), which.min)
Or with pmap from purrr
library(purrr)
pmap_int(d, ~ which.min(c(...))) %>%
array(., dim(x))
I would like to replace the loops in the following code.
Test<-function(j){
card<-5
#matrix s is to hold the results
s <- matrix(rep(0,j*card),nrow=j,ncol=card,byrow=TRUE)
# Loop1
for (k in 1:j)
{
#A vector should be drawn from another matrix,
#for simplicity, I define a vector "sol" to be modified in Loop2
sol<-rep(1,card)
#Given the vector "sol", select a vector position randomly
#for a given no. of times (i.e. steps), say 10.
step<-10
# Loop2 - Modify value in sol
for (i in seq_len(step))
{
#Draw a position
r<-sample(seq_len(card),1)
#Each position has specific probabilities for
#assignment of possible values, meaning p is related to
#the position.
#For simplicity, just define the probabilities by random here.
p<-runif(3,0,1) # just create p for each step
p<-p/sum(p) #
#Finally, draw a value for the selected position and
#value of sol within this loop is kept changing.
sol[r]<-sample(1:3,1,prob=p)
}
# keep the result in matrix s.
s[k,]<-sol }
return(s)}
Given an input vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
It is expected to output a matrix like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 3 2 3
[2,] 1 1 1 1 3
[3,] 2 2 2 2 3
[4,] 2 1 2 2 1
[5,] 1 1 3 1 1
Each step in Loop2 depends on a probability vector, which is then used to change value in the sol. Then I tried to replace Loop2 with sapply as follows:
sapply(seq_len(steps), function(x){
r<-runif(seq_len(card),1)
sol[r]<-sample(1:3,1,prob=p) #Try to modify value in sol
})
s[k,]<-sol #Actually, no change in sol.
However, values in sol has no changed keeping all 1s, i.e. 1,1,1,1,1.
How can Loop2 be replaced by other apply family or other functions?
Thank you.
If I understand correctly what you're trying to achieve, you don't need apply() functions for this:
Test <- function(j) {
card <- 5
p<-runif(3,0,1)
p<-p/sum(p)
out <- matrix(sample(1:3, j*card, replace=T, prob=p), ncol=card, nrow=j)
return(out)
}
Test(5)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 2 2 1 1
[2,] 1 2 3 2 2
[3,] 2 3 1 1 2
[4,] 1 2 1 2 1
[5,] 2 1 1 2 2
In order to refactor this function, notice that all the r <- sample(card,1) are independent draws from the multinomial distribution. This can be pulled out of the loop.
The second thing to note is that the conditional distribution of s[i,j] given r is 1 if the multinomial draw is zero, otherwise it is sample(3,1,prob=runif(3)). (The distribution does not change if a cell is selected repeatedly).
Put those two facts together, and we have this:
Test2 <- function(j,card=5,step=10) {
r <- t(rmultinom(j,step,rep(1,card)))
s <- apply(r, 1:2, function(x) if(x > 0) sample(3,1,prob=runif(3)) else 1)
return(s)
}
What about that:
test2 <- function(j) {
card <- 5
# Create a matrix where each of the j*card row is a p as defined in your original function.
p <- matrix(runif(3*j*card), ncol=3)
p <- t(apply(p, 1, function(x) x/sum(x)))
# For each row of p, draw a single value at random
draws <- apply(p, 1, function(x) sample(1:3, 1, prob=x))
# Format the output as a j*card matrix
out <- matrix(draws, ncol=card, byrow=TRUE)
return(out)
}
If test2() does what you want, it's roughly 300 times faster than Test() on my machine.
How can I create a quadratic band matrix, where I give the diagonal and the first diagonal below and above the diagonal? I am looking for a function like
tridiag(upper, lower, main)
where length(upper)==length(lower)==length(main)-1 and returns, for example,
tridiag(1:3, 2:4, 3:6)
[,1] [,2] [,3] [,4]
[1,] 3 1 0 0
[2,] 2 4 2 0
[3,] 0 3 5 3
[4,] 0 0 4 6
Is there an efficient way to do it?
This function will do what you want:
tridiag <- function(upper, lower, main){
out <- matrix(0,length(main),length(main))
diag(out) <- main
indx <- seq.int(length(upper))
out[cbind(indx+1,indx)] <- lower
out[cbind(indx,indx+1)] <- upper
return(out)
}
Note that when the index to a matrix is a 2 column matrix, each row in that index is interpreted as the row and column index for a single value in the vector being assigned.
As fast as possible, I would like to replace the first zeros in some rows of a matrix with values stored in another vector.
There is a numeric matrix where each row is a vector with some zeros.
I also have two vectors, one containing the rows, in what to be replaced, and another the new values: replace.in.these.rows and new.values. Also, I can generate the vector of first zeroes with sapply
mat <- matrix(1,5,5)
mat[c(1,8,10,14,16,22,14)] <- 0
replace.in.these.rows <- c(1,2,3)
new.values <- c(91,92,93)
corresponding.poz.of.1st.zero <- sapply(replace.in.these.rows,
function(x) which(mat [x,] == 0)[1] )
Now I would like something that iterates over the index vectors, but without a for loop possibly:
matrix[replace.in.these.rows, corresponding.poz.of.the.1st.zero ] <- new.values
Is there a trick with indexing more than simple vectors? It could not use list or array(e.g.-by-column) as index.
By default R matrices are a set of column vectors. Do I gain anything if I store the data in a transposed form? It would mean to work on columns instead of rows.
Context:
This matrix stores contact ID-s of a network. This is not an adjacency matrix n x n, rather n x max.number.of.partners (or n*=30) matrix.
The network uses edgelist by default, but I wanted to store the "all links from X" together.
I assumed, but not sure if this is more efficient than always extract the information from the edgelist (multiple times each round in a simulation)
I also assumed that this linearly growing matrix form is faster than storing the same information in a same formatted list.
Some comments on these contextual assumptions are also welcome.
Edit: If only the first zeros are to be replace then this approach works:
first0s <-apply(mat[replace.in.these.rows, ] , 1, function(x) which(x==0)[1])
mat[cbind(replace.in.these.rows, first0s)] <- new.values
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 91 1 1 0 1
[2,] 1 1 1 1 92
[3,] 1 93 1 1 1
[4,] 1 1 0 1 1
[5,] 1 0 1 1 1
Edit: I thought that the goal was to replace all zeros in the chosen rows and this was the approach. A completely vectorized approach:
idxs <- which(mat==0, arr.ind=TRUE)
# This returns that rows and columns that identify the zero elements
# idxs[,"row"] %in% replace.in.these.rows
# [1] TRUE TRUE FALSE FALSE TRUE TRUE
# That isolates the ones you want.
# idxs[ idxs[,"row"] %in% replace.in.these.rows , ]
# that shows what you will supply as the two column argument to "["
# row col
#[1,] 1 1
#[2,] 3 2
#[3,] 1 4
#[4,] 2 5
chosen.ones <- idxs[ idxs[,"row"] %in% replace.in.these.rows , ]
mat[chosen.ones] <- new.values[chosen.ones[,"row"]]
# Replace the zeros with the values chosen (and duplicated if necessary) by "row".
mat
#---------
[,1] [,2] [,3] [,4] [,5]
[1,] 91 1 1 91 1
[2,] 1 1 1 1 92
[3,] 1 93 1 1 1
[4,] 1 1 0 1 1
[5,] 1 0 1 1 1
I am trying to figure out a way to delete rows of matrix if a cell in that row satisfies a certain characteristic. For example:
> mm <- matrix(c(1,2,3,2,3,4,1,2,3,4),5,2)
> mm
[,1] [,2]
[1,] 1 4
[2,] 2 1
[3,] 3 2
[4,] 2 3
[5,] 3 4
I want to delete rows if the 1st column element in that row is 2. At the end I want this:
[,1] [,2]
[1,] 1 4
[2,] 3 2
[3,] 3 4
How could I do this?
And what about a more general method if instead of deleting all rows who's first column element is 2, I needed to delete rows who's first column element corresponds to a set of numbers that are contained in a list? For example
delete_list <- c(2,3)
What is the best way to do this?
Thank You in advance.
Just use
mm2 <- mm[mm[,1]!=2,]
This works because
mm[,1] != 2
returns
[1] TRUE FALSE TRUE FALSE TRUE
and essentially you are using this boolean array to choose which rows to pick.
Not tested...
newmat <- mm[mm[,1]!=2,]
is basically what I think you're after.
Edit: damn, ninja'd by one minute!