I'm trying to write a streamlined function in R to compare multiple columns in a matrix. What is the optimal way to do this in R? Most likely using apply.
I have seen this question crop up a number of times but with some conflicting views on the optimal way to write this.
for ( j in 2:ncol(net) )
{
for ( i in 1:nrow(net) )
{
net[i,j] <- min(net[i,j],net[i,1])
}
}
The end output of a matrix with the following
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 2 3
[3,] 3 2 3
would be
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 2 3
We can unlist the columns the "net" except the first column (net[-1]), replicate the first column as the same length as the unlisted columns, and use pmin to get the minimum value of corresponding elements of the vectors.
pmin(unlist(net[-1], use.names=FALSE), net[,1][row(net[-1])])
#[1] 2 2 7 5 2 2 2 6 5 3 2 1 0 5 1
If we need a lapply solution,
unlist(lapply(net[-1], function(x) pmin(x, net[,1])), use.names=FALSE)
Using the OP's for loop
for ( i in 2:ncol(net) ){
for ( j in 1:nrow(net) ){
print(min(net[j,i],net[j,1]))
}
}
#[1] 2
#[1] 2
#[1] 7
#[1] 5
#[1] 2
#[1] 2
#[1] 2
#[1] 6
#[1] 5
#[1] 3
#[1] 2
#[1] 1
#[1] 0
#[1] 5
#[1] 1
Update
As the OP mentioned that this is not giving the expected output, trying with new data showed in the OP's post
net <- cbind(1:3, 2, 3)
cbind(net[,1],pmin(unlist(net[,-1], use.names=FALSE),
net[,1][row(net[,-1])]))
# [,1] [,2] [,3]
#[1,] 1 1 1
#[2,] 2 2 2
#[3,] 3 2 3
data
set.seed(24)
net <- as.data.frame(matrix(sample(0:9, 4*5, replace=TRUE), ncol=4))
If there are no NAs you can do
net <- head(airquality, 4) # example data
for (j in 1:nrow(net)) net[j, net[j,]>net[j,1]] <- net[j,1]
net
Here's a version with sapply and ifelse (which is vectorised, woo), which is likely faster, and deals with NA values in a predictable way:
sapply(X = seq(to = ncol(x = net)), FUN = function(j){
net[,j] <- ifelse(test = net[,1] < net[,j], yes = net[,1], no = net[,j])
})
Some sample data
net <- head(airquality)
net
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
results in:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 41 41 7.4 41 5 1
[2,] 36 36 8.0 36 5 2
[3,] 12 12 12.0 12 5 3
[4,] 18 18 11.5 18 5 4
[5,] NA NA NA NA NA NA
[6,] 28 NA 14.9 28 5 6
Note: I specified pretty much all argument names, as I've found this makes most code faster. If you don't care about time, a simpler [possibly more readable] version:
sapply(seq(ncol(net)), function(j){
net[,j] <- ifelse(net[,1] < net[,j], net[,1], net[,j])
})
Related
I want to create a Matrix where the entry for each row is chosen randomly. I want the matrix to have the property that each row in the same column has a different value. If different rows (for example row i and row i+1) in the same column have the same value then I want to replace the entry for row i+1 with NA. Basically, I want the column to have different entries for each row. For example, column 1 entries are (1,2,2,4,1). Then, I want to make this column entries are (1,2,NA,4,NA). I have tried this
solution = matrix(NA,nrow=5,ncol=5)
for (i in 1:5) {
for (j in 1:5) {
one_entry = sample(1:10, 1)
solution[j,i] = one_entry
if (solution[j+1,i]==solution[j,i]){
#is.na(solution[j+1,i]) <- solution[j+1,I]
solution[j+1,i]<- NA
#solution[solution[j+1,i]] <- NA
} else {
solution[j+1, i] = one_entry
}
}
}
print(solution)
I got the error "Error in if (solution[j + 1, i] == solution[j, i]) { :
missing value where TRUE/FALSE needed". Please help how to resolve this.
Instead of element-wise comparison using if statement, you can replace duplicated entries with NA. The output of duplicated() is a logical vector setting the position of the duplicates to TRUE.
set.seed(1)
nr <- 5
nc <- 7
m <- matrix(sample(1:10, nr * nc, replace = TRUE), nrow = nr)
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 9 7 5 9 5 1 10
# [2,] 4 2 10 5 5 4 6
# [3,] 7 3 6 5 2 3 4
# [4,] 1 1 10 9 10 6 4
# [5,] 2 5 7 9 9 10 10
for (i in seq_len(nc)) {
m[, i][duplicated(m[, i])] <- NA
}
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 9 7 5 9 5 1 10
# [2,] 4 2 10 5 NA 4 6
# [3,] 7 3 6 NA 2 3 4
# [4,] 1 1 NA NA 10 6 NA
# [5,] 2 5 7 NA 9 10 NA
Using purrr library:
library(purrr)
set.seed(123)
#populate the matrix
(mat <- rerun(5, sample(1:10,size = 5, replace = TRUE)) %>%
reduce(cbind))
#> out elt elt elt elt
#> [1,] 3 5 5 3 9
#> [2,] 3 4 3 8 3
#> [3,] 10 6 9 10 4
#> [4,] 2 9 9 7 1
#> [5,] 6 10 9 10 7
map(2:length(mat), ~{ if (mat[[. - 1]] == mat[[.]]) .x } ) %>%
compact() %>%
walk(~{ mat[[.x]] <<- NA })
mat
#> out elt elt elt elt
#> [1,] 3 5 5 3 9
#> [2,] NA 4 3 8 3
#> [3,] 10 6 9 10 4
#> [4,] 2 9 NA 7 1
#> [5,] 6 10 NA 10 7
Created on 2021-06-28 by the reprex package (v2.0.0)
I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):
nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 1
# [7,] 7 3 2
# [8,] 8 4 3
# Warning message:
# In (function (..., deparse.level = 1) :
# number of rows of result is not a multiple of vector length (arg 2)
As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.
Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 NA
# [7,] 7 NA NA
# [8,] 8 NA NA
How can I go about doing this?
You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:
nm <- list(1:8,3:8,1:5)
foo <- 8
sapply(nm, '[', 1:foo)
EDIT:
Or in one line using the largest vector as number of rows:
sapply(nm, '[', seq(max(sapply(nm,length))))
From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):
sapply(nm, '[', seq(max(lengths(nm))))
You should fill vectors with NA before calling do.call.
nm <- list(1:8,3:8,1:5)
max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
ans[1:length(x)]<- x;
return(ans)})
do.call(cbind,nm_filled)
This is a shorter version of Wojciech's solution.
nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
c(x, rep(NA, max_length - length(x)))
})
Here is an option using stri_list2matrix from stringi
library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
# [,1] [,2] [,3]
#[1,] 1 3 1
#[2,] 2 4 2
#[3,] 3 5 3
#[4,] 4 6 4
#[5,] 5 7 5
#[6,] 6 8 NA
#[7,] 7 NA NA
#[8,] 8 NA NA
Late to the party but you could use cbind.fill from rowr package with fill = NA
library(rowr)
do.call(cbind.fill, c(nm, fill = NA))
# object object object
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
If you have a named list instead and want to maintain the headers you could use setNames
nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))
# a b c
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
I have a sparse matrix represented as
> (f <- data.frame(row=c(1,2,3,1,2,1,2,3,4,1,1,2),value=1:12))
row value
1 1 1
2 2 2
3 3 3
4 1 4
5 2 5
6 1 6
7 2 7
8 3 8
9 4 9
10 1 10
11 1 11
12 2 12
Here the first column is always present (in fact, the first few are present, the rest are not).
I want to get the data into the matrix format:
> t(matrix(c(1,2,3,NA,4,5,NA,NA,6,7,8,9,10,NA,NA,NA,11,12,NA,NA),nrow=4,ncol=5))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 NA
[2,] 4 5 NA NA
[3,] 6 7 8 9
[4,] 10 NA NA NA
[5,] 11 12 NA NA
Here is what seems to be working:
> library(Matrix)
> as.matrix(sparseMatrix(i = cumsum(f[[1]] == 1), j=f[[1]], x=f[[2]]))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 0
[2,] 4 5 0 0
[3,] 6 7 8 9
[4,] 10 0 0 0
[5,] 11 12 0 0
Except that I have to replace 0 with NA myself.
Is there a better solution?
You can do everything with base functions. The trick is to use indexing by a 2-col (row and col indices) matrix:
j <- f$row
i <- cumsum(j == 1)
x <- f$value
m <- matrix(NA, max(i), max(j))
m[cbind(i, j)] <- x
m
Whether it is better or not than using the Matrix package is subjective. Overkill in my opinion if you are not doing anything else with it. Also if your data had 0 in the f$value column, they would end up being converted as NA if you are not too careful.
How can I sum the number of complete cases of two columns?
With c equal to:
a b
[1,] NA NA
[2,] 1 1
[3,] 1 1
[4,] NA 1
Applying something like
rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)
I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).
Am I over thinking it?
Thanks!
Did you try sum(complete.cases(x))?!
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
# [,1] [,2] [,3]
#[1,] 1 NA 5
#[2,] 4 3 2
#[3,] 2 5 4
#[4,] 5 3 3
#[5,] 5 2 NA
sum(complete.cases(x))
#[1] 3
To find the complete.cases() of the first two columns:
sum(complete.cases(x[,1:2]))
#[1] 4
And to apply to two columns of a matrix across the whole matrix you could do this:
# Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 NA 5 5 5 4 5 2 NA NA
#[2,] 4 3 2 1 4 3 5 4 2 1
#[3,] 2 5 4 NA 3 3 4 1 2 2
#[4,] 5 3 3 1 5 1 4 1 2 1
#[5,] 5 2 NA 5 3 NA NA 1 NA 5
# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3
complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.
You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:
m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
# [,1] [,2] [,3] [,4]
#[1,] NA 1 1 1
#[2,] 1 NA 1 1
library(zoo)
rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2
This shoudl work for both matrix and data.frame
> sum(apply(c, 1, function(x)all(!is.na(x))))
[1] 2
and you could simply iterate through large matrix M
for (i in 1:(ncol(M)-1) ){
c <- M[,c(i,i+1]
agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}
I'm a beginner R user and I need to write a function that sums the rows of a data frame over a fixed interval (every 4 rows).
I've tried the following code
camp<-function(X){
i<-1
n<-nrow(X)
xc<-matrix(nrow=36,ncol=m)
for (i in 1:n){
xc<-apply(X[i:(i+4),],2,sum)
rownames(xc[i])<-rownames(X[i])
i<-i+5
}
return(xc)
}
the result is "Error in X[i:(i + 4), ] : index out of range".
How can I solve? Any suggestion?
Thanks.
The zoo package has rollapply which is pretty handy for stuff like this...
# Make some data
set.seed(1)
m <- matrix( sample( 10 , 32 , repl = TRUE ) , 8 )
# [,1] [,2] [,3] [,4]
#[1,] 3 7 8 3
#[2,] 4 1 10 4
#[3,] 6 3 4 1
#[4,] 10 2 8 4
#[5,] 3 7 10 9
#[6,] 9 4 3 4
#[7,] 10 8 7 5
#[8,] 7 5 2 6
# Sum every 4 rows
require( zoo )
tmp <- rollapply( m , width = 4 , by = 4 , align = "left" , FUN = sum )
# [,1] [,2] [,3] [,4]
#[1,] 23 13 30 12
#[2,] 29 24 22 24
You can also use rowSums() on the result if you actually wanted to aggregate the columns into a single value for each of the 4 rows...
rowSums( tmp )
#[1] 78 99
Here is a way to do it :
## Sample data
m <- matrix(1:36, nrow=12)
## Create a "group" index
fac <- (seq_len(nrow(m))-1) %/% 4
## Apply sum
apply(m, 2, function(v) tapply(v, fac, sum))
Sample data :
[,1] [,2] [,3]
[1,] 1 13 25
[2,] 2 14 26
[3,] 3 15 27
[4,] 4 16 28
[5,] 5 17 29
[6,] 6 18 30
[7,] 7 19 31
[8,] 8 20 32
[9,] 9 21 33
[10,] 10 22 34
[11,] 11 23 35
[12,] 12 24 36
Result :
[,1] [,2] [,3]
0 10 58 106
1 26 74 122
2 42 90 138