Build summary table from matrix - r

I have this matrix
mdat <- matrix(c(0,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,1,1,0,1), nrow = 4, ncol = 5, byrow = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 1 1 0
[2,] 0 1 1 0 1
[3,] 1 1 1 0 1
[4,] 1 1 1 0 1
and I'm trying to build T:
T1 T2 T3
row1 1 2 4
row2 2 2 3
row3 2 5 5
row4 3 1 3
row5 3 5 5
row6 4 1 3
row7 4 5 5
where for each row in mdat:
T1 shows mdat row number
T2 shows mdat column where there's the first 1
T3 shows mdat column where there's the last consecutive 1.
Therefore
row1 in T is [1 2 4] because for row 1 in mdat the first 1 is in column 2 and the last consecutive 1 is in column 4.
row2 in T is [2 2 3] because for row 2 in mdat the first 1 is in column 2 and the last consecutive 1 is in column 3.
This is my try:
for (i in 1:4){
for (j in 1:5) {
if (mdat[i,j]==1) {T[i,1]<-i;T[i,2]<-j;
cont<-0;
while (mdat[i,j+cont]==1){
cont<-cont+1;
T[i,3]<-cont}
}
}
}

Here's a strategy using apply/rle as Richard suggested.
xx<-apply(mdat, 1, function(x) {
r <- rle(x)
w <- which(r$values==1)
l <- r$lengths[w]
s <- cumsum(c(0,r$lengths))[w]+1
cbind(start=s,stop=s+l-1)
})
do.call(rbind, Map(cbind, row=seq_along(xx), xx))
We start by finding the runs of 1 on each row using the "values" property of the rle and we calculate their start and stop positions using the "lengths" property. We turn this data into a list of two column matrices with one list item per row of the original matrix.
Now we use Map to add the row number back onto the matrix and then we rbind all the results. That seems to give you the data you're after
row start stop
[1,] 1 2 4
[2,] 2 2 3
[3,] 2 5 5
[4,] 3 1 3
[5,] 3 5 5
[6,] 4 1 3
[7,] 4 5 5

Try the Bioconductor IRanges package:
library(IRanges)
r <- unlist(slice(split(Rle(mdat), row(mdat)), 1, rangesOnly=TRUE)))
r
IRanges of length 7
start end width names
[1] 2 4 3 1
[2] 2 3 2 2
[3] 5 5 1 2
[4] 1 3 3 3
[5] 5 5 1 3
[6] 1 3 3 4
[7] 5 5 1 4
EDIT: optimized

Related

How to calculate number of specific values in a data frame in R? [duplicate]

This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Counting number of instances of a condition per row R [duplicate]
(1 answer)
Closed 5 years ago.
I have a dataframe df:
a b c
1 5 5
2 3 5
3 3 5
3 3 3
3 3 2
4 2 2
1 2 2
I want to calculate how much 3's I have in a row for example, how can I do it?
For example row 2 = 1, row 3 = 2 etc.
Please advice.
The answer of #ManuelBickel is good if you want to count all of the values. If you really just want to know how many 3's there are, this might be simpler.
rowSums(data==3)
[1] 0 1 2 3
If you want the counts returned in a more ordered fashion
set.seed(1)
m <- matrix(sample(c(1:3, 5), 15, replace=TRUE), 5, dimnames=list(LETTERS[1:5]))
m
# [,1] [,2] [,3]
# A 2 5 1
# B 2 5 1
# C 3 3 3
# D 5 3 2
# E 1 1 5
u <- sort(unique(as.vector(m)))
r <- sapply(setNames(u, u), function(x) rowSums(m == x))
r
# 1 2 3 5
# A 1 1 0 1
# B 1 1 0 1
# C 0 0 3 0
# D 0 1 1 1
# E 2 0 0 1
You can use apply and table for this. The output is a list giving you the counts of unique elements per row. (If this is of interest, setting the MARGIN of apply to 2 would give you the output per column.)
Update: Since others have provided solutions producing more "ordered" output in the meanwhile, I have amended my approach by using data.table::rbindlist for this purpose.
#I have skipped some of the last rows of your example
data <- read.table(text = "
a b c
1 5 5
2 3 5
3 3 5
3 3 3
", header = T, stringsAsFactors = F)
apply(data, 1, table)
# [[1]]
# 1 5
# 1 2
# [[2]]
# 2 3 5
# 1 1 1
# [[3]]
# 3 5
# 2 1
# [[4]]
# 3
# 3
#Update: output in more ordered fashion
library(data.table)
rbindlist(apply(data, 1, function(x) as.data.table(t(as.matrix(table(x)))))
,fill = TRUE
,use.names = TRUE)
# 1 5 2 3
# 1: 1 2 NA NA
# 2: NA 1 1 1
# 3: NA 1 NA 2
# 4: NA NA NA 3
#if necessary NA values might be replaced, see, e.g.,
##https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table

Abnormal Sequencing in R

I would like to create a vector of sequenced numbers such as:
1,2,3,4,5, 2,3,4,5,1, 3,4,5,1,2
Whereby after a sequence is complete (say, rep(seq(1,5),3)), the first number of the previous sequence now moves to the last spot in the sequence.
%% to modulo?
(1:5) %% 5 + 1 # left shift by 1
[1] 2 3 4 5 1
(1:5 + 1) %% 5 + 1 # left shift by 2
[1] 3 4 5 1 2
also try
(1:5 - 2) %% 5 + 1 # right shift by 1
[1] 5 1 2 3 4
(1:5 - 3) %% 5 + 1 # right shift by 2
[1] 4 5 1 2 3
I would start off by making a matrix of one column longer than the length of the series.
> lseries <- 5
> nreps <- 3
> (values <- matrix(1:lseries, nrow = lseries + 1, ncol = nreps))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 1
[5,] 5 1 2
[6,] 1 2 3
This may throw a warning (In matrix(1:lseries, nrow = lseries + 1, ncol = nreps) : data length [5] is not a sub-multiple or multiple of the number of rows [6]) which you can ignore. Note, the first 1:lseries rows have the data you want. We can get the final result using:
> as.vector(values[1:lseries, ])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2
Here's method to get a matrix of each of these
matrix(1:5, 5, 6, byrow=TRUE)[, -6]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 1
[3,] 3 4 5 1 2
[4,] 4 5 1 2 3
[5,] 5 1 2 3 4
or turn it into a list
split.default(matrix(1:5, 5, 6, byrow=TRUE)[, -6], 1:5)
$`1`
[1] 1 2 3 4 5
$`2`
[1] 2 3 4 5 1
$`3`
[1] 3 4 5 1 2
$`4`
[1] 4 5 1 2 3
$`5`
[1] 5 1 2 3 4
or into a vector with c
c(matrix(1:5, 5, 6, byrow=TRUE)[, -6])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
For the sake of variety, here is a second method to return the vector:
# construct the larger vector
temp <- rep(1:5, 6)
# use sapply with which to pull off matching positions, then take select position to drop
temp[-sapply(1:5, function(x) which(temp == x)[x+1])]
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

R merging with a preference

Suppose you have a matrix that consists of two columns of only 1's and 2's.
A B
1 2
2 2
1 1
2 1
2 1
2 2
2 1
How would you merge these two columns into one so that 2 always overwrites 1?
Desired Output:
C
2
2
1
2
2
2
2
Assuming that the data is stored in a dataframe named df, you can use
df$C <- pmax(df$A, df$B)
to create a new column C with the desired result.
In the case of a matrix m you can use
m <- cbind(m, pmax(m[,1], m[,2]))
colnames(m) <- LETTERS[1:ncol(m)]
#> m
# A B C
#[1,] 1 2 2
#[2,] 2 2 2
#[3,] 1 1 1
#[4,] 2 1 2
#[5,] 2 1 2
#[6,] 2 2 2
#[7,] 2 1 2
#> class(m)
#[1] "matrix"
Without ifelse:
df$C <- apply(df[,c("A","B")],1,max)
With ifelse:
df$C2 <- with(df, ifelse(A==1&B==1,1,2))
Result
> df
A B C1 C2
1 1 2 2 2
2 2 2 2 2
3 1 1 1 1
4 2 1 2 2
5 2 1 2 2
6 2 2 2 2
7 2 1 2 2

convert rows after column

I have csv file which reads like this
1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1
What I want to do is to this:
1 5 4 6 7 2
2 3 5 3 8 1
3 2 6 7 9 1
i.e after every third row, I want a different column of the values side by side. Any advise?
Thanks a lot
Here's a way to do this with matrix indexing. It's a bit strange, but I find it interesting so I will post it.
You want an index matrix, with indices as follows. This gives the order of your data as a matrix (column-major order):
1, 1
2, 1
3, 1
1, 2
2, 2
3, 2
4, 1
...
8, 2
9, 2
This gives the pattern that you need to select the elements. Here's one approach to building such a matrix. Say that your data is in the object dat, a data frame or matrix:
m <- matrix(
c(
outer(rep(1:3, 2), seq(0,nrow(dat)-1,by=3), FUN='+'),
rep(rep(1:2, each=3), nrow(dat)/3)
),
ncol=2
)
The outer expression is the first column of the desired index matrix, and the rep expression is the second column. Now just index dat with this index matrix, and build a result matrix with three rows:
matrix(dat[m], nrow=3)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 5 4 6 7 2
## [2,] 2 3 5 3 8 1
## [3,] 3 2 6 7 9 1
a <- read.table(text = "1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1")
(seq_len(nrow(a))-1) %/% 3
# [1] 0 0 0 1 1 1 2 2 2
split(a, (seq_len(nrow(a))-1) %/% 3)
# $`0`
# V1 V2
# 1 1 5
# 2 2 3
# 3 3 2
# $`1`
# V1 V2
# 4 4 6
# 5 5 3
# 6 6 7
# $`2`
# V1 V2
# 7 7 2
# 8 8 1
# 9 9 1
do.call(cbind,split(a, (seq_len(nrow(a))-1) %/% 3))
# 0.V1 0.V2 1.V1 1.V2 2.V1 2.V2
# 1 1 5 4 6 7 2
# 2 2 3 5 3 8 1
# 3 3 2 6 7 9 1

Resources