Convert matrix to three defined columns in R - r

Given m:
m <- structure(c(5, 1, 3, 2, 1, 4, 5, 2, 5, 1, 1, 5, 1, 4, 0, 4, 5,
5, 3, 2, 0, 0, 3, 0, 3, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(7L,
5L))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 2 0 0 0
# [2,] 1 5 4 3 0
# [3,] 3 1 5 0 0
# [4,] 2 1 5 3 0
# [5,] 1 5 3 2 0
# [6,] 4 1 2 3 0
# [7,] 5 4 0 0 0
Consider the element 1, it appears in 5 rows (2, 3, 4, ,5, 6) and the respective column-wise indices are (1, 2, 2, 1, 2). I would like to have the following:
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2
As another example, consider the element 2, it appears in 4 rows (1, 4, 5, 6) and the respective column-wise indices are (2, 1, 4, 3) and we have:
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2
2 1 2
2 4 1
2 5 4
2 6 3
What I want is a n*3 matrix for all 1-5. Preferably in base R

A convenient way to transform it is to use sparseMatrix from Matrix library, since your desired output is very close to the representation of sparse Matrix:
library(Matrix)
summary(Matrix(m, sparse = T))
# 7 x 5 sparse Matrix of class "dgCMatrix", with 23 entries
# i j x
# 1 1 1 5
# 2 2 1 1
# 3 3 1 3
# 4 4 1 2
# 5 5 1 1
# 6 6 1 4
# 7 7 1 5
# 8 1 2 2
# 9 2 2 5
# 10 3 2 1
# 11 4 2 1
# 12 5 2 5
# 13 6 2 1
# 14 7 2 4
# 15 2 3 4
# 16 3 3 5
# 17 4 3 5
# 18 5 3 3
# 19 6 3 2
# 20 2 4 3
# 21 4 4 3
# 22 5 4 2
# 23 6 4 3
To see it better:
summary(Matrix(m, sparse = T)) %>% dplyr::arrange(x)
# i j x
# 1 2 1 1
# 2 5 1 1
# 3 3 2 1
# 4 4 2 1
# 5 6 2 1
# 6 4 1 2
# 7 1 2 2
# 8 6 3 2
# 9 5 4 2
# 10 3 1 3
# 11 5 3 3
# 12 2 4 3
# 13 4 4 3
# 14 6 4 3
# 15 6 1 4
# 16 7 2 4
# 17 2 3 4
# 18 1 1 5
# 19 7 1 5
# 20 2 2 5
# 21 5 2 5
# 22 3 3 5
# 23 4 3 5

We can use which with arr.ind=TRUE
cbind(val= 1, which(m==1, arr.ind=TRUE))
# val row col
#[1,] 1 2 1
#[2,] 1 5 1
#[3,] 1 3 2
#[4,] 1 4 2
#[5,] 1 6 2
For multiple cases, as #RHertel mentioned
for(i in 1:5) print(cbind(i,which(m==i, arr.ind=TRUE)))
Or with lapply
do.call(rbind, lapply(1:2, function(i) {
m1 <-cbind(val=i,which(m==i, arr.ind=TRUE))
m1[order(m1[,2]),]}))
# val row col
#[1,] 1 2 1
#[2,] 1 3 2
#[3,] 1 4 2
#[4,] 1 5 1
#[5,] 1 6 2
#[6,] 2 1 2
#[7,] 2 4 1
#[8,] 2 5 4
#[9,] 2 6 3
As the OP mentioned about base R solutions, the above would help. But, in case, if somebody wants a compact solution,
library(reshape2)
melt(m)
and then subset the values of interest.

Just use row and col.
> data.frame(m=as.vector(m), row=as.vector(row(m)), col=as.vector(col(m)))
m row col
1 5 1 1
2 1 2 1
3 3 3 1
4 2 4 1
5 1 5 1
...
Subset, sort, and print as desired.
> tmp <- out[order(out$m, out$row), ]
> print(subset(tmp, m==1), row.names=FALSE)
m row col
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2

Related

Creating an indexed column in R, grouped by user_id, and not increase when NA

I want to create a column (in R) that indexes the presence of a number in another column grouped by a user_id column. And when the other column is NA, the new desired column should not increase.
The example should bring clarity.
I have this df:
data <- data.frame(user_id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
one=c(1,NA,3,2,NA,0,NA,4,3,4,NA))
user_id tobeindexed
1 1 1
2 1 NA
3 1 3
4 2 2
5 2 NA
6 2 0
7 2 NA
8 3 4
9 3 3
10 3 4
11 3 NA
I want to make a new column looking like "desired" in the following df:
> cbind(data,data.frame(desired = c(1,1,2,1,1,2,2,1,2,3,3)))
user_id tobeindexed desired
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 1
5 2 NA 1
6 2 0 2
7 2 NA 2
8 3 4 1
9 3 3 2
10 3 4 3
11 3 NA 3
How can I solve this?
Using colsum and group_by gets me close, but the count does not start over from 1 when the user_id changes...
> data %>% group_by(user_id) %>% mutate(desired = cumsum(!is.na(tobeindexed)))
user_id tobeindexed desired
<dbl> <dbl> <int>
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 3
5 2 NA 3
6 2 0 4
7 2 NA 4
8 3 4 5
9 3 3 6
10 3 4 7
11 3 NA 7
Given the sample data you provided (with the one) column, this works unchanged. The code is retained below for demonstration.
base R
data$out <- ave(data$one, data$user_id, FUN = function(z) cumsum(!is.na(z)))
data
# user_id one out
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3
dplyr
library(dplyr)
data %>%
group_by(user_id) %>%
mutate(out = cumsum(!is.na(one))) %>%
ungroup()
# # A tibble: 11 × 3
# user_id one out
# <dbl> <dbl> <int>
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3

expand_grid with identical vectors

Problem:
Is there a simple way to get all combinations of two (or more) identical vectors. But only show unique combinations.
Reproducible example:
library(tidyr)
x = 1:3
expand_grid(a = x,
b = x,
c = x)
# A tibble: 27 x 3
a b c
<int> <int> <int>
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
5 1 2 2
6 1 2 3
7 1 3 1
8 1 3 2
9 1 3 3
10 2 1 1
# ... with 17 more rows
But, if row 1 2 1 exists, then I do not want to see 1 1 2 or 2 1 1. I.e. show only unique combinations of the three vectors (any order).
library(gtools)
x = 1:3
df <- as.data.frame(combinations(n=3,r=3,v=x,repeats.allowed=T))
df
output
V1 V2 V3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
You can just sort rowwise and remove duplicates. Continuing from your expand_grid(), then
df <- tidyr::expand_grid(a = x,
b = x,
c = x)
data.frame(unique(t(apply(df, 1, sort))))
X1 X2 X3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
Using comboGeneral from the RcppAlgos package, it's implemented in C++ and pretty fast.
x <- 1:3
RcppAlgos::comboGeneral(x, repetition=TRUE)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 1 3
# [4,] 1 2 2
# [5,] 1 2 3
# [6,] 1 3 3
# [7,] 2 2 2
# [8,] 2 2 3
# [9,] 2 3 3
# [10,] 3 3 3
Note: If you're running Linux, you will need gmp installed, e.g. for Ubuntu do:
sudo apt install libgmp3-dev
base
x <- 1:3
df <- expand.grid(a = x,
b = x,
c = x)
df[!duplicated(apply(df, 1, function(x) paste(sort(x), collapse = ""))), ]
#> a b c
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
#> 5 2 2 1
#> 6 3 2 1
#> 9 3 3 1
#> 14 2 2 2
#> 15 3 2 2
#> 18 3 3 2
#> 27 3 3 3
Created on 2021-09-09 by the reprex package (v2.0.1)

Creating a "run ID" for values in sequence

I have a vector which contains an ordered sequence of repeated integers:
x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)
I want to create a "run ID" (I assume using data.table::rleid()) for numbers that are in sequence. That is, numbers which are either equal or +1 the previous value.
So, the expected output would be:
x
#> [1] 1 1 1 2 2 2 2 3 3 5 5 5 5 6 6 9 9 9 9
data.table::rleid(???)
#> [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
My first thought was to simply check if each value is the same or +1 the previous, but that doesn't work since the first change is considered a run of its own, obviously (a FALSE surrounded by TRUEs):
x
#> [1] 1 1 1 2 2 2 2 3 3 5 5 5 5 6 6 9 9 9 9
data.table::rleid((x - lag(x, default = 1)) %in% 0:1)
#> [1] 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 4 5 5 5
I obviously need something which allows me to compare each value to the last different value, but I can't think of how to do that effectively. Any pointers?
How about using lag from dplyr with cumsum?
library(dplyr)
cumsum(x - lag(x,default = 0) > 1)+1
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
Or the data.table way with shift:
library(data.table)
cumsum(x - shift(x,1,fill = 0) > 1) + 1
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
Base R option using diff and cumsum :
cumsum(c(TRUE, diff(x) > 1))
#[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)
tibble(X = x) %>%
mutate(PREV.X = lag(X, default = 0),
IS.SEQ = X != PREV.X & X != PREV.X + 1,
RZLT = 1 + cumsum(IS.SEQ))
# A tibble: 19 x 4
X PREV.X IS.SEQ RZLT
<dbl> <dbl> <lgl> <dbl>
1 1 0 FALSE 1
2 1 1 FALSE 1
3 1 1 FALSE 1
4 2 1 FALSE 1
5 2 2 FALSE 1
6 2 2 FALSE 1
7 2 2 FALSE 1
8 3 2 FALSE 1
9 3 3 FALSE 1
10 5 3 TRUE 2
11 5 5 FALSE 2
12 5 5 FALSE 2
13 5 5 FALSE 2
14 6 5 FALSE 2
15 6 6 FALSE 2
16 9 6 TRUE 3
17 9 9 FALSE 3
18 9 9 FALSE 3
19 9 9 FALSE 3

Abnormal Sequencing in R

I would like to create a vector of sequenced numbers such as:
1,2,3,4,5, 2,3,4,5,1, 3,4,5,1,2
Whereby after a sequence is complete (say, rep(seq(1,5),3)), the first number of the previous sequence now moves to the last spot in the sequence.
%% to modulo?
(1:5) %% 5 + 1 # left shift by 1
[1] 2 3 4 5 1
(1:5 + 1) %% 5 + 1 # left shift by 2
[1] 3 4 5 1 2
also try
(1:5 - 2) %% 5 + 1 # right shift by 1
[1] 5 1 2 3 4
(1:5 - 3) %% 5 + 1 # right shift by 2
[1] 4 5 1 2 3
I would start off by making a matrix of one column longer than the length of the series.
> lseries <- 5
> nreps <- 3
> (values <- matrix(1:lseries, nrow = lseries + 1, ncol = nreps))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 1
[5,] 5 1 2
[6,] 1 2 3
This may throw a warning (In matrix(1:lseries, nrow = lseries + 1, ncol = nreps) : data length [5] is not a sub-multiple or multiple of the number of rows [6]) which you can ignore. Note, the first 1:lseries rows have the data you want. We can get the final result using:
> as.vector(values[1:lseries, ])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2
Here's method to get a matrix of each of these
matrix(1:5, 5, 6, byrow=TRUE)[, -6]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 1
[3,] 3 4 5 1 2
[4,] 4 5 1 2 3
[5,] 5 1 2 3 4
or turn it into a list
split.default(matrix(1:5, 5, 6, byrow=TRUE)[, -6], 1:5)
$`1`
[1] 1 2 3 4 5
$`2`
[1] 2 3 4 5 1
$`3`
[1] 3 4 5 1 2
$`4`
[1] 4 5 1 2 3
$`5`
[1] 5 1 2 3 4
or into a vector with c
c(matrix(1:5, 5, 6, byrow=TRUE)[, -6])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
For the sake of variety, here is a second method to return the vector:
# construct the larger vector
temp <- rep(1:5, 6)
# use sapply with which to pull off matching positions, then take select position to drop
temp[-sapply(1:5, function(x) which(temp == x)[x+1])]
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4

convert rows after column

I have csv file which reads like this
1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1
What I want to do is to this:
1 5 4 6 7 2
2 3 5 3 8 1
3 2 6 7 9 1
i.e after every third row, I want a different column of the values side by side. Any advise?
Thanks a lot
Here's a way to do this with matrix indexing. It's a bit strange, but I find it interesting so I will post it.
You want an index matrix, with indices as follows. This gives the order of your data as a matrix (column-major order):
1, 1
2, 1
3, 1
1, 2
2, 2
3, 2
4, 1
...
8, 2
9, 2
This gives the pattern that you need to select the elements. Here's one approach to building such a matrix. Say that your data is in the object dat, a data frame or matrix:
m <- matrix(
c(
outer(rep(1:3, 2), seq(0,nrow(dat)-1,by=3), FUN='+'),
rep(rep(1:2, each=3), nrow(dat)/3)
),
ncol=2
)
The outer expression is the first column of the desired index matrix, and the rep expression is the second column. Now just index dat with this index matrix, and build a result matrix with three rows:
matrix(dat[m], nrow=3)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 5 4 6 7 2
## [2,] 2 3 5 3 8 1
## [3,] 3 2 6 7 9 1
a <- read.table(text = "1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1")
(seq_len(nrow(a))-1) %/% 3
# [1] 0 0 0 1 1 1 2 2 2
split(a, (seq_len(nrow(a))-1) %/% 3)
# $`0`
# V1 V2
# 1 1 5
# 2 2 3
# 3 3 2
# $`1`
# V1 V2
# 4 4 6
# 5 5 3
# 6 6 7
# $`2`
# V1 V2
# 7 7 2
# 8 8 1
# 9 9 1
do.call(cbind,split(a, (seq_len(nrow(a))-1) %/% 3))
# 0.V1 0.V2 1.V1 1.V2 2.V1 2.V2
# 1 1 5 4 6 7 2
# 2 2 3 5 3 8 1
# 3 3 2 6 7 9 1

Resources