Create dataframe of all array indices in R

Create dataframe of all array indices in R - r

Using R, I'm trying to construct a dataframe of the row and col numbers of a given matrix. E.g., if
a <- matrix(c(1:15), nrow=5, ncol=3)
then I'm looking to construct a dataframe that gives:
row col
1 1
1 2
1 3
. .
5 1
5 2
5 3
What I've tried:
row <- matrix(row(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
col <- matrix(col(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
out <- cbind(row, col)
colnames(out) <- c("row", "col")
results in:
row col
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 1 2
[7,] 2 2
[8,] 3 2
[9,] 4 2
[10,] 5 2
[11,] 1 3
[12,] 2 3
[13,] 3 3
[14,] 4 3
[15,] 5 3
Which isn't what I'm looking for, as the sequence of rows and cols in suddenly reversed, even tough I specified "byrow=T". I don't see if and where I'm making a mistake but would hugely appreciate suggestions to overcome this problem. Thanks in advance!

I'd use expand.grid on the vectors 1:ncol and 1:nrow, then flip the columns with [,2:1] to get them in the order you want:
> expand.grid(seq(ncol(a)),seq(nrow(a)))[,2:1]
Var2 Var1
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3

Use row and col, but more directly manipulate their output ordering since they return corresponding indices in place for the input array. Use t to get the non-default order you want in the end:
data.frame(row = as.vector(t(row(a))), col = as.vector(t(col(a))))
row col
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Or, as a matrix not a data.frame:
cbind(as.vector(t(row(a))), as.vector(t(col(a))))
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 2 1
[5,] 2 2
[6,] 2 3
[7,] 3 1
[8,] 3 2
[9,] 3 3
[10,] 4 1
[11,] 4 2
[12,] 4 3
[13,] 5 1
[14,] 5 2
[15,] 5 3

You may want to have a look at ?expand.grid, which does just about exactly what you want to achieve.

Since there are many ways to skin a cat, I'll chip in with yet another variant based on rep:
data.frame(row=rep(seq(nrow(a)), each=ncol(a)), col=rep(seq(ncol(a)), nrow(a)))
...but to announce a "winner", I think you need to time the solutions:
# Make up a huge matrix...
a <- matrix(runif(1e7), 1e4)
system.time( a1<-data.frame(row = as.vector(t(row(a))),
col = as.vector(t(col(a)))) ) # 0.68 secs
system.time( a2<-expand.grid(col = seq(ncol(a)),
row = seq(nrow(a)))[,2:1] ) # 0.49 secs
system.time( a3<-data.frame(row=rep(seq(nrow(a)), each=ncol(a)),
col=rep(seq(ncol(a)), nrow(a))) ) # 0.59 secs
identical(a1, a2) && identical(a1, a3) # TRUE
...so it seems #Spacedman has the speediest solution!

Related

If n occurs less than 5 times, change to n-1

In R, if I have a df of numbers c(1,1,1,2,3,3,3,3,3,3,4,4,4,5,5), how do change n to n-1 if n occurs less than 5 times? Example input x and output out.
x out
1 1 1
2 1 1
3 1 1
4 2 1
5 3 3
6 3 3
7 3 3
8 3 3
9 3 3
10 3 3
11 4 3
12 4 3
13 4 3
As the first value in the column (it will also be the minimum value), 1 would stay the same. However, if it would make the coding easier, the 1s can change to 0, but the 2 would still change to 1.
EDIT:
How can I repeat this if the changed values now occur <5 times? For example
# x out
# [1,] 1 0
# [2,] 1 0
# [3,] 1 0
# [4,] 2 1
# [5,] 3 3
# [6,] 3 3
# [7,] 3 3
# [8,] 3 3
# [9,] 3 3
#[10,] 3 3
#[11,] 4 3
#[12,] 4 3
#[13,] 4 3
#[14,] 5 3
#[15,] 5 3
#[16,] 5 3
#[17,] 6 3
#[18,] 6 3
#[19,] 6 3
#[20,] 7 3
#[21,] 7 3

Using ave :
x <- c(1,1,1,2,3,3,3,3,3,3,4,4,4)
pmax(x - +(ave(x, x, FUN =length) < 5), 1)
#[1] 1 1 1 1 3 3 3 3 3 3 3 3 3
If values in x can repeat we need to use rle for grouping.
pmax(x - +(ave(x,with(rle(x),rep(seq_along(values), lengths)),FUN =length) < 5),1)

You can use rle if x is sorted to find how many times a number is there. And subtract 1 if there are less then 5.
i <- order(x)
y <- rle(x[i])
y$values <- y$values - (y$lengths < 5)
cbind(x,out=inverse.rle(y)[order(i)])
# x out
# [1,] 1 0
# [2,] 1 0
# [3,] 1 0
# [4,] 2 1
# [5,] 3 3
# [6,] 3 3
# [7,] 3 3
# [8,] 3 3
# [9,] 3 3
#[10,] 3 3
#[11,] 4 3
#[12,] 4 3
#[13,] 4 3
#[14,] 5 4
#[15,] 5 4

Another solution
library(tidyvese)
x <- c(1,1,1,2,3,3,3,3,3,3,4,4,4,5,5)
df <- tibble(x = x)
df %>%
group_by(x) %>%
mutate(n = n()) %>%
ungroup %>%
transmute(x,
out = if_else((x != min(x, na.rm = T) & n < 5), x - 1, x))

R - Counting through a vector and refreshing at a certain value?

Let's say I have a vector
vec <- c(3,0,1,1,0,3,0,1,3,0,0,0,3)
And I want to be able to count through this vector using the value 3 as the refresh point. So, the output I want is
vec out
[1,] 3 1
[2,] 0 2
[3,] 1 3
[4,] 1 4
[5,] 0 5
[6,] 3 1
[7,] 0 2
[8,] 1 3
[9,] 3 1
[10,] 0 2
[11,] 0 3
[12,] 0 4
[13,] 3 1
How would I do this in R, preferably without using loops?

With base R, you can do:
ave(vec, cumsum(vec == 3), FUN = seq_along)
[1] 1 2 3 4 5 1 2 3 1 2 3 4 1

An option using data.table::rowid:
data.table::rowid(cumsum(vec==3L))

As another idea, we can locate the indices of the last value of 3 for each element of vec:
last3 = cummax((vec == 3) * seq_along(vec))
last3
# [1] 1 1 1 1 1 6 6 6 9 9 9 9 13
And subtract from their respective indices in vec:
seq_along(vec) - last3 + 1 ## `.. - pmax(last3, 1) ..` if `vec[1] != 3`
# [1] 1 2 3 4 5 1 2 3 1 2 3 4 1

What is the best way to tidy a matrix in R

Is there a best practice means of "tidying" a matrix/array? By "tidy" in this context I mean
one row per element of the matrix
one column per dimension. the elements of these columns give you the "coordinates" of the matrix element which is stored on that row
I have an example here for a 2d matrix, but ideally this would work with an array also (This example works for mm <- array(1:18, c(3,3,3)), but I thought that would be too much to paste in here)
mm <- matrix(1:9, nrow = 3)
mm
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
inds <- which(mm > -Inf, arr.ind = TRUE)
cbind(inds, value = mm[inds])
#> row col value
#> [1,] 1 1 1
#> [2,] 2 1 2
#> [3,] 3 1 3
#> [4,] 1 2 4
#> [5,] 2 2 5
#> [6,] 3 2 6
#> [7,] 1 3 7
#> [8,] 2 3 8
#> [9,] 3 3 9

as.data.frame.table One way to convert from wide to long is the following. See ?as.data.frame.table for more information. No packages are used.
mm <- matrix(1:9, 3)
long <- as.data.frame.table(mm)
The code gives this data.frame:
> long
Var1 Var2 Freq
1 A A 1
2 B A 2
3 C A 3
4 A B 4
5 B B 5
6 C B 6
7 A C 7
8 B C 8
9 C C 9
numbers
If you prefer row and column numbers:
long[1:2] <- lapply(long[1:2], as.numeric)
giving:
> long
Var1 Var2 Freq
1 1 1 1
2 2 1 2
3 3 1 3
4 1 2 4
5 2 2 5
6 3 2 6
7 1 3 7
8 2 3 8
9 3 3 9
names Note that above it used A, B, C, ... because there were no row or column names. They would have been used if present. That is, had there been row and column names and dimension names the output would look like this:
mm2 <- array(1:9, c(3, 3), dimnames = list(A = c("a", "b", "c"), B = c("x", "y", "z")))
as.data.frame.table(mm2, responseName = "Val")
giving:
A B Val
1 a x 1
2 b x 2
3 c x 3
4 a y 4
5 b y 5
6 c y 6
7 a z 7
8 b z 8
9 c z 9
3d
Here is a 3d example:
as.data.frame.table(array(1:8, c(2,2,2)))
giving:
Var1 Var2 Var3 Freq
1 A A A 1
2 B A A 2
3 A B A 3
4 B B A 4
5 A A B 5
6 B A B 6
7 A B B 7
8 B B B 8
2d only For 2d one can alternately use row and col:
sapply(list(row(mm), col(mm), mm), c)
or
cbind(c(row(mm)), c(col(mm)), c(mm))
Either of these give this matrix:
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 1 2 4
[5,] 2 2 5
[6,] 3 2 6
[7,] 1 3 7
[8,] 2 3 8
[9,] 3 3 9

Another method is to use arrayInd together with cbind like this.
# a 3 X 3 X 2 array
mm <- array(1:18, dim=c(3,3,2))
Similar to your code, but with the more natural arrayInd function, we have
# get array in desired format
myMat <- cbind(c(mm), arrayInd(seq_along(mm), .dim=dim(mm)))
# add column names
colnames(myMat) <- c("values", letters[24:26])
which returns
myMat
values x y z
[1,] 1 1 1 1
[2,] 2 2 1 1
[3,] 3 3 1 1
[4,] 4 1 2 1
[5,] 5 2 2 1
[6,] 6 3 2 1
[7,] 7 1 3 1
[8,] 8 2 3 1
[9,] 9 3 3 1
[10,] 10 1 1 2
[11,] 11 2 1 2
[12,] 12 3 1 2
[13,] 13 1 2 2
[14,] 14 2 2 2
[15,] 15 3 2 2
[16,] 16 1 3 2
[17,] 17 2 3 2
[18,] 18 3 3 2

All possible combinations over groups

I have 5 groups: G1, G2,…,G5 with n1,n2,…,n5 elements in each group respectively. I select 2 elements from each of the 4 groups and 1 element from the 5th group. How do I generate all possible combinations in R?

(It is not specified in the question whether the groups are mutually exclusive or not; So, assume:
1. the groups are mutually exclusive
2. the subsets of groups (n1, n2, ...) will use the same elements in being filled)
3 just for the sake of argument |G1|=|G2|=|G3|=5 (The user can change the following code accordingly for differing numbers of elements in the groups)
The following is 3 set mock-up answer of the question that any user can generalize to arbitrary number of groups. So, assume group names are G1, G2, G3.
library(causfinder)
gctemplate(5,2,2) # Elements are coded as: 1,2,3,4,5; |sub-G1|=2; |sub-G2|=2; |sub-G3|=5-(2+2)=1
# In the following table, each number represents a unique element. (SOLUTION ENDED!)
My package (causfinder) is not in CRAN. Hence, I will give the function gctemplate's code below.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5 sub-G1={1,2} sub-G2={3,4} sub-G3={5}
[2,] 1 2 3 5 4
[3,] 1 2 4 5 3 sub-G1={1,2} sub-G2={4,5} sub-G3={3}
[4,] 1 3 2 4 5
[5,] 1 3 2 5 4
[6,] 1 3 4 5 2
[7,] 1 4 2 3 5
[8,] 1 4 2 5 3
[9,] 1 4 3 5 2
[10,] 1 5 2 3 4
[11,] 1 5 2 4 3
[12,] 1 5 3 4 2
[13,] 2 3 1 4 5
[14,] 2 3 1 5 4
[15,] 2 3 4 5 1
[16,] 2 4 1 3 5
[17,] 2 4 1 5 3
[18,] 2 4 3 5 1
[19,] 2 5 1 3 4
[20,] 2 5 1 4 3
[21,] 2 5 3 4 1
[22,] 3 4 1 2 5
[23,] 3 4 1 5 2
[24,] 3 4 2 5 1
[25,] 3 5 1 2 4
[26,] 3 5 1 4 2
[27,] 3 5 2 4 1
[28,] 4 5 1 2 3
[29,] 4 5 1 3 2
[30,] 4 5 2 3 1
The code of gctemplate:
gctemplate <- function(nvars, ncausers, ndependents){
independents <- combn(nvars, ncausers)
patinajnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
independentspatinajednumber <- dim(combn(nvars, ncausers))[[2]]*patinajnumber
dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ndependents)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
dependents[(patinajnumber*(i-1)+1):(patinajnumber*i),] <- t(combn(setdiff(seq(1:nvars), independents[,i]), ndependents))
}
independentspatinajed <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ncausers)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
for (j in as.integer(1:patinajnumber)){
independentspatinajed[(i-1)*patinajnumber+j,] <- independents[,i]
}}
independentsdependents <- cbind(independentspatinajed, dependents)
others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = nvars - ncausers - ndependents)
for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]])*patinajnumber))){
others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i,])
}
causalitiestemplate <- cbind(independentsdependents, others)
causalitiestemplate
}
Now, the solution for G1,G2,G3 is the above. Just generalize the above code to 5-variable case with the very same logic!

Create numbered sequence for occurrences of a given nesting variable

I'm hoping to add to a data set a variable that sequences the instances a certain grouping variable appears. For example:
ids <- c(rep(1,4),rep(2,6),rep(3,2))
I'm wanting another variable that would count the instances each id appears. Creating a vector like this:
1,2,3,4,1,2,3,4,5,6,1,2
With them combined looking something like this:
ids count
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
9 2 5
10 2 6
11 3 1
12 3 2
Any ideas? Many thanks!

I suggest ave with seq_along
ids <- c(rep(1,4),rep(2,6),rep(3,2))
count <- ave(ids,ids, FUN=seq_along)
cbind(ids, count)
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2

Or if it is ordered
cbind(ids, count=sequence(unname(table(ids))))
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2
Or
cbind(ids,within.list(rle(ids), lengths <- sequence(lengths))$lengths)
Or
library(data.table)
dt <- as.data.table(ids)
dt[,count:=seq_len(.N), by=ids]
Or
library(dplyr)
dat <- data.frame(ids)
dat %>%
group_by(ids) %>%
mutate(count=row_number())