Create 'usable' bins from a vector in R - r

I have a numeric vector with integers which:
I want to transform into "bins".
I want these bins to be used as sample frames from which I can then sample again, uniformly.
So far I can do both using findInterval but I am looking for a way to do it with cut.
Let's consider a random vector with integers which will be split in equally sized intervals of length 2:
df = sample(1:100,10)
df
[1] 81 11 38 95 45 14 10 61 96 88
Using findInterval I get the bins and a approximate way for sampling:
breaks = seq(1,max(df+1),by=10)
b <- findInterval(df, breaks)
b
[1] 9 2 4 10 5 2 1 7 10 9
# If b is equal to 1 or 100, then use ifelse() to prevent leaking outside [1,100]
sam <- round(runif(10,ifelse(b==1,10*b-9,10*b-10),ifelse(b==10,10*b,10*b+10)))
sam
[1] 85 14 39 94 50 16 7 63 93 85
Using cut I get the intervals:
breaks = seq(1,max(df+1),by=10)
cut(df,breaks,right=TRUE)
[1] (71,81] (1,11] (31,41] <NA> (41,51] (11,21] (1,11] (51,61] <NA> (81,91] Levels: (1,11] (11,21] (21,31] (31,41] (41,51] (51,61] (61,71] (71,81] (81,91]
But I don't know how to use those values as intervals from which to sample.
If there is another approach, I would be interested to know!

Good Question! I will give you a completely different approach.
So basically you want to perform Latin Hypercube sampling, i.e. stratified uniform sampling in the interval [0,100] with each bin of 10.
For this, it would be easier to download lhs package and use randomLHS function to perform stratified sampling.
First step: Generate uniform draws from every 10 quartiles (strata) as many times as you want. In this example, let's do 5 times:
library(lhs)
randomLHS(10, 5)
> X
[,1] [,2] [,3] [,4] [,5]
[1,] 0.92154144 0.22185959 0.49953326 0.66248165 0.79035832
[2,] 0.47571700 0.05894016 0.55883326 0.34875162 0.98831829
[3,] 0.57738486 0.64525528 0.04955733 0.50939147 0.46297294
[4,] 0.17578838 0.83843074 0.27138703 0.87421301 0.16401042
[5,] 0.03850768 0.40746004 0.69518073 0.23487653 0.55537945
[6,] 0.83942905 0.52957416 0.84952231 0.14031915 0.84956654
[7,] 0.22802502 0.79911728 0.76789194 0.09788194 0.08667802
[8,] 0.61821268 0.93088726 0.30789950 0.95831993 0.36903120
[9,] 0.70391230 0.11445154 0.97976851 0.42027836 0.61097786
[10,] 0.31385709 0.33557430 0.18389684 0.70124986 0.27601550
Second step: Although the output of X is stratified, the columns are still unsorted. Therefore, when we show the final stratified draws, we sort them.
Y <- apply(X,2, function(x) sort(round(x*100)))
> Y
[,1] [,2] [,3] [,4] [,5]
[1,] 4 6 5 10 9
[2,] 18 11 18 14 16
[3,] 23 22 27 23 28
[4,] 31 34 31 35 37
[5,] 48 41 50 42 46
[6,] 58 53 56 51 56
[7,] 62 65 70 66 61
[8,] 70 80 77 70 79
[9,] 84 84 85 87 85
[10,] 92 93 98 96 99
NB: I have done rounding only for convenience to make it obvious but no need to call round function if you are happy to have non-integer draws as output).

Related

Moving rows between subarrays

I have a number of subarrays, say 2 (for simplicity), each with the same number of rows and columns. Each spot in the subarrays is occupied by a number in [1, 10].
What I would like to do is move rows randomly between subarrays according to some rate of movement m = [0, 1]. m = 0 corresponds to no movement, while m = 1 means that any rows across all subarrays can be moved.
I take inspiration from:
How to swap a number of the values between 2 rows in R
but my problem is a bit different than this. I do know that sample() would be needed here.
Is there an easy way to go about accomplishing this?
This doesn't do it, but I believe I'm on the right track anyway.
m <- 0.2
a <- array(dim = c(5, 5, 2)) # 5 rows, 5 columns, 2 subarrays
res <- rep(sample(nrow(a), size = ceiling(nrow(a)*m), replace = FALSE)) # sample 20% of rows from array a.
Any assistance is appreciated.
It is significantly easier if you can use a matrix (2-dim array).
set.seed(2)
m <- 0.2
d <- c(10, 4)
a <- array(sample(prod(d)), dim = d)
a
# [,1] [,2] [,3] [,4]
# [1,] 8 17 14 1
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 7 35 3 32
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
(I'm going to set the seed again to something that conveniently gives me something "interesting" to show.)
set.seed(2)
ind <- which(runif(d[1]) < m)
ind
# [1] 1 4 7
The first randomness, runif, is compared against m and generates the indices that may change. The second randomness, sample below, takes those indices and possibly reorders them. (In this case, it reorders "1,4,7" to "4,1,7", meaning the third of the rows-that-may-change will be left unchanged.)
a[ind,] <- a[sample(ind),]
a
# [,1] [,2] [,3] [,4]
# [1,] 7 35 3 32 # <-- row 4
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 8 17 14 1 # <-- row 1
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13 # <-- row 7, unchanged
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
Note that this is probabilistic, which means a probability of 0.2 does not guarantee you 20% (or even any) of the rows will be swapped.
(Since I'm guessing you'd really like to preserve your 3-dim (or even n-dim) array, you might be able to use aperm to transfer between array <--> matrix.)
EDIT 1
As an alternative to a probabilitic use of runif, you can use:
ind <- head(sample(d[1]),size=d[1]*m)
to get closer to your goal of "20%". Since d[1]*m will often not be an integer, head silently truncates/floors the number, so you'll get the price-is-right winner: closest to but not over your desired percentage.
EDIT 2
A reversible method for transforming an n-dimensional array into a matrix and back again. Caveat: though the logic appears solid, my testing has only included a couple arrays.
array2matrix <- function(a) {
d <- dim(a)
ind <- seq_along(d)
a2 <- aperm(a, c(ind[2], ind[-2]))
dim(a2) <- c(d[2], prod(d[-2]))
a2 <- t(a2)
attr(a2, "origdim") <- d
a2
}
The reversal uses the "origdim" attribute if still present; this will work as long as your modifications to the matrix do not clear its attributes. (Simple row-swapping does not.)
matrix2array <- function(m, d = attr(m, "origdim")) {
ind <- seq_along(d)
m2 <- t(m)
dim(m2) <- c(d[2], d[-2])
aperm(m2, c(ind[2], ind[-2]))
}
(These two functions should probably do some more error-checks, such as is.null(d).)
A sample run:
set.seed(2)
dims <- 5:2
a <- array(sample(prod(dims)), dim=dims)
Quick show:
a[,,1,1:2,drop=FALSE]
# , , 1, 1
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# , , 1, 2
# [,1] [,2] [,3] [,4]
# [1,] 118 25 14 93
# [2,] 65 21 16 77
# [3,] 87 82 3 38
# [4,] 92 12 78 17
# [5,] 49 4 75 80
The transformation:
m <- array2matrix(a)
dim(m)
# [1] 30 4
head(m)
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# [6,] 67 47 1 54
Proof of reversability:
identical(matrix2array(m), a)
# [1] TRUE
EDIT 3, "WRAP UP of all code"
Creating fake data:
dims <- c(5,4,2)
(a <- array(seq(prod(dims)), dim=dims))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 25 30 35 40
(m <- array2matrix(a))
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 23 28 33 38
# [9,] 24 29 34 39
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
The random-swapping of rows. I'm using 50% here.
pct <- 0.5
nr <- nrow(m)
set.seed(3)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 8 4 3 9
(ind2 <- sample(ind1))
# [1] 3 2 9 8 4
m[ind1,] <- m[ind2,]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 2 7 12 17
# [9,] 4 9 14 19
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
(Note that I pre-made ind1 and ind2 here, mostly to see what was going on internally. You can replace m[ind2,] with m[sample(ind1),] for the same effect.)
BTW: if we had instead used a seed of 2, we would notice that 2 rows are not swapped:
set.seed(2)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 7 5 10 6
(ind2 <- sample(ind1))
# [1] 6 2 5 10 7
Because of this, I chose a seed of 3 for demonstration. However, this may give the appearance of things not working. Lacking more controlling code, sample does not ensure that positions change: it is certainly reasonable to expect that "randomly swap rows" could randomly choose to move row 2 to row 2. Take for example:
set.seed(267)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 3 6 5 7 2
(ind2 <- sample(ind1))
# [1] 3 6 5 7 2
The first randomly chooses five rows, and then reorders them randomly into an unchanged order. (I suggest that if you want to force that they are all movements, you should ask a new question asking about just forcing a sample vector to change.)
Anyway, we can regain the original dimensionality with the second function:
(a2 <- matrix2array(m))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 2 7 12 17
# [4,] 4 9 14 19
# [5,] 25 30 35 40
In the first plane of the array, rows 1 and 5 are unchanged; in the second plane, rows 1, 2, and 5 are unchanged. Five rows the same, five rows moved around (but otherwise unchanged within each row).

Extract Consecutive Pairs of Elements from a Vector and Place in a Matrix

This may be a simple question but I can not find how to produce pairs of values from a vector sequentially which each pair includes last value and new value in a matrix of values with two columns. Example below
C<-c(1 , 20 , 44 , 62 , 64 , 89 , 91, 100)
matrix example
newpairs
[,1] [,2]
[1,] 1 20
[2,] 20 44
[3,] 44 64
[4,] 64 89
[5,] 89 91
[6,] 91 100
So when I try the matrix it does not work as last element is not repated with the new element
newpairs <- matrix(C, ncol=2, byrow=TRUE)
newpairs
[,1] [,2]
[1,] 1 20
[2,] 44 62
[3,] 64 89
[4,] 91 100
I guess you can subset but if C values change then you have to change the drop or keep of subset. I also have tried on functions that extract certain increments or that can extract every nth elemen. However I would like to find a systematic way to create the first example matrix.
Any help is welcomed
This fits your desired output:
cbind(C[-length(C)], C[-1])
[,1] [,2]
[1,] 1 20
[2,] 20 44
[3,] 44 62
[4,] 62 64
[5,] 64 89
[6,] 89 91
[7,] 91 100
How about:
## define input
C <- c(1 , 20 , 44 , 62 , 64 , 89 , 91, 100)
## replicate all but first and last elements
Crep <- rep(C,c(1,rep(2,length(C)-2),1))
## create matrix
matrix(Crep,ncol=2,byrow=TRUE)

How to combine 2 columns of the same matrix and arrange the values in alternating rows?

Probably, it will be an easy one, just can't get my head around it today.
How can I combine 2 columns of the same matrix in such a way that element 1 from column 1 of the original matrix will be followed by element 1 from column 2 and so on? E.g. the original matrix may look like the one below:
set.seed(200)
m <- matrix(sample(1:100, 10, replace=FALSE), ncol=2, byrow=TRUE, dimnames=NULL)
m
[,1] [,2]
[1,] 54 58
[2,] 99 68
[3,] 65 80
[4,] 67 9
[5,] 49 22
What I would like to achieve should look like this:
[,1]
[1,] 54
[2,] 58
[3,] 99
[4,] 68
[5,] 65
[6,] 80
[7,] 67
[8,] 9
[9,] 49
[10,] 22
How do I then transform the original matrix to achieve the arrangement shown in the second matrix? Of course it's only an example, not a real data. Thanks for your help.
You can use c or as.vector on the transpose (t) of your matrix, like this:
c(t(m))
# [1] 54 58 99 68 65 80 67 9 49 22
Wrap it again in matrix if you want a single column matrix like you show (or, as noted in the comments, you can skip the c or as.vector at this stage since you're not supplying any dimensions to the matrix you are creating).
matrix(c(t(m)))
# [,1]
# [1,] 54
# [2,] 58
# [3,] 99
# [4,] 68
# [5,] 65
# [6,] 80
# [7,] 67
# [8,] 9
# [9,] 49
# [10,] 22

Exclude specific columns from a matrix

I have a list of numbers (example bellow):
[[178]]
NULL
[[179]]
[1] 179 66
[[180]]
[1] 180 67
[[181]]
[1] 181 123
[[182]]
[1] 182
This list contains columns (179, 66, 180, 67, 181, 123) I want to exclude from a matrix.
I tried commands bellow, but they didn't work:
MyMatrix[, !(unlist(MyList))]
MyMatrix[, -(unlist(MyList))]
MyMatrix[, !unlist(MyList)]
MyMatrix[, -unlist(MyList)]
My question: what is a right way to exclude specific columns from a matrix?
Here's my small replication of your problem.
listOfColumns<-list(NULL, c(2,3), 5, NULL)
listOfColumns #print for viewing
#output
#[[1]]
#NULL
#[[2]]
#[1] 2 3
#[[3]]
#[1] 5
#[[4]]
#NULL
MyMatrix<-matrix(1:50, nrow=10, ncol=5)
MyMatrix #print for viewing
#output
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 11 21 31 41
#[2,] 2 12 22 32 42
#[3,] 3 13 23 33 43
#[4,] 4 14 24 34 44
#[5,] 5 15 25 35 45
#[6,] 6 16 26 36 46
#[7,] 7 17 27 37 47
#[8,] 8 18 28 38 48
#[9,] 9 19 29 39 49
#[10,] 10 20 30 40 50
First, the way you're going to want to subset your matrix so that you omit the given column numbers is to do
MyMatrix[-columnNumbers]
In R, negative numbers used to subset correspond to entries that should be omitted.
The following call output's what you want
MyMatrix[,-unlist(listOfNumbers)]
#output
# [,1] [,2]
# [1,] 1 31
# [2,] 2 32
# [3,] 3 33
# [4,] 4 34
# [5,] 5 35
# [6,] 6 36
# [7,] 7 37
# [8,] 8 38
# [9,] 9 39
# [10,] 10 40
If you want to keep this result for later use, you'll need to store it (As David Robinson got at)
MySmallerMatrix<-MyMatrix[,-unlist(listOfNumbers)]

How to turn a vector into a matrix in R?

I have a vector with 49 numeric values. I want to have a 7x7 numeric matrix instead.
Is there some sort of convenient automatic conversion statement I can use, or do I have to do 7 separate column assignments of the correct vector subsets to a new matrix? I hope that there is something like the oposite of c(myMatrix), with the option of giving the number of rows and/or columns I want to have, of course.
Just use matrix:
matrix(vec,nrow = 7,ncol = 7)
One advantage of using matrix rather than simply altering the dimension attribute as Gavin points out, is that you can specify whether the matrix is filled by row or column using the byrow argument in matrix.
A matrix is really just a vector with a dim attribute (for the dimensions). So you can add dimensions to vec using the dim() function and vec will then be a matrix:
vec <- 1:49
dim(vec) <- c(7, 7) ## (rows, cols)
vec
> vec <- 1:49
> dim(vec) <- c(7, 7) ## (rows, cols)
> vec
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 8 15 22 29 36 43
[2,] 2 9 16 23 30 37 44
[3,] 3 10 17 24 31 38 45
[4,] 4 11 18 25 32 39 46
[5,] 5 12 19 26 33 40 47
[6,] 6 13 20 27 34 41 48
[7,] 7 14 21 28 35 42 49

Resources