Make a probability with specific format in R - r

I have data like this:
x = c(1,2,3)
prob = c(0.13,0.13,0.74)
# Total sample size
n = 70
result = rep(x, round(n * prob))
Final<-replicate(1, sample(result))
I want to make a matrix[7,10] that have the probability of (0.14,0.14,0.72) for (1,2,3). In this matrix, I need to have in every seven values 1 and 2 repeat 1, and 3 repeats 5 times like this :
3 3 3 1 2 3 3
3 3 3 3 2 1 3
3 2 1 3 3 3 3
3 3 3 1 3 3 2
2 1 3 3 3 3 3
2 1 3 3 3 3 3
3 3 3 2 1 3 3
3 3 3 3 2 3 1
3 2 1 3 3 3 3
So, I will get just one 1, and one 2 in each raw. Could you please help me how to write the code?

One way is to populate a matrix with 3's, then assign 1 and 2 randomly to a column position for each row.
set.seed(1)
m <- matrix(rep(3, 7*10), ncol = 7)
pos <- replicate(10, sample(1:7, 2))
for (i in 1:nrow(m)) m[i, pos[,i]] <- 1:2
m
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 3 3 2 3 3 3
#> [2,] 2 3 3 3 3 3 1
#> [3,] 3 1 3 3 2 3 3
#> [4,] 3 3 2 3 3 3 1
#> [5,] 3 2 3 3 3 1 3
#> [6,] 3 3 1 3 3 3 2
#> [7,] 1 3 3 3 2 3 3
#> [8,] 3 2 3 3 1 3 3
#> [9,] 3 3 3 3 3 1 2
#> [10,] 2 1 3 3 3 3 3
Created on 2022-04-15 by the reprex package (v2.0.1)

Related

What is the best way to tidy a matrix in R

Is there a best practice means of "tidying" a matrix/array? By "tidy" in this context I mean
one row per element of the matrix
one column per dimension. the elements of these columns give you the "coordinates" of the matrix element which is stored on that row
I have an example here for a 2d matrix, but ideally this would work with an array also (This example works for mm <- array(1:18, c(3,3,3)), but I thought that would be too much to paste in here)
mm <- matrix(1:9, nrow = 3)
mm
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
inds <- which(mm > -Inf, arr.ind = TRUE)
cbind(inds, value = mm[inds])
#> row col value
#> [1,] 1 1 1
#> [2,] 2 1 2
#> [3,] 3 1 3
#> [4,] 1 2 4
#> [5,] 2 2 5
#> [6,] 3 2 6
#> [7,] 1 3 7
#> [8,] 2 3 8
#> [9,] 3 3 9
as.data.frame.table One way to convert from wide to long is the following. See ?as.data.frame.table for more information. No packages are used.
mm <- matrix(1:9, 3)
long <- as.data.frame.table(mm)
The code gives this data.frame:
> long
Var1 Var2 Freq
1 A A 1
2 B A 2
3 C A 3
4 A B 4
5 B B 5
6 C B 6
7 A C 7
8 B C 8
9 C C 9
numbers
If you prefer row and column numbers:
long[1:2] <- lapply(long[1:2], as.numeric)
giving:
> long
Var1 Var2 Freq
1 1 1 1
2 2 1 2
3 3 1 3
4 1 2 4
5 2 2 5
6 3 2 6
7 1 3 7
8 2 3 8
9 3 3 9
names Note that above it used A, B, C, ... because there were no row or column names. They would have been used if present. That is, had there been row and column names and dimension names the output would look like this:
mm2 <- array(1:9, c(3, 3), dimnames = list(A = c("a", "b", "c"), B = c("x", "y", "z")))
as.data.frame.table(mm2, responseName = "Val")
giving:
A B Val
1 a x 1
2 b x 2
3 c x 3
4 a y 4
5 b y 5
6 c y 6
7 a z 7
8 b z 8
9 c z 9
3d
Here is a 3d example:
as.data.frame.table(array(1:8, c(2,2,2)))
giving:
Var1 Var2 Var3 Freq
1 A A A 1
2 B A A 2
3 A B A 3
4 B B A 4
5 A A B 5
6 B A B 6
7 A B B 7
8 B B B 8
2d only For 2d one can alternately use row and col:
sapply(list(row(mm), col(mm), mm), c)
or
cbind(c(row(mm)), c(col(mm)), c(mm))
Either of these give this matrix:
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 1 2 4
[5,] 2 2 5
[6,] 3 2 6
[7,] 1 3 7
[8,] 2 3 8
[9,] 3 3 9
Another method is to use arrayInd together with cbind like this.
# a 3 X 3 X 2 array
mm <- array(1:18, dim=c(3,3,2))
Similar to your code, but with the more natural arrayInd function, we have
# get array in desired format
myMat <- cbind(c(mm), arrayInd(seq_along(mm), .dim=dim(mm)))
# add column names
colnames(myMat) <- c("values", letters[24:26])
which returns
myMat
values x y z
[1,] 1 1 1 1
[2,] 2 2 1 1
[3,] 3 3 1 1
[4,] 4 1 2 1
[5,] 5 2 2 1
[6,] 6 3 2 1
[7,] 7 1 3 1
[8,] 8 2 3 1
[9,] 9 3 3 1
[10,] 10 1 1 2
[11,] 11 2 1 2
[12,] 12 3 1 2
[13,] 13 1 2 2
[14,] 14 2 2 2
[15,] 15 3 2 2
[16,] 16 1 3 2
[17,] 17 2 3 2
[18,] 18 3 3 2

How do I generate all the vectors by swapping two positions?

Suppose I have a column vector of [1 1 1 2 2 2 3 3 3] and I want to generate all the different column vectors only by switching two positions. For an example, one such vector would be
[1 1 3 2 2 2 1 3 3].
Try this (it gives you a data frame each row of which is a unique vector with 2 elements swapped from the original vector, there are 28 such unique vectors, including the original one):
v <- c(1,1,1,2,2,2,3,3,3)
unique(t(apply(t(combn(1:length(v), 2)), 1, function(x) {v[x] <- v[rev(x)]; v})))
with output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 1 1 2 2 2 3 3 3 # original one
[2,] 2 1 1 1 2 2 3 3 3 # swap 1st & 4th elements
[3,] 2 1 1 2 1 2 3 3 3 # swap 1st & 5th
[4,] 2 1 1 2 2 1 3 3 3 # ...
[5,] 3 1 1 2 2 2 1 3 3
[6,] 3 1 1 2 2 2 3 1 3
[7,] 3 1 1 2 2 2 3 3 1
[8,] 1 2 1 1 2 2 3 3 3
[9,] 1 2 1 2 1 2 3 3 3
[10,] 1 2 1 2 2 1 3 3 3
[11,] 1 3 1 2 2 2 1 3 3
[12,] 1 3 1 2 2 2 3 1 3
[13,] 1 3 1 2 2 2 3 3 1
[14,] 1 1 2 1 2 2 3 3 3
[15,] 1 1 2 2 1 2 3 3 3
[16,] 1 1 2 2 2 1 3 3 3
[17,] 1 1 3 2 2 2 1 3 3
[18,] 1 1 3 2 2 2 3 1 3
[19,] 1 1 3 2 2 2 3 3 1
[20,] 1 1 1 3 2 2 2 3 3
[21,] 1 1 1 3 2 2 3 2 3
[22,] 1 1 1 3 2 2 3 3 2
[23,] 1 1 1 2 3 2 2 3 3
[24,] 1 1 1 2 3 2 3 2 3
[25,] 1 1 1 2 3 2 3 3 2
[26,] 1 1 1 2 2 3 2 3 3
[27,] 1 1 1 2 2 3 3 2 3
[28,] 1 1 1 2 2 3 3 3 2 # swap 6th & 9th
First, we can use the utils function combn to generate all of the possible combinations of pairs of positions to swap. Here, I am assuming you don't want to swap the same number (e.g. 1 and 1), so am checking them to make sure they are different values:
allCombo <-
combn(1:length(startVec), 2)
toKeep <- apply(allCombo, 2, function(x) {
startVec[x[1]] != startVec[x[2]]
})
Then, apply along those that you are keeping, and swap the positions.
outVecs <- apply(allCombo[ , toKeep], 2, function(x){
temp <- startVec
temp[x] <- startVec[rev(x)]
return(temp)
})
This returns as a vector, but you can convert it to a list, which may be easier to manage, like so:
outVecsInList <-
as.list(as.data.frame(outVecs))
head(outVecsInList) shows:
$V1
[1] 2 1 1 1 2 2 3 3 3
$V2
[1] 2 1 1 2 1 2 3 3 3
$V3
[1] 2 1 1 2 2 1 3 3 3
$V4
[1] 3 1 1 2 2 2 1 3 3
$V5
[1] 3 1 1 2 2 2 3 1 3
$V6
[1] 3 1 1 2 2 2 3 3 1

Creating a matrix from multiple column vectors

How can I create a matrix from multiple column vectors?
I know that I can easily create a data frame with column vectors:
> colA <- 1:5
> colB <- 21:25
> colC <- 31:35
> data.frame(colA, colB, colC)
colA colB colC
1 1 21 31
2 2 22 32
3 3 23 33
4 4 24 34
5 5 25 35
However, when I try matrix(), it gives me unexpected results, as shown below. How can create my desired matrix? I know I can do as.matrix(df), which nicely preserves the column names, but I'm looking for a more direct approach.
> matrix(colA, colB, colC)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[2,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[3,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[4,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[5,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[6,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[7,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[8,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[9,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[10,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[11,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[12,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[13,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[14,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[15,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[16,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[17,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[18,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[19,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[20,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[21,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
[1,] 4 5 1 2 3 4 5 1 2 3 4 5
[2,] 5 1 2 3 4 5 1 2 3 4 5 1
[3,] 1 2 3 4 5 1 2 3 4 5 1 2
[4,] 2 3 4 5 1 2 3 4 5 1 2 3
[5,] 3 4 5 1 2 3 4 5 1 2 3 4
[6,] 4 5 1 2 3 4 5 1 2 3 4 5
[7,] 5 1 2 3 4 5 1 2 3 4 5 1
[8,] 1 2 3 4 5 1 2 3 4 5 1 2
[9,] 2 3 4 5 1 2 3 4 5 1 2 3
[10,] 3 4 5 1 2 3 4 5 1 2 3 4
[11,] 4 5 1 2 3 4 5 1 2 3 4 5
[12,] 5 1 2 3 4 5 1 2 3 4 5 1
[13,] 1 2 3 4 5 1 2 3 4 5 1 2
[14,] 2 3 4 5 1 2 3 4 5 1 2 3
[15,] 3 4 5 1 2 3 4 5 1 2 3 4
[16,] 4 5 1 2 3 4 5 1 2 3 4 5
[17,] 5 1 2 3 4 5 1 2 3 4 5 1
[18,] 1 2 3 4 5 1 2 3 4 5 1 2
[19,] 2 3 4 5 1 2 3 4 5 1 2 3
[20,] 3 4 5 1 2 3 4 5 1 2 3 4
[21,] 4 5 1 2 3 4 5 1 2 3 4 5
[,26] [,27] [,28] [,29] [,30] [,31]
[1,] 1 2 3 4 5 1
[2,] 2 3 4 5 1 2
[3,] 3 4 5 1 2 3
[4,] 4 5 1 2 3 4
[5,] 5 1 2 3 4 5
[6,] 1 2 3 4 5 1
[7,] 2 3 4 5 1 2
[8,] 3 4 5 1 2 3
[9,] 4 5 1 2 3 4
[10,] 5 1 2 3 4 5
[11,] 1 2 3 4 5 1
[12,] 2 3 4 5 1 2
[13,] 3 4 5 1 2 3
[14,] 4 5 1 2 3 4
[15,] 5 1 2 3 4 5
[16,] 1 2 3 4 5 1
[17,] 2 3 4 5 1 2
[18,] 3 4 5 1 2 3
[19,] 4 5 1 2 3 4
[20,] 5 1 2 3 4 5
[21,] 1 2 3 4 5 1
Warning message:
In matrix(colA, colB, colC) :
data length [5] is not a sub-multiple or multiple of the number of rows [21]
You can use cbind to produce the desired matrix:
mat <- cbind(colA, colB, colC)
mat
# colA colB colC
# [1,] 1 21 31
# [2,] 2 22 32
# [3,] 3 23 33
# [4,] 4 24 34
# [5,] 5 25 35
class(mat)
# [1] "matrix"
You don't get the matrix you're expecting with the call of matrix(colA, colB, colC), because your arguments are getting interpreted as the first, second, and third arguments to the matrix function (aka data, nrow, and ncol). If you wanted to use the matrix function, you would need to provide your data as a single argument, with something like mat <- matrix(c(colA, colB, colC), ncol=3). If you used this syntax, you would not get the column names from the variables like we did with cbind.

find unique elements in matrix based on a subset of columns

I have a table which i want to transform
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 4 9 1 2
[5,] 2 3 5 1 2
[6,] 2 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
I want to filter the data in a way that rows which just differ by there number in the first column are removed (not completely but only the duplicate). So for rows 1 and 4 only row 1 should remain in the table. Or for row 3 and 9 only row 9 should remain. It is important that the information in the first column is remained and that the earliest occurance of the row remaisn in the table not the other incidences.
You can use duplicated:
mat[!duplicated(as.data.frame(mat[, -1])), ]
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
where mat is the name of your matrix.
Try using duplicated function:
mymx <- matrix(c(1,4,9,1,2 ,1,3,5,1,2 ,1,1,6,1,2 ,2,4,9,1,2 ,2,3,5,1,2 ,2,1,6,1,2 ,2,7,2,2,2 ,3,3,5,3,4 ,3,1,6,3,4 ,3,7,2,3,5 ,3,4,9,3,5), ncol=5, byrow=T)
mymx[!duplicated(mymx[,-1]),]
> mymx[!duplicated(mymx[,-1]),]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 7 2 2 2
[5,] 3 3 5 3 4
[6,] 3 1 6 3 4
[7,] 3 7 2 3 5
[8,] 3 4 9 3 5

remove a value and its corresponding value in a table

In a file (1000 columns, 2000 rows), for each column there is another column next to it. something like:
[,1] [,2] [,3] [,4] [,5] [,6]
3 3 4 4 4 6
6 5 2 2 5 1
9 1 3 5 4 1
2 5 6 4 8 5
6 1 5 2 3 1
I want to remove those values which their corresponding value is 1
the result:
[,1] [,3] [,5]
3 4 4
6 2 8
2 3
6
5
To echo what #shellter said, it's both helpful and polite to include what you've tried in the question.
Here's a compact way to accomplish this using split and mapply.
d <- read.table(text='3 3 4 4 4 6
6 5 2 2 5 1
9 1 3 5 4 1
2 5 6 4 8 5
6 1 5 2 3 1', header=FALSE)
cols <- split(as.list(d), rep(1:2, length.out=length(d)))
mapply(function(col1, col2) col1[col2 != 1],
cols[[1]], cols[[2]], SIMPLIFY=FALSE)
# $V1
# [1] 3 6 2
#
# $V3
# [1] 4 2 3 6 5
#
# $V5
# [1] 4 8

Resources