I have a matrix made of a very long series of integers bounded between 1 and 6. I would like to create an output matrix of the same length than the original matrix and 6 columns (the maximum value in the original matrix) where 1 is repeated n time the value of the first encountered integer in the column of the integer value. (i.e if the first value is 6 it would repeat 1 x 6 times in the 6th column of the output matrix and then the value in in row 7 of the original matrix would be used for the next repeat sequence. I have shown an example in the below. Is there an efficient way to do this in R?
Original Matrix output Matrix
c1 c1 c2 c3 c4 c5 c6
R1 1 R1 1 0 0 0 0 0
R2 1 R2 1 0 0 0 0 0
R3 3 R3 0 0 1 0 0 0
R4 2 R4 0 0 1 0 0 0
R5 6 R5 0 0 1 0 0 0
R6 1 R6 1 0 0 0 0 0
R7 1 R7 1 0 0 0 0 0
R8 1 R8 1 0 0 0 0 0
R9 1 R9 1 0 0 0 0 0
R10 4 R10 0 0 0 1 0 0
R11 4 R11 0 0 0 1 0 0
R12 2 R12 0 0 0 1 0 0
R13 1 R13 0 0 0 1 0 0
R14 3 R14 0 0 1 0 0 0
R15 1 R15 0 0 1 0 0 0
A further example of the input and output matrix to make my above example clearer.
Input matrix Output matrix
c1 1 2 3 4 5 6
1 2 1 0 1 0 0 0 0
2 2 2 0 1 0 0 0 0
3 1 3 1 0 0 0 0 0
4 6 4 0 0 0 0 0 1
5 3 5 0 0 0 0 0 1
6 4 6 0 0 0 0 0 1
7 5 7 0 0 0 0 0 1
8 4 8 0 0 0 0 0 1
9 5 9 0 0 0 0 0 1
10 4 10 0 0 0 1 0 0
11 3 11 0 0 0 1 0 0
12 3 12 0 0 0 1 0 0
13 2 13 0 0 0 1 0 0
14 3 14 0 0 1 0 0 0
15 4 15 0 0 1 0 0 0
16 5 16 0 0 1 0 0 0
17 5 17 0 0 0 0 1 0
18 5 18 0 0 0 0 1 0
This is a simplistic solution but it works:
input_data <- c(1, 1, 3, 2, 6, 1, 1, 1, 1, 4, 4, 2, 1, 3, 1)
result <- matrix(0, nrow = length(input_data), ncol = 6)
counter <- 0
for (i in 1:length(input_data)){
if (counter == 0){
counter <- set_value <- input_data[i]
}
result[i, set_value] <- 1
counter <- counter - 1
}
> cbind(input_data, result)
[1,] 1 1 0 0 0 0 0
[2,] 1 1 0 0 0 0 0
[3,] 3 0 0 1 0 0 0
[4,] 2 0 0 1 0 0 0
[5,] 6 0 0 1 0 0 0
[6,] 1 1 0 0 0 0 0
[7,] 1 1 0 0 0 0 0
[8,] 1 1 0 0 0 0 0
[9,] 1 1 0 0 0 0 0
[10,] 4 0 0 0 1 0 0
[11,] 4 0 0 0 1 0 0
[12,] 2 0 0 0 1 0 0
[13,] 1 0 0 0 1 0 0
[14,] 3 0 0 1 0 0 0
[15,] 1 0 0 1 0 0 0
Related
I'd like to build a matrix that records the change from one integer value to another for a vector.
Example Vector
a <- c(NA,1,3,4,2,6,5,3,7,7,NA,3,NA,5,5,NA,2,3,1,4)
Conceptual Matrix Design
Where I would tally every time a value in the vector a changes (or doesn't change) from one integer to another.
To
1 2 3 4 5 6 7
1
2
3
From 4
5
6
7
Desired Output
Note that NA's matter. E.g., 7,NA,3 in a does not count for from 7 to 3.
To
1 2 3 4 5 6 7
1 0 0 1 1 0 0 0
2 0 0 1 0 0 1 0
3 1 0 0 1 0 0 1
From 4 0 1 0 0 0 0 0
5 0 0 1 0 1 0 0
6 0 0 0 0 1 0 0
7 0 0 0 0 0 0 1
Using table
table(dplyr::lag(a),a)
a
1 2 3 4 5 6 7
1 0 0 1 1 0 0 0
2 0 0 1 0 0 1 0
3 1 0 0 1 0 0 1
4 0 1 0 0 0 0 0
5 0 0 1 0 1 0 0
6 0 0 0 0 1 0 0
7 0 0 0 0 0 0 1
dict = sapply(2:length(a), function(i) toString(a[(i-1):i]))
unq = sort(unique(a))
+t(sapply(unq, function(x) sapply(unq, function(y) toString(c(x, y)) %in% dict)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 0 0 1 1 0 0 0
#[2,] 0 0 1 0 0 1 0
#[3,] 1 0 0 1 0 0 1
#[4,] 0 1 0 0 0 0 0
#[5,] 0 0 1 0 1 0 0
#[6,] 0 0 0 0 1 0 0
#[7,] 0 0 0 0 0 0 1
An option with tidyverse
library(tidyverse)
tibble(a, a1 = lag(a)) %>%
dplyr::count(a, a1) %>%
filter(!is.na(a), !is.na(a1)) %>%
spread(a1, n, fill = 0) %>%
column_to_rownames('a')
# 1 2 3 4 5 6 7
#1 0 0 1 0 0 0 0
#2 0 0 0 1 0 0 0
#3 1 1 0 0 1 0 0
#4 1 0 1 0 0 0 0
#5 0 0 0 0 1 1 0
#6 0 1 0 0 0 0 0
#7 0 0 1 0 0 0 1
I'd like to convert a matrix of values into a matrix of 'bits'.
I have been looking for solutions and found this, which seems to be part of a solution.
I'll try to explain what I am looking for.
I have a matrix like
> x<-matrix(1:20,5,4)
> x
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
which I would like to convert into
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
so for each value in the row a "1" in the corresponding column.
If I use
> table(sequence(length(x)),t(x))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
13 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
17 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
this is close to what I am looking for, but returns a line for each value.
I would only need to consolidate all values from one row into one row.
Because a
> table(x)
x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
gives alls values of the whole table, so what do I need to do to get the values per row.
Here is another option using table() function:
table(row(x), x)
# x
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
# 2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
# 3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
# 4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
# 5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
bit_x = matrix(0, nrow = nrow(x), ncol = max(x))
for (i in 1:nrow(x)) {bit_x[i,x[i,]] = 1}
Let
(x <- matrix(c(1, 3), 2, 2))
[,1] [,2]
[1,] 1 1
[2,] 3 3
One approach would be
M <- matrix(0, nrow(x), max(x))
M[cbind(c(row(x)), c(x))] <- 1
M
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 0 0 1
In one line:
replace(matrix(0, nrow(x), max(x)), cbind(c(row(x)), c(x)), 1).
Following your approach, and similarly to #Psidom's suggestion:
table(rep(1:nrow(x), ncol(x)), x)
# x
# 1 3
# 1 2 0
# 2 0 2
We can use the reshape2 package.
library(reshape2)
# At first we make the matrix you provided
x <- matrix(1:20, 5, 4)
# then melt it based on first column
da <- melt(x, id.var = 1)
# then cast it
dat <- dcast(da, Var1 ~ value, fill = 0, fun.aggregate = length)
which gives us this
Var1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
2 2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
3 3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
4 4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
5 5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
in the analysis I am running there are many predictor variables fro which I would like to build a model matrix. However, the model matrix requires a formula in a format such as
t<-model.matrix(f[,1]~f[,2]+f[,3]+....)
if my data frame is called f is there a quick way with paste or somethign just to write out this formula recusively? Otherwise Iw oudl need to type everything
Why not use:
f <- data.frame(z = 1:10, b= 1:10, d=factor(1:10))
model.matrix(~. , data=f[-1])
#-------------
(Intercept) b d2 d3 d4 d5 d6 d7 d8 d9 d10
1 1 1 0 0 0 0 0 0 0 0 0
2 1 2 1 0 0 0 0 0 0 0 0
3 1 3 0 1 0 0 0 0 0 0 0
4 1 4 0 0 1 0 0 0 0 0 0
5 1 5 0 0 0 1 0 0 0 0 0
6 1 6 0 0 0 0 1 0 0 0 0
7 1 7 0 0 0 0 0 1 0 0 0
8 1 8 0 0 0 0 0 0 1 0 0
9 1 9 0 0 0 0 0 0 0 1 0
10 1 10 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$d
[1] "contr.treatment"
Compare to what you get with:
> model.matrix(z~., f)
(Intercept) b d2 d3 d4 d5 d6 d7 d8 d9 d10
1 1 1 0 0 0 0 0 0 0 0 0
2 1 2 1 0 0 0 0 0 0 0 0
3 1 3 0 1 0 0 0 0 0 0 0
4 1 4 0 0 1 0 0 0 0 0 0
5 1 5 0 0 0 1 0 0 0 0 0
6 1 6 0 0 0 0 1 0 0 0 0
7 1 7 0 0 0 0 0 1 0 0 0
8 1 8 0 0 0 0 0 0 1 0 0
9 1 9 0 0 0 0 0 0 0 1 0
10 1 10 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$d
[1] "contr.treatment"
I have a large data set like this:
SUB SMOKE AMT MDV ADDL II EVID
1 0 0 0 0 0 0
1 0 20 0 16 24 1
1 0 0 0 0 0 0
1 0 0 0 0 0 0
2 1 0 0 0 0 0
2 1 50 0 24 12 1
2 1 0 0 0 0 0
2 1 0 0 0 0 0
...
I want to copy the row where EVID=1 and insert it below, but for the copied row, AMT,ADDL,II and EVID should all equal to 0, SMOKE and MDV remain the same. The expected output should look like this:
SUB SMOKE AMT MDV ADDL II EVID
1 0 0 0 0 0 0
1 0 20 0 16 24 1
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
2 1 0 0 0 0 0
2 1 50 0 24 12 1
2 1 0 0 0 0 0
2 1 0 0 0 0 0
2 1 0 0 0 0 0
...
Does anyone have idea about realizing this?
# repeat EVID=0 rows 1 time and EVID=1 rows 2 times
r <- rep(1:nrow(DF), DF$EVID + 1)
DF2 <- DF[r, ]
# insert zeros
DF2[duplicated(r), c("AMT", "ADDL", "II", "EVID")] <- 0
giving:
> DF2
SUB SMOKE AMT MDV ADDL II EVID
1 1 0 0 0 0 0 0
2 1 0 20 0 16 24 1
2.1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0
4 1 0 0 0 0 0 0
5 2 1 0 0 0 0 0
6 2 1 50 0 24 12 1
6.1 2 1 0 0 0 0 0
7 2 1 0 0 0 0 0
8 2 1 0 0 0 0 0
Maybe this:
> t2 <- t[t$EVID==1,] # t is your data.frame
> t2[c("AMT","ADDL","II","EVID")] <- 0
> t2
SUB SMOKE AMT MDV ADDL II EVID
2 1 0 0 0 0 0 0
6 2 1 0 0 0 0 0
> rbind(t,t2)
SUB SMOKE AMT MDV ADDL II EVID
1 1 0 0 0 0 0 0
2 1 0 20 0 16 24 1
3 1 0 0 0 0 0 0
4 1 0 0 0 0 0 0
5 2 1 0 0 0 0 0
6 2 1 50 0 24 12 1
7 2 1 0 0 0 0 0
8 2 1 0 0 0 0 0
21 1 0 0 0 0 0 0 # this row
61 2 1 0 0 0 0 0 # and this one are new
Say you have a matrix M1 as such:
A B C D E F G H I J
353 1 0 1 0 0 1 0 0 1 1
288 1 0 1 0 0 1 1 0 1 1
275 1 0 1 0 1 1 0 0 1 1
236 0 0 1 0 0 1 0 0 1 1
235 0 0 1 0 0 1 1 0 1 1
227 1 0 1 0 1 1 1 0 1 1
the rownames are the values (they are not random they have meaning and it is what I want as I will explain).
Say you have another matrix M2 as such:
A B C D E F G H I J AA
[1,] 0 0 0 0 0 0 0 0 0 0 0
[2,] 1 0 0 0 0 0 0 0 0 0 0
[3,] 0 1 0 0 0 0 0 0 0 0 1
[4,] 1 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 0 0 0 1
[6,] 1 0 1 0 0 0 0 0 0 0 0
Note A to J is the same number of cols, except the 2 new cols, AA
Now, I want something like:
for (i in 1:nrow(M2)){
if(M2[i,"AA"]==1){
#-1 since I M1 doesnt have the AA column
vec = M2[i,1:(ncol(M2)-1)]
#BELOW is what I am not sure of what how to implement
#get the rowname from M1 that matches vec, and replace M2[i,"AA"] = that value
}
}
The result should be 0, since in this example there are no rows of M1 matching any rows of M2[,A:J]