appending columns by column matching ID in R - r

So what I am trying to do is difficult for me to articulate but very straightforward, and I can easily show you. The title is my best guess at the verbage, edits appeciated.
set.seed(1)
theta=matrix(rnorm(6,0,1),2,3)
M = c( 0 , 0 , 0 , 0, 1 ,
1, 0 , 0 , 0 , 1,
2 , 0 , 0 , 0, 2,
0 , 1 , 0 , 0 ,2,
1 , 1 , 0 , 0, 3,
0 , 2 , 0 , 0, 3)
M = matrix(M, nrow = 6,ncol= 5,byrow=T)
theta
[,1] [,2] [,3]
[1,] 0.4418121 1.962053 2.236691
[2,] 1.0931398 1.273616 1.050373
M
prod11 prod12 prod21 prod22 d
1 0 0 0 0 1
2 1 0 0 0 1
3 2 0 0 0 2
4 0 1 0 0 2
5 1 1 0 0 3
7 0 2 0 0 3
OUTPUT DESIRED
prod11 prod12 prod21 prod22 d theta1 theta2
1 0 0 0 0 1 0.4418121 1.0931398
2 1 0 0 0 1 0.4418121 1.0931398
3 2 0 0 0 2 1.962053 1.273616
4 0 1 0 0 2 1.962053 1.273616
5 1 1 0 0 3 2.236691 1.050373
7 0 2 0 0 3 2.236691 1.050373

I would use data.table:
setDT(M)
M[, paste0("theta",1:2) := as.data.table(t(theta[, d]))]
> M
V1 V2 V3 V4 V5 theta1 theta2
1: 0 0 0 0 1 -1.2341141 0.4675928
2: 1 0 0 0 1 -1.2341141 0.4675928
3: 2 0 0 0 2 -0.6186437 1.5602801
4: 0 1 0 0 2 -0.6186437 1.5602801
5: 1 1 0 0 3 0.1233480 -0.3746259
6: 0 2 0 0 3 0.1233480 -0.3746259
We need as.data.table or as.data.frame because as.list destroys the dimensions of the matrix result and := will just unlist what comes out of t(theta[, d])
If M is really stored as a matrix (unclear since you haven't named its dimensions), I recommend you store it as a data.table (or data.frame) by using M <- data.table(M).
For completeness sake, here's a solution purely in matrix notation:
M <- cbind(M, t(theta[, M[, "d"]]))

With base R:
mat1 <- cbind(M, apply(theta, 1, function(x) x[M[, "d"]]))
colnames(mat1) <- c(colnames(M), paste0("theta", 1:nrow(theta)))
# prod11 prod12 prod21 prod22 d theta1 theta2
# [1,] 0 0 0 0 1 -0.893800723 -0.3073283
# [2,] 1 0 0 0 1 -0.893800723 -0.3073283
# [3,] 2 0 0 0 2 -0.004822422 0.9881641
# [4,] 0 1 0 0 2 -0.004822422 0.9881641
# [5,] 1 1 0 0 3 0.839750360 0.7053418
# [6,] 0 2 0 0 3 0.839750360 0.7053418
The core of the function is x[M[, "d"]]. As in Micheal's answer, we can subset one matrix by a vector in another. The vector is column "d" of M, M[, "d"]. If that column had a more randomized code we would set up a more robust lookup. But since it matches the column numbers of theta, we can use it directly.
I wrapped it with apply as it works well with matrices. The second argument 1 indicates that the function should be carried out row-wise ( equivalent to theta[1, ] and theta[2, ] and so on. If I chose 2, x would be equivalent to theta[ ,1] and so on.
To match the column names to the desired output we use colnames (a possible pitfall is to attempt names() which works with data frames).

We can use merge():
theta <- t(theta) #transpose matrix
theta <- cbind(theta,seq(1:nrow(theta))) # add column "d" with row numbers
colnames(theta) <- c("theta1","theta2","d")
merge(M,theta)
# d prod11 prod12 prod21 prod22 theta1 theta2
#1 1 0 0 0 0 0.4418121 1.093140
#2 1 1 0 0 0 0.4418121 1.093140
#3 2 2 0 0 0 1.9620530 1.273616
#4 2 0 1 0 0 1.9620530 1.273616
#5 3 1 1 0 0 2.2366910 1.050370
#6 3 0 2 0 0 2.2366910 1.050370
data
M <- c(0 , 0 , 0 , 0 , 1,
1 , 0 , 0 , 0 , 1,
2 , 0 , 0 , 0 , 2,
0 , 1 , 0 , 0 , 2,
1 , 1 , 0 , 0 , 3,
0 , 2 , 0 , 0 , 3)
M <- as.data.frame(matrix(M, nrow = 6,ncol= 5,byrow=TRUE))
colnames(M) <- c( "prod11","prod12","prod21","prod22", "d")
theta <-matrix(c(0.4418121, 1.962053, 2.236691,1.0931398, 1.273616, 1.05037), byrow=TRUE, nrow=2)

Related

How can I create dummy variables from a numeric variable in R?

How can I create dummy variables from a numeric variable in R?
I want to create N dummy variables. In such a way the numeric variable means how many zeros will come, counting from the first column. Imagine N=6. Like this:
x
a 5
b 2
c 4
d 1
e 9
It must become:
1 2 3 4 5 6
a 0 0 0 0 0 1
b 0 0 1 1 1 1
c 0 0 0 0 1 1
d 0 1 1 1 1 1
e 0 0 0 0 0 0
Thank you!
Here's a hacky solution for you
x = c(5,2,4,1,9)
N = 6
out = matrix(1, length(x), N)
for (i in 1:length(x))
out[i,1:min(x[i], N)] = 0
> out
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 1
[2,] 0 0 1 1 1 1
[3,] 0 0 0 0 1 1
[4,] 0 1 1 1 1 1
[5,] 0 0 0 0 0 0
We could do this in a vectorized manner by creating row/column index and assigning an already created matrix of 1s to 0 based on the index
m1 <- matrix(1, ncol = N, nrow = length(x),
dimnames = list(letters[seq_along(x)], seq_len(N)))
x1 <- pmin(x, ncol(m1))
m1[cbind(rep(seq_len(nrow(m1)), x1), sequence(x1))] <- 0
m1
# 1 2 3 4 5 6
#a 0 0 0 0 0 1
#b 0 0 1 1 1 1
#c 0 0 0 0 1 1
#d 0 1 1 1 1 1
#e 0 0 0 0 0 0
data
x <- c(5,2,4,1,9)
N <- 6

Replicate rows by value in column, change values to 1 or 0, in R

I have data structured as:
A B C D
3 2 1 1
I want it restructured as
A B C D
1 0 0 0
1 0 0 0
1 0 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Any thoughts on how to do this in R? Many thanks.
If the input is a data.frame, you could do the following:
coln <- seq_along(df)
m = do.call(rbind, lapply(coln, function(i) {t(replicate(df[1,i], coln == i))})) +0
This will result in a matrix like this:
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 0
#[2,] 1 0 0 0
#[3,] 1 0 0 0
#[4,] 0 1 0 0
#[5,] 0 1 0 0
#[6,] 0 0 1 0
#[7,] 0 0 0 1
You can then convert it to a data.frame or set column names if you like.
Here is an option using dcast
library(data.table)
nm1 <- rep(names(df1), unlist(df1))
dcast(data.table(nm1, v1 = seq_along(nm1)), v1 ~ nm1, length)[, v1 := NULL][]
# A B C D
#1: 1 0 0 0
#2: 1 0 0 0
#3: 1 0 0 0
#4: 0 1 0 0
#5: 0 1 0 0
#6: 0 0 1 0
#7: 0 0 0 1
Or after creating the 'nm1', use model.matrix from base R
model.matrix(~-1 + nm1)
or in a single line
model.matrix(~ -1 + rep(names(df1), unlist(df1)))
and change the column names
data
df1 <- data.frame(A = 3, B = 2, C = 1, D = 1)

How can I create this special sequence?

I would like to create the following vector sequence.
0 1 0 0 2 0 0 0 3 0 0 0 0 4
My thought was to create 0 first with rep() but not sure how to add the 1:4.
Create a diagonal matrix, take the upper triangle, and remove the first element:
d <- diag(0:4)
d[upper.tri(d, TRUE)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
If you prefer a one-liner that makes no global assignments, wrap it up in a function:
(function() { d <- diag(0:4); d[upper.tri(d, TRUE)][-1L] })()
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
And for code golf purposes, here's another variation using d from above:
d[!lower.tri(d)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
rep and rbind up to their old tricks:
rep(rbind(0,1:4),rbind(1:4,1))
#[1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
This essentially creates 2 matrices, one for the value, and one for how many times the value is repeated. rep does not care if an input is a matrix, as it will just flatten it back to a vector going down each column in order.
rbind(0,1:4)
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 0
#[2,] 1 2 3 4
rbind(1:4,1)
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 4
#[2,] 1 1 1 1
You can use rep() to create a sequence that has n + 1 of each value:
n <- 4
myseq <- rep(seq_len(n), seq_len(n) + 1)
# [1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
Then you can use diff() to find the elements you want. You need to append a 1 to the end of the diff() output, since you always want the last value.
c(diff(myseq), 1)
# [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1
Then you just need to multiply the original sequence with the diff() output.
myseq <- myseq * c(diff(myseq), 1)
myseq
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
unlist(lapply(1:4, function(i) c(rep(0,i),i)))
# the sequence
s = 1:4
# create zeros vector
vec = rep(0, sum(s+1))
# assign the sequence to the corresponding position in the zeros vector
vec[cumsum(s+1)] <- s
vec
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
Or to be more succinct, use replace:
replace(rep(0, sum(s+1)), cumsum(s+1), s)
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4

How to transform a directed Dataset into a Matrix with R

I have a Dataset in R which looks like this:
ID LinkedTo
1 Null
2 1
3 1
4 3
5 4
I want transform it into a Matrix which looks similar to this:
0 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
Another option , is to modelize your directed dataset as a directed graph and extract adjacency matrix.
library(igraph)
dat <- read.table(text='ID LinkedTo
2 1
3 1
4 3
5 4',header=TRUE)
gg <- graph.data.frame(dat)
as.matrix(get.adjacency(gg))
2 3 4 5 1
2 0 0 0 0 1
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
1 0 0 0 0 0
It's more convenient if you replace "Null" by NA in your dataset. Something like
i <- structure(list(ID = c(1, 2, 3, 4, 5),
LinkedTo = c(NA, 1, 1, 3, 4)),
.Names = c("ID", "LinkedTo"),
row.names = c(NA, -5L), class = "data.frame")
i
# ID LinkedTo
# 1 1 NA
# 2 2 1
# 3 3 1
# 4 4 3
# 5 5 4
Then you can do
m <- matrix(0, nrow(i), nrow(i))
m[i$ID + (i$LinkedTo - 1) * nrow(i)] <- 1
(It would work the same way if i was a matrix, but you would have to change i$ID and i$LinkedTo to i[, 1] and i[, 2] resp)
you can start by replacing the null with zeros, i think .
Then you can do a little for loop:
data.frame(id=1:5, pos=sample(1:5))->df
matrix(nrow=max(nrow(df)),ncol= max(df$id),data=0)->m
for (i in 1:nrow(df)){
m[i,df$pos[i]]<-1
}
Using #konvas i dataset
i[,2][is.na(i[,2])] <- 0
m <- matrix(0, nrow(i), nrow(i))
m[as.matrix(i)] <- 1
m
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 0
#[2,] 1 0 0 0 0
#[3,] 1 0 0 0 0
#[4,] 0 0 1 0 0
#[5,] 0 0 0 1 0
table should also work if you combine it with factor. (I say "should" because your conditions aren't clearly specified and your sample data are not reproducible.)
Using #konvas's "i" sample data, try:
> table(i$ID, factor(i$LinkedTo, 1:5))
1 2 3 4 5
1 0 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 0 0 1 0

Using loop to make column selections using different vectors

Let's say I have 3 vectors (strings of 10):
X <- c(1,1,0,1,0, 1,1, 0, NA,NA)
H <- c(0,0,1,0,NA,1,NA,1, 1, 1 )
I <- c(0,0,0,0,0, 1,NA,NA,NA,1 )
Data.frame Y contains 10 columns and 6 rows:
1 2 3 4 5 6 7 8 9 10
0 1 0 0 1 1 1 0 1 0
1 1 1 0 1 0 1 0 0 0
0 0 0 0 1 0 0 1 0 1
1 0 1 1 0 1 1 1 0 0
0 0 0 0 0 0 1 0 0 0
1 1 0 1 0 0 0 0 1 1
I'd like to use vector X, H en I to make column selections in data.frame Y, using "1's" and "0's" in the vector as selection criterium .
So the results for vector X using the '1' as selection criterium should be:
X <- c(1,1,0,1,0, 1,1, 0, NA,NA)
1 2 4 6 7
0 1 0 1 1
1 1 0 0 1
0 0 0 0 0
1 0 1 1 1
0 0 0 0 1
1 1 1 0 0
For vector H using the '1' as selection criterium:
H <- c(0,0,1,0,NA,1,NA,1, 1, 1 )
3 6 8 9 10
0 1 0 1 0
1 0 0 0 0
0 0 1 0 1
1 1 1 0 0
0 0 0 0 0
0 0 0 1 1
For vector I using the '1' as selection criterium:
I <- c(0,0,0,0,0, 1,NA,NA,NA,1 )
6 10
1 0
0 0
0 1
1 0
0 0
0 1
For convenience and speed I'd like to use a loop. It might be something like this:
all.ones <- lapply[,function(x) x %in% 1]
In the outcome (all.ones), the result for each vector should stay separate. For example:
X 1,2,4,6,7
H 3,6,8,9,10
I 6,10
The standard way of doing this is using the %in% operator:
Y[, X %in% 1]
To do this for multiple vectors (assuming you want an AND operation):
mylist = list(X, H, I, D, E, K)
Y[, Reduce(`&`, lapply(mylist, function(x) x %in% 1))]
The problem is the NA, use which to get round it. Consider the following:
x <- c(1,0,1,NA)
x[x==1]
[1] 1 1 NA
x[which(x==1)]
[1] 1 1
How about this?
idx <- which(X==1)
Y[,idx]
EDIT: For six vectors, do
idx <- which(X==1 & H==1 & I==1 & D==1 & E==1 & K==1)
Y[,idx]
Replace & with | if you want all columns of Y where at least one of the lists has a 1.

Resources