How to transform a directed Dataset into a Matrix with R - r

I have a Dataset in R which looks like this:
ID LinkedTo
1 Null
2 1
3 1
4 3
5 4
I want transform it into a Matrix which looks similar to this:
0 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0

Another option , is to modelize your directed dataset as a directed graph and extract adjacency matrix.
library(igraph)
dat <- read.table(text='ID LinkedTo
2 1
3 1
4 3
5 4',header=TRUE)
gg <- graph.data.frame(dat)
as.matrix(get.adjacency(gg))
2 3 4 5 1
2 0 0 0 0 1
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
1 0 0 0 0 0

It's more convenient if you replace "Null" by NA in your dataset. Something like
i <- structure(list(ID = c(1, 2, 3, 4, 5),
LinkedTo = c(NA, 1, 1, 3, 4)),
.Names = c("ID", "LinkedTo"),
row.names = c(NA, -5L), class = "data.frame")
i
# ID LinkedTo
# 1 1 NA
# 2 2 1
# 3 3 1
# 4 4 3
# 5 5 4
Then you can do
m <- matrix(0, nrow(i), nrow(i))
m[i$ID + (i$LinkedTo - 1) * nrow(i)] <- 1
(It would work the same way if i was a matrix, but you would have to change i$ID and i$LinkedTo to i[, 1] and i[, 2] resp)

you can start by replacing the null with zeros, i think .
Then you can do a little for loop:
data.frame(id=1:5, pos=sample(1:5))->df
matrix(nrow=max(nrow(df)),ncol= max(df$id),data=0)->m
for (i in 1:nrow(df)){
m[i,df$pos[i]]<-1
}

Using #konvas i dataset
i[,2][is.na(i[,2])] <- 0
m <- matrix(0, nrow(i), nrow(i))
m[as.matrix(i)] <- 1
m
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 0
#[2,] 1 0 0 0 0
#[3,] 1 0 0 0 0
#[4,] 0 0 1 0 0
#[5,] 0 0 0 1 0

table should also work if you combine it with factor. (I say "should" because your conditions aren't clearly specified and your sample data are not reproducible.)
Using #konvas's "i" sample data, try:
> table(i$ID, factor(i$LinkedTo, 1:5))
1 2 3 4 5
1 0 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 0 0 1 0

Related

How can I create dummy variables from a numeric variable in R?

How can I create dummy variables from a numeric variable in R?
I want to create N dummy variables. In such a way the numeric variable means how many zeros will come, counting from the first column. Imagine N=6. Like this:
x
a 5
b 2
c 4
d 1
e 9
It must become:
1 2 3 4 5 6
a 0 0 0 0 0 1
b 0 0 1 1 1 1
c 0 0 0 0 1 1
d 0 1 1 1 1 1
e 0 0 0 0 0 0
Thank you!
Here's a hacky solution for you
x = c(5,2,4,1,9)
N = 6
out = matrix(1, length(x), N)
for (i in 1:length(x))
out[i,1:min(x[i], N)] = 0
> out
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 1
[2,] 0 0 1 1 1 1
[3,] 0 0 0 0 1 1
[4,] 0 1 1 1 1 1
[5,] 0 0 0 0 0 0
We could do this in a vectorized manner by creating row/column index and assigning an already created matrix of 1s to 0 based on the index
m1 <- matrix(1, ncol = N, nrow = length(x),
dimnames = list(letters[seq_along(x)], seq_len(N)))
x1 <- pmin(x, ncol(m1))
m1[cbind(rep(seq_len(nrow(m1)), x1), sequence(x1))] <- 0
m1
# 1 2 3 4 5 6
#a 0 0 0 0 0 1
#b 0 0 1 1 1 1
#c 0 0 0 0 1 1
#d 0 1 1 1 1 1
#e 0 0 0 0 0 0
data
x <- c(5,2,4,1,9)
N <- 6

Shifting rows in R

My data is as follows:
1 2 3 4 5
0 1 2 3 4
0 0 1 2 3
0 0 0 0 1
0 0 0 0 1
How can I make the data so that it will look like this:
1 2 3 4 5
1 2 3 4 0
1 2 3 0 0
0 1 0 0 0
1 0 0 0 0
So that the first row don't shift, the second row shifted left by 1, third row shifted left by 2, fourth row shifted left by 3, and last row shifted left by 4?
I tried to at first shift all the rows below the first row to the left by 1, but apparently, it doesn't work.
nc <- ncol(df)
df[-(1), 2:nc] <- df[-(1), 2:(nc+1)]
df[-(1), 10] <- 0
df
You can use the shift function from data.table with fill = 0. If you want the output as a data.frame, put data.frame() around the last line.
mat <- as.matrix(df)
library(data.table)
t(sapply(seq(nrow(mat)), function(i) shift(mat[i,], i - 1, 'lead', fill = 0)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 1 2 3 4 0
# [3,] 1 2 3 0 0
# [4,] 0 1 0 0 0
# [5,] 1 0 0 0 0
A base R option:
m <- as.matrix(read.table(text = "1 2 3 4 5
0 1 2 3 4
0 0 1 2 3
0 0 0 0 1
0 0 0 0 1"))
do.call(rbind, lapply(seq_along(1:nrow(m)),
function(i) {c(m[i, i:ncol(m)], rep(0, i-1))}))
# V1 V2 V3 V4 V5
#[1,] 1 2 3 4 5
#[2,] 1 2 3 4 0
#[3,] 1 2 3 0 0
#[4,] 0 1 0 0 0
#[5,] 1 0 0 0 0

R: Generating sparse matrix with all elements as rows and columns

I have a data set with user to user. It doesn't have all users as col and row. For example,
U1 U2 T
1 3 1
1 6 1
2 4 1
3 5 1
u1 and u2 represent users of the dataset. When I create a sparse matrix using following code, (df- keep all data of above dataset as a dataframe)
trustmatrix <- xtabs(T~U1+U2,df,sparse = TRUE)
3 4 5 6
1 1 0 0 1
2 0 1 0 0
3 0 0 1 0
Because this matrix doesn't have all the users in row and columns as below.
1 2 3 4 5 6
1 0 0 1 0 0 1
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
If I want to get above matrix after sparse matrix, How can I do so in R?
We can convert the columns to factor with levels as 1 through 6 and then use xtabs
df1[1:2] <- lapply(df1[1:2], factor, levels = 1:6)
as.matrix(xtabs(T~U1+U2,df1,sparse = TRUE))
# U2
#U1 1 2 3 4 5 6
# 1 0 0 1 0 0 1
# 2 0 0 0 1 0 0
# 3 0 0 0 0 1 0
# 4 0 0 0 0 0 0
# 5 0 0 0 0 0 0
# 6 0 0 0 0 0 0
Or another option is to get the expanded index filled with 0s and then use sparseMatrix
library(tidyverse)
library(Matrix)
df2 <- crossing(U1 = 1:6, U2 = 1:6) %>%
left_join(df1) %>%
mutate(T = replace(T, is.na(T), 0))
sparseMatrix(i = df2$U1, j = df2$U2, x = df2$T)
Or use spread
spread(df2, U2, T)

Replicate rows by value in column, change values to 1 or 0, in R

I have data structured as:
A B C D
3 2 1 1
I want it restructured as
A B C D
1 0 0 0
1 0 0 0
1 0 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Any thoughts on how to do this in R? Many thanks.
If the input is a data.frame, you could do the following:
coln <- seq_along(df)
m = do.call(rbind, lapply(coln, function(i) {t(replicate(df[1,i], coln == i))})) +0
This will result in a matrix like this:
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 0
#[2,] 1 0 0 0
#[3,] 1 0 0 0
#[4,] 0 1 0 0
#[5,] 0 1 0 0
#[6,] 0 0 1 0
#[7,] 0 0 0 1
You can then convert it to a data.frame or set column names if you like.
Here is an option using dcast
library(data.table)
nm1 <- rep(names(df1), unlist(df1))
dcast(data.table(nm1, v1 = seq_along(nm1)), v1 ~ nm1, length)[, v1 := NULL][]
# A B C D
#1: 1 0 0 0
#2: 1 0 0 0
#3: 1 0 0 0
#4: 0 1 0 0
#5: 0 1 0 0
#6: 0 0 1 0
#7: 0 0 0 1
Or after creating the 'nm1', use model.matrix from base R
model.matrix(~-1 + nm1)
or in a single line
model.matrix(~ -1 + rep(names(df1), unlist(df1)))
and change the column names
data
df1 <- data.frame(A = 3, B = 2, C = 1, D = 1)

How can I create this special sequence?

I would like to create the following vector sequence.
0 1 0 0 2 0 0 0 3 0 0 0 0 4
My thought was to create 0 first with rep() but not sure how to add the 1:4.
Create a diagonal matrix, take the upper triangle, and remove the first element:
d <- diag(0:4)
d[upper.tri(d, TRUE)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
If you prefer a one-liner that makes no global assignments, wrap it up in a function:
(function() { d <- diag(0:4); d[upper.tri(d, TRUE)][-1L] })()
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
And for code golf purposes, here's another variation using d from above:
d[!lower.tri(d)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
rep and rbind up to their old tricks:
rep(rbind(0,1:4),rbind(1:4,1))
#[1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
This essentially creates 2 matrices, one for the value, and one for how many times the value is repeated. rep does not care if an input is a matrix, as it will just flatten it back to a vector going down each column in order.
rbind(0,1:4)
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 0
#[2,] 1 2 3 4
rbind(1:4,1)
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 4
#[2,] 1 1 1 1
You can use rep() to create a sequence that has n + 1 of each value:
n <- 4
myseq <- rep(seq_len(n), seq_len(n) + 1)
# [1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
Then you can use diff() to find the elements you want. You need to append a 1 to the end of the diff() output, since you always want the last value.
c(diff(myseq), 1)
# [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1
Then you just need to multiply the original sequence with the diff() output.
myseq <- myseq * c(diff(myseq), 1)
myseq
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
unlist(lapply(1:4, function(i) c(rep(0,i),i)))
# the sequence
s = 1:4
# create zeros vector
vec = rep(0, sum(s+1))
# assign the sequence to the corresponding position in the zeros vector
vec[cumsum(s+1)] <- s
vec
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
Or to be more succinct, use replace:
replace(rep(0, sum(s+1)), cumsum(s+1), s)
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4

Resources