Reshape a data frame into a wide shape - r

The data contains two variables: id and grade. Each id can have multiple records
for each grade.
dat <- data.frame(id = c(1,1,1,2,2,2,2,3,3,4,5,5,5),
grade = c("a", "b", "c", "a", "a", "b", "b", "d", "f", "c", "a", "e", "f"))
I want to reshape the data into a wide shape such that each id has only one record
and each unique grade becomes a single column. The value of each column is either 0 or 1,
depending on the grades for each id.
The final data set looks like:
id a b c d e f
1 1 1 1 0 0 0
2 1 1 0 0 0 0
3 0 0 0 1 0 1
4 0 0 1 0 0 0
5 1 0 0 0 1 1
I tried this, but no luck.
n.dat <- reshape(dat, timevar = "grade",idvar = c("id"),direction = "wide")

You could simply table the values, then convert to logical based on > 0 condition and then convert back to numeric using the + unary operator (or if you want less golfed, by simply + 0)
+(table(dat) > 0)
# grade
# id a b c d e f
# 1 1 1 1 0 0 0
# 2 1 1 0 0 0 0
# 3 0 0 0 1 0 1
# 4 0 0 1 0 0 0
# 5 1 0 0 0 1 1

Related

Transform relationship pairs into a matrix

I have one data frame like this. The id of each line is unique and the type defines the group of the id.
id type
a a1
b a1
c a2
d a3
e a4
f a4
I want to make a matrix like below. The value would be 1 if the two id belong to the same type, otherwise 0.
a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
The data frame is large (over 70 thousands line), and I do not know how to do this efficiently in R. Any suggestions would be appreciated.
Here is a base R solution, and I think you can use the following code
M <- crossprod(t(table(df)))
or
M <- crossprod(table(rev(df)))
such that
> M
id
id a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
DATA
df <- structure(list(id = c("a", "b", "c", "d", "e", "f"), type = c("a1",
"a1", "a2", "a3", "a4", "a4")), class = "data.frame", row.names = c(NA,
-6L))

Adjacency Matrix from source target dataset

I have a dataset as follows
Var1 Var2 Count
A B 3
A C 4
A D 10
A L 6
I need to create an adjacency matrix for usage downstream in creating a chord diagram. I am looking for an efficient way to get it.
A B C D L
A 0 3 4 10 6
B 3 0 0 0 0
C 4 0 0 0 0
D 10 0 0 0 0
L 6 0 0 0 0
I am looking for a visualization as follows
Assuming you're talking about just the symmetric matrix generation:
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Var1 Var2 Count
A B 3
A C 4
A D 10
A L 6')
vars <- sort(unique(unlist(dat[c("Var1","Var2")])))
m <- matrix(0, nr=length(vars), nc=length(vars), dimnames=list(vars,vars))
m[as.matrix(dat[c("Var1","Var2")])] <- m[as.matrix(dat[c("Var2","Var1")])] <- dat$Count
m
# A B C D L
# A 0 3 4 10 6
# B 3 0 0 0 0
# C 4 0 0 0 0
# D 10 0 0 0 0
# L 6 0 0 0 0
Here is an option using xtabs. Convert the first two column to factor with levels specified in the order we want in the output. Then, use xtabs to get a matrix output, transpose the output and add to the original matrix to get the expected output
dat[1:2] <- lapply(dat[1:2], factor, levels = c("A", "B", "C", "D", "L"))
out <- xtabs(Count ~ Var1 + Var2, dat)
out + t(out)
# Var2
#Var1 A B C D L
# A 0 3 4 10 6
# B 3 0 0 0 0
# C 4 0 0 0 0
# D 10 0 0 0 0
# L 6 0 0 0 0
data
dat <- structure(list(Var1 = c("A", "A", "A", "A"), Var2 = c("B", "C",
"D", "L"), Count = c(3L, 4L, 10L, 6L)), class = "data.frame",
row.names = c(NA, -4L))

R: Social network - max values based on conditions in different columns and rows

I would like to transform the following sample data based on a fictional trade survey example. Think of it that Country A says it exports to country B (Row 2, Export=1), while Country B says it does not import from A (Row 4, Import=0), and vice versa. I now want to get the max values (=1) for all these matches in the dataset, i.e. in this case Import in Row 4 would be =1).
> df <- data.frame("Sender" = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
+ "Receiver" = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
+ "Export"= c(0,1,0,0,0,0,0,0,0),
+ "Import" = c(0,1,1,0,0,1,0,0,0))
> df
Sender Receiver Export Import
1 A A 0 0
2 A B 1 1
3 A C 0 1
4 B A 0 0
5 B B 0 0
6 B C 0 1
7 C A 0 0
8 C B 0 0
9 C C 0 0
The solution should be
Sender Receiver Export Import Export_MAX Import_MAX
1 A A 0 0 0 0
2 A B 1 1 1 1
3 A C 0 1 0 1
4 B A 0 0 1 1
5 B B 0 0 0 0
6 B C 0 1 0 1
7 C A 0 0 1 0
8 C B 0 0 1 0
9 C C 0 0 0 0
I searched many ways to do that in this forum and elsewhere, but couldn't find a solution so far. I was thinking of something along the lines of applying a max function on the "Import" & "Export" columns, conditional on the values given in "Sender" & "Receiver", but I didn't get as far as to be able to report code here.
Any ideas out there? Your advice is much appreciated.
Here is my own solution, in case someone gets to the same issue.
df$Pairs <- paste(df$Sender,df$Receiver,sep = "-")
values <- df$Pairs[df$Export==1]
values2 <- df$Pairs[df$Import==1]
df$Import[df$Pairs %in% gsub("(\\w+)-(\\w+)","\\2-\\1", values)] <- 1
df$Export[df$Pairs %in% gsub("(\\w+)-(\\w+)","\\2-\\1", values2)] <- 1
The first line brings all sender-receiver combination into a character field - seperated by "-". The second and third line create pair combination for each row where the export/import conditions are 1. The final two rows use gsub to match all instances where the word combination sender-receiver or receiver sender is a match, and replace these values with 1.
Solution (directly in the Export/Import columns):
Sender Receiver Export Import Export_MAX Import_MAX
1 A A 0 0 0 0
2 A B 1 1 1 1
3 A C 0 1 0 1
4 B A 1 1 1 1
5 B B 0 0 0 0
6 B C 0 1 0 1
7 C A 1 0 1 0
8 C B 1 0 1 0
9 C C 0 0 0 0

How to create dichotomous variables based on some factors in r?

The initial dataframe is:
Factor1 Factor2 Factor3
A B C
B C NA
A NA NA
B C D
E NA NA
I want to create 5 dichotomous variables based on the above factor variables. The rule should be the new variable A will get 1 if either Factor1 or Factor2 or Factor3 contains an A otherwise A should be 0, and so on. The newly created variables should look like:
A B C D E
1 1 1 0 0
0 1 1 0 0
1 0 0 0 0
0 1 1 1 0
0 0 0 0 1
We can use table to do this. We replicate the sequence of rows with the number of columns, unlist the dataset and get the frequency of values.
table(rep(1:nrow(df1), ncol(df1)), unlist(df1))
# A B C D E
# 1 1 1 1 0 0
# 2 0 1 1 0 0
# 3 1 0 0 0 0
# 4 0 1 1 1 0
# 5 0 0 0 0 1
If we have more than 1 value per row, then convert to logical and then reconvert it back to binary.
+(!!table(rep(1:nrow(df1), ncol(df1)), unlist(df1)))
data
df1 <- structure(list(Factor1 = c("A", "B", "A", "B", "E"),
Factor2 = c("B",
"C", NA, "C", NA), Factor3 = c("C", NA, NA, "D", NA)),
.Names = c("Factor1",
"Factor2", "Factor3"), class = "data.frame", row.names = c(NA, -5L))

Assigning values to an empty adjacency matrix based on matching column values

I have an nxn dataset, say 5X5 data set.
ALPHA BETA GAMMA DELTA EPSILON
A B A X 1
B C 3 X 3
C D E Z 4
D A D X 5
E A 2 Z 2
I use column “ALPHA” to create an empty adjacency matrix (Aij),
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
I want to reassign Adjacency matrix values to 1 or 0 based on the matched values of column “DELTA” such that, if “DELTA” matches we set Aij=1 and 0 otherwise. That is, we will have a new adjacency matrix that looks like the following,
A B C D E
A 0 1 0 1 0
B 1 0 0 1 0
C 0 0 0 0 1
D 1 1 0 0 0
E 0 0 1 0 0
What loop command can or matching technique can I use to assign the new values?
Thanks.
Phil
A loop could work. You have A(i=j) as 0 in your example so I subtracted a diagonal matrix
DELTA<-c("X","X","Z","X","Z")
Adj<-mat.or.vec(nr=length(DELTA), nc=length(DELTA))
for (i in 1:length(DELTA)){
Adj[i,DELTA==DELTA[i]]<-1
}
Adj<-Adj-diag(length(DELTA))
You could use outer
res <- +(outer(df1$DELTA, df1$DELTA, FUN='=='))*!diag(dim(df1)[1])
dimnames(res) <- rep(list(df1$ALPHA),2)
res
# A B C D E
#A 0 1 0 1 0
#B 1 0 0 1 0
#C 0 0 0 0 1
#D 1 1 0 0 0
#E 0 0 1 0 0
Or
sapply(df1$DELTA, `==`, df1$DELTA) - diag(dim(df1)[1])
data
df1 <- structure(list(ALPHA = c("A", "B", "C", "D", "E"), BETA = c("B",
"C", "D", "A", "A"), GAMMA = c("A", "3", "E", "D", "2"), DELTA = c("X",
"X", "Z", "X", "Z"), EPSILON = c(1L, 3L, 4L, 5L, 2L)), .Names = c("ALPHA",
"BETA", "GAMMA", "DELTA", "EPSILON"), class = "data.frame",
row.names = c(NA, -5L))

Resources