I am trying to create a matrix from a dataframe based on the frequency of interaction of pairs of individuals. In the dataframe (example below), I have a list of names under the GIVER and RECIPIENT columns. Each row with a combination of GIVER and RECIPIENT corresponds to one (directed) interaction between the two individuals (interaction dyad).
The matrix I would like to obtain should have all the names of the individuals listed in the columns "GIVER" and "RECIPIENT" (not all individuals appear in both columns). The matrix's rows should represent the number of interactions that an individual (each rowname) gives to any other individual (each colname). The columns should instead represent the the number of interactions that each individual (each colname) receives from any other individual (each rowname).
This is an example of my dataframe:
GIVER
RECIPIENT
A
B
A
C
D
A
E
B
C
E
B
D
I used this function to obtain the matrix:
my_matrix = function(df){
tablei = as.data.frame(table(union(df$GIVER, df$RECIPIENT), union(df$GIVER, df$RECIPIENT)))
nameVals <- sort(unique(unlist(tablei[1:2])))
matrixi <- matrix(0, length(nameVals), length(nameVals), dimnames = list(nameVals,nameVals))
matrixi[as.matrix(tablei[c("Var1", "Var2")])] <- tablei[["Freq"]]
as.data.frame(matrixi)}
However, there is a problem in the second row, which returns the frequency values as all 0 (any interaction of an individual with others) or 1 (interaction of an individual with itself).
tablei = as.data.frame(table(union(df$GIVER, df$RECIPIENT), union(df$GIVER, df$RECIPIENT)))
Do you have any idea on how to fix the problem?
Thank you for your help!
A tidyverse approach to your problem.
Data
df <-
tibble::tribble(
~GIVER, ~RECIPIENT,
"A", "B",
"A", "C",
"D", "A",
"E", "B",
"C", "E",
"B", "D"
)
Code
library(dplyr)
library(tidyr)
df %>%
count(GIVER,RECIPIENT,name = "freq") %>%
complete(GIVER,RECIPIENT,fill = list(freq = 0))
Output
# A tibble: 25 x 3
GIVER RECIPIENT freq
<chr> <chr> <dbl>
1 A A 0
2 A B 1
3 A C 1
4 A D 0
5 A E 0
6 B A 0
7 B B 0
8 B C 0
9 B D 1
10 B E 0
# ... with 15 more rows
Related
I apologize as I'm not sure how to word this title exactly.
I have two data frames. df1 is a series of paths with columns "source" and "destination". df2 stores values associated with the destinations. Below is some sample data:
df1
row
source
destination
1
A
B
2
C
B
3
H
F
4
G
B
df2
row
destination
n
1
B
26
2
F
44
3
L
12
I would like to compare the two data frames and add the n column to df1 so that df1 has the correct n value for each destination. df1 should look like:
row
source
destination
n
1
A
B
26
2
C
B
26
3
H
F
44
4
G
B
26
The data that I'm actually working with is much larger, and is never the same number of rows when I run the program. The furthest I've gotten with this is using the which command to get the right values, but only each value once.
df2[ which(df2$destination %in% df1$destination), ]$n
[1] 26 44
When what I would need is the list (26,26,44,26) so I can save it to df1$n
We can use a merge or left_join
library(data.table)
setDT(df1)[df2, n := i.n, on = .(destination)]
A base R option using match
transform(
df1,
n = df2$n[match(destination, df2$destination)]
)
which gives
row source destination n
1 1 A B 26
2 2 C B 26
3 3 H F 44
4 4 G B 26
Data
df1 <- data.frame(row = 1:4, source = c("A", "C", "H", "G"), destination = c("B", "B", "F", "B"))
df2 <- data.frame(row = 1:3, destination = c("B", "F", "L"), n = c(26, 44, 12))
I am trying to merge some data frames, precisely 4, but I would like my command to work with whichever amount of them. Those data frames are named in a vector:
dataframes<-c('df1', 'data2', 'd3', 'samples4')
All data frames present the same data, but they belong to different samples. As an example, the first dataframe is as following:
ID count1
A 0
B 67
C 200
D 12
E 0
My desired output would contain a column with the counts of each ID for each sample:
ID count1 count2 count3 count4
A 0 2 0 30
B 67 100 300 500
C 200 2 1025 0
D 12 4 0 10
E 0 0 20 2
I have tried the following commands:
Reduce(function(x, y) merge(x, y, by="ID"), list(unname(get(dataframes))))
as.data.frame(do.call(cbind, unname(get(dataframes))))
But in both cases I get just the first data frame. No merging is occuring.
How can I solve this?
Assuming:
df1 <- data.frame(ID = c("A", "B", "C", "D"), count = c(2,45,24,21))
df2 <- data.frame(ID = c("A", "B", "C", "D"), count = c(11,35,4,2))
I'd suggest to just add a column with the sample name to each dataframe, e.g.:
df1["sample"] <- "sample1"
df2["sample"] <- "sample2"
Then merge them "vertically" by using something like rbind().
all_data <- rbind(df1, df2) # this can take more dataframes
This "long format" should also make it easier to filter rows by sample.
If you still need to have the wider structure you describe above (with a column for each sample), you can use reshape2::dcast() to construct it:
library(reshape2)
all_data <– dcast(all_data, ID ~ sample, value.var="count")
Result:
ID sample1 sample2
1 A 2 11
2 B 45 35
3 C 24 4
4 D 21 2
I'm following up on a prior question I asked here: Calculating ratio of reciprocated ties for each node in igraph
The answers were very helpful, but I realized one of the calculations isn't coming out correctly. I'm trying to figure out the ratio of reciprocated edges to outdegree--in other words, what percentage of people I nominate as friends nominate me as a friend?
When students don't nominate friends (outdegree is 0), they're not included in my calculation of reciprocated ties. Since they can't have any reciprocated ties, I want their reciprocity to be calculated as 0. Their ratio of reciprocated ties/outdegree should also be 0.
Here's an example:
library(igraph)
###Creating sample edgelist###
from<- c("A", "A", "A", "B", "B", "B", "C", "D", "D", "E")
to<- c("B", "C", "D", "A", "E", "D", "A", "B", "C", "E")
weight<- c(1,2,3,2,1,3,2,2,1,1)
g2<- as.matrix(cbind(from,to, weight))
###Converting edgelist to network###
g3=graph.edgelist(g2[,1:2])
E(g3)$weight=as.numeric(g2[,3])
###Removing self-loop###
g3<-simplify(g3, remove.loops = T)
Here, E's indegree is 1 and outdegree is 0. I create a self-loop for E so the indegree and outdegree vectors remain the same length, and then remove it.
Next, I see which nominations are reciprocated:
recip<-is.mutual(g3)
recip<-as.data.frame(recip)
Then I create an edgelist from g3, and add recip to the data frame:
###Creating edgelist and adding recipe###
edgelist<- get.data.frame(g3, what = "edges")
colnames(edgelist)<- c("from", "to", "weight")
edgelist<- cbind(edgelist, recip)
edgelist
> edgelist
from to weight recip
1 A B 1 TRUE
2 A C 2 TRUE
3 A D 3 FALSE
4 B A 2 TRUE
5 B D 3 TRUE
6 B E 1 FALSE
7 C A 2 TRUE
8 D B 2 TRUE
9 D C 1 FALSE
This is where the trouble begins. Since E isn't in from, it's also not in the objects I create below.
Next, I create a table with outdegree and add vertex names:
##Creating outdegree and adding vertex IDs##
outdegree<- as.data.frame(degree(g3, mode="out"))
ID<-V(g3)$name
outdegree<-cbind(ID, outdegree)
colnames(outdegree) <- c("ID","outdegree")
rownames(outdegree)<-NULL
outdegree
Outdegree comes out just as I want it:
ID outdegree
1 A 3
2 B 3
3 C 1
4 D 2
5 E 0
When I calculate the number of reciprocated ties for each node, E isn't included, since I use the from column from edgelist I discussed above.
##Calculating number of reciprocated ties##
recip<-aggregate(recip~from,edgelist,sum)
colnames(recip)<- c("ID", "recip")
recip
> recip
ID recip
1 A 2
2 B 2
3 C 1
4 D 1
So that's where the problem is. If try to create a table with the ratio of reciprocated ties to outdegree, E isn't included:
##Creating ratio table##
ratio<-merge(recip, outdegree, by= "ID")
ratio<-as.data.frame (recip$recip/ratio$outdegree)
ratio<- cbind(recip$ID, ratio)
colnames(ratio)<- c("ID", "ratio")
ratio
ID ratio
1 A 0.6666667
2 B 0.6666667
3 C 1.0000000
4 D 0.5000000
Ultimately, I want a row in ratio for E that equals 0. Since the ratio here would be 0/0 (0 reciprocated ties/0 outdegree), I'd probably get an NaN but I can convert that to 0 easily, so that would be fine.
I could work around this and export the data to Excel, run the calculations by hand, and keep it easy. But that won't help improve my coding skills, and I have a bunch of networks to run, so it's also pretty inefficient.
Any thoughts on how to automate this?
Thanks again for your help.
E is not showing up because E is not in the column from in the recip data frame! It is only in to.
You can aggregate on both columns and then merge.
r1 <- aggregate(recip~from,edgelist,sum)
colnames(r1) <- c("ID", "recip")
r2 <- aggregate(recip~to,edgelist,sum)
colnames(r2) <- c("ID", "recip")
recip <- merge(r1,r2, all = T) # all = T gives the union of the df's
Which gives:
ID recip
1 A 2
2 B 2
3 C 1
4 D 1
5 E 0
Also, with piplining:
library(dplyr)
edgelist %>%
aggregate(recip~from,.,sum) %>%
rename(ID = from) %>%
merge(., edgelist %>%
aggregate(recip~to,.,sum) %>%
rename(ID = to),
all = T)
I have a factor variable with a single column of data. It contains 33 levels, and as I understand each individual level has an integer value from 1-33. I was wondering how I could refer to that value rather than the levels label when subsetting a row?
Heres my attemp at writing the code for it:
Bexley <- subset(LE2016, borough[3])
I am trying to create a new object 'Bexley' containing only the level which was assigned the integer value of 3 from the dataframe 'LE2016'.
Thanks
You can use c or as.numeric to coerce the factor to integer and then do the subsetting. Here is a simple example:
library(dplyr)
DF <- data.frame(f=factor(c("a", "b", "c", "a", "b", "c"), levels=c("a", "b", "c")))
DF
f
1 a
2 b
3 c
4 a
5 b
6 c
# filter out the 3th level
DF %>% filter(c(f) == 3)
f
1 c
2 c
# c will coerce factor to integer
c(DF$f)
[1] 1 2 3 1 2 3
I have 16*3 data frame. Elements in data frame are character e.g., A, B, C... How can I assign them values e.g., A= 2, B=5, C=4 in R?
You can map the values from the vector you created:
relevel <- function(df, levelmap) {
df[] <- lapply(df, function(x) levelmap[as.character(x)]);df
}
The function subsets the values based on the map vector.
Example
df <- data.frame(x=c("A", "C", "C", "A"), y=c("B", "C", "B", "A"), z=c("A", "B", "C", "A"))
df
x y z
1 A B A
2 C C B
3 C B C
4 A A A
newlevels <- c(A=2,B=5,C=4)
relevel(df, newlevels)
x y z
1 2 5 2
2 4 4 5
3 4 5 4
4 2 2 2
The newlevels vector is a special vector called a named vector. It's very helpful as it can be referenced by both its names and its indices. newlevels["A"] and newlevels[1] both return the same output. This simplifies what in other languages would require hash tables or other lookup arrays.