Suppose I have the following clusters:
library(linkcomm)
g <- swiss[,3:4]
lc <-getLinkCommunities(g)
plot(lc, type = "members")
getNodesIn(lc, clusterids = c(3, 7, 8))
From the plot you can see the node 6 is present in 3 overlapping clusters: 3, 7 and 8. I am interested to know how to retrieve the direct binary interactions in these clusters as a data frame. Specifically, I would like a data frame with the cluster id as the first column, and the last two columns as "interactor 1" and "interactor 2", where all pairs of interactors can be listed per cluster. These should be direct, i.e. they have an edge in common.
Basically I would like something like this:
Cluster ID Interactor 1 Interactor 2
3 6 14
3 3 7
3 6 7
3 14 3
3 6 3
and so on for the other ids. If possible I would like to avoid duplicates such as 6 and 14, 14 and 6 etc.
Many thanks,
Abigail
You might be looking for the edges. Note: Use str(lc) to examine what's all included in your object of interest.
lc$edges
# node1 node2 cluster
# 1 17 15 1
# 2 17 8 1
# 3 15 8 1
# 4 16 13 2
# 5 16 10 2
# 6 16 29 2
# 7 14 6 3
# 8 ...
res <- setNames(lc$edges, c(paste0("interactor.", 1:2), "cluster"))[c(3, 1, 2)]
res
# cluster interactor.1 interactor.2
# 1 1 17 15
# 2 1 17 8
# 3 1 15 8
# 4 2 16 13
# 5 2 16 10
# 6 2 16 29
# 7 3 14 6
# 8 ...
Related
I have the following dataframe containing a variable "group" and a variable "number of elements per group"
group elements
1 3
2 1
3 14
4 10
.. ..
.. ..
30 5
then I have a bunch of numbers going from 1 to (let's say) 30
when summing "elements" I would get 900. what I want to obtain is to randomly select a number (from 0 to 30) from 1-30 and assign it to each group until I fill the number of elements for that group. Each of those should appear 30 times in total.
thus, for group 1, I want to randomly select 3 number from 0 to 30
for group 2, 1 number from 0 to 30 etc. until I filled all of the groups.
the final table should look like this:
group number(randomly selected)
1 7
1 20
1 7
2 4
3 21
3 20
...
any suggestions on how I can achieve this?
In base R, if you have df like this...
df
group elements
1 3
2 1
3 14
Then you can do this...
data.frame(group = rep(df$group, #repeat group no...
df$elements), #elements times
number = unlist(sapply(df$elements, #for each elements...
sample.int, #...sample <elements> numbers
n=30, #from 1 to 30
replace = FALSE))) #without duplicates
group number
1 1 19
2 1 15
3 1 28
4 2 15
5 3 20
6 3 18
7 3 27
8 3 10
9 3 23
10 3 12
11 3 25
12 3 11
13 3 14
14 3 13
15 3 16
16 3 26
17 3 22
18 3 7
Give this a try:
df <- read.table(text = "group elements
1 3
2 1
3 14
4 10
30 5", header = TRUE)
# reproducibility
set.seed(1)
df_split2 <- do.call("rbind",
(lapply(split(df, df$group),
function(m) cbind(m,
`number(randomly selected)` =
sample(1:30, replace = TRUE,
size = m$elements),
row.names = NULL
))))
# remove element column name
df_split2$elements <- NULL
head(df_split2)
#> group number(randomly selected)
#> 1.1 1 25
#> 1.2 1 4
#> 1.3 1 7
#> 2 2 1
#> 3.1 3 2
#> 3.2 3 29
The split function splits the df into chunks based on the group column. We then take those smaller data frames and add a column to them by sampling 1:30 a total of elements time. We then do.call on this list to rbind back together.
Yo have to generate a new dataframe repeating $group $element times, and then using sample you can generate the exact number of random numbers:
data<-data.frame(group=c(1,2,3,4,5),
elements=c(2,5,2,1,3))
data.elements<-data.frame(group=rep(data$group,data$elements),
number=sample(1:30,sum(data$elements)))
The result:
group number
1 1 9
2 1 4
3 2 29
4 2 28
5 2 18
6 2 7
7 2 25
8 3 17
9 3 22
10 4 5
11 5 3
12 5 8
13 5 26
I solved as follow:
random_sample <- rep(1:30, each=30)
random_sample <- sample(random_sample)
then I create a df with this variable and a variable containing one group per row repeated by the number of elements in the group itself
My data looks like this:
x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18
y is a grouping variable. I would like to see how well this grouping went.
Because of this I want to extract a sample of n pairs of cases that are grouped together by variable y
and n pairs of cases that are not grouped together by variable y. In order to calculate the number of
false positives and false negatives (either falsly grouped or not). How do I extract a sample of grouped pairs
and a sample of not-grouped pairs?
I would like the samples to look like this (for n=6) :
Grouped sample:
x y
2 2
3 2
9 9
10 9
15 14
17 14
Not-grouped sample:
x y
1 1
2 2
6 8
6 8
11 11
19 17
How would I go about this in R?
I'm not entirely clear on what you like to do, partly because I feel there is some context missing as to what you're trying to achieve. I also don't quite understand your expected output (for example, the not-grouped sample contains an entry 6 8 that does not exist in your original data...)
That aside, here is a possible approach.
# Maximum number of samples per group
n <- 3;
# Set fixed RNG seed for reproducibility
set.seed(2017);
# Grouped samples
df.grouped <- do.call(rbind.data.frame, lapply(split(df, df$y),
function(x) if (nrow(x) > 1) x[sample(min(n, nrow(x))), ]));
df.grouped;
# x y
#2.3 3 2
#2.2 2 2
#6.6 6 6
#6.7 7 6
#9.10 10 9
#9.9 9 9
#13.13 13 13
#13.14 14 13
#14.15 15 14
#14.17 17 14
# Ungrouped samples
df.ungrouped <- df[sample(nrow(df.grouped)), ];
df.ungrouped;
# x y
#7 7 6
#1 1 1
#9 9 9
#4 4 4
#3 3 2
#2 2 2
#5 5 5
#6 6 6
#10 10 9
#8 8 8
Explanation: Split df based on y, then draw min(n, nrow(x)) samples from subset x containing >1 rows; rbinding gives the grouped df.grouped. We then draw nrow(df.grouped) samples from df to produce the ungrouped df.ungrouped.
Sample data
df <- read.table(text =
"x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18", header = T)
my first language isn't English so I apologize in advance for mistakes I could do. I'm newbie in R but you will notice that anyway.
I'm trying to solve the problem of having a co-occurence matrix. I have several dataframes and I am interested in 3 variables : idT, numname and numstim.
This is the unique dataframe that contains the merged data :
z=rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,
df15,df16,df17,df18,df19,df20,df21,df22,df23,df24,df25,df26,df27,df28,df29,df30,df31,df32)
write.csv(z, file = ".../listz.csv")
Then I extracted the 3 variables with :
#Extract columns 3 & 6 from all the files within the list
z1 = z[,c(3,6)]
#Create a new variable 'numname' to convert name groups into numeric groups,
#then obtain levels with facNum
z1$numname <- as.numeric(z1$namegroup)
colnames(z1) <- c("namegroup", "idT", "numname")
facNum <- factor(z1$numname)
write.csv(z1, file = "...D:/z1.csv")
And data look like :
namegroup idT numname
1 GLISSEVIBREVITE 1 6
2 CINETIQUE 1 3
3 VIBRATIONS_LEGERES 1 20
4 DIFFUS 1 5
5 LIQUIDE 1 8
6 PICOTEMENTS 1 10
How to read the table : each idT is classified in a group (namegroup) and then this group is converted in a numeric variable (numname).
# Specify z1 as a data frame to make next operations
z1 = as.data.frame(z1, idT = z1$numstim, numgroup = z1$numname)
tab1 <- table(z1)
write.csv(tab1, file = ".../tab1test.csv")
out1 <- data.matrix(tab1 %*% t(tab1))
write.csv(out1, file = ".../bmtest.csv")
But the bmtest matrix doesn't look like counting pairs of idT, because only 22 users have participated and there are 32 idT, but some the numbers are much higher :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 24 10 7 7 11 7 7 8 10 8 11 8 6 11 11 12
2 10 32 27 7 5 4 7 4 4 4 5 3 2 6 6 14
3 7 27 40 0 3 1 0 2 0 0 2 2 1 2 0 15
4 7 7 0 30 7 14 15 9 15 13 13 7 5 12 13 5
5 11 5 3 7 24 7 9 20 12 13 10 19 14 20 12 7
I wanna have a matrix which shows the results of a count of idT paired together. The matrix has to look like :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 15 3 2 2 3 3 2 1 2 1 3 3 1 3 3 5
2 3 15 9 2 0 1 2 0 0 0 0 0 0 0 1 3
3 2 9 15 0 2 1 0 2 0 0 1 1 1 2 0 2
4 2 2 0 15 1 6 5 1 7 5 6 2 0 1 3 2
5 3 0 2 1 15 1 2 12 4 5 3 13 9 11 3 2
In other words, I want to see which idT have been paired together. I've looked at this topic but didn't find a way to solve my problem.
Also, I tried :
library(igraph)
library(tnet)
idT_numname <- cbind(z1$idT, z1$numname)
igraph <- graph.data.frame(idT_numname)
item_item <- projecting_tm(net = idT_numname, method="sum")
item_item <- tnet_igraph(item_item,type="weighted one-mode tnet")
itemmat <- get.adjacency(item_item,attr="weight")
itemmat #8x8 martrix of items to items
But I get error message and I don't know how to get over the "duplicated entries in the edgelist", because it seems necessary to me to have duplicated entries in order to do a co-occurrence matrix :
> idT_numname <- cbind(z1$idT, z1$numname)
> item_item <- projecting_tm(idT_numname, method="sum")
Error in as.tnet(net, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet", method="sum")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet", method = "sum") :
unused argument (method = "sum")
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
Your help is greatly appreciated.
I like to do data analysis and I want to learn more and more everyday !
Thank you
I have copied my code below. I start with a list of 50 small integers, representing the number of televisions owned by 50 families. My objective is shown in the object 'tv.final' below. My effort seems very wordy and inefficient.
Question: is there a better way to start with a list of 50 integers and end with a grouped data table with proportions? (Just taking my first baby steps with R, sorry for such a stupid question, but inquiring minds want to know.)
tv.data <- read.table("Tb02-08.txt",header=TRUE)
str(tv.data)
# 'data.frame': 50 obs. of 1 variable:
# $ TVs: int 1 1 1 2 6 3 3 4 2 4 ...
tv.table <- table(tv.data)
tv.table
# tv.data
# 0 1 2 3 4 5 6
# 1 16 14 12 3 2 2
tv.prop <- prop.table(tv.table)*100
tv.prop
# tv.data
# 0 1 2 3 4 5 6
# 2 32 28 24 6 4 4
tvs <- rbind(tv.table,tv.prop)
tvs
# 0 1 2 3 4 5 6
# tv.table 1 16 14 12 3 2 2
# tv.prop 2 32 28 24 6 4 4
tv.final <- t(tvs)
tv.final
# tv.table tv.prop
# 0 1 2
# 1 16 32
# 2 14 28
# 3 12 24
# 4 3 6
# 5 2 4
# 6 2 4
You can treat the object returned by table() as any other vector/matrix:
tv.table <- table(tv.data)
round(100 * tv.table/sum(tv.table))
That will give you the proportions in rounded percentage points.
This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed.
I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States".
For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with States. A reproducible code is as follows:
S.disc <- c(2000,2000)
S.max <- c(6200,2300)
S.min <- c(700,100)
Traces.num <- 3
Class.str <- lapply(1:2,function(x) seq(S.min[x],S.max[x],S.disc[x]))
Class.inf <- seq_len(Traces.num)
Actions <- data.table(expand.grid(Class.inf, Class.str[[2]], Class.str[[1]], Class.str[[2]], Class.str[[1]])[,c(5,4,1,3,2)])
setnames(Actions,c("k1","k2","i","l1","l2"))
States <- unique(Actions[,list(k1,k2,i)])
So if i was using data.frame, the following line would be like:
index <- apply(Actions,1,function(x) {which((States[,1]==x[4]) & (States[,2]==x[5]))})
How can I do the same with data.table efficiently ?
This is relatively simple once you get the hang of keys and the special symbols which may be used in the j expression of a data.table. Try this...
# First make an ID for each row for use in the `dcast`
# because you are going to have multiple rows with the
# same key values and you need to know where they came from
Actions[ , ID := 1:.N ]
# Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )
# Join States to Actions, using '.I', which
# is the row locations in States in which the
# key of Actions are found and within each
# group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
# k1 k2 ID Ind Row
#1: 700 100 1 1 1
#2: 700 100 1 2 2
#3: 700 100 1 3 3
#4: 700 100 2 1 1
#5: 700 100 2 2 2
#6: 700 100 2 3 3
# reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
# Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1: 1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16 1 1 1 4 4 4 7 7 7...
#2: 2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17 2 2 2 5 5 5 8 8 8...
#3: 3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18 3 3 3 6 6 6 9 9 9...