suppose I have a network like this with multiple subgraphs.
How can I only keep the subgraph with the most number of vertices while removing the rest? In this case I want to keep the subgraph on the left and remove the 3-vertices one the lower right. Thanks!
Given
set.seed(1)
g <- sample_gnp(20, 1 / 20)
plot(g)
we wish to keep the subgraph with 6 vertices. Using
(clu <- components(g))
# $membership
# [1] 1 2 3 4 5 4 5 5 6 7 8 9 10 3 5 11 5 3 12 5
# $csize
# [1] 1 1 3 2 6 1 1 1 1 1 1 1
# $no
# [1] 12
gMax <- induced_subgraph(g, V(g)[clu$membership == which.max(clu$csize)])
we then get
plot(gMax)
This assumes that there is a single largest connected subgraph. Otherwise the "first" one will be chosen.
I have a data frame of n = 20 variables (number of columns) spread over b = 5 blocks (4 variables per block).
I would like to create p = 4 random and equal sized blocks of variables from the 5 blocks of variables.
I tried :
sample (x = 1: p, size = n, replace = TRUE)
[1] 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 4 4 4 4 4
Example of expected result (5 variables per block):
[1] 4 1 2 1 4 2 3 1 2 3 2 1 4 3 1 2 3 3 4 4
Thanks for your help !
You can try:
sample(x = rep(1:p,n/p), size = n, replace = FALSE)
Having discussed this in comments below, here is a solution:
Create a vector that looks like what you want, and then use sample to randomly sort it by sampling the whole vector without replacement:
p <- 4
b <- 5
sample(rep(1:p, b), size = p * b)
[1] 3 1 4 3 3 4 1 1 4 2 2 4 3 2 1 2 2 4 3 1
This question already has an answer here:
Insert missing time rows into a dataframe
(1 answer)
Closed 5 years ago.
I have a dataset that look like the following
id = c(1,1,1,2,2,2,3,3,4)
cycle = c(1,2,3,1,2,3,1,3,2)
value = 1:9
data.frame(id,cycle,value)
> data.frame(id,cycle,value)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 3 8
9 4 2 9
so basically there is a variable called id that identifies the sample, a variable called cycle which identifies the timepoint, and a variable called value that identifies the value at that timepoint.
As you see, sample 3 does not have cycle 2 data and sample 4 is missing cycle 1 and 3 data. What I want to know is there a way to run a command outside of a loop to get the data to place NA's where there is no data. So I would like for my dataset to look like the following:
> data.frame(id,cycle,value)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 2 NA
9 3 3 8
10 4 1 NA
11 4 2 9
12 4 3 NA
I am able to solve this problem with a lot of loops and if statements but the code is extremely long and cumbersome (I have many more columns in my real dataset).
Also, the number of samples I have is very large so I need something that is generalizable.
Using merge and expand.grid, we can come up with a solution. expand.grid creates a data.frame with all combinations of the supplied vectors (so you'd supply it with the id and cycle variables). By merging to your original data (and using all.x = T, which is like a left join in SQL), we can fill in those rows with missing data in dat with NA.
id = c(1,1,1,2,2,2,3,3,4)
cycle = c(1,2,3,1,2,3,1,3,2)
value = 1:9
dat <- data.frame(id,cycle,value)
grid_dat <- expand.grid(id = 1:4,
cycle = 1:3)
# or you could do (HT #jogo):
# grid_dat <- expand.grid(id = unique(dat$id),
# cycle = unique(dat$cycle))
merge(x = grid_dat, y = dat, by = c('id','cycle'), all.x = T)
id cycle value
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 4
5 2 2 5
6 2 3 6
7 3 1 7
8 3 2 NA
9 3 3 8
10 4 1 NA
11 4 2 9
12 4 3 NA
A solution based on the package tidyverse.
library(tidyverse)
# Create example data frame
id <- c(1, 1, 1, 2, 2, 2, 3, 3, 4)
cycle <- c(1, 2, 3, 1, 2, 3, 1, 3, 2)
value <- 1:9
dt <- data.frame(id, cycle, value)
# Complete the combination between id and cycle
dt2 <- dt %>% complete(id, cycle)
Here is a solution with data.table doing a cross join:
library("data.table")
d <- data.table(id = c(1,1,1,2,2,2,3,3,4), cycle = c(1,2,3,1,2,3,1,3,2), value = 1:9)
d[CJ(id=id, cycle=cycle, unique=TRUE), on=.(id,cycle)]
Seems like this very simple maneuver used to work for me, and now it simply doesn't. A dummy version of the problem:
df <- data.frame(x = 1:5) # create simple dataframe
df
x
1 1
2 2
3 3
4 4
5 5
df$y <- c(1:5) # adding a new column with a vector of the exact same length. Works out like it should
df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
df$z <- c(1:4) # trying to add a new colum, this time with a vector with less elements than there are rows in the dataframe.
Error in `$<-.data.frame`(`*tmp*`, "z", value = 1:4) :
replacement has 4 rows, data has 5
I was expecting this to work with the following result:
x y z
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 1
I.e. the shorter vector should just start repeating itself automatically. I'm pretty certain this used to work for me (it's in a script that I've been running a hundred times before without problems). Now I can't even get the above dummy example to work like I want to. What am I missing?
If the vector can be evenly recycled, into the data.frame, you do not get and error or a warning:
df <- data.frame(x = 1:10)
df$z <- 1:5
This may be what you were experiencing before.
You can get your vector to fit as you mention with rep_len:
df$y <- rep_len(1:3, length.out=10)
This results in
df
x z y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 1
5 5 5 2
6 6 1 3
7 7 2 1
8 8 3 2
9 9 4 3
10 10 5 1
Note that in place of rep_len, you could use the more common rep function:
df$y <- rep(1:3,len=10)
From the help file for rep:
rep.int and rep_len are faster simplified versions for two common cases. They are not generic.
If the total number of rows is a multiple of the length of your new vector, it works fine. When it is not, it does not work everywhere. In particular, probably you have used this type of recycling with matrices:
data.frame(1:6, 1:3, 1:4) # not a multiply
# Error in data.frame(1:6, 1:3, 1:4) :
# arguments imply differing number of rows: 6, 3, 4
data.frame(1:6, 1:3) # a multiple
# X1.6 X1.3
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 1
# 5 5 2
# 6 6 3
cbind(1:6, 1:3, 1:4) # works even with not a multiple
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 1 4
# [5,] 5 2 1
# [6,] 6 3 2
# Warning message:
# In cbind(1:6, 1:3, 1:4) :
# number of rows of result is not a multiple of vector length (arg 3)
I have a weighted directed graph with three strongly connected components(SCC).
The SCCs are obtained from the igraph::clusters function
library(igraph)
SCC<- clusters(graph, mode="strong")
SCC$membership
[1] 9 2 7 7 8 2 6 2 2 5 2 2 2 2 2 1 2 4 2 2 2 3 2 2 2 2 2 2 2 2
SCC$csize
[1] 1 21 1 1 1 1 2 1 1
SCC$no
[1] 9
I want to visualize the SCCs with circles and a colored background as the graph below, is there any ways to do this in R? Thanks!
Take a look at the mark.groups argument of plot.igraph. Something like the following will do the trick:
# Create some toy data
set.seed(1)
library(igraph)
graph <- erdos.renyi.game(20, 1/20)
# Do the clustering
SCC <- clusters(graph, mode="strong")
# Add colours and use the mark.group argument
V(graph)$color <- rainbow(SCC$no)[SCC$membership]
plot(graph, mark.groups = split(1:vcount(graph), SCC$membership))