igraph constraint components (c-size, c-density, c-hierarchy) - r

I would like to compute the components of Burt's constraint discussed here by Burt.
Igraph's constraint command computes Burt's constraint score:
rm(list=ls())
library(igraph)
g <- graph.formula( "A"--------"B":"E":"F":"EGO",
"B"--------"A":"D":"EGO",
"C"--------"EGO",
"D"--------"B":"EGO",
"E"--------"A":"EGO",
"F"--------"A":"EGO",
"EGO"-"A":"B":"C":"D":"E":"F")
coords <- layout_nicely(g)
V(g)$label <- V(g)$name
g$layout <- coords
plot(g)
constraint(g)
Constraint only returns the overall constraint score.
with
and
is the strength (= weight) of connection between two vertices i and j.
is the direct connection between vertices i and j (share of connections from i to j of all of i's connections).
is the sum of indirect connections between vertices i and j (connections to other vertices q that are both connected to i and j).
I want to work with the individual components c-size, c-density, and c-hierarchy.
Burt reshapes the constraint equation like this:
The first term is c-size.
The second term is c-density.
The third term is c-hierarchy.
I want to compute the three components of constraint for each vertex of the network.
I could think of two solutions, both of which are beyond my capabilities.
Maybe there is a way to harness these values directly from igraph's constraint command.
Alternatively, one would have to compute these values manually.
For the example above, I have computed these values by hand using Excel:
node
degrees
constraint
c-size
c-density
c-hierarchy
A
4
0.60
0.25
0.23
0.12
B
3
0.64
0.33
0.24
0.07
C
1
1.00
1.00
0.00
0.00
D
2
0.78
0.50
0.25
0.03
E
2
0.73
0.50
0.21
0.02
F
2
0.73
0.50
0.21
0.02
EGO
6
0.40
0.17
0.16
0.07
strength
direct influence of j
indirect influence of j
combinded influence of j
FROM
TO
A
B
1
0.25
0.25
0.17
0.04
0.09
A
E
1
0.25
0.25
0.17
0.04
0.09
A
F
1
0.25
0.25
0.17
0.04
0.09
A
EGO
1
0.25
0.25
1.33
0.33
0.34
B
A
1
0.33
0.33
0.17
0.06
0.15
B
D
1
0.33
0.33
0.17
0.06
0.15
B
EGO
1
0.33
0.33
0.75
0.25
0.34
C
EGO
1
1.00
1.00
0.00
0.00
1.00
D
B
1
0.50
0.50
0.17
0.08
0.34
D
EGO
1
0.50
0.50
0.33
0.17
0.44
E
A
1
0.50
0.50
0.17
0.08
0.34
E
EGO
1
0.50
0.50
0.25
0.13
0.39
F
A
1
0.50
0.50
0.17
0.08
0.34
F
EGO
1
0.50
0.50
0.25
0.13
0.39
EGO
A
1
0.17
0.17
1.33
0.22
0.15
EGO
B
1
0.17
0.17
0.75
0.13
0.09
EGO
C
1
0.17
0.17
0.00
0.00
0.03
EGO
D
1
0.17
0.17
0.33
0.06
0.05
EGO
E
1
0.17
0.17
0.25
0.04
0.04
EGO
F
1
0.17
0.17
0.25
0.04
0.04

Related

Sort columns of a data.frame based on a list of prefixes

I have a data frame that looks like this:
Names S1_ATTCG S1_GTTA S9_TGCC S5_TGGA S21_GGCA
A 0.34 0.12 0.32 0.98 0.65
B 0.14 0.02 0.45 0.09 0.006
C 0.04 0.34 0.98 0.12 0.06
Is there a way to sort the columns so that the columns beginning with ^S1 and ^S5 will appear before all the others?
The data frame is composed by 53.000 columns and 12.000 rows.
A quick and dirty solution:
cbind(
d[, grepl( "S1|S5", names(d))],
d[,!grepl( "S1|S5", names(d))]
)
S1_ATTCG S1_GTTA S5_TGGA Names S9_TGCC S21_GGCA
1 0.34 0.12 0.98 A 0.32 0.650
2 0.14 0.02 0.09 B 0.45 0.006
3 0.04 0.34 0.12 C 0.98 0.060
With data:
d <- read.table(text = 'Names S1_ATTCG S1_GTTA S9_TGCC S5_TGGA S21_GGCA
A 0.34 0.12 0.32 0.98 0.65
B 0.14 0.02 0.45 0.09 0.006
C 0.04 0.34 0.98 0.12 0.06 ', header = T)
Good ol' dplyr can help too.
d %>%
relocate(
starts_with(c('S1','S5')), .after = Names)
)
Names S1_ATTCG S1_GTTA S5_TGGA S9_TGCC S21_GGCA
1 A 0.34 0.12 0.98 0.32 0.650
2 B 0.14 0.02 0.09 0.45 0.006
3 C 0.04 0.34 0.12 0.98 0.060
d <- read.table(text = 'Names S1_ATTCG S1_GTTA S9_TGCC S5_TGGA S21_GGCA
A 0.34 0.12 0.32 0.98 0.65
B 0.14 0.02 0.45 0.09 0.006
C 0.04 0.34 0.98 0.12 0.06 ', header = T)
d[c("Names", gtools::mixedsort(names(d)[-1]))]
#> Names S1_ATTCG S1_GTTA S5_TGGA S9_TGCC S21_GGCA
#> 1 A 0.34 0.12 0.98 0.32 0.650
#> 2 B 0.14 0.02 0.09 0.45 0.006
#> 3 C 0.04 0.34 0.12 0.98 0.060
Created on 2021-09-16 by the reprex package (v2.0.1)
or
d %>%
relocate(gtools::mixedsort(names(d)), .after = Names))

Filtering all rows if any value in a row is less than a threshold value

I would like to remove all rows if any value of the row is less than 0.05. Any suggestions? I need dplyr and base R simple subset solutions.
library(magrittr)
text = '
INNO RISK PRO AMB MKT IP
1 0.00 0.01 0.00 0.00 0.19 0.24
2 1.00 0.83 0.04 0.48 0.60 0.03
3 0.01 0.07 0.79 0.05 0.19 0.00
4 0.99 0.99 0.92 0.86 0.01 0.10
5 0.72 0.93 0.28 0.48 1.00 0.90
6 0.96 1.00 1.00 0.86 1.00 0.75
7 0.02 0.07 0.01 0.86 0.60 0.00
8 0.02 0.01 0.01 0.12 0.60 0.24
9 0.02 0.93 0.92 0.02 0.19 0.90
10 0.99 0.97 0.92 0.86 0.99 0.90'
d10 = textConnection(text) %>% read.table(header = T)
Created on 2020-11-28 by the reprex package (v0.3.0)
We can use rowSums
d10[!rowSums(d10 < 0.05),]
# INNO RISK PRO AMB MKT IP
#5 0.72 0.93 0.28 0.48 1.00 0.90
#6 0.96 1.00 1.00 0.86 1.00 0.75
#10 0.99 0.97 0.92 0.86 0.99 0.90
Or with dplyr
library(dplyr)
d10 %>%
filter(across(everything(), ~ . >= 0.05))
# INNO RISK PRO AMB MKT IP
#5 0.72 0.93 0.28 0.48 1.00 0.90
#6 0.96 1.00 1.00 0.86 1.00 0.75
#10 0.99 0.97 0.92 0.86 0.99 0.90

How do i create a 3d surface plot in R If I have a dataframe of 3 columns?

Here are the first 20 rows of my dataframe:
x y z
1 0.50 0.50 48530.98
2 0.50 0.51 49029.34
3 0.50 0.52 49576.12
4 0.50 0.53 50161.22
5 0.50 0.54 50752.05
6 0.50 0.55 51354.43
7 0.50 0.56 51965.09
8 0.50 0.57 38756.51
9 0.50 0.58 39262.34
10 0.50 0.59 39783.68
11 0.51 0.60 41052.09
12 0.51 0.61 41447.51
13 0.51 0.62 26972.85
14 0.51 0.63 27134.74
15 0.51 0.64 27297.85
16 0.51 0.65 27462.82
17 0.51 0.66 27632.45
18 0.51 0.67 27806.77
19 0.51 0.68 27988.12
20 0.51 0.69 25514.42
I need to create a 3d surface plot to view it.
The best would be one where I can rotate it around angles to view it from all perspectives.
Thanks.
You can use plotly to create a 3d surface plot. Use xtabs to turn your data into a suitable matrix
library(plotly)
plot_ly(z = ~xtabs(z ~ x + y, data = df)) %>% add_surface()
Sample data
df <- read.table(text =
" x y z
1 0.50 0.50 48530.98
2 0.50 0.51 49029.34
3 0.50 0.52 49576.12
4 0.50 0.53 50161.22
5 0.50 0.54 50752.05
6 0.50 0.55 51354.43
7 0.50 0.56 51965.09
8 0.50 0.57 38756.51
9 0.50 0.58 39262.34
10 0.50 0.59 39783.68
11 0.51 0.60 41052.09
12 0.51 0.61 41447.51
13 0.51 0.62 26972.85
14 0.51 0.63 27134.74
15 0.51 0.64 27297.85
16 0.51 0.65 27462.82
17 0.51 0.66 27632.45
18 0.51 0.67 27806.77
19 0.51 0.68 27988.12
20 0.51 0.69 25514.42", header = T)

How to create a new column in a data frame depending on multiple criteria from multiple columns from the same data frame

I have a data frame df1 with four variables. One refers to sunlight, the second one refers to the moon-phase light (light due to the moon's phase), the third one to the moon-position light (light from the moon depending on if it is in the sky or not) and the fourth refers to the clarity of the sky (opposite to cloudiness).
I call them SL, MPhL, MPL and SC respectively. I want to create a new column referred to "global light" that during the day depends only on SL and during the night depends on the other three columns ("MPhL", "MPL" and "SC"). What I want is that at night (when SL == 0), the light in a specific area is equal to the product of the columns "MPhL", "MPL" and "SC". If any of them is 0, then, the light at night would be 0 also.
Since I work with a matrix of hundreds of thousands of rows, what would be the best way to do it? As an example of what I have:
SL<- c(0.82,0.00,0.24,0.00,0.98,0.24,0.00,0.00)
MPhL<- c(0.95,0.85,0.65,0.35,0.15,0.00,0.87,0.74)
MPL<- c(0.00,0.50,0.10,0.89,0.33,0.58,0.00,0.46)
SC<- c(0.00,0.50,0.10,0.89,0.33,0.58,0.00,0.46)
df<-data.frame(SL,MPhL,MPL,SC)
df
SL MPhL MPL SC
1 0.82 0.95 0.00 0.00
2 0.00 0.85 0.50 0.50
3 0.24 0.65 0.10 0.10
4 0.00 0.35 0.89 0.89
5 0.98 0.15 0.33 0.33
6 0.24 0.00 0.58 0.58
7 0.00 0.87 0.00 0.00
8 0.00 0.74 0.46 0.46
What I would like to get is this:
df
SL MPhL MPL SC GL
1 0.82 0.95 0.00 0.00 0.82 # When "SL">0, GL= SL
2 0.00 0.85 0.50 0.50 0.21 # When "SL" is 0, GL = MPhL*MPL*SC
3 0.24 0.65 0.10 0.10 0.24
4 0.00 0.35 0.89 0.89 0.28
5 0.98 0.15 0.33 0.33 0.98
6 0.24 0.00 0.58 0.58 0.24
7 0.00 0.87 0.00 0.00 0.00
8 0.00 0.74 0.46 0.46 0.16
the most simple way would be to use the ifelse function:
GL <- ifelse(SL == 0, MPhL * MPL * SC, SL)
If you want to work in a more structured environment, I can recommend the dplyr package:
library(dplyr)
tibble(SL = SL, MPhL = MPhL, MPL = MPL, SC = SC) %>%
mutate(GL = if_else(SL == 0, MPhL * MPL * SC, SL))
# A tibble: 8 x 5
SL MPhL MPL SC GL
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.82 0.95 0.00 0.00 0.820000
2 0.00 0.85 0.50 0.50 0.212500
3 0.24 0.65 0.10 0.10 0.240000
4 0.00 0.35 0.89 0.89 0.277235
5 0.98 0.15 0.33 0.33 0.980000
6 0.24 0.00 0.58 0.58 0.240000
7 0.00 0.87 0.00 0.00 0.000000
8 0.00 0.74 0.46 0.46 0.156584

Repeat data.frame N times with adding column

I have the following data frame and I want to repeat it N times
dc <- read.table(text = "from 1 2 3 4 5
1 0.01 0.02 0.03 0.04 0.05
2 0.06 0.07 0.08 0.09 0.10
3 0.11 0.12 0.13 0.14 0.15
4 0.16 0.17 0.18 0.19 0.20
5 0.21 0.22 0.23 0.24 0.25", header = TRUE)
n<-20
ddr <- NA
for(i in 1:n) {
ddr <- rbind(ddr, cbind(dc,i))
}
As a result, I would like to receive:
from X1 X2 X3 X4 X5 i
1 0.01 0.02 0.03 0.04 0.05 1
2 0.06 0.07 0.08 0.09 0.10 1
3 0.11 0.12 0.13 0.14 0.15 1
4 0.16 0.17 0.18 0.19 0.20 1
5 0.21 0.22 0.23 0.24 0.25 1
1 0.01 0.02 0.03 0.04 0.05 2
2 0.06 0.07 0.08 0.09 0.10 2
3 0.11 0.12 0.13 0.14 0.15 2
4 0.16 0.17 0.18 0.19 0.20 2
5 0.21 0.22 0.23 0.24 0.25 2
.............................
1 0.01 0.02 0.03 0.04 0.05 20
2 0.06 0.07 0.08 0.09 0.10 20
3 0.11 0.12 0.13 0.14 0.15 20
4 0.16 0.17 0.18 0.19 0.20 20
5 0.21 0.22 0.23 0.24 0.25 20
The matrix must be repeated N times, and repeat number is added.
Is there a correct solution (easy function to do this in R) to this issue? In my case if the ddr is not declared (ddr<-NA), the script does not work. Thanks!
You can use rep() to replicate the row indexes, and also to create the repeat number column.
cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))
Let's break it down:
dc[rep(1:nrow(dc), n), ] uses replicated row indexes in the i value of row indexing of [ for data frames
rep(1:n, each = nrow(dc)) replicates a sequence the length of the n value nrow(dc) times each
cbind(...) combines the two into a single data frame
As #HubertL points out in the comments, this can be further simplified to
cbind(dc, i = rep(1:n, each = nrow(dc)))
thanks to the magic of recycling. Please go give him a vote.
Here is also a more intuitive way, about identical in speed to the other top answer:
n <- 3
data.frame(df,i=rep(1:n,ea=NROW(df)))
Output (repeated 3x):
from X1 X2 X3 X4 X5 i
1 1 0.01 0.02 0.03 0.04 0.05 1
2 2 0.06 0.07 0.08 0.09 0.10 1
3 3 0.11 0.12 0.13 0.14 0.15 1
4 4 0.16 0.17 0.18 0.19 0.20 1
5 5 0.21 0.22 0.23 0.24 0.25 1
6 1 0.01 0.02 0.03 0.04 0.05 2
7 2 0.06 0.07 0.08 0.09 0.10 2
8 3 0.11 0.12 0.13 0.14 0.15 2
9 4 0.16 0.17 0.18 0.19 0.20 2
10 5 0.21 0.22 0.23 0.24 0.25 2
11 1 0.01 0.02 0.03 0.04 0.05 3
12 2 0.06 0.07 0.08 0.09 0.10 3
13 3 0.11 0.12 0.13 0.14 0.15 3
14 4 0.16 0.17 0.18 0.19 0.20 3
15 5 0.21 0.22 0.23 0.24 0.25 3
EDIT: Top Answer Speed Test
This test was scaled up to n=1e+05, iterations=100:
func1 <- function(){
data.frame(df,i=rep(1:n,ea=NROW(df)))
}
func2 <- function(){
cbind(dc, i = rep(1:n, each = nrow(dc)))
}
func3 <- function(){
cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))
}
microbenchmark::microbenchmark(
func1(),func2(),func3())
Unit: milliseconds
expr min lq mean median uq max neval cld
func1() 15.58709 21.69143 28.62695 22.01692 23.85648 117.9012 100 a
func2() 15.99023 21.59375 28.37328 22.18298 23.99953 136.1209 100 a
func3() 414.18741 436.51732 473.14571 453.26099 498.21576 666.8515 100 b

Resources