Override hidden metadata within a tbl_graph object - r

Through a bunch of manipulations I recombined a tbl_graph that has this structure:
# A tbl_graph: 98 nodes and 78 edges
#
# An undirected simple graph with 46 components
#
# Node Data: 98 x 4 (active)
w nucleus name alpha
<dbl> <int> <int> <dbl>
1 0.4 1 95 0.05
2 0.4 1 34 0.05
3 0.4 1 82 0.05
4 0.4 2 10 0.55
5 0.4 2 2 0.55
6 0.4 3 68 0.55
# ... with 92 more rows
#
# Edge Data: 78 x 3
from to color
<int> <int> <chr>
1 34 95 red
2 82 95 red
3 34 82 red
# ... with 75 more rows
however, when I pass as.igraph to it (that is a necessary passage for ggraph), the edges in the metadata are these:
IGRAPH 70d96f3 UN-- 98 78 --
+ attr: w (v/n), nucleus (v/n), name (v/n), alpha (v/n), color (e/c)
+ edges from 70d96f3 (vertex names):
[1] 3--18 40--18 3--40 34--53 79--93 20-- 5 20--76 5--76 8--83 66--75 66--78 75--78 94--54 41--89
[15] 4--89 4--41 16--31 67--77 9--28 58--28 58-- 9 57--80 82--63 10--59 27--39 35--19 36--42 91--42
[29] 36--91 52--64 25--52 25--64 50--33 50--65 33--65 7--84 97--14 96-- 1 96--43 96-- 6 43-- 1 6-- 1
[43] 6--43 21--62 86--62 21--86 30--49 37--17 37--90 17--90 92--44 68--69 68--26 68--38 68--72 68--24
[57] 69--26 69--38 69--72 69--24 38--26 72--26 26--24 72--38 38--24 72--24 48--71 71--51 48--51 23--74
[71] 88--60 61--88 61--60 47--85 98--85 98--47 95--81 29--46
As you can see, these are different edges. Is there a easy way to force igraph to ovverride metadata with content of columns name,from,to?
EDIT:
It seems that it has to do with the order of the nodes; if I rearrange by the name, from and to will change to match the new id of the raw of their original match with name; that is something that I don't want to happen.

Related

how to extract observation values for each cluster of Kmeans

I have data that come from two distribution functions (mixture data). I fit the k-means to the data with $2$ centers. I then get the clusters. My point here is, instead of the number of each cluster, I would like to divide my data into two groups. That is, the first group contains the data that comes from the first cluster and the same for the second group (my data is two dimensions and a matrix).
Here is my try:
kme <- kmeans(Sim, 2)
kme$cluster
which gives this:
kme$cluster
[1] 1 2 2 1 1 1 2 2 2 1 2 2 1 2 1 2 1 2 2 1 2 1 2 2 2 1 2 1 2 1 1 2 1 1 1 2 2 1 1 1 1 1 2 2 1 1 1 2 2 1 2 1 2 2 2
[56] 1 2 1 2 2 1 2 1 1 2 2 1 2 2 1 1 2 2 1 2 2 1 1 1 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 1 1 2 2 1 2
I know that means the first row (observations in the first row) of my matrix comes from the first cluster and the second and third rows are from the second cluster. Instead of this, I want two groups, one with the observations (the values not the number of the cluster) of the first cluster, and the other come from the second cluster.
For example,
[,1] [,2] [,3]
[1,] 0.8026952 0.8049413 1
[2,] 0.4333745 0.5063472 2
[3,] 0.3587946 0.4091627 2
[4,] 0.9067146 0.9211618 1
[5,] 0.6663730 0.6644439 1
[6,] 0.9752217 0.8299001 1
Hence, I want it like this:
Group_1
[,1] [,2]
[1,] 0.8026952 0.8049413
[2,] 0.9067146 0.9211618
[3,] 0.6663730 0.6644439
[4,] 0.9752217 0.8299001
Group_2
[2,] 0.4333745 0.5063472
[3,] 0.3587946 0.4091627
## my data
structure(c(0.8026952064848, 0.433374540465373, 0.35879457564118,
0.906714606331661, 0.666372966486961, 0.975221659988165, 0.146514602801487,
0.185211665343342, 0.266845172200967, 0.9316249943804, 0.458760005421937,
0.260092565789819, 0.546946153900359, 0.320214906940237, 0.998543527442962,
0.264783770404576, 0.940526409307495, 0.218771387590095, 0.00109510733232848,
0.909367726704406, 0.195467973826453, 0.853418850837688, 0.257240866776556,
0.18492349224921, 0.0350681275368262, 0.743108308431699, 0.120800079312176,
0.536067422405767, 0.387076289858669, 0.859893148997799, 0.962759922724217,
0.0288314732712864, 0.878663770621642, 0.98208610656754, 0.98423704248853,
0.0850008164197942, 0.415692074922845, 0.725441533140838, 0.514739896170795,
0.564903213409707, 0.65493689605431, 0.551635805051774, 0.20452569425106,
0.0509099354967475, 0.646801606381046, 0.656341063790023, 0.706781879998744,
0.244539211907925, 0.43318469475677, 0.848426640266553, 0.26359805940462,
0.730860544172275, 0.405211122473702, 0.401496034115553, 0.432796132021846,
0.654138915939257, 0.00803712895140052, 0.991968845921972, 0.0311756118742527,
0.0648601313587278, 0.733741108178729, 0.0431173096876591, 0.619796682847664,
0.804308546474203, 0.0934691624715924, 0.520366458455101, 0.833598382357762,
0.373484763782471, 0.261487311183624, 0.822368689114228, 0.88254910800606,
0.261728620579622, 0.109025254459585, 0.661885950024542, 0.231851563323289,
0.46855820226483, 0.909970719134435, 0.799321972066537, 0.646252158097923,
0.233985049184412, 0.309839888018159, 0.129971102112904, 0.0901338488329202,
0.460395671925082, 0.274646409088746, 0.675003502921675, 0.00289221783168614,
0.336108531044562, 0.371105678845197, 0.607435576152056, 0.156731446506456,
0.246894558891654, 0.418194083335386, 0.000669385509081014, 0.929943428778418,
0.972200238145888, 0.503282874496368, 0.126382717164233, 0.683936105109751,
0.21720214970307, 0.804941252722838, 0.506347232734472, 0.409162739287115,
0.921161751145135, 0.664443932378791, 0.829900114789874, 0.0660539097664178,
0.296326436845226, 0.120007439729838, 0.768823563807157, 0.449026418114183,
0.268668511775742, 0.733763495587273, 0.365402223476625, 0.97980160509396,
0.335119241818387, 0.929315469866307, 0.253016166717649, 0.00521095494948787,
0.870041067705, 0.215020805969677, 0.858896143709886, 0.167998804405928,
0.204213777320881, 0.050652931423494, 0.731499125526297, 0.166061290725948,
0.520575411719918, 0.370579454420263, 0.655607928337889, 0.978414469097905,
0.00268175014874324, 0.937587480238656, 0.992468047261219, 0.856301580636229,
0.106064732119751, 0.530228247677302, 0.502227925225818, 0.66462369930413,
0.526988978414104, 0.394591213637187, 0.623968017885322, 0.222666427921132,
0.0707407196787662, 0.715361864683925, 0.561951996212598, 0.874765155771585,
0.217631973951671, 0.576708062239157, 0.910641489550344, 0.215463715360162,
0.761807500922947, 0.417110771840405, 0.497162608159201, 0.530665309105489,
0.689703677933362, 0.00811876221245061, 0.991245541114815, 0.0518070069187705,
0.0733367055960226, 0.803126294581356, 0.0291602667026993, 0.724848517465592,
0.682316094846719, 0.0914714514707226, 0.426956537783392, 0.826985575416605,
0.3128962286514, 0.295208624024388, 0.58934716401092, 0.856718183582533,
0.183019143019377, 0.302561606994597, 0.666755501118539, 0.176298329811281,
0.389183841328174, 0.86253900906311, 0.753736534075238, 0.627220192419063,
0.319958512526359, 0.321602248149364, 0.161772830672492, 0.103166641060684,
0.339980194505715, 0.218533019046996, 0.689884789678819, 0.00251942038852481,
0.174792447835404, 0.509071373135409, 0.647835095901117, 0.22572898134156,
0.287369659385574, 0.538675651472693, 0.000995476493411555, 0.939528694637273,
0.961510166904661, 0.452822116916426, 0.2061782381611, 0.722694525115558,
0.328404467661884), .Dim = c(100L, 2L))
I hope this is what you are looking for.
I had to transform the matrix to a data frame so that when we use split function the structure will be preserved, otherwise it would split the whole matrix element by element as matrix is actually a vector that has dim attribute. So it behaves like a vector
split function divides a data frame or a vector into groups defined by f. which in your case are unique cluster values
kme <- kmeans(Sim, 2)
kme$cluster
Sim2 <- as.data.frame(cbind(Sim, kme$cluster))
split(Sim2, Sim2$V3) |>
setNames(paste("Group", sort(unique(kme$cluster))))
$`Group 1`
V1 V2 V3
2 0.4333745405 0.5063472327 1
3 0.3587945756 0.4091627393 1
7 0.1465146028 0.0660539098 1
8 0.1852116653 0.2963264368 1
9 0.2668451722 0.1200074397 1
11 0.4587600054 0.4490264181 1
12 0.2600925658 0.2686685118 1
14 0.3202149069 0.3654022235 1
16 0.2647837704 0.3351192418 1
18 0.2187713876 0.2530161667 1
19 0.0010951073 0.0052109549 1
21 0.1954679738 0.2150208060 1
23 0.2572408668 0.1679988044 1
24 0.1849234922 0.2042137773 1
25 0.0350681275 0.0506529314 1
27 0.1208000793 0.1660612907 1
29 0.3870762899 0.3705794544 1
32 0.0288314733 0.0026817501 1
36 0.0850008164 0.1060647321 1
37 0.4156920749 0.5302282477 1
43 0.2045256943 0.2226664279 1
44 0.0509099355 0.0707407197 1
48 0.2445392119 0.2176319740 1
49 0.4331846948 0.5767080622 1
51 0.2635980594 0.2154637154 1
53 0.4052111225 0.4171107718 1
54 0.4014960341 0.4971626082 1
55 0.4327961320 0.5306653091 1
57 0.0080371290 0.0081187622 1
59 0.0311756119 0.0518070069 1
60 0.0648601314 0.0733367056 1
62 0.0431173097 0.0291602667 1
65 0.0934691625 0.0914714515 1
66 0.5203664585 0.4269565378 1
68 0.3734847638 0.3128962287 1
69 0.2614873112 0.2952086240 1
72 0.2617286206 0.1830191430 1
73 0.1090252545 0.3025616070 1
75 0.2318515633 0.1762983298 1
76 0.4685582023 0.3891838413 1
80 0.2339850492 0.3199585125 1
81 0.3098398880 0.3216022481 1
82 0.1299711021 0.1617728307 1
83 0.0901338488 0.1031666411 1
84 0.4603956719 0.3399801945 1
85 0.2746464091 0.2185330190 1
87 0.0028922178 0.0025194204 1
88 0.3361085310 0.1747924478 1
89 0.3711056788 0.5090713731 1
91 0.1567314465 0.2257289813 1
92 0.2468945589 0.2873696594 1
93 0.4181940833 0.5386756515 1
94 0.0006693855 0.0009954765 1
97 0.5032828745 0.4528221169 1
98 0.1263827172 0.2061782382 1
100 0.2172021497 0.3284044677 1
$`Group 2`
V1 V2 V3
1 0.8026952 0.8049413 2
4 0.9067146 0.9211618 2
5 0.6663730 0.6644439 2
6 0.9752217 0.8299001 2
10 0.9316250 0.7688236 2
13 0.5469462 0.7337635 2
15 0.9985435 0.9798016 2
17 0.9405264 0.9293155 2
20 0.9093677 0.8700411 2
22 0.8534189 0.8588961 2
26 0.7431083 0.7314991 2
28 0.5360674 0.5205754 2
30 0.8598931 0.6556079 2
31 0.9627599 0.9784145 2
33 0.8786638 0.9375875 2
34 0.9820861 0.9924680 2
35 0.9842370 0.8563016 2
38 0.7254415 0.5022279 2
39 0.5147399 0.6646237 2
40 0.5649032 0.5269890 2
41 0.6549369 0.3945912 2
42 0.5516358 0.6239680 2
45 0.6468016 0.7153619 2
46 0.6563411 0.5619520 2
47 0.7067819 0.8747652 2
50 0.8484266 0.9106415 2
52 0.7308605 0.7618075 2
56 0.6541389 0.6897037 2
58 0.9919688 0.9912455 2
61 0.7337411 0.8031263 2
63 0.6197967 0.7248485 2
64 0.8043085 0.6823161 2
67 0.8335984 0.8269856 2
70 0.8223687 0.5893472 2
71 0.8825491 0.8567182 2
74 0.6618860 0.6667555 2
77 0.9099707 0.8625390 2
78 0.7993220 0.7537365 2
79 0.6462522 0.6272202 2
86 0.6750035 0.6898848 2
90 0.6074356 0.6478351 2
95 0.9299434 0.9395287 2
96 0.9722002 0.9615102 2
99 0.6839361 0.7226945 2
Add the kme$cluster values to the original dataframe and then create a new dataframe with each column based on the value in kme$cluster
From what I understand without a data sample:
library(tidyverse)
Sim <- Sim %>%
mutate(cluster_group = kme$cluster)
df_final <- data.frame(Group1 = Sim %>%
filter(cluster_group == 1) %>%
select(value) %>%
pull(),
Group2 = Sim %>%
filter(cluster_group== 2) %>%
select(value) %>%
pull())
With value the values used for the kmeans in Sim

Selecting subsets of a grouped variable

The data I used can be found here (the "sq.txt" file).
Below is a summary of the data:
> summary(sq)
behaviour date squirrel time
resting :983 2017-06-28: 197 22995 : 127 09:30:00: 17
travelling :649 2017-06-26: 160 22758 : 116 08:00:00: 16
feeding :344 2017-06-30: 139 23080 : 108 16:25:00: 15
OOS :330 2017-07-18: 110 23089 : 100 08:11:00: 13
vocalization:246 2017-06-27: 99 23079 : 97 08:31:00: 13
social : 53 2017-06-29: 96 22865 : 95 15:24:00: 13
(Other) : 67 (Other) :1871 (Other):2029 (Other) :2585
Each squirrel has a number of observations that correspond to a number of different behaviours (behaviour).
For example, squirrel 22995 was observed 127 times. These 127 observations correspond to different behaviour categories: 7 feeding, 1 territorial, 55 resting, etc. I then need to divide the number of each behaviour by the total number of observations (i.e. feeding = 7/127, territorial = 1/127, resting = 55/127, etc.) to get proportions of time spent doing each behaviour.
I already have grouped my observations by squirrel using the dplyr package.
Is there a way, using dplyr, for me to calculate proportions for one column (behaviour) based on the total observations for a column (squirrel) where the values have been grouped?
Something like this?
sq %>%
count(squirrel, behaviour) %>%
group_by(squirrel) %>%
mutate(p = n/sum(n)) %>%
# add this line to see result for squirrel 22995
filter(squirrel == 22995)
# A tibble: 8 x 4
# Groups: squirrel [1]
squirrel behaviour n p
<int> <chr> <int> <dbl>
1 22995 feeding 7 0.0551
2 22995 nest_building 4 0.0315
3 22995 OOS 9 0.0709
4 22995 resting 55 0.433
5 22995 social 6 0.0472
6 22995 territorial 1 0.00787
7 22995 travelling 32 0.252
8 22995 vocalization 13 0.102
EDIT:
If you want to include zero counts for squirrels where a behaviour was not observed, one way is to use tidyr::complete(). That generates NA by default, which you may want to replace with zero.
library(dplyr)
library(tidyr)
sq %>%
count(squirrel, behaviour) %>%
complete(squirrel, behaviour) %>%
group_by(squirrel) %>%
mutate(p = n/sum(n, na.rm = TRUE)) %>%
replace_na(list(n = 0, p = 0)) %>%
filter(squirrel == 22995)
# A tibble: 11 x 4
# Groups: squirrel [1]
squirrel behaviour n p
<int> <chr> <dbl> <dbl>
1 22995 dead 0 0
2 22995 feeding 7.00 0.0551
3 22995 grooming 0 0
4 22995 nest_building 4.00 0.0315
5 22995 OOS 9.00 0.0709
6 22995 resting 55.0 0.433
7 22995 social 6.00 0.0472
8 22995 territorial 1.00 0.00787
9 22995 travelling 32.0 0.252
10 22995 vigilant 0 0
11 22995 vocalization 13.0 0.102

R, analyzing a data set with a large parameter space and replicates

I've run experiments whereby I use a parameter combination, collect the average forces and torques (in the x,y, and z directions). I do four replicates for each parameter combo, and I have 432 parameter combinations in total.
The actual dataset is a bit too big to include here, so I've made a subset for testing purposes and uploaded it to dropbox, along with the relevant R script.
Here is a heavily parsed version:
> data2[1:20,1:8]
# A tibble: 20 x 8
`Foil Color` `Flow Speed (rpm)` `Frequency (Hz)` StepTime Maxpress Minpress `Minpress Percentage` FxMean
<fctr> <fctr> <fctr> <fctr> <fctr> <int> <fctr> <dbl>
1 Black 0 0.25 250 50 0 0 0.014537062
2 Black 0 0.25 250 50 0 0 0.014870256
3 Black 0 0.25 250 50 0 0 0.013180870
4 Black 0 0.25 250 50 0 0 0.013448804
5 Black 0 0.25 250 50 3 0.05 0.012996979
6 Black 0 0.25 250 50 3 0.05 0.012115166
7 Black 0 0.25 250 50 3 0.05 0.012427347
8 Black 0 0.25 250 50 3 0.05 0.012561253
9 Black 0 0.25 250 50 5 0.1 0.012480644
10 Black 0 0.25 250 50 5 0.1 0.011603403
11 Black 0 0.25 250 50 5 0.1 0.011427116
12 Black 0 0.25 250 50 5 0.1 0.011545803
13 Black 0 0.25 250 50 13 0.25 0.009891865
14 Black 0 0.25 250 50 13 0.25 0.008465604
15 Black 0 0.25 250 50 13 0.25 0.009089619
16 Black 0 0.25 250 50 13 0.25 0.008560160
17 Black 0 0.25 250 75 0 0 0.025101186
18 Black 0 0.25 250 75 0 0 0.023611920
19 Black 0 0.25 250 75 0 0 0.026276007
20 Black 0 0.25 250 75 0 0 0.026593895
I am trying to group the data by the parameter combinations and calculate the average FxMean, sd, and se, for that group of 4 replicates.
I have tried to follow tutorials and other examples where people try to summarize the data (example), but it doesn't work for me. It normally spits out an array that looks nothing like what I need.
For example:
fx_data2 <- ddply(data_csv, c(data_csv$`Frequency (Hz)`,data_csv$`Flow Speed (rpm)`, data_csv$StepTime, data_csv$Maxpress, data_csv$`Minpress Percentage`), summarise,
N = length(data_csv$FxMean),
mean = mean(data_csv$FxMean),
sd = sd(data_csv$FxMean),
se = sd / sqrt(N)
)
fx_data3 <- summaryBy(FxMean ~freq + foilColor+maxP+minPP, data=data_csv, FUN=c(length, mean, sd))
fx_data2 looks just...abyssmal.
head(fx_data2)
....
Foil Color.2530 Foil Color.2531 Foil Length.2512 Foil Length.2513 Foil
Length.2514 Foil Length.2515 Flow Speed (rpm).2544 Flow Speed (rpm).2545
Flow Speed (rpm).2546 Flow Speed (rpm).2547 Frequency (Hz).800 Frequency
(Hz).801 Frequency (Hz).802 Frequency (Hz).803 Foil Color.2532 Foil Color.2533
Foil Color.2534 Foil Color.2535 Foil Length.2516 Foil Length.2517 Foil
Length.2518 Foil Length.2519 Flow Speed (rpm).2548 Flow Speed (rpm).2549
Flow Speed (rpm).2550 Flow Speed (rpm).2551 Frequency (Hz).804 Frequency
(Hz).805 Frequency (Hz).806 Frequency (Hz).807 Foil Color.2536 Foil Color.2537
I mean. I have no idea what's going on with that. The dimensions are 24x8724. Just...what.
and fx_data3 looks like this:
> fx_data3
FxMean.length FxMean.mean FxMean.sd
1 1744 0.01379712 0.01423244
>
Ideally, these would look like the original data set, but each parameter combination is compressed to a single line, and the values on the far right would be the mean, sd, and se for the FxMean, FxStDev, etc. for the four replicates.
I've been struggling with this for a few days. I'd greatly appreciate some help.
Thank you,
Zane
url <- "https://www.dropbox.com/sh/vhf39uz4pol7sgl/AAAJ9Fr6OTEIgb_ZeSno-X5ea?dl=1"
download.file(url, destfile = "from-SO-via-dropbox")
unzip("from-SO-via-dropbox")
df <- readr::read_csv("Data_subset.csv")
library(dplyr)
df %>%
group_by(`Frequency (Hz)`, `Foil Color`, StepTime, Maxpress, `Minpress Percentage`) %>%
summarise_at(vars(FxMean), funs(N = length, mean, sd, se = sd(.) / sqrt(N)))
# # A tibble: 13 x 9
# # Groups: Frequency (Hz), Foil Color, StepTime, Maxpress [?]
# `Frequency (Hz)` `Foil Color` StepTime Maxpress `Minpress Percentage` N mean sd se
# <dbl> <chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0.25 Black 250 50 0.00 4 0.014009248 0.0008206156 0.0004103078
# 2 0.25 Black 250 50 0.05 4 0.012525186 0.0003658681 0.0001829340
# 3 0.25 Black 250 50 0.10 4 0.011764241 0.0004832082 0.0002416041
# 4 0.25 Black 250 50 0.25 4 0.009001812 0.0006538297 0.0003269149
# 5 0.25 Black 250 75 0.00 4 0.025395752 0.0013514463 0.0006757231
# 6 0.25 Black 250 75 0.05 4 0.020794212 0.0028703242 0.0014351621
# 7 0.25 Black 250 75 0.10 4 0.018409500 0.0037305138 0.0018652569
# 8 0.25 Black 250 75 0.25 4 0.016193536 0.0016200530 0.0008100265
# 9 0.25 Black 250 100 0.00 4 0.035485324 0.0052513208 0.0026256604
# 10 0.25 Black 250 100 0.05 4 0.050097709 0.0024123653 0.0012061827
# 11 0.25 Black 250 100 0.10 4 0.051378181 0.0049857712 0.0024928856
# 12 0.25 Black 250 100 0.25 4 0.039374874 0.0031421884 0.0015710942
# 13 0.50 Black 250 50 0.00 2 0.014778494 0.0004683882 0.0003312005
Which parameters you want to group_by? Just insert them in the code snippet below in place of param1, param2 etc
You could use dplyr:
library(dplyr)
data %>%
group_by(param1, param2, param3) %>%
summarise(mean = mean(FxMean),
sd = sd(FxMean),
se = sd/n())

R - Disaggregate coverage area data based on a ranking preference

I have 4G mobile coverage at the Local Authority level in the UK, as a percentage of geographical area covered (for approximately 200 areas). I want to disaggregate this data so I can work with roughly 9000 lower-level postcode sector.
The most appropriate way for me to do this, is to allocate 4G geographic coverage to the most densely populated areas first, as this would best represent how mobile operators would invest in the market. The least populated areas would end up with no coverage. I'm struggling with how I do this in R, however.
I have a data frame that looks like this for the postcode sector data (I've used hypothetical data here):
Name pcd.sect pop area pop.dens rank
Cambridge 1 5546 0.6 8341 1
Cambridge 2 7153 1.1 5970 2
Cambridge 3 5621 2.3 5289 3
Cambridge 4 10403 4.3 4361 4
Cambridge 5 14796 4.2 3495 5
...
I then took the aggregate local authority data and put it on each row (adding the three right columns):
Name pcd.sect pop area pop.dens rank LA.4G LA.area LA.4G(km2)
Cambridge 1 5546 0.6 8341 1 58 140 82
Cambridge 2 7153 1.1 5970 2 58 140 82
Cambridge 3 5621 2.3 5289 3 58 140 82
Cambridge 4 10403 4.3 4361 4 58 140 82
Cambridge 5 14796 4.2 3495 5 58 140 82
...
I had to shorten the headings, so let me just explain them in more detail:
Name - Local Authority name
pcd.sector - postcode sector (so the lower level unit)
pop - the population in the postcode sector
area - surface area of the postcode sector in km2
pop.dens - is the population density of the postcode sector in persons per km2
rank - rank of the postcode sector based on population density within each local authority
LA.4G - % coverage of the local authority with 4G
LA.area - the sum of the area column for each local authority
LA.4G(km2) - the number of km2 with 4G coverage within each local authority
Taking Cambridge as a hypothetical example, there is 58% 4G coverage across the whole Local Authority. I then want to disaggregate this number to achieve 4G coverage for the individual postcode sectors.
Ideally the data would end up looking like this, with an extra column for the postcode sector coverage:
Name pcd.sect ... pcd.sector.coverage (%)
Cambridge 1 ... 100
Cambridge 2 ... 100
Cambridge 3 ... 100
Cambridge 4 ... 34
Cambridge 5 ... 0
... ... ... ...
How do I get R to allocate this 82km2 (58% geographical coverage) out to the postcode sectors in a new column, based on the area column, but then stopping once it hits the maximum coverage level of 82km2 (58% geographical coverage)?
this is how I interpret this question. Correct me if this is not what you meant.
Suppose you have the following data.
dat <- data.frame(
Name = "A", pcd.sector = 1:5,
area = c(2, 3, 1, 5, 3),
areaSum = 14, LA.4G = 8
)
dat
# Name pcd.sector area areaSum LA.4G
#1 A 1 2 14 8
#2 A 2 3 14 8
#3 A 3 1 14 8
#4 A 4 5 14 8
#5 A 5 3 14 8
You have five sectors, with various areas. Although the areas sum up to 14, there are only 8 covered by 4G. You want to allocate the areas from the sectors 1 through 5.
The following code does this job. I used cumsum function to compute the cumulative sum of areas from the top sector, which is capped by the 4G coverage limit. Allocated area can be computed by diff function, which takes the one-step difference of a vector. The sector 1 through 3 gets 100% coverage, which sum up to 6 areas, hence only 2 remains. Although sector 4 has 5 area, it can only enjoy 2, or 40%. This uses up the areas and nothing is left for the sector 5.
dat$area_allocated <- diff(c(0, pmin(cumsum(dat$area), dat$LA.4G)))
dat$area_coverage <- dat$area_allocated / dat$area * 100
dat
# Name pcd.sector area areaSum LA.4G area_allocated area_coverage
# 1 A 1 2 14 8 2 100
# 2 A 2 3 14 8 3 100
# 3 A 3 1 14 8 1 100
# 4 A 4 5 14 8 2 40
# 5 A 5 3 14 8 0 0
If you have a lot of areas, then you may want to use dplyr::group_by function.
dat <- rbind(
data.frame(
Name = "A", pcd.sector = 1:5,
area = c(2, 3, 1, 5, 3),
areaSum = 14, LA.4G = 8
),
data.frame(
Name = "B", pcd.sector = 1:3,
area = c(4, 3, 2),
areaSum = 9, LA.4G = 5
)
)
library(dplyr)
dat <- dat %>% group_by(Name) %>%
mutate(area_allocated = diff(c(0, pmin(cumsum(area), LA.4G)))) %>%
mutate(area_coverage = area_allocated / area * 100)
dat
# Name pcd.sector area areaSum LA.4G area_allocated area_coverage
# <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 2 14 8 2 100.00000
# 2 A 2 3 14 8 3 100.00000
# 3 A 3 1 14 8 1 100.00000
# 4 A 4 5 14 8 2 40.00000
# 5 A 5 3 14 8 0 0.00000
# 6 B 1 4 9 5 4 100.00000
# 7 B 2 3 9 5 1 33.33333
# 8 B 3 2 9 5 0 0.00000

changing the color of a subgraph in igraph plot

I have the following code to plot the minimum spanning tree of a graph
## g is an igraph graph
mst = minimum.spanning.tree(g)
E(g)$color <- "SkyBlue2"
## how to I make mst a different color
E(g)[E(mst)]$color = "red" ### <---- I WANT TO DO ESSENTIALLY THIS
plot(g, edge.label=E(g)$weight)
That is, for a simple graph, I find the mst. I want to change the mst to red and plot the mst as part of the main graph. To do this, I want to select the edges of g that are also in mst. How do I do this?
UPDATE:
More generally, I have a graph g0 which is the mst of g, which has n vertices. It was constructed as follows
## implementing the Dijkstra-Prim algorithm
v0 = sample(1:n, 1)
g0 = graph.empty(n=n, directed=FALSE)
weight.g0 = 0
while(length(setdiff(1:n, v0) > 0)) {
## chose the shortest edge in the cut set of g
## to find the cut, figure out the set of edges where vertex is
## in v0 and the other is not
cutset = E(g)[ v0 %->% setdiff(1:n, v0)]
## find the lightest weight edge
cutweights = E(g)$weight[cutset]
lightest_edge_idx = which(cutweights == min(cutweights))[1]
weight.g0 = weight.g0 + min(cutweights)
## get the vertices of the lightest weight edge, add to path
lightest_edge = cutset[as.numeric(cutset)[lightest_edge_idx]]
vertices = get.edges(g, as.numeric(lightest_edge))
g0 <- add.edges(g0, vertices, weight=min(cutweights))
## now that we have the vertices, add the one that is not in the
## graph already
for(vtx in vertices) {
if(!(vtx %in% v0)) {
v0 = c(vtx, v0)
}
}
}
I know I am probably not using a lot of useful features of igraph, but I do get g0 to be a mst at the end of this loop. Given this, I have
E(g0)
Edge sequence:
[1] 8 -- 1
[2] 2 -- 1
[3] 9 -- 8
[4] 9 -- 5
[5] 3 -- 2
[6] 4 -- 3
[7] 7 -- 3
[8] 11 -- 4
[9] 7 -- 6
[10] 11 -- 10
> E(g)
Edge sequence:
[1] 2 -- 1
[2] 5 -- 1
[3] 8 -- 1
[4] 3 -- 2
[5] 5 -- 2
[6] 6 -- 2
[7] 4 -- 3
[8] 6 -- 3
[9] 7 -- 3
[10] 7 -- 4
[11] 11 -- 4
[12] 6 -- 5
[13] 8 -- 5
[14] 9 -- 5
[15] 7 -- 6
[16] 9 -- 6
[17] 10 -- 6
[18] 10 -- 7
[19] 11 -- 7
[20] 9 -- 8
[21] 10 -- 9
[22] 11 -- 10
My question was, how do I assign an attribute to the edges in E(g) that are also in E(g0)?
This is actually quite easy because minimum.spanning.tree() keeps edge attributes. So you just need to assign an edge id attribute, and you'll see which edges to color red. It goes like this:
# Some test data, no edge weights, quite boring
g <- erdos.renyi.game(20,2/20)
g
# IGRAPH U--- 20 24 -- Erdos renyi (gnp) graph
# + attr: name (g/c), type (g/c), loops (g/l), p (g/n)
E(g)$id <- seq_len(ecount(g))
mst <- minimum.spanning.tree(g)
mst
# IGRAPH U--- 20 18 -- Erdos renyi (gnp) graph
# + attr: name (g/c), type (g/c), loops (g/l), p (g/n), id (e/n)
E(mst)$id
# [1] 1 2 3 6 7 8 9 10 11 12 13 16 18 19 20 22 23 24
E(g)$color <- "black"
E(g)$color[E(mst)$id] <- "red"
plot(g)

Resources