Dataframe Aggregation By Group - Separating a Column's Values by Ranges - r

I have a dataframe as follows:
parent<- c('a', 'b', 'c', 'd',
'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p',
'q', 'r', 's', 't',
'u', 'v', 'w', 'x',
'y', 'z')
child<- c('A', 'B', 'C', 'D',
'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L',
'M', 'N', 'O', 'P',
'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X',
'Y', 'Z')
Type<- c('desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop', 'desktop', 'desktop',
'desktop', 'desktop')
Size<- c('MEDIUM', 'MEDIUM', 'LARGE', 'LARGE',
'SMALL', 'MEDIUM', 'LARGE', 'SMALL',
'MEDIUM', 'SMALL', 'LARGE', 'LARGE',
'SMALL', 'SMALL', 'LARGE', 'LARGE',
'MEDIUM', 'SMALL', 'SMALL', 'MEDIUM',
'LARGE', 'MEDIUM', 'SMALL', 'MEDIUM',
'LARGE', 'MEDIUM')
Revenue<- c(22138.16, 18617.94, 12394.36, 10535.76,
8901.41, 7320.17, 3821.40, 2811.50,
2483.10, 2145.76, 2138.41, 2037.67,
1950.52, 1837.93, 1737.68, 1554.61,
1374.40, 1334.02, 1214.60, 1191.41,
1189.56, 1174.55, 1162.80, 1131.29,
1127.05, 1108.53)
NumberofSales<- c(1954720, 5129937, 1086104, 970326,
1608012, 746613, 333424, 236643,
352294, 587541, 209218, 342455,
192670, 340580, 275260, 248049,
251790, 128845, 303515, 112218,
149878, 226633, 194973, 103425,
101819, 114570)
Price<- c(11.325489, 3.629273, 11.411762, 10.857959,
5.535661, 9.804504, 11.461083, 11.880766,
7.048374, 3.652103, 10.220966, 5.950183,
10.123631, 5.396471, 6.312868, 6.267350,
5.458517, 10.353681, 4.001779, 10.616924,
7.936855, 5.182608, 5.963908, 10.938264,
11.069152, 9.675570)
Opps<- c(5144351, 6038044, 2354341, 4578272,
7197544, 474510, 1045528, 181471,
1071631, 801038, 928563, 477870,
590497, 849537, 410179, 432703,
1983993, 330478, 939806, 191824,
283107, 575004, 256846, 249530,
142318, 2036363)
df<-data.frame(parent, child, Type, Size,
Revenue, NumberofSales, Price, Opps)
This is what it looks like:
df
parent child Type Size Revenue NumberofSales Price Opps
1 a A desktop MEDIUM 22138.16 1954720 11.325489 5144351
2 b B desktop MEDIUM 18617.94 5129937 3.629273 6038044
3 c C desktop LARGE 12394.36 1086104 11.411762 2354341
4 d D desktop LARGE 10535.76 970326 10.857959 4578272
5 e E desktop SMALL 8901.41 1608012 5.535661 7197544
6 f F desktop MEDIUM 7320.17 746613 9.804504 474510
7 g G desktop LARGE 3821.40 333424 11.461083 1045528
8 h H desktop SMALL 2811.50 236643 11.880766 181471
9 i I desktop MEDIUM 2483.10 352294 7.048374 1071631
10 j J desktop SMALL 2145.76 587541 3.652103 801038
11 k K desktop LARGE 2138.41 209218 10.220966 928563
12 l L desktop LARGE 2037.67 342455 5.950183 477870
13 m M desktop SMALL 1950.52 192670 10.123631 590497
14 n N desktop SMALL 1837.93 340580 5.396471 849537
15 o O desktop LARGE 1737.68 275260 6.312868 410179
16 p P desktop LARGE 1554.61 248049 6.267350 432703
17 q Q desktop MEDIUM 1374.40 251790 5.458517 1983993
18 r R desktop SMALL 1334.02 128845 10.353681 330478
19 s S desktop SMALL 1214.60 303515 4.001779 939806
20 t T desktop MEDIUM 1191.41 112218 10.616924 191824
21 u U desktop LARGE 1189.56 149878 7.936855 283107
22 v V desktop MEDIUM 1174.55 226633 5.182608 575004
23 w W desktop SMALL 1162.80 194973 5.963908 256846
24 x X desktop MEDIUM 1131.29 103425 10.938264 249530
25 y Y desktop LARGE 1127.05 101819 11.069152 142318
26 z Z desktop MEDIUM 1108.53 114570 9.675570 2036363
I want to create a dataframe that shows the distribution of Price BY Size and Type with all of the appropriate metrics for these Price ranges. I want the final dataframe to look like this. ( I didn't do the aggregation for the metric values because it takes way too long the way I am currently doing it, that's why they are all the same right now but the final answer should have all different values)
Type Size Price Range SUM_Opps SUM_NumberofSales SUM_Revenue
1 desktop LARGE $3-$3.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $4-$4.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $5-$5.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $6-$6.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $7-$7.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $8-$8.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $9-$9.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $10-$10.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $11-$11.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $12-$12.99 9,143,587 2,531,983 $8,453.93
1 desktop LARGE $13-Greater 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $3-$3.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $4-$4.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $5-$5.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $6-$6.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $7-$7.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $8-$8.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $9-$9.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $10-$10.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $11-$11.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $12-$12.99 9,143,587 2,531,983 $8,453.93
1 desktop MEDIUM $13-Greater 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $3-$3.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $4-$4.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $5-$5.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $6-$6.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $7-$7.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $8-$8.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $9-$9.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $10-$10.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $11-$11.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $12-$12.99 9,143,587 2,531,983 $8,453.93
1 desktop SMALL $13-Greater 9,143,587 2,531,983 $8,453.93
How do I create the table above? The table above is showing the sum of OPPS, Number of Sales, and Revenue BY Type, Size, and Price Range.
I understand how to use dplyr to do the simple aggregation but the tough part is doing the distribution of prices.
Any help would be great, thanks!

you could use Hmisc::cut2() to generate you price bins as levels of a factor:
library(Hmisc)
library(dplyr)
df$cut_Price <- cut2(df$Price, cuts = 4:13)
df %>% group_by(cut_Price, Size, Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"),"sum") %>%
arrange(Size, cut_Price) %>% ungroup() %>%
mutate(cut_Price = gsub("(.*, \\d\\.)00", "\\199", cut_Price))
# A tibble: 16 × 6
cut_Price Size Type Opps NumberofSales Revenue
<chr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 [ 5.00, 6.99) LARGE desktop 477870 342455 2037.67
2 [ 6.00, 7.99) LARGE desktop 842882 523309 3292.29
3 [ 7.00, 8.99) LARGE desktop 283107 149878 1189.56
4 [10.00,11.00) LARGE desktop 5506835 1179544 12674.17
5 [11.00,12.00) LARGE desktop 3542187 1521347 17342.81
6 [ 3.63, 4.99) MEDIUM desktop 6038044 5129937 18617.94
7 [ 5.00, 6.99) MEDIUM desktop 2558997 478423 2548.95
8 [ 7.00, 8.99) MEDIUM desktop 1071631 352294 2483.10
9 [ 9.00,10.00) MEDIUM desktop 2510873 861183 8428.70
10 [10.00,11.00) MEDIUM desktop 441354 215643 2322.70
11 [11.00,12.00) MEDIUM desktop 5144351 1954720 22138.16
12 [ 3.63, 4.99) SMALL desktop 801038 587541 2145.76
13 [ 4.00, 5.99) SMALL desktop 939806 303515 1214.60
14 [ 5.00, 6.99) SMALL desktop 8303927 2143565 11902.14
15 [10.00,11.00) SMALL desktop 920975 321515 3284.54
16 [11.00,12.00) SMALL desktop 181471 236643 2811.50
if you want to adjust the cuts to every 0.5 instead of 1, you could do this since its the vector passed to cut = ... is defining the "cut points":
df$cut_Price <- cut2(df$Price, cuts = seq(4,13,.5))

this will add the price bins
library(dplyr)
df %>%
mutate(price_bin=ifelse(Price>13, 13, floor(Price))) %>%
group_by(Type, Size, price_bin) %>%
summarise(sum_opps=sum(Opps), sum_sales=sum(NumberofSales), sum_revenue=sum(Revenue))
update
not sure why there are down votes when this returns the same results as the accepted answer without requiring an additional library
Type Size price_bin sum_opps sum_sales sum_revenue
<fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
1 desktop LARGE 5 477870 342455 2037.67
2 desktop LARGE 6 842882 523309 3292.29
3 desktop LARGE 7 283107 149878 1189.56
4 desktop LARGE 10 5506835 1179544 12674.17
5 desktop LARGE 11 3542187 1521347 17342.81
6 desktop MEDIUM 3 6038044 5129937 18617.94
7 desktop MEDIUM 5 2558997 478423 2548.95
8 desktop MEDIUM 7 1071631 352294 2483.10
9 desktop MEDIUM 9 2510873 861183 8428.70
10 desktop MEDIUM 10 441354 215643 2322.70
11 desktop MEDIUM 11 5144351 1954720 22138.16
12 desktop SMALL 3 801038 587541 2145.76
13 desktop SMALL 4 939806 303515 1214.60
14 desktop SMALL 5 8303927 2143565 11902.14
15 desktop SMALL 10 920975 321515 3284.54
16 desktop SMALL 11 181471 236643 2811.50

Related

Best plot to show percentage of quantity sold of different products for each artist R

I have this dataframe :
artist product quantity_sold freq
<chr> <chr> <dbl> <dbl>
1 Bad Waitress bad waitress x-large 1 11.1
2 Bad Waitress cheaper reaper do 1 11.1
3 Bad Waitress cheaper reaper m 1 11.1
4 Bad Waitress cheaper reaper tank top 1 11.1
5 Bad Waitress short sleeve black 5 55.6
6 Black Pumas capitol cuts 2 50
7 Black Pumas poster 2 50
8 CMAT cmat socks 51 12.2
9 CMAT if my wife new i'd be dead poster 95 22.8
10 CMAT imwn bolo tie 37 8.87
I would like to have a plot where I select an artist and it will show each product they sold with its percentage (I would prefer an interactive plot or a table).
I have tried a grouped barchart with plotly but it looks very bad. I tried to add a button so that I did not have a plot with all artists but I cannot make it work.
Do you have any suggestion?
Thank you

How to create spouse variable?

I have a data of couples, with variables : 'household number', 'head of household' , 'education', 'income'. 'household number' is the id number that is uniquely assigned to each household. 'head of household' is whether the person is the head of the household ( 1 = head of household, 2 = spouse of head of the household), 'education' and 'income' is education level and income of individual respectively. For example, data looks like below.
'household_number' 'head_of_household' 'education' 'income'
1 1 high 1000
1 2 low 100
3 1 medium 500
3 2 high 800
4 2 high 800
4 1 high 800
9 1 low 150
9 2 low 200
I want to create spouse's variable for each individual. So that data looks like below. Where 'spouse edu' is spouse's education level and 'spouse inc' is spouse's income.
'household_number' 'head_of_household' 'education' 'income' 'spouse_edu' 'spouse_inc'
1 1 high 1000 low 100
1 2 low 100 high 1000
3 1 medium 500 high 800
3 2 high 800 medium 500
4 2 high 800 high 800
4 1 high 800 high 800
9 1 low 150 low 200
9 2 low 200 low 150
I have very large dataset so I am looking for simple way to do this. Is there any elegant way to do this?
Below is reproducible example syntax.
household_number <- c(1,1,3,3,4,4,9,9)
head_of_household <- c(1,2,1,2,2,1,1,2)
education <- c("high", "low", "medium", "high", "high", "high", "low", "low")
income <- c(1000, 100, 500, 800, 800, 800, 150, 200)
data <- data.frame(household_number, head_of_household, education, income)
You can use base::rev and dplyr here.
library(dplyr)
data %>%
group_by(household_number) %>%
mutate(spouse_income = rev(income),
spouse_education = rev(education)) %>%
ungroup()
# A tibble: 8 x 6
# household_number head_of_household education income spouse_income spouse_education
# <dbl> <dbl> <fctr> <dbl> <dbl> <fctr>
#1 1 1 high 1000 100 low
#2 1 2 low 100 1000 high
#3 3 1 medium 500 800 high
#4 3 2 high 800 500 medium
#5 4 2 high 800 800 high
#6 4 1 high 800 800 high
#7 9 1 low 150 200 low
#8 9 2 low 200 150 low
A solution using data.table.
library(data.table)
setDT(data)[, c("spouse_income", "spouse_education") := .(rev(income), rev(education)),
by = household_number][]
# same as
# setDT(data)[, `:=`(spouse_income = rev(income),
# spouse_education = rev(education)),
# by = household_number][]
In base R one could do
transform(data,
spouse_income = ave(income, household_number, FUN = rev),
spouse_education = ave(education, household_number, FUN = rev))
The other way to solve this using shift in data.table. It will be 2 step process though.
First group by on household_number and fill spouse details of 1st set using shift with lag
data[,':='(
spouse_edu = shift(education),
spouse_inc = shift(income)),
by = household_number]
> data
household_number head_of_household education income spouse_edu spouse_inc
1: 1 1 high 1000 NA NA
2: 1 2 low 100 high 1000
3: 3 1 medium 500 NA NA
4: 3 2 high 800 medium 500
5: 4 2 high 800 NA NA
6: 4 1 high 800 high 800
7: 9 1 low 150 NA NA
8: 9 2 low 200 low 150
Now, fill spouse details for other set using lead type of shift. Make sure we don't replace spouse details already filled in or updated.
data[,':='(
spouse_edu = ifelse( is.na(spouse_edu), shift(education, type="lead"), spouse_edu) ,
spouse_inc = ifelse( is.na(spouse_inc), shift(income, type="lead"), spouse_inc)),
by = household_number]
> data
household_number head_of_household education income spouse_edu spouse_inc
1: 1 1 high 1000 low 100
2: 1 2 low 100 high 1000
3: 3 1 medium 500 high 800
4: 3 2 high 800 medium 500
5: 4 2 high 800 high 800
6: 4 1 high 800 high 800
7: 9 1 low 150 low 200
8: 9 2 low 200 low 150

Modify Breaks in cut2 function in Hmisc package

This is a follow-up to this question:
Dataframe Aggregation By Group - Separating a Column's Values by Ranges
The answer provided uses Hmisc::cut2 which works great! I want to modify the breaks so that instead of breaking by $1 it breaks by $0.50.
Below is the code provided for the answer:
library(Hmisc)
library(dplyr)
df$cut_Price <- cut2(df$Price, cuts = 4:13)
df %>% group_by(cut_Price, Size, Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"),"sum") %>%
arrange(Size, cut_Price) %>% ungroup() %>%
mutate(cut_Price = gsub("(.*, \\d\\.)00", "\\199", cut_Price))
# A tibble: 16 × 6
cut_Price Size Type Opps NumberofSales Revenue
<chr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 [ 5.00, 6.99) LARGE desktop 477870 342455 2037.67
2 [ 6.00, 7.99) LARGE desktop 842882 523309 3292.29
3 [ 7.00, 8.99) LARGE desktop 283107 149878 1189.56
4 [10.00,11.00) LARGE desktop 5506835 1179544 12674.17
5 [11.00,12.00) LARGE desktop 3542187 1521347 17342.81
6 [ 3.63, 4.99) MEDIUM desktop 6038044 5129937 18617.94
7 [ 5.00, 6.99) MEDIUM desktop 2558997 478423 2548.95
8 [ 7.00, 8.99) MEDIUM desktop 1071631 352294 2483.10
9 [ 9.00,10.00) MEDIUM desktop 2510873 861183 8428.70
10 [10.00,11.00) MEDIUM desktop 441354 215643 2322.70
11 [11.00,12.00) MEDIUM desktop 5144351 1954720 22138.16
12 [ 3.63, 4.99) SMALL desktop 801038 587541 2145.76
13 [ 4.00, 5.99) SMALL desktop 939806 303515 1214.60
14 [ 5.00, 6.99) SMALL desktop 8303927 2143565 11902.14
15 [10.00,11.00) SMALL desktop 920975 321515 3284.54
16 [11.00,12.00) SMALL desktop 181471 236643 2811.50
Any help would be great, thanks!
You need to pass cut2 the vector of breaks you want, which you can create with seq:
library(tidyverse)
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5)),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum)
## Source: local data frame [18 x 6]
## Groups: Size, cut_Price [?]
##
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 2 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 3 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 4 LARGE [10.00,10.50) desktop 928563 209218 2138.41
## 5 LARGE [10.50,11.00) desktop 4578272 970326 10535.76
## 6 LARGE [11.00,11.50) desktop 3542187 1521347 17342.81
## 7 MEDIUM [ 3.63, 4.00) desktop 6038044 5129937 18617.94
## 8 MEDIUM [ 5.00, 5.50) desktop 2558997 478423 2548.95
## 9 MEDIUM [ 7.00, 7.50) desktop 1071631 352294 2483.10
## 10 MEDIUM [ 9.50,10.00) desktop 2510873 861183 8428.70
## 11 MEDIUM [10.50,11.00) desktop 441354 215643 2322.70
## 12 MEDIUM [11.00,11.50) desktop 5144351 1954720 22138.16
## 13 SMALL [ 3.63, 4.00) desktop 801038 587541 2145.76
## 14 SMALL [ 4.00, 4.50) desktop 939806 303515 1214.60
## 15 SMALL [ 5.00, 5.50) desktop 849537 340580 1837.93
## 16 SMALL [ 5.50, 6.00) desktop 7454390 1802985 10064.21
## 17 SMALL [10.00,10.50) desktop 920975 321515 3284.54
## 18 SMALL [11.50,12.00) desktop 181471 236643 2811.50
If you want rows for every value, you can use tidyr::complete. Empty values will be NA unless you specify otherwise in complete's fill parameter.
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5), oneval = FALSE),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum) %>%
ungroup() %>%
complete(Size, cut_Price, Type)
## # A tibble: 57 × 6
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 3.63, 4.00) desktop NA NA NA
## 2 LARGE [ 4.00, 4.50) desktop NA NA NA
## 3 LARGE [ 4.50, 5.00) desktop NA NA NA
## 4 LARGE [ 5.00, 5.50) desktop NA NA NA
## 5 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 6 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 7 LARGE [ 6.50, 7.00) desktop NA NA NA
## 8 LARGE [ 7.00, 7.50) desktop NA NA NA
## 9 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 10 LARGE [ 8.00, 8.50) desktop NA NA NA
## # ... with 47 more rows

R - Disaggregate coverage area data based on a ranking preference

I have 4G mobile coverage at the Local Authority level in the UK, as a percentage of geographical area covered (for approximately 200 areas). I want to disaggregate this data so I can work with roughly 9000 lower-level postcode sector.
The most appropriate way for me to do this, is to allocate 4G geographic coverage to the most densely populated areas first, as this would best represent how mobile operators would invest in the market. The least populated areas would end up with no coverage. I'm struggling with how I do this in R, however.
I have a data frame that looks like this for the postcode sector data (I've used hypothetical data here):
Name pcd.sect pop area pop.dens rank
Cambridge 1 5546 0.6 8341 1
Cambridge 2 7153 1.1 5970 2
Cambridge 3 5621 2.3 5289 3
Cambridge 4 10403 4.3 4361 4
Cambridge 5 14796 4.2 3495 5
...
I then took the aggregate local authority data and put it on each row (adding the three right columns):
Name pcd.sect pop area pop.dens rank LA.4G LA.area LA.4G(km2)
Cambridge 1 5546 0.6 8341 1 58 140 82
Cambridge 2 7153 1.1 5970 2 58 140 82
Cambridge 3 5621 2.3 5289 3 58 140 82
Cambridge 4 10403 4.3 4361 4 58 140 82
Cambridge 5 14796 4.2 3495 5 58 140 82
...
I had to shorten the headings, so let me just explain them in more detail:
Name - Local Authority name
pcd.sector - postcode sector (so the lower level unit)
pop - the population in the postcode sector
area - surface area of the postcode sector in km2
pop.dens - is the population density of the postcode sector in persons per km2
rank - rank of the postcode sector based on population density within each local authority
LA.4G - % coverage of the local authority with 4G
LA.area - the sum of the area column for each local authority
LA.4G(km2) - the number of km2 with 4G coverage within each local authority
Taking Cambridge as a hypothetical example, there is 58% 4G coverage across the whole Local Authority. I then want to disaggregate this number to achieve 4G coverage for the individual postcode sectors.
Ideally the data would end up looking like this, with an extra column for the postcode sector coverage:
Name pcd.sect ... pcd.sector.coverage (%)
Cambridge 1 ... 100
Cambridge 2 ... 100
Cambridge 3 ... 100
Cambridge 4 ... 34
Cambridge 5 ... 0
... ... ... ...
How do I get R to allocate this 82km2 (58% geographical coverage) out to the postcode sectors in a new column, based on the area column, but then stopping once it hits the maximum coverage level of 82km2 (58% geographical coverage)?
this is how I interpret this question. Correct me if this is not what you meant.
Suppose you have the following data.
dat <- data.frame(
Name = "A", pcd.sector = 1:5,
area = c(2, 3, 1, 5, 3),
areaSum = 14, LA.4G = 8
)
dat
# Name pcd.sector area areaSum LA.4G
#1 A 1 2 14 8
#2 A 2 3 14 8
#3 A 3 1 14 8
#4 A 4 5 14 8
#5 A 5 3 14 8
You have five sectors, with various areas. Although the areas sum up to 14, there are only 8 covered by 4G. You want to allocate the areas from the sectors 1 through 5.
The following code does this job. I used cumsum function to compute the cumulative sum of areas from the top sector, which is capped by the 4G coverage limit. Allocated area can be computed by diff function, which takes the one-step difference of a vector. The sector 1 through 3 gets 100% coverage, which sum up to 6 areas, hence only 2 remains. Although sector 4 has 5 area, it can only enjoy 2, or 40%. This uses up the areas and nothing is left for the sector 5.
dat$area_allocated <- diff(c(0, pmin(cumsum(dat$area), dat$LA.4G)))
dat$area_coverage <- dat$area_allocated / dat$area * 100
dat
# Name pcd.sector area areaSum LA.4G area_allocated area_coverage
# 1 A 1 2 14 8 2 100
# 2 A 2 3 14 8 3 100
# 3 A 3 1 14 8 1 100
# 4 A 4 5 14 8 2 40
# 5 A 5 3 14 8 0 0
If you have a lot of areas, then you may want to use dplyr::group_by function.
dat <- rbind(
data.frame(
Name = "A", pcd.sector = 1:5,
area = c(2, 3, 1, 5, 3),
areaSum = 14, LA.4G = 8
),
data.frame(
Name = "B", pcd.sector = 1:3,
area = c(4, 3, 2),
areaSum = 9, LA.4G = 5
)
)
library(dplyr)
dat <- dat %>% group_by(Name) %>%
mutate(area_allocated = diff(c(0, pmin(cumsum(area), LA.4G)))) %>%
mutate(area_coverage = area_allocated / area * 100)
dat
# Name pcd.sector area areaSum LA.4G area_allocated area_coverage
# <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 2 14 8 2 100.00000
# 2 A 2 3 14 8 3 100.00000
# 3 A 3 1 14 8 1 100.00000
# 4 A 4 5 14 8 2 40.00000
# 5 A 5 3 14 8 0 0.00000
# 6 B 1 4 9 5 4 100.00000
# 7 B 2 3 9 5 1 33.33333
# 8 B 3 2 9 5 0 0.00000

Colouring points in an ordination plot in r using a data frame

Using the following data.frame:
df<-data.frame("sites"=as.character(1:20),"type"=c(rep("small",10),rep("large",10)))
sites type
1 1 small
2 2 small
3 3 small
4 4 small
5 5 small
6 6 small
7 7 small
8 8 small
9 9 small
10 10 small
11 11 large
12 12 large
13 13 large
14 14 large
15 15 large
16 16 large
17 17 large
18 18 large
19 19 large
20 20 large
I would like to colour the text labels (i.e. 1-20) by label (i.e. "small", "large") in the following ordination plot:
library(vegan)
library(stats)
data(dune)
dist <- vegdist(wisconsin(dune))
#Ordinate data
pc<-cmdscale(dist, k=10, eig=TRUE, add=TRUE, x.ret =TRUE)
#Create ordination plot
quartz(title="PCoA on coral data")
fig<-ordiplot(scores(pc)[,c(1,2)], type="t", main="PCoA")
It looks like the text label color is hard coded in ordiplot so you have to set up the plot and then use text() to plot the labels by group:
score <- scores(pc)[, 1:2]
fig<-ordiplot(score, type="n", main="PCoA")
color <- c("red", "blue")
sz <- as.numeric(df$type[as.numeric(rownames(score))])
text(score, rownames(score), col=color[sz])

Resources