Create table with nested header from pre-summarized data - r

How do I create a nested table from a data.frame, which have already been summarized? By nested I mean that the table has headers and subheaders.
My input data looks like this:
library(ggplot2)
library(reshape2)
df <- ggplot2::diamonds
count(df, cut,color) %>% mutate(
n = n,
pct = round(n / sum(n),2) ) %>% reshape2::melt() -> df2
head(df2 )
> head(df2 )
cut color variable value
1 Fair D n 163
2 Fair E n 224
3 Fair F n 312
4 Fair G n 314
5 Fair H n 303
6 Fair I n 175
I would like to have something this:
Color
D E F G H I J
cut n pct n pct n pct n pct n pct n pct n pct
Fair 163 0.10 224 0.14 312 0.19 314 0.20 303 0.19 175 0.11 119 0.07
Good 662 0.13 933 0.19 909 0.19 871 0.18 702 0.14 522 0.11 307 0.06
Very Good 1513 0.13 2400 0.20 2164 0.18 2299 0.19 1824 0.15 1204 0.10 678 0.06
Premium 1603 0.12 2337 0.17 2331 0.17 2924 0.21 2360 0.17 1428 0.10 808 0.06
Ideal 2834 0.13 3903 0.18 3826 0.18 4884 0.23 3115 0.14 2093 0.10 896 0.04
Below is an example of the closest I can get. The problem with this table below is that there is only one header. I would like 3 rows/headers: One which says the name of the variable: Color, one which lists the individual categories inside color, and one which lists type of summary (coming from df2$variable):
reshape2::dcast(df2, cut ~ color + variable , value.var = c("value") )
cut D_n D_pct E_n E_pct F_n F_pct G_n G_pct H_n H_pct I_n I_pct J_n J_pct
1 Fair 163 0.10 224 0.14 312 0.19 314 0.20 303 0.19 175 0.11 119 0.07
2 Good 662 0.13 933 0.19 909 0.19 871 0.18 702 0.14 522 0.11 307 0.06
3 Very Good 1513 0.13 2400 0.20 2164 0.18 2299 0.19 1824 0.15 1204 0.10 678 0.06
4 Premium 1603 0.12 2337 0.17 2331 0.17 2924 0.21 2360 0.17 1428 0.10 808 0.06
5 Ideal 2834 0.13 3903 0.18 3826 0.18 4884 0.23 3115 0.14 2093 0.10 896 0.04
I hope there is some function/package which can do this. I think it should be possible because the packages etable and tables, and the function ftable, can create the output I want, but not for pre-summarized data.
This link does what I need (I think), but I only have access to CRAN-packages on the server I use.
https://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/

Solution based on comments. Thanks!
# data
library(tidyr)
library(dplyr)
library(ggplot2)
library(reshape2)
df <- ggplot2::diamonds
count(df, cut,color) %>% mutate(
n = n,
pct = round(n / sum(n),2) ) %>% reshape2::melt() -> df2
head(df2 )
# Solution
spread( data = df2, key = variable, value = value ) -> df2_spread
tabular( Heading() * cut ~ color * (n + pct) * Heading() * (identity), data =df2_spread )

Related

How to convert a list into a data.frame in R?

I've created a frequency table in R with the fdth package using this code
fdt(x, breaks = "Sturges")
The specific result was:
Class limits f rf rf(%) cf cf(%)
[-15.907,-11.817) 12 0.00 0.10 12 0.10
[-11.817,-7.7265) 8 0.00 0.07 20 0.16
[-7.7265,-3.636) 6 0.00 0.05 26 0.21
[-3.636,0.4545) 70 0.01 0.58 96 0.79
[0.4545,4.545) 58 0.00 0.48 154 1.27
[4.545,8.6355) 91 0.01 0.75 245 2.01
[8.6355,12.726) 311 0.03 2.55 556 4.57
[12.726,16.817) 648 0.05 5.32 1204 9.89
[16.817,20.907) 857 0.07 7.04 2061 16.93
[20.907,24.998) 1136 0.09 9.33 3197 26.26
[24.998,29.088) 1295 0.11 10.64 4492 36.90
[29.088,33.179) 1661 0.14 13.64 6153 50.55
[33.179,37.269) 2146 0.18 17.63 8299 68.18
[37.269,41.36) 2525 0.21 20.74 10824 88.92
[41.36,45.45) 1349 0.11 11.08 12173 100.00
It was given as a list:
> class(x)
[1] "fdt.multiple" "fdt" "list"
I need to convert it into a data frame object, so I can have a table. How can I do it?
I'm a beginner at using R :(
Since you did not provide a reproducible example of your data I have used example from the help page of ?fdt which is closer to what you have.
library(fdth)
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, TRUE),
c2=as.factor(sample(1:10, 1e2, TRUE)),
n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
n2=rnorm(100, 60, 4),
n3=rnorm(100, 50, 4),
stringsAsFactors=TRUE)
fdt <- fdt(mdf,breaks='FD',by='c1')
class(fdt)
#[1] "fdt.multiple" "fdt" "list"
You can extract the table part from each list and bind them together.
result <- purrr::map_df(fdt, `[[`, 'table')
#In base R
#result <- do.call(rbind, lapply(fdt, `[[`, 'table'))
result
# Class limits f rf rf(%) cf cf(%)
#1 [8.1781,9.1041) 5 0.20833333 20.833333 5 20.833333
#2 [9.1041,10.03) 6 0.25000000 25.000000 11 45.833333
#3 [10.03,10.956) 10 0.41666667 41.666667 21 87.500000
#4 [10.956,11.882) 3 0.12500000 12.500000 24 100.000000
#5 [53.135,56.121) 4 0.16000000 16.000000 4 16.000000
#6 [56.121,59.107) 8 0.32000000 32.000000 12 48.000000
#7 [59.107,62.092) 8 0.32000000 32.000000 20 80.000000
#....

How to output twice in R pipe?

library(psych)
library(mokken)
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
summary %>%
{.$Hi[.$Hi<0]}
A1
-0.3873723
Above script works well.I get the final output but still want to review the output of summary.
How to make summary output too in this pipe?
If we want the summary as well, place it in a list
library(psych)
library(mokken)
library(magrittr)
out <- bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %>%
{list(summary(.), .$Hi[.$Hi < 0])}
out
#[[1]]
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
#A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
#A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
#A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#[[2]]
# A1
#-0.3873723
You can use %T>% print() to show the result of summary() but not return it.
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
{print(summary(.))} %>%
{.$Hi[.$Hi<0]}
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
# A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
# A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
# A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#
# A1
# -0.3873723
If you assign it to a variable, it doesn't store the result of summary().
out <- ...
out
# A1
# -0.3873723

Efficient way to connect information between two dataframes based on factors in R (or how to avoid loops in R)

I have two large dataframes, one is called Dates_only and the other Values
**Dates_only:**
ID Quart_y Quart
1 1118 2017Q3 0.25
2 1118 2017Q4 0.50
3 1118 2018Q1 0.75
4 1118 2018Q2 1.00
5 1118 2018Q3 1.25
6 1118 2018Q4 1.50
7 1118 2019Q1 1.75
8 1118 2019Q2 2.00
9 1119 2017Q3 0.25
10 1119 2017Q4 0.50
11 1119 2018Q1 0.75
12 1119 2018Q2 1.00
13 1119 2018Q3 1.25
14 1119 2018Q4 1.50
15 1119 2019Q1 1.75
16 1119 2019Q2 2.00
17 13PP 2017Q3 0.25
18 13PP 2017Q4 0.50
19 13PP 2018Q1 0.75
20 13PP 2018Q2 1.00
21 13PP 2018Q3 1.25
22 13PP 2018Q4 1.50
23 13PP 2019Q1 1.75
24 13PP 2019Q2 2.00
And the second dataset:
**Values**
ID Day Value
1 1118 0 7.6
2 1119 0 6.2
3 13PP 0 6.8
4 1118 0.14 7.1
5 1119 0.13 6.2
6 13PP 0.13 5.9
7 1118 0.20 6.8
8 1119 0.23 5.8
9 13PP 0.24 4.6
10 1118 0.27 6.5
11 1119 0.28 5.4
12 13PP 0.32 4.2
13 1118 0.32 6.3
14 1119 0.32 4.8
15 13PP 0.44 4.0
16 1118 0.47 6.0
17 1119 0.49 4.3
18 13PP 0.49 3.8
19 1118 0.59 5.9
20 1119 0.64 4.0
21 13PP 0.61 3.6
22 1118 0.72 5.6
23 1119 0.71 3.8
24 13PP 0.73 3.4
25 1118 0.95 5.4
26 1119 0.86 3.2
27 13PP 0.78 3.0
28 1118 1.10 5.0
29 1119 0.93 2.9
30 13PP 1.15 2.9
What I want to do is to create another column (a fourth) in the Dates_only called Value_average, and it will contain average scores extracted from Values dataframe from the column Values$Value.
Specifically, as you can observe in Dates_only the Quart_y represents quarters/year, the Quart quantify this with a number from 0.25:2.
So, the pattern goes like this Q3 - x.25, Q4 - x.50, Q1 - x.75, Q2 - x.00.
In the second dataframe, Values, we have some scores that represent days of the year. The concept is that for days that have scores 0<Day<0.25 belong to the 2017Q3, days with scores 0.25<Day<0.50 belong to 2017Q4, and days with scores 1.00<Day<1.25 belong to 2018Q3.
I want for each ID from Dates_only dataframe to find the average of the Values$Value numbers that belong to the appropriate time frame:
For ID=1118 and for 2017Q3 the 'Values$Day' elements that are between 0<Day<=0.25 are (0, 0.14, 0.20) and the equivalent Values$Value are (7.6, 7.1, 6.8) so the Dates_only$Value_average is going to be 7.16. The next will average values for days 0.25<Day<=0.50 etc.
**Dates_only:**
ID Quart_y Quart Value_average
1 1118 2017Q3 0.25 7.16
2 1118 2017Q3 0.50 6.27
The code that I have used is:
Dates_only$Value_average <- 0
for (i in 1:length(Dates_only$ID)){
id <- as.character(Dates_only$ID[i])
quart <- as.numeric(Dates_only$Quart[i])
quart_prev <- quart-0.25
count_d <- 0
sum_val <- 0
for (k in 1:length(Values$ID)){
if (id==as.character(Values$ID[k])
&& quart>=as.numeric(Values$Day[k])
&& as.numeric(Values$Day[k])>quart_prev){
sum_val <- as.numeric(Values$Value[k]) + sum_val
count_d <- count_d + 1
}
}
av_value <- sum_val/count_d
Dates_only$Value_average[i] <- av_value
}
Is there a more efficient code to do that in very large datasets (over 300K observations)? I am pretty sure there is but my novice skills on R do not help a lot.
To replicate the two dataframes:
Dates_only <- data.frame(ID=c('1118','1118','1118','1118','1118',
'1118','1118','1118','1119','1119',
'1119','1119','1119','1119','1119',
'1119','13PP','13PP','13PP','13PP',
'13PP','13PP','13PP','13PP'),
Quart_y=c('2017Q3','2017Q4','2018Q1','2018Q2',
'2018Q3','2018Q4','2019Q1','2019Q2',
'2017Q3','2017Q4','2018Q1','2018Q2',
'2018Q3','2018Q4','2019Q1','2019Q2',
'2017Q3','2017Q4','2018Q1','2018Q2',
'2018Q3','2018Q4','2019Q1','2019Q2'),
Quart=c(0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00,
0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00,
0.25,0.50,0.75,1.00,1.25,1.50,1.75,2.00))
Values <- data.frame(ID=c('1118','1119','13PP','1118','1119','13PP',
'1118','1119','13PP','1118','1119','13PP',
'1118','1119','13PP','1118','1119','13PP',
'1118','1119','13PP','1118','1119','13PP',
'1118','1119','13PP','1118','1119','13PP'),
Day=c(0,0,0,0.14,0.13,0.13,0.2,0.23,0.24,0.27,0.28,
0.32,0.32,0.32,0.44,0.47,0.49,0.49,0.59,0.64,
0.61,0.72,0.71,0.73,0.95,0.86,0.78,1.1,0.93,1.15),
Value=c(7.6,6.2,6.8,7.1,6.2,5.9,6.8,5.8,4.6,6.5,5.4,
4.2,6.3,4.8,4,6,4.3,3.8,5.9,4,3.6,5.6,3.8,
3.4,5.4,3.2,3,5,2.9,2.9))
We can accomplish almost all of this using the dplyr package
library(dplyr)
Values %>%
mutate(Day = ifelse(Day == 0, 0.01, Day)) %>%
mutate(Quart = ceiling(Day / 0.25) * 0.25) %>%
full_join(., Dates_only, by = c("ID", "Quart")) %>%
group_by(ID, Quart, Quart_y) %>%
summarise(Value_average = mean(Value, na.rm = TRUE))
Which gives you:
ID Quart Quart_y Value_average
<fctr> <dbl> <fctr> <dbl>
1 1118 0.25 2017Q3 7.166667
2 1118 0.50 2017Q4 6.266667
3 1118 0.75 2018Q1 5.750000
4 1118 1.00 2018Q2 5.400000
5 1118 1.25 2018Q3 5.000000
6 1118 1.50 2018Q4 NaN
7 1118 1.75 2019Q1 NaN
8 1118 2.00 2019Q2 NaN
9 1119 0.25 2017Q3 6.066667
10 1119 0.50 2017Q4 4.833333
# ... with 14 more rows
See below for a breakdown of each line of code for any questions:
# Start with your `Values` data frame
Values %>%
# Recode `Day` that are '0.00', as they currently will be excluded from
# the rule 2017Q3: 0<Day<=0.25
# I picked 0.01 arbitrarily to fit this rule
mutate(Day = ifelse(Day == 0, 0.01, Day)) %>%
# Now round all `Day` values up to the nearest 0.25
mutate(Quart = ceiling(Day / 0.25) * 0.25) %>%
# Now join the two data frames using a `full_join`
# A left_join may also be used if you are uninterested in NA's
full_join(., Dates_only, by = c("ID", "Quart")) %>%
# Finally, designate groupings to calculate the mean values
# for each ID for each quarter
group_by(ID, Quart, Quart_y) %>%
summarise(Value_average = mean(Value, na.rm = TRUE))

post code areas plotting in R - how to add legend

I am new to R and need some help.
could you please help me with the below ? I would like to add gradient legend next to the plot from 0 to 1 showing different color as value change, but this is best I was able to get. As well please some tips how to add text with the post code inside of the map ? Thanks.
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
# collection data set load + post codes lo la - 2016
df2016 <- read.csv('C:/Users/thomas/desktop/coll2016WORKINGFILE.csv')
colnames(df2016) <- c('name','value','amount')
df2016$amount <- NULL
df2016$name <- as.character(df2016$name)
# OPTIONAL: Depending on your data, you may need to rescale it for the color ramp to work
df2016$value <- rescale(df2016$value, newrange = c(0, 1))
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
postal <- raster::merge(postal, df2016, by = "name")
postal$value[is.na(postal$value)] <- 0.50
# Get centroids of spatialPolygonDataFrame and convert to dataframe
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name,
ratio = postal$value)
plot(postal, col = gray(postal$value))
title("UK Success Rate")
legend("right",NULL,legend = postal$value,col = gray(postal$value))
Original dataset from csv has below 3 columns:
Row Labels Success/Failed ratio N of coll
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
Here you go. Use sf with new ggplot or stuff from my misc package for base graphs.
# collection data set load + post codes lo la - 2016
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
# YOUR OTHER VALUES FROM ABOVE
PH 0.23 13")
if(FALSE){ # don't run when sourcing file
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile.zip"
)
# Unzip and read the shapefile
unzip("postal_shapefile.zip")
}
# install.packages("sf")
postal <- sf::st_read("Distribution/Areas.shp")
# Join your data to the shapefile
postal2 <- merge(postal, df2016, by="name")
#devtools::install_github("tidyverse/ggplot2") # need newer ggplot2 version for geom_sf
library(ggplot2)
ggplot(postal2) + geom_sf(aes(fill = value))
# Want to remain in base graphs?
#install.packages("berryFunctions")
library(berryFunctions)
cols <- seqPal(n=100)
cls <- classify(postal2$value, breaks=100)$index
plot(postal2[,c("value","geometry")], col=cols[cls], graticule=TRUE, axes=TRUE) # ?sf::plot_sf
colPointsLegend(postal2$value, colors=cols, horizontal=FALSE, title="UK value")

How to fill out white/missing parts of the map in R?

This code below creates map of UK postcodes using ggplot, however leaves some of the parts white/missing from the map, could you please advise how to make sure that whole map is filled and that the postcode areas have a border ? Thanks.
MAP OF UK from the below code
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
library(ggrepel)
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
")
#df2016$amount <- NULL
df2016$name <- as.character(df2016$name)
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
colnames(postal.df)[colnames(postal.df) == "id"] <- "name"
postal.df <- raster::merge(postal.df, df2016, by = "name")
postal.df$value[is.na(postal.df$value)] <- 0.50
# Get centroids of spatialPolygonDataFrame and convert to dataframe
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name)
p <- ggplot(postal.df, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = cut(value,5))) +
geom_text_repel(data = postal.centroids.df, aes(label = id, x = long, y = lat, group = id), size = 3, check_overlap = T) +
labs(x=" ", y=" ") +
theme_bw() + scale_fill_brewer('Success Rate 2016', palette = 15) +
coord_map() +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(panel.border = element_blank())
p
Try arranging the postal code by name or number just before plotting
postal.centroids.df %>%
arrange(id)
My county maps of the US did the same thing when they weren't in order. If that doesn't work try by lat or long as well.
Solution was to use left_join from dplyr instead of merge:
rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
library(ggrepel)
df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
")
# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)
# Unzip the shapefile
unzip("postal_shapefile")
# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")
postal.df <- fortify(postal, region = "name")
# Join your data to the shapefile
colnames(postal.df)[colnames(postal.df) == "id"] <- "name"
library(dplyr)
test <- left_join(postal.df, df2016, by = "name", copy = FALSE)
#postal.df <- raster::merge(postal.df, df2016, by = "name")
test$value[is.na(test$value)] <- 0.50
# for use in plotting area names.
postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name)
p <- ggplot(test, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = cut(value,5))) +
geom_text_repel(data = postal.centroids.df, aes(label = id, x = long, y = lat, group = id), size = 3, check_overlap = T) +
labs(x=" ", y=" ") +
theme_bw() + scale_fill_brewer('Success Rate 2016', palette = 15) +
coord_map() +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(panel.border = element_blank())
p

Resources