Plotting on a geographical map the provenience of our patients - r

I am trying to put on a Italian geographical map a dot reporting the provenience ('provincia') of our patients. Ideally, the dot size should be proportional to the number of patients coming from that 'provincia'. An example of the list I would like to plot is the following.
MI 8319
CO 537
MB 436
VA 338
BG 310
PV 254
CR 244
NO 210
RM 189
CS 179
In the first column there is the 'provincia' code: MI (Milano), CO (Como), MB (Monza-Brianza), etc. In the second column there is the number of patients from that 'provincia'. So the output should be an Italian political map where the biggest dot is around the city of Milano (MI), the second biggest dot is near the city of Como (CO), the third one is around the city of Monza-Brianza (MB),etc.
Is there any package that could do the plot I am looking for? I found a tool that could do the job here, but apparently they expect that I load the geographical coordinates in order to do the plot.
https://www.littlemissdata.com/blog/maps
Thanks in advance.

Here is one way to handle your task. You have the abbreviations for Italian province. You want to use them to merge your data with polygon data. If you download Italy's polygons from GADM, you can obtain data that contain the abbreviations. Specifically, the column, HASC_2 is the one. You need to merge your data with the polygon data. Then, you want to create another data set which contains centroid. You can draw a map with the two data sets.
library(tidyverse)
library(sf)
library(ggthemes)
# Get the sf file from https://gadm.org/download_country_v3.html
# and import it in R.
mysf <- readRDS("gadm36_ITA_2_sf.rds")
# This is your data, which is called mydata.
mydata <- structure(list(abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"), value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L)), class = "data.frame", row.names = c(NA,
-10L))
abbs value
1 MI 8319
2 CO 537
3 MB 436
4 VA 338
5 BG 310
6 PV 254
7 CR 244
8 NO 210
9 RM 189
10 CS 179
# Abbreviations are in HASC_2 in mysf. Manipulate strings so that
# I can join mydata with mysf with the abbreviations. I also get
# longitude and latitude with st_centroid(). This data set is for
# geom_point().
mysf2 <- mutate(mysf, HASC_2 = sub(x = HASC_2, pattern = "^IT.", replacement = "")) %>%
left_join(mydata, by = c("HASC_2" = "abbs")) %>%
mutate(lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
# Draw a map
ggplot() +
geom_sf(data = mysf) +
geom_point(data = mysf2, aes(x = lon, y = lat, size = value)) +
theme_map()

UPDATE ON INSET MAP
This is an update following different suggestion on using inset maps, which I think it would be the best solution for yout question and comments:
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Get breaks for map
br=getBreaks(patients$value)
#Delimit zone
#Based on NUTS1, Nortwest Italy
par(mar=c(0,0,0,0))
ghostLayer(IT[grep("ITC",IT$NUTS_ID),], bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "topleft",
legend.title.txt = "Total patients",
add = TRUE,
legend.frame = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
#Inset
par(
fig = c(0, 0.4, 0.01, 0.4),
new = TRUE
)
inset=patients[patients$abbs %in% c("RM","CS"),]
ghostLayer(inset, bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "n",
add = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
box(which = "figure", lwd = 1)
#RESTORE PLOT
par(fig=c(0,1,0,1))
OLD ANSWER
Following my comment on plotting labels, maybe with circles is not the best option for your map, given the concentration. I suggest you to use another kind of map for that, as chorolayer, I leveraged on https://stackoverflow.com/users/3304471/jazzurro for the dataframe.
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Options1 - With circles
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
propSymbolsLayer(
x = patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Option 2 - Chorolayer with labels
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
choroLayer(
patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Create labels
patients$label = paste(patients$abbs, patients$value, sep = " - ")
labelLayer(
patients,
txt = "label",
overlap = FALSE,
halo = TRUE,
show.lines = TRUE,
)
Data from
https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/nuts-2016-files.html

Related

Scatterpie pie plot: circles not properly positioned over map

I am trying to create a map where I show the amount and category of Exports in every European country, using a scatterpie plot. This is the data I am trying to represent:
Country A B C D E F G Total
1 FR 48208727011 129696846358 34574295963 99154544367 87056475894 104059261659 391086898 50.3141238
2 BE 30008344795 130642251666 27315419464 48966420544 51351672841 57686707705 875915760 34.6846733
3 NL 53815652300 126965690773 52604259051 164935573324 43089183110 79607329056 516212340 52.1533900
4 DE 79643366705 285793555191 66579801287 230961697801 160598853461 167790359814 13590821673 100.4958456
5 IT 35306881277 124880125091 31042897909 65051137874 44481779280 65707113992 307508636 36.6777444
6 UK 4190569134 14226329164 4343541388 8299777138 7863823675 8191378024 177728913 4.7293147
7 IE 8049979989 25547263228 3324685081 15609577840 18293778082 13299495081 284077060 8.4408856
8 DK 10844794488 22366273732 3669934507 20904821209 8871184551 17364886109 1104100358 8.5125995
9 EL 5548998459 14199041489 9684405892 6969942717 2877634605 8740624663 9513713 4.8030162
10 PT 9302893141 19921174761 5742487970 12183620710 9794141959 10889202370 59025653 6.7892547
11 ES 29087706350 79136960848 26777114009 45807156391 43316950993 54577475375 225619825 27.8928984
12 LU 2103037221 5485541709 1274451840 3165573258 3448812873 2685200517 23828895 1.8186446
13 SE 14297019504 32367817406 10023929115 31082425639 18504243058 21520786963 251825497 12.8048047
14 FI 4368941438 17924135085 6424290821 13268574752 7679357024 7759601514 87932852 5.7512833
15 AT 11108739001 47969735941 8282060600 36180768764 20761302493 26060191499 319396555 15.0682195
16 MT 529547453 748570490 789405002 772157398 939286493 808546088 1179489 0.4588692
17 EE 1387220092 4797469841 1253135597 3127037067 1483571375 2251847940 315884341 1.4616166
18 LV 2714038229 4237027490 958962478 3158721396 1479290893 2931423023 89667330 1.5569131
19 LT 3408636288 8761053696 3263941940 5534705815 2630113004 4477086678 348351748 2.8423889
20 PL 17264039729 70678231411 11723435712 53284056901 28214023352 41438947683 319384835 22.2922120
21 CZ 7664643659 38573705210 5359209173 54059163460 20745595183 22423687496 216009863 14.9042014
22 SK 4193310193 17229538594 3771900263 19251595573 18415022178 10092362707 163300267 7.3117030
23 HU 5067726212 26282833327 5807291521 31406620462 16576651093 12918544146 456905984 9.8516573
24 RO 7210065674 24768518425 3986448288 20279628790 10274528929 13490373296 213856837 8.0223420
25 BG 3364866564 11098005470 2490021719 5767532283 2282959524 4540599434 289425842 2.9833411
26 SI 2226481542 11769625979 2186097710 5986840366 6169533307 8453642146 32927930 3.6825149
27 HR 2664219116 7204053277 2281750708 4155735739 2094082503 4970586651 14826478 2.3385254
28 CY 847756088 1467939342 983937418 824244195 1900124484 1375465594 47109886 0.7446577
Using the following code:
library(giscoR)
borders <- gisco_get_countries(
epsg = "3035",
year = "2020",
resolution = "3",
country = idf$Country
)
merged <- merge(borders,
idf,
by.x = "CNTR_ID",
by.y = "Country",
all.x = TRUE
)
library(tidyverse)
symbol_pos <- st_centroid(merged, of_largest_polygon = TRUE)
separate_coords = symbol_pos %>% mutate(lat = unlist(map(symbol_pos$geometry, 1)), long = unlist(map(symbol_pos$geometry, 2)))
sympos = data.frame(Country = separate_coords$CNTR_ID, lat = separate_coords$lat, long = separate_coords$long)
merged <- merge(merged,
sympos,
by.x = "CNTR_ID",
by.y = "Country",
all.x = TRUE
)
ggplot() +
geom_sf(data = merged, size = 0.1) +
geom_scatterpie(data = merged, aes(x = long, y = lat, r = Total), cols = LETTERS[1:7])+
coord_sf(xlim = c(2377294, 6500000), ylim = c(1413597, 5228510))
And it gives me this error:
Error in rowSums(data[, cols]) : 'x' must be numeric
I am trying to create a map similar to this one:
And I would be grateful if someone can provide some hint as to how to fix the error. Thanks.
Edit: below is the dput(idf) output:
structure(list(Country = c("FR", "BE", "NL", "DE", "IT", "UK",
"IE", "DK", "EL", "PT", "ES", "LU", "SE", "FI", "AT", "MT", "EE",
"LV", "LT", "PL", "CZ", "SK", "HU", "RO", "BG", "SI", "HR", "CY"
), A = c(48208727011, 30008344795, 53815652300, 79643366705,
35306881277, 4190569134, 8049979989, 10844794488, 5548998459,
9302893141, 29087706350, 2103037221, 14297019504, 4368941438,
11108739001, 529547453, 1387220092, 2714038229, 3408636288,
17264039729,
7664643659, 4193310193, 5067726212, 7210065674, 3364866564,
2226481542,
2664219116, 847756088), B = c(129696846358, 130642251666,
126965690773,
285793555191, 124880125091, 14226329164, 25547263228,
22366273732,
14199041489, 19921174761, 79136960848, 5485541709, 32367817406,
17924135085, 47969735941, 748570490, 4797469841, 4237027490,
8761053696, 70678231411, 38573705210, 17229538594, 26282833327,
24768518425, 11098005470, 11769625979, 7204053277, 1467939342
), C = c(34574295963, 27315419464, 52604259051, 66579801287,
31042897909, 4343541388, 3324685081, 3669934507, 9684405892,
5742487970, 26777114009, 1274451840, 10023929115, 6424290821,
8282060600, 789405002, 1253135597, 958962478, 3263941940,
11723435712,
5359209173, 3771900263, 5807291521, 3986448288, 2490021719,
2186097710,
2281750708, 983937418), D = c(99154544367, 48966420544,
164935573324,
230961697801, 65051137874, 8299777138, 15609577840, 20904821209,
6969942717, 12183620710, 45807156391, 3165573258, 31082425639,
13268574752, 36180768764, 772157398, 3127037067, 3158721396,
5534705815, 53284056901, 54059163460, 19251595573, 31406620462,
20279628790, 5767532283, 5986840366, 4155735739, 824244195),
E = c(87056475894, 51351672841, 43089183110, 160598853461,
44481779280, 7863823675, 18293778082, 8871184551, 2877634605,
9794141959, 43316950993, 3448812873, 18504243058, 7679357024,
20761302493, 939286493, 1483571375, 1479290893, 2630113004,
28214023352, 20745595183, 18415022178, 16576651093, 10274528929,
2282959524, 6169533307, 2094082503, 1900124484), F =
c(104059261659,
57686707705, 79607329056, 167790359814, 65707113992, 8191378024,
13299495081, 17364886109, 8740624663, 10889202370, 54577475375,
2685200517, 21520786963, 7759601514, 26060191499, 808546088,
2251847940, 2931423023, 4477086678, 41438947683, 22423687496,
10092362707, 12918544146, 13490373296, 4540599434, 8453642146,
4970586651, 1375465594), G = c(391086898, 875915760, 516212340,
13590821673, 307508636, 177728913, 284077060, 1104100358,
9513713, 59025653, 225619825, 23828895, 251825497, 87932852,
319396555, 1179489, 315884341, 89667330, 348351748, 319384835,
216009863, 163300267, 456905984, 213856837, 289425842, 32927930,
14826478, 47109886), Total = c(50.314123815, 34.6846732775,
52.1533899954, 100.4958455932, 36.6777444059, 4.7293147436,
8.4408856361, 8.5125994954, 4.8030161538, 6.7892546564,
27.8928983791,
1.8186446313, 12.8048047182, 5.7512833486, 15.0682194853,
0.4588692413, 1.4616166253, 1.5569130839, 2.8423889169,
22.2922119623,
14.9042014044, 7.3117029775, 9.8516572745, 8.0223420239,
2.9833410836, 3.682514898, 2.3385254472, 0.7446577007)),
row.names = c(NA,
-28L), class = "data.frame")
Please find below one possible solution to your request. The main problem was that geom_scatterpie() expects a dataframe and not an sf object. So you need to use as.data.frame() inside geom_scatterpie(). I also took the opportunity to simplify your code a bit.
Reprex
Code
library(giscoR)
library(sf)
library(dplyr)
library(ggplot2)
library(scatterpie)
borders <- gisco_get_countries(
epsg = "3035",
year = "2020",
resolution = "3",
country = idf$Country
)
merged <- merge(borders,
idf,
by.x = "CNTR_ID",
by.y = "Country",
all.x = TRUE
)
symbol_pos <- st_centroid(merged, of_largest_polygon = TRUE)
sympos <- symbol_pos %>%
st_drop_geometry() %>%
as.data.frame() %>%
cbind(., symbol_pos %>% st_coordinates()) %>%
select(CNTR_ID, X, Y) %>%
rename(Country = CNTR_ID, long = X, lat = Y)
merged <- merge(merged,
sympos,
by.x = "CNTR_ID",
by.y = "Country",
all.x = TRUE
)
Visualization
ggplot() +
geom_sf(data = merged, size = 0.1) +
geom_scatterpie(data = as.data.frame(merged), aes(x = long, y = lat, r = Total*2200), cols = LETTERS[1:7]) +
coord_sf(xlim = c(2377294, 6500000), ylim = c(1413597, 5228510))
Created on 2022-01-23 by the reprex package (v2.0.1)

speech-gaze activity plot in ggplot2

I have data with Utterances by speakers in conversation as well as their gazes to one another. The speakers' gazes are in columns A_aoi, B_aoi, and C_aoi, the gaze durations are in A_aoi_dur, B_aoi_dur, and C_aoi_dur. Here's a reproducible snippet of the data:
df0 <- structure(list(Line = c(105L, 106L, 107L, 109L, 110L, 111L, 112L,
113L, 114L, 115L, 116L), Speaker = c("ID01.A", NA, "ID01.A",
NA, "ID01.B", NA, "ID01.A", NA, "ID01.A", NA, "ID01.C"), Utterance = c("so you've ↑obviously↑ thought about it obviously: (.) have made a decision (.) I'm !head!ing in this door (.) one of the cleaning ladies at the UB !grabb!ed my elbow",
"(0.662)", "and said (.) ~no no no !this! is the !womens'! bathroom~=",
"(0.015)", "=((v: gasps))=", "(0.166)", "=NOW", "(0.622)", "!how! this always plays out ",
"(0.726)", "[when was] that¿="), UttStart = c(163898L, 172500L,
173162L, 176100L, 176115L, 176800L, 176966L, 177372L, 177994L,
179328L, 180054L), UttEnd = c(172500, 173162, 176100, 176115,
176800, 176966, 177372, 177994, 179328, 180054, 180668), UttDur = c(8602,
662, 2938, 15, 685, 166, 406, 622, 1334, 726, 614), A_aoi = c("*B*C*B*C*B*C*B*C*B*C",
"C*", "*B*C*C", "C", "C*", "*", "*C", "C", "C*B", "B*", "*"),
A_aoi_dur = c("21,516,79,333,200,634,233,651,17,2332,33,400,33,518,17,532,33,1900,119,1",
"414,248", "1124,412,116,533,600,153", "15", "616,69", "166",
"153,253", "622", "204,151,979", "219,507", "614"), B_aoi = c("A*A*A*A*A",
"A", "A", "A", "A", "A", "A", "A*", "*A*A", "A*A", "A*A"),
B_aoi_dur = c("475,130,567,137,1983,313,787,1400,2810", "662",
"2938", "15", "685", "166", "406", "398,224", "76,136,284,838",
"108,571,47", "116,270,228"), C_aoi = c("A", "A", "A*A*A",
"A", "A", "A", "A", "A*A", "A", "A*A", "A"), C_aoi_dur = c("8602",
"662", "1058,123,1300,144,313", "15", "685", "166", "406",
"264,351,7", "1334", "125,323,278", "614")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
EDIT: new test data with temporally overlapping Utterances:
df0 <- structure(list(Line = 137:145,
Speaker = c("ID01.A", "ID01.A-Q", NA, "ID01.A", "ID01.A-Q", "ID01.A-Q", "ID01.A-Q", "ID01.A-Q",NA),
Utterance = c("↑she gra:bs my elbow (.) I turn to !look! at her↑ and she's like ~this is a (.) womens' bathroom you can't go in there~",
"~this is a (.) womens' bathroom you can't go in there~", "(0.534)",
"and I'm like ~((silent f: blank stare))~ (.) and she didn't, she was just like ~you can't go in~ (.) I'm like ~I'm a !woman!~ she said ~no you're not you can't go in~",
"~((silent f: blank stare))~", "~you can't go in~", "~I'm a !woman!~",
"~no you're not you can't go in~", "(0.487)"),
UttStart = c(208845L, 211450L, 214136L, 214670L, 215409L, 218307L, 219235L, 220076L, 221368L),
UttEnd = c(214136, 214136, 214670, 221368, 217117, 219050, 219885, 221368, 221855),
UttDur = c(5291, 2686, 534, 6698, 1708, 743, 650, 1292, 487),
A_aoi = c("C*B*C*C*B*C*", "C*B*C*", "*B", "B*C*B*C*C*B*B", "C*B", "C*B", "*", "*B","B"),
A_aoi_dur = c("57,445,1100,135,199,333,866,302,832,33,468,521","530,302,832,33,468,521",
"144,390", "377,235,466,399,1268,132,268,132,433,6,716,1412,854","339,399,970", "73,6,664", "650", "438,854", "487"),
B_aoi = c("A*A","A", "A", "A*A*A*A*A*A", "A", "*A*A", "*A", "A*A", "A"),
B_aoi_dur = c("1691,121,3479", "2686", "534", "53,180,3333,134,253,280,203,534,1296,138,294",
"1708", "63,253,280,147", "405,245", "860,138,294", "487"),
C_aoi = c("A", "A", "A", "A*A", "A", "A*", "A", "A", "A"),
C_aoi_dur = c("5291", "2686", "534", "3766,734,2198",
"1708", "129,614", "650", "1292", "487")),
row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))
What I'd like to be able to visualize is who is looking at whom and for how long for each Utterance, roughly like in this schematic representation:
What I can do at present is plot the gazes on a minute-by-minute scale, but just the gazes - not the Utterances: Plotting gaze movements by multiple speakers in a single plot. Starting from the data as above, this can be achieved by multiple transformations (shown below) but the resulting plot does not feature the Utterances and it plots the gazes per minute, whereas I need the gazes per Utterance:
I'm fully aware that this is demanding a lot. Help with it is all the more appreciated.
# pivot_longer so that all gazes have their own row:
df0 <- df0 %>%
rename_with(~ str_c(., "_AOI"), ends_with("_aoi")) %>%
pivot_longer(cols = contains("_"),
names_to = c("Gaze_by", ".value"), #
names_pattern = "^(.*)_([^_]+$)"
) %>%
mutate(Gaze_by = sub("^(.).*", "\\1", Gaze_by)) %>%
mutate(AOI = str_replace_all(AOI, "(?<=.)(?=.)", ",")) %>%
separate_rows(c(AOI, dur), sep = ",", convert = TRUE)
# compute starttimes and endtimes for gazes:
df1 <- df0 %>%
group_by(Gaze_by) %>%
mutate(
end = cumsum(dur),
start = end - dur
)
View(df1)
# compute minutes:
df2 <- df1 %>%
mutate(
# which minute does the event start in?
minute_start = as.integer(start/60000),
# which minute does the event end in?
minute_end = as.integer(end/60000),
# does the event straddle a minute mark?
straddler = minute_end > minute_start)
View(df2)
# 1st subset of `df2`:
df2_A1 <- df2 %>%
# filter those rows that contain events straddling minute marks:
filter(straddler=="TRUE") %>%
# reduce the endtime to the exact minute mark:
mutate(end = minute_end*60000)
View(df2_A1)
# 2nd subset of `df2`:
df2_A2 <- df2 %>%
# filter those rows that contain events straddling minute marks:
filter(straddler=="TRUE") %>%
# reduce the starttime to the exact minute mark:
mutate(start = minute_end*60000)
View(df2_A2)
# 3rd subset of `df0`:
df2_A3 <- df2 %>%
# filter those rows that do not contain events straddling minute marks:
filter(!straddler == "TRUE")
View(df2_A3)
# row-bind all three subsets:
df4 <- rbind(df2_A1, df2_A2, df2_A3) %>%
arrange(start) %>%
mutate(
minute = as.integer(start/60000),
# reduce total starttimes to starttimes per minute:
start_pm = start - 60000*minute,
# reduce total endtimes to endtimes per minute:
end_pm = end - 60000*minute)
# plot gaze activity for **ALL** speakers:
df4 %>%
ggplot(aes(x = start_pm,
xend = end_pm,
y = minute + scale(as.numeric(as.factor(Gaze_by))) / 6,
yend = minute + scale(as.numeric(as.factor(Gaze_by))) / 6,
color = AOI)) +
# draw segments for AOI:
geom_segment(size = 2) +
# reverse y-axis scale:
scale_y_reverse(breaks = 0:max(df4$minute),
labels = paste(0:max(df4$minute), "min", " Gaze_by_A\n Gaze_by_B\n Gaze_by_C", sep = " "),
name = NULL) +
# define custom colors:
scale_colour_manual(values = c("*" = "lemonchiffon",
"A" = "darkorange",
"B" = "lawngreen",
"C" = "slateblue1")) +
# plot title:
labs(title = "Gaze activity") +
theme(axis.title.x.bottom = element_blank())
Here is a solution that gets close to what you are looking for, making use of facets. It also uses forcats::fct_reorder and stringr::str_wrap (which are both part of the tidyverse).
This also wraps any long utterances and keeps the x-scale the same for all facets, rather than allowing them to stretch to fill the width.
df4 %>%
mutate(#add text for y axis labels
Gaze_by = paste0("Gaze_by_", Gaze_by),
#reorder facet panels, add speaker at start, and wrap to 120 characters
Utterance = fct_reorder(str_wrap(paste0(substr(Speaker, 6, 6), ": ",
Utterance),
120),
start_pm),
#set a dummy end point for each utterance based on the longest one
max_x = UttStart - min(UttStart) + max(UttDur)) %>%
ggplot(aes(x = start_pm, xend = end_pm,
y = Gaze_by, yend = Gaze_by, #as discrete variable
color = AOI)) +
geom_segment(size = 3) +
geom_point(aes(x = max_x, y = Gaze_by), alpha = 0) + #plot invisible dummy end points
scale_y_discrete(name = NULL, limits = rev) + #rev to get A at the top
facet_wrap(~Utterance, scales = "free_x", ncol = 1) +
scale_colour_manual(values = c("*" = "lemonchiffon",
"A" = "darkorange",
"B" = "lawngreen",
"C" = "slateblue1")) +
labs(title = "Gaze activity") +
theme_minimal() + #removes a lot of lines etc
theme(strip.text = element_text(color = "blue", hjust = 0), #facet strip text
strip.background = element_rect(fill = "white", color = "white"),
axis.title.x.bottom = element_blank())
To cut the utterances into 4-second chunks, you can do something like this...
df4 %>% group_by(Utterance) %>%
#work out relative durations from start of utterance and create subutterances
mutate(relStart = start_pm - min(start_pm),
relEnd = end_pm - min(start_pm),
subNo = map2(relStart, relEnd, ~seq(.x %/% 4000, .y %/% 4000, 1))) %>%
unnest(subNo) %>% #expand one row per subutterance
mutate(Utterance = paste0(Utterance, " (#", subNo + 1, ")"), #add sub no
subStart = pmax(4000 * subNo, relStart), #limits on subUtt
subEnd = pmin(4000 * (subNo + 1), relEnd), #limits on subUtt
start_pm = min(start_pm) + subStart, #redefine start
end_pm = min(start_pm) + subEnd) %>% #redefine end
group_by(Utterance) %>% #regroup as Utterance has changed!
mutate(max_x = min(start_pm) + 4000) %>% #define dummy end points
ungroup() %>%
mutate(Gaze_by = paste0("Gaze_by_", Gaze_by),
Utterance = fct_reorder(str_wrap(paste0(substr(Speaker, 6, 6), ": ", Utterance),
120), start_pm)) %>%
ggplot(...) #...as per code above from this point

Changing the order of lines plotted so that one is on top for time series in ggplot

so I am plotting some time series data on ggplot for a project I'm working on. Here is what the data looks like:
Sample data:
structure(list(Date = c("2015-01-01", "2015-02-01", "2015-03-01",
"2015-04-01"), Actual = c(500L, 600L, 700L, 750L), Fcst1 = c(600L,
610L, 634L, 650L), Fcst2 = c(500L, 630L, 875L, 900L), Fcst3 = c(500L,
600L, 754L, 800L), Fcst4 = c(500L, 600L, 700L, 760L)), class = "data.frame", row.names = c(NA,
-4L))
Date Actual Fcst1 Fcst2 Fcst3 Fcst4
2015-01-01 500 600 500 500 500
2015-02-01 600 610 630 600 600
2015-03-01 700 634 875 754 700
2015-04-01 750 650 900 800 760
.......... ... ... ... ... ...
The data itself goes on for another 60 months and there are 40 forecasts total, adjusted monthly. When I try to plot it, I want the Actual line to be on top, but it ends up being plotted first. Here's the code I am using:
df <- df %>%
mutate(Date = ymd(Date))
colnames(df)[3:length(df)] <-
paste("df", colnames(df[, c(3:length(df))]), sep = "")
ggplot(
tidyr::pivot_longer(df, c(Fcst1:Fcst6, Actual), names_to = "Forecast", names_prefix = "df"),
aes(Date, value, color = Forecast)
) +
geom_line(size = 1) +
scale_color_manual(values = c(
"Fcst1" = "red", "Fcst2" = "blue",
"Fcst3" = "green", "Fcst4" = "yellow", "Fcst5" = "purple",
"Fcst6" = "orange", "Actual" = "black"
)) +
ggtitle(label = "Actuals vs Forecasts", subtitle = "Dataset") +
ylab("Rate") +
scale_y_continuous(labels = scales::comma)
I want to preserve the ordering in the legend though, so I want Actuals at the top followed by the order of forecasts listed in the scale_color_manual. Right now, Actuals is being plotted first (which means it's under all the other forecasts), and I want it to be on top (and preferably with a thicker line, maybe size=1.2. Thanks!
Here a possible solution using the data you included. You have to format the levels of Forecast and modify scale_color_manual() for the legend. I have added a trick for that:
library(tidyverse)
#Data
df <- structure(list(Date = c("2015-01-01", "2015-02-01", "2015-03-01",
"2015-04-01"), Actual = c(500L, 600L, 700L, 750L), Fcst1 = c(600L,
610L, 634L, 650L), Fcst2 = c(500L, 630L, 875L, 900L), Fcst3 = c(500L,
600L, 754L, 800L), Fcst4 = c(500L, 600L, 700L, 760L)), class = "data.frame", row.names = c(NA,
-4L))
#Format date
df <- df %>% mutate(Date = ymd(Date))
#Create data for plot
df2 <- tidyr::pivot_longer(df, c(Fcst1:Fcst4, Actual), names_to = "Forecast", names_prefix = "df")
#Format levels
labs <- unique(df2$Forecast)
i1 <- labs[which(labs=='Actual')]
i2 <- rev(labs[which(labs!='Actual')])
i3 <- c(i2,i1)
df2$Forecast <- factor(df2$Forecast,levels=i3,ordered = T)
#Plot
ggplot(df2,aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
scale_color_manual(values = c(
"Fcst1" = "red", "Fcst2" = "blue",
"Fcst3" = "green", "Fcst4" = "yellow", "Fcst5" = "purple",
"Fcst6" = "orange", "Actual" = "black"
),guide = guide_legend(reverse=TRUE)) +
ggtitle(label = "Actuals vs Forecasts", subtitle = "Dataset") +
ylab("Rate") +
scale_y_continuous(labels = scales::comma)
Output:

Comparing "Unlimited" value to numerical values in ggplot

I am trying to make a visual comparison between an input vector and my database.However, the input vector or the database may contain the "UL" character, which means, an infinite number. Think of it as your unlimited voice plan, with which you can make an unlimited number of calls.
Here is the code I have used to try to make a visual comparison between "UL" and other numerical values.
# d is the database data.frame, with which we want to compare the input vector
d = structure(list(Type = c("H1", "H2", "H3"),
P1 = c(2000L, 1500L, 1000L),
P2 = c(60L, 40L, 20L),
P3 = c("UL", 3000L, 2000L)),
class = "data.frame",
row.names = c(NA, -3L))
# d2 is the input vector
d2 = structure(list(Type = "New_offre", P1 = 1200L, P2 = "UL", P3 = 2000),
class = "data.frame",
row.names = c(NA, -1L))
#Check if there are some unlimited values in both d and d2
y1 <-rbind(d,d2)
y <- y1
if("UL" %in% y$P3){
max_P3_scale <- max(as.numeric(y[y$P3!="UL","P3"]))
y[y$P3=="UL","P3"]= 2*max_P3_scale
}
if("UL" %in% y$P2){
max_P2_scale <- max(as.numeric(y[y$P2!="UL","P2"]))
y[y$P2=="UL","P2"]= 2*max_P2_scale
}
y <- transform(y,P1=as.numeric(P1),
P2=as.numeric(P2),
P3=as.numeric(P3))
d <- y[1:nrow(d),]
d2<- y[nrow(d)+1,]
d %>% gather(var1, current, -Type) %>%
mutate(new = as.numeric(d2[cbind(rep(1, max(row_number())),
match(var1, names(d2)))]),
slope = factor(sign(current - new), -1:1)) %>%
gather(var2, val, -Type, -var1, -slope) %>%
ggplot(aes(x = factor(var2,levels = c("new","current")), y = val, group = 1)) +
geom_point(aes(fill = var2), shape = 2,size=4) +
geom_line(aes(colour = slope)) +
scale_colour_manual(values = c("green","green", "red")) +
facet_wrap(Type ~ var1,scales = "free")
My first attempt was to find if there is "UL" values in P2 and P3. If yes, I try to find the maximum numeric value other than "UL". Then, I replace all "UL" occurrences by this maximum value* 2, so the graphical representations will always show that "UL" is maximum.
The issue with this is that I am not able to differentiate between actual values and "UL" ones.
Here is how my plot looks like using this solution

R ggplot2/ggmap concentric circles as points

I am trying to plot some information that shows full population and then a subset of that population by location on a map. I've seen data visualizations that use concentric circles or 3-d inverted cones to convey this. I just can't figure out how to do it in ggplot / ggmap
Here's a free hand version in Paint that shows a rough idea of what I'm looking to do:
Here's a rough piece of data for an example:
> dput(df1)
structure(list(zip = c("00210", "00653", "00952", "02571", "04211",
"05286", "06478", "07839", "10090", "11559"), city = c("Portsmouth",
"Guanica", "Sabana Seca", "Wareham", "Auburn", "Craftsbury",
"Oxford", "Greendell", "New York", "Lawrence"), state = c("NH",
"PR", "PR", "MA", "ME", "VT", "CT", "NJ", "NY", "NY"), latitude = c(43.005895,
17.992112, 18.429218, 41.751554, 44.197009, 44.627698, 41.428163,
41.12831, 40.780751, 40.61579), longitude = c(-71.013202, -66.90097,
-66.18014, -70.71059, -70.239485, -72.434398, -73.12729, -74.678956,
-73.977182, -73.73126), timezone = c(-5L, -4L, -4L, -5L, -5L,
-5L, -5L, -5L, -5L, -5L), dst = c(TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), totalPop = c(43177, 37224, 37168,
15492, 1614, 88802, 2587, 80043, 78580, 87461), subPop = c(42705,
36926, 27556, 10827, 774, 39060, 1542, 21304, 53438, 2896)), .Names = c("zip",
"city", "state", "latitude", "longitude", "timezone", "dst",
"totalPop", "subPop"), row.names = c(1L, 50L, 200L, 900L, 1500L,
2000L, 2500L, 3000L, 3500L, 4000L), class = "data.frame")
Any suggestions?
The basic idea is to use separate geoms for the two populations, making sure the smaller one is plotted after the larger one, so its layer is on top:
library(ggplot2) # using version 0.9.2.1
library(maps)
# load us map data
all_states <- map_data("state")
# start a ggplot. it won't plot til we type p
p <- ggplot()
# add U.S. states outlines to ggplot
p <- p + geom_polygon(data=all_states, aes(x=long, y=lat, group = group),
colour="grey", fill="white" )
# add total Population
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = totalPop),
colour="#b5e521")
# add sub Population as separate layer with smaller points at same long,lat
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = subPop),
colour="#00a3e8")
# change name of legend to generic word "Population"
p <- p + guides(size=guide_legend(title="Population"))
# display plot
p
From the map, it is clear your data include non-contiguous-US locations, in which case you may want different underlying map data. get_map() from ggmap package provides a couple options:
require(ggmap)
require(mapproj)
map <- get_map(location = 'united states', zoom = 3, maptype = "terrain",
source = "google")
p <- ggmap(map)
After which you add the total and sub Population geom_point() layers and display it as before.

Resources