Related
I am trying to put on a Italian geographical map a dot reporting the provenience ('provincia') of our patients. Ideally, the dot size should be proportional to the number of patients coming from that 'provincia'. An example of the list I would like to plot is the following.
MI 8319
CO 537
MB 436
VA 338
BG 310
PV 254
CR 244
NO 210
RM 189
CS 179
In the first column there is the 'provincia' code: MI (Milano), CO (Como), MB (Monza-Brianza), etc. In the second column there is the number of patients from that 'provincia'. So the output should be an Italian political map where the biggest dot is around the city of Milano (MI), the second biggest dot is near the city of Como (CO), the third one is around the city of Monza-Brianza (MB),etc.
Is there any package that could do the plot I am looking for? I found a tool that could do the job here, but apparently they expect that I load the geographical coordinates in order to do the plot.
https://www.littlemissdata.com/blog/maps
Thanks in advance.
Here is one way to handle your task. You have the abbreviations for Italian province. You want to use them to merge your data with polygon data. If you download Italy's polygons from GADM, you can obtain data that contain the abbreviations. Specifically, the column, HASC_2 is the one. You need to merge your data with the polygon data. Then, you want to create another data set which contains centroid. You can draw a map with the two data sets.
library(tidyverse)
library(sf)
library(ggthemes)
# Get the sf file from https://gadm.org/download_country_v3.html
# and import it in R.
mysf <- readRDS("gadm36_ITA_2_sf.rds")
# This is your data, which is called mydata.
mydata <- structure(list(abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"), value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L)), class = "data.frame", row.names = c(NA,
-10L))
abbs value
1 MI 8319
2 CO 537
3 MB 436
4 VA 338
5 BG 310
6 PV 254
7 CR 244
8 NO 210
9 RM 189
10 CS 179
# Abbreviations are in HASC_2 in mysf. Manipulate strings so that
# I can join mydata with mysf with the abbreviations. I also get
# longitude and latitude with st_centroid(). This data set is for
# geom_point().
mysf2 <- mutate(mysf, HASC_2 = sub(x = HASC_2, pattern = "^IT.", replacement = "")) %>%
left_join(mydata, by = c("HASC_2" = "abbs")) %>%
mutate(lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
# Draw a map
ggplot() +
geom_sf(data = mysf) +
geom_point(data = mysf2, aes(x = lon, y = lat, size = value)) +
theme_map()
UPDATE ON INSET MAP
This is an update following different suggestion on using inset maps, which I think it would be the best solution for yout question and comments:
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Get breaks for map
br=getBreaks(patients$value)
#Delimit zone
#Based on NUTS1, Nortwest Italy
par(mar=c(0,0,0,0))
ghostLayer(IT[grep("ITC",IT$NUTS_ID),], bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "topleft",
legend.title.txt = "Total patients",
add = TRUE,
legend.frame = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
#Inset
par(
fig = c(0, 0.4, 0.01, 0.4),
new = TRUE
)
inset=patients[patients$abbs %in% c("RM","CS"),]
ghostLayer(inset, bg="lightblue")
plot(st_geometry(EU), col="grey90", add=TRUE)
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464", add=TRUE)
choroLayer(
patients,
var = "value",
breaks = br,
col = carto.pal(pal1 = "red.pal", n1 = length(br)-1),
legend.pos = "n",
add = TRUE
)
labelLayer(patients,txt="abbs", halo=TRUE, overlap = FALSE)
box(which = "figure", lwd = 1)
#RESTORE PLOT
par(fig=c(0,1,0,1))
OLD ANSWER
Following my comment on plotting labels, maybe with circles is not the best option for your map, given the concentration. I suggest you to use another kind of map for that, as chorolayer, I leveraged on https://stackoverflow.com/users/3304471/jazzurro for the dataframe.
library(sf)
library(cartography)
EU = st_read("~/R/mapslib/EUROSTAT/NUTS_RG_03M_2016_3035_LEVL_3.geojson")
IT = subset(EU, CNTR_CODE == "IT")
mydata <-
structure(list(
abbs = c("MI", "CO", "MB", "VA", "BG", "PV", "CR",
"NO", "RM", "CS"),
value = c(8319L, 537L, 436L, 338L, 310L, 254L,
244L, 210L, 189L, 179L),
nuts = c("ITC4C","ITC42","ITC4D","ITC41",
"ITC46", "ITC48","ITC4A","ITC15",
"ITI43","ITF61")
),
class = "data.frame",
row.names = c(NA, -10L))
patients = merge(IT, mydata, by.x = "id", by.y = "nuts")
#Options1 - With circles
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
propSymbolsLayer(
x = patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Option 2 - Chorolayer with labels
par(mar = c(0, 0, 0, 0))
plot(st_geometry(IT), col = "#FEFEE9", border = "#646464")
choroLayer(
patients,
var = "value",
col = carto.pal(pal1 = "red.pal", n1 = 6),
legend.title.txt = "Total patients",
add = TRUE
)
#Create labels
patients$label = paste(patients$abbs, patients$value, sep = " - ")
labelLayer(
patients,
txt = "label",
overlap = FALSE,
halo = TRUE,
show.lines = TRUE,
)
Data from
https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/nuts-2016-files.html
I'm new to coding with R and I'm doing some analysis with categorical variables.
My data frame consists of answers from single respondents. I want to analyse the data by area. The data frame looks like this:
structure(list(area = c("chicago", "portland", "chicago", "detroit" ),
a1 = c("good", "bad", "good", "bad"),
a2 = c("good", "bad", "good", "bad"),
a3 = c("bad", "bad", "bad", "bad"),
weight = c(140.626215, 111.285163, 132.497397, 129.510583),
strata = c("male_ch_20", "female_po_40", "male_ch_70", "male_po_30")),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
What I'm doing so far is this to get the percentage out for all the areas together and each answer individually:
my_data_new <- my_data %>%
group_by(area, answer2) %>%
summarize(share = survey_mean()) %>%
ungroup() %>%
my_own_function() %>%
arrange(answer2)
my_data_new
and then plot this (the y value is produced by my_own_function)
ggplot() +
geom_bar(data = my_data_new, aes(x = area, y = percentage, fill = answer2), stat = "identity")
So far I've been typing manually the answer variable (answer1, then answer2, then answer3 and so on) into the group_by and also into the ggplot function but I would like to use a loop to do so.
I've done this to access the individual areas but don't know how to continue from here.
list_of_tempdata <- list()
unique_areas <- unique(my_data_new$area)
for(i in 1:length(unique_areas)){
list_of_tempdata[[i]] <- my_data_new[my_data_new$area==unique_area[i],]
}
I want to plot parts of a raster (highest values) on top of a previously plotted map. Here my code:
ggplot() + geom_map(data = GERblue_iran, aes(map_id = state, fill = veggies), map = iran_states) +
expand_limits(x = iran_states$long, y = iran_states$lat) +
geom_polygon(data=iran_states, aes(x= long, y= lat, group=group), color='grey40', lwd=0.3, fill=NA) +
scale_fill_distiller(palette="PuBu")
resulting in this plot:
And the code for the raster overlay:
ggplot() + geom_tile(data=nut_iran.df, aes(x = x, y = y, fill = layer)) + scale_fill_distiller(palette = "Reds")
My expected result should look (more or less) like this (result made in photoshop!!):
showing only values (of raster) above +/-1000.
Does anyone know how to plot like this in ggplot2?
For reproducibility, here a bit of my data:
library(ggplot2)
# polygons
iran <- getData("GADM", country = "IRN", level = 1)
iran_states <- fortify(iran, region = "NAME_1")
# data for choropleth map:
GERblue_iran <- structure(list(veggies = c(NA, 1135142.7169744, 1064475.14405642,
579007.139090945, 2291173.06203667, 1609487.86612194, 5514745.42173307,
210033.193615536, NA, 1082275.82518455, 395053.664034339, 833546.886449334,
1350410.79594876, 2030498.45168616, 5018327.9046678, 413119.296060151,
853322.135586823, 2136776.14200603, 581494.047168068, 535593.624579909,
414310.523642145, NA, NA, 2156369.86690811, 274390.590608389,
546804.909031463, 144406.95766963, 285002.432443622, 1605244.30546598,
307546.827903725, 589330.238261654), fruits = c(NA, 19645300.9396573,
39318516.6693754, 15154130.3351692, 38374281.8287458, 29164989.9985857,
125240289.719822, 6419392.00424945, NA, 23342736.5294504, 9223806.43987587,
19008972.7788205, 62709223.291618, 41703691.9392781, 164306013.73773,
13518682.0729514, 17420595.3934969, 44462391.2814304, 11715807.4374495,
13475005.0070146, 12228946.6624824, NA, NA, 63708363.0757236,
9221772.9477743, 13545791.4738047, 4268610.12809181, 6496251.74039526,
31651316.7352119, 8570276.47057257, 10288282.5059752), nuts = c(NA,
108771666.285736, 188713516.14938, 84626256.2539182, 227028948.551643,
165167762.523232, 669060935.751113, 17905599.826691, NA, 124536243.958677,
62369588.7036755, 123253859.776379, 3137384087.58166, 279956412.016931,
506078060.03775, 70275261.7698334, 115596090.695869, 284469721.056207,
73232219.7923014, 47691287.4633623, 73453936.1223698, NA, NA,
382631908.316345, 78226462.4088062, 60449633.6571361, 25656409.607032,
36523271.0224757, 233944364.555385, 158201233.377931, 74033085.0714528
), state = c("Alborz", "Ardebil", "Bushehr", "Chahar Mahall and Bakhtiari",
"East Azarbaijan", "Esfahan", "Fars", "Gilan", "Golestan", "Hamadan",
"Hormozgan", "Ilam", "Kerman", "Kermanshah", "Khuzestan", "Kohgiluyeh and Buyer Ahmad",
"Kordestan", "Lorestan", "Markazi", "Mazandaran", "North Khorasan",
"Qazvin", "Qom", "Razavi Khorasan", "Semnan", "Sistan and Baluchestan",
"South Khorasan", "Tehran", "West Azarbaijan", "Yazd", "Zanjan"
)), .Names = c("veggies", "fruits", "nuts", "state"), row.names = c(NA,
31L), class = "data.frame")
However, the raster data set is too large and I don't know how to make a random raster with this country extent. However, if someone knows how. Here is the transformation I did in order to plot it with geom_tile():
nut_iran.spdf <- as(nut_iran, "SpatialPixelsDataFrame")
nut_iran.df <- as.data.frame(nut_iran.spdf)
Thanks for your ideas.
EDIT: I managed to plot as #spacedman suggested the raster as I want, setting all values in nut_iran.df$layer < 1000 into NAs. And plot them inculding na.value="transparent" to scale_fill_distiller. Result which is fine:
However, plotting both together delivers:
due to the 2 different fills (pointed by #hrbrmstr) the color are set to last colors (red) and the previous NA from the choropleth map are as well setted transparent.
My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])
I am trying to plot some information that shows full population and then a subset of that population by location on a map. I've seen data visualizations that use concentric circles or 3-d inverted cones to convey this. I just can't figure out how to do it in ggplot / ggmap
Here's a free hand version in Paint that shows a rough idea of what I'm looking to do:
Here's a rough piece of data for an example:
> dput(df1)
structure(list(zip = c("00210", "00653", "00952", "02571", "04211",
"05286", "06478", "07839", "10090", "11559"), city = c("Portsmouth",
"Guanica", "Sabana Seca", "Wareham", "Auburn", "Craftsbury",
"Oxford", "Greendell", "New York", "Lawrence"), state = c("NH",
"PR", "PR", "MA", "ME", "VT", "CT", "NJ", "NY", "NY"), latitude = c(43.005895,
17.992112, 18.429218, 41.751554, 44.197009, 44.627698, 41.428163,
41.12831, 40.780751, 40.61579), longitude = c(-71.013202, -66.90097,
-66.18014, -70.71059, -70.239485, -72.434398, -73.12729, -74.678956,
-73.977182, -73.73126), timezone = c(-5L, -4L, -4L, -5L, -5L,
-5L, -5L, -5L, -5L, -5L), dst = c(TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), totalPop = c(43177, 37224, 37168,
15492, 1614, 88802, 2587, 80043, 78580, 87461), subPop = c(42705,
36926, 27556, 10827, 774, 39060, 1542, 21304, 53438, 2896)), .Names = c("zip",
"city", "state", "latitude", "longitude", "timezone", "dst",
"totalPop", "subPop"), row.names = c(1L, 50L, 200L, 900L, 1500L,
2000L, 2500L, 3000L, 3500L, 4000L), class = "data.frame")
Any suggestions?
The basic idea is to use separate geoms for the two populations, making sure the smaller one is plotted after the larger one, so its layer is on top:
library(ggplot2) # using version 0.9.2.1
library(maps)
# load us map data
all_states <- map_data("state")
# start a ggplot. it won't plot til we type p
p <- ggplot()
# add U.S. states outlines to ggplot
p <- p + geom_polygon(data=all_states, aes(x=long, y=lat, group = group),
colour="grey", fill="white" )
# add total Population
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = totalPop),
colour="#b5e521")
# add sub Population as separate layer with smaller points at same long,lat
p <- p + geom_point(data=df1, aes(x=longitude, y=latitude, size = subPop),
colour="#00a3e8")
# change name of legend to generic word "Population"
p <- p + guides(size=guide_legend(title="Population"))
# display plot
p
From the map, it is clear your data include non-contiguous-US locations, in which case you may want different underlying map data. get_map() from ggmap package provides a couple options:
require(ggmap)
require(mapproj)
map <- get_map(location = 'united states', zoom = 3, maptype = "terrain",
source = "google")
p <- ggmap(map)
After which you add the total and sub Population geom_point() layers and display it as before.