I would like to use somehow meaningful centers for labels etc. Here is my code so far:
import cartopy.crs as ccrs
from cartopy.io import shapereader as shpreader
import matplotlib.pyplot as plt
geo_axes = plt.axes((2, 2, 2, 2), projection=ccrs.PlateCarree(central_longitude=0.0, globe=None))
geo_axes.set_global()
file = shpreader.natural_earth(resolution='10m', category='cultural', name='admin_0_countries')
all_countries = list(shpreader.Reader(file).records())
country_centroids = {country.attributes['SU_A3']: (country.geometry.centroid.y, country.geometry.centroid.x) for country in all_countries if country.attributes['SU_A3']}
for label in {'DEU', 'FRA', 'USA'}:
geo_axes.plot(country_centroids[label][1], country_centroids[label][0], marker='x', color='red')
geo_axes.add_geometries([country.geometry for country in countries if country.attributes['SU_A3'] in {'DEU', 'FRA', 'USA'}], ccrs.PlateCarree(), facecolor='yellow', edgecolor='black', zorder=0)
geo_axes.add_geometries([country.geometry for country in countries if country.attributes['SU_A3'] not in {'DEU', 'FRA', 'USA'}], ccrs.PlateCarree(), facecolor='0.9', edgecolor='black', zorder=0)
For Germany, e.g., this is straight forward (no oversea regions, etc.) and it works. However, for USA, for example, it's already bad and, for example, for France it is very misleading (i.e. France's center is in Spain).
How can I fix that?
Related
Let's consider this HexLayer example using PyDeck in StreamLit:
import numpy as np
import pandas as pd
import pydeck as pdk
import streamlit as st
lat0=40.7
lon0=-74.1201062
n_points = 1000
lat = np.random.normal(loc=lat0, scale=0.02, size=n_points)
lon = np.random.normal(loc=lon0, scale=0.02, size=n_points)
data = pd.DataFrame({'lat': lat, 'lon': lon})
st.pydeck_chart(pdk.Deck(
map_provider="mapbox",
initial_view_state=pdk.ViewState(
latitude=lat0,
longitude=lon0,
zoom=10,
),
layers=[
pdk.Layer(
'HexagonLayer',
data=data,
get_position='[lon, lat]',
radius=1000,
coverage=0.6,
),
],
))
Here's the output:
Is there a way to only display the hexagonal bis with a count above a given threshold, say counts>5?
Similarly, is it possible to set a logarithmic scale for the color/height of the hexagons?
Trying to make a choropleth map in plotly using some data I have in a csv file. Have created This is what i get in result(my map)
Below are the coding that I have did to the work:
import json
import pandas as pd
import plotly.express as px
asean_country = json.load(open("aseancovidmap.geojson","r"))
df= pd.read_csv("covidcases.csv")
df["iso-2"]=df['Country'].apply(lambda x: id_map[x])
id_map={}
for feature in asean_country['features']:
feature['id']= feature['properties']['sform']
id_map[feature['properties']['name']]=feature['id']
figure=px.choropleth(df,locations='iso-2',locationmode='country names',geojson=asean_country,color='Ttlcases',scope='asia',title='Total COVID 19 cases in ASEAN Countries as on 10/1/2022')
figure.show()
clearly I don't have access to your files, so have sourced geometry and COVID data. For reference this is at end of answer.
the key change I have made. *Don't loop over geojson Define locations as column in dataframe and featureidkey
clearly this is coloring countries
solution
import json
import pandas as pd
import plotly.express as px
# asean_country = json.load(open("aseancovidmap.geojson","r"))
asean_country = gdf_asean.rename(columns={"adm0_a3": "iso_a2"}).__geo_interface__
# df= pd.read_csv("covidcases.csv")
df = gdf_asean_cases.loc[:, ["iso_code", "adm0_a3", "total_cases", "date"]].rename(
columns={"iso_code": "iso_a2", "total_cases": "Ttlcases"}
)
figure = px.choropleth(
df,
locations="iso_a2",
featureidkey="properties.iso_a2",
geojson=asean_country,
color="Ttlcases",
title="Total COVID 19 cases in ASEAN Countries as on 10/1/2022",
).update_geos(fitbounds="locations", visible=True).update_layout(margin={"t":40,"b":0,"l":0,"r":0})
figure.show()
data sourcing
import requests, io
import geopandas as gpd
import pandas as pd
# get asia geometry
gdf = gpd.read_file(
"https://gist.githubusercontent.com/hrbrmstr/94bdd47705d05a50f9cf/raw/0ccc6b926e1aa64448e239ac024f04e518d63954/asia.geojson"
)
# get countries that make up ASEAN
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_ASEAN_countries_by_GDP")[1].loc[1:]
# no geometry for singapore.... just ASEAN geometry
gdf_asean = (
gdf.loc[:, ["admin", "adm0_a3", "geometry"]]
.merge(
df.loc[:, ["Country", "Rank"]], left_on="admin", right_on="Country", how="right"
)
)
# get COVID data
dfall = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
# filter to last date in data
dfall["date"] = pd.to_datetime(dfall["date"])
dflatest = dfall.groupby(["iso_code"], as_index=False).last()
# merge geometry and COVID data
gdf_asean_cases = gdf_asean.merge(
dflatest.loc[:, ["iso_code", "total_cases", "date"]], left_on="adm0_a3", right_on="iso_code"
)
pardon me if it is a basic question, this is my first time to write here, so my thanks in advance.
I have exported a report from Google Analytics with columns Longitude, Latitude and Sessions and I want to add these data points to polygon map I have created in R for administrative regions of Slovakia.
This is what I have for now.
##Load the Raster Library
library(raster)
##Get the Province Shapefile for Slovakia
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
slovakia_level_2 <- getData('GADM', country='SVK', level=2)
##Plot this shapefile
plot(slovakia_level_1)
library(ggmap) ##load the ggmap package so we can access the crime data
## read our dataset with sessions from google analytics ( more on how to read excel files http://www.sthda.com/english/wiki/reading-data-from-excel-files-xls-xlsx-into-r)
library(readxl) ## this is the dataframe from google analytics and i would like to plot these data to the slovakia administrtaive region map
lugera <- read_excel("Analytics 01. [Lugera.sk] - [Reporting View] - [Filtered Data] New Custom Report 20190101-20190627.xlsx")
But i really do not know how to move on. I went based on this article http://data-analytics.net/wp-content/uploads/2014/09/geo2.html but i have stuck when i needed to plot points.
This is a sample how the report from google analytics looks like:
Longitude Latitude Sessions
17.1077 48.1486 25963
0.0000 0.0000 13366
21.2611 48.7164 4732
18.7408 49.2194 3154
21.2393 49.0018 2597
18.0335 48.8849 2462
19.1462 48.7363 2121
17.5833 48.3709 1918
18.0764 48.3061 1278
14.4378 50.0755 1099
20.2954 49.0511 715
18.1571 47.9882 663
18.6245 48.7745 653
17.8272 48.5918 620
18.9190 49.0617 542
19.1371 48.5762 464
-6.2603 53.3498 369
18.1700 48.5589 369
20.5637 48.9453 325
-0.1278 51.5074 284
21.9184 48.7557 258
Can someone help me how to progress from here as I am struggling to figure it out how to plot those points on polygon map.
Is it also possible to create a heat map over particular regions as well, please?
I hope it was clear, but if not, please tell me, i will improve my question, this is my first time to ask.
Thank you very much!
UPDATE
I was trying to reproduce Jay`s answer and the first map with red dots works awesome! Thanks!
But in case of the heat map I am getting errors and cannot reproduce the same map as I am getting several erros.
Belowe is my code how it looks like and I am not sure where is the issue as I tried to name my dataframe as ses the same way as in jay`s answer.
##Load the Raster Library
library(raster) # imports library(sp)
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
##Plot
plot(slovakia_level_1)
points(coordinates(slovakia_level_2), pch=20, col="red")
#ses is my google analytics dataframe where all 3 columns Longitude, Latitude and Sessions are numeric
## it is imported excel file to r and stored as a dataframe
ses
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
proj4string=CRS(proj4string(slovakia_level_2)))
ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
These are the errors I am getting
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
+ proj4string=CRS(proj4string(slovakia_level_2)))
Error in proj4string(slovakia_level_2) :
object 'slovakia_level_2' not found
> ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
Error in aggregate(x = spdf["Sessions"], by = slovakia_level_2, FUN = sum) :
object 'spdf' not found
> spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
Error in spplot(ppl.sum, "Sessions", main = "Sessions in Slovakia") :
object 'ppl.sum' not found
Please, take my huge thanks for being so helpful on my first question and I cannot express my respect to all people at StackOverflow.
Thank you
Actually there's a coordinates() function included in the sp package (imported from raster), with which we easily can add the points to the plot.
library(raster) # imports library(sp)
slovakia_level_1 <- getData('GADM', country='SVK', level=1)
slovakia_level_2 <- getData('GADM', country='SVK', level=2)
##Plot
plot(slovakia_level_1)
points(coordinates(slovakia_level_2), pch=20, col="red")
To get a heatmap using your google analytics data (here ses) we can use spplot(), also included in sp. First we need to create a SpatialPointsDataFrame, which - according to this post on gis.stackexchange - we aggregate to match ses$Sessionspoints and polygons from slovakia_level_2.
spdf <- SpatialPointsDataFrame(coords=ses[1:2], data=ses[3],
proj4string=CRS(proj4string(slovakia_level_2)))
ppl.sum <- aggregate(x=spdf["Sessions"], by=slovakia_level_2, FUN=sum)
spplot(ppl.sum, "Sessions", main="Sessions in Slovakia")
Result
Data
# your data from google analytics above
ses <- structure(list(Longitude = c(17.1077, 0, 21.2611, 18.7408, 21.2393,
18.0335, 19.1462, 17.5833, 18.0764, 14.4378, 20.2954, 18.1571,
18.6245, 17.8272, 18.919, 19.1371, -6.2603, 18.17, 20.5637, -0.1278,
21.9184), Latitude = c(48.1486, 0, 48.7164, 49.2194, 49.0018,
48.8849, 48.7363, 48.3709, 48.3061, 50.0755, 49.0511, 47.9882,
48.7745, 48.5918, 49.0617, 48.5762, 53.3498, 48.5589, 48.9453,
51.5074, 48.7557), Sessions = c(25963L, 13366L, 4732L, 3154L,
2597L, 2462L, 2121L, 1918L, 1278L, 1099L, 715L, 663L, 653L, 620L,
542L, 464L, 369L, 369L, 325L, 284L, 258L)), row.names = c(NA,
-21L), class = "data.frame")
The simplest way to do it would be this (slov_df is your dataset):
library(sp)
library(ggplot2)
slov_reg <- fortify(slovakia_level_2)
ggplot() +
geom_polygon(data = slov_reg, aes(x = long, y = lat, group = group), col = "black", fill = NA) +
geom_point(data = slov_df, aes(x = Longitude, y = Latitude))
EDIT:
Nice solution by jay.sf. If you like this let me provide another option:
sp_google <- SpatialPointsDataFrame(coords=slov_df[1:2], data=slov_df[3],
proj4string=CRS(proj4string(slovakia_level_2)))
slovakia_level_2#data$Sessions <- over(slovakia_level_2, sp_google, fn = sum)$Sessions
slovakia_level_2#data$id <- row.names(slovakia_level_2#data)
slov_reg <- fortify(slovakia_level_2, region = "id")
slov_reg <- join(slov_reg, slovakia_level_2#data, by="id")
ggplot() +
geom_polygon(data = slov_reg, aes(x = long, y = lat, group = group, fill = Sessions), col = "black") +
scale_fill_gradient(low = "yellow", high = "red", na.value = "lightgrey") +
theme_bw()
It's a little bit more work, but in the end ggplot offers you a much wider range of customization options. It's a question of your preference.
I'm trying to use tm_facets to display data (in this case on maize yields) in 2005, 2050, and 2080. The test.RDS file is available here.
library(tmap)
map.temp <- readRDS("test.RDS")
title <- "Maize rainfed yield <br> (mt/ha)"
legend_title <- "(mt/ha)"
breaks <- c(1.0, 2139.2, 4277.5, 6415.8, 8554)
tm_shape(map.temp) +
tm_polygons(col = "value", title = legend_title) +
tm_facets(by = "year") +
tm_layout(main.title = title) +
tm_view(view.legend.position = c("left", "bottom"))
The code above does this, but displays the data in the wrong polygon and wrong years. To see this, run the script and click the dark red area in northeast Canada. The popup in all three maps says AMR_RUS with value of 5,634, but the colors are different. View the map.temp file (I'm using Rstudio to do all of this). Filter on FPU with AMR_RUS. The 2005 value is 6,047, 2050 is 5634 and 2080 is 4406 (climate change will reduce yields in this area). Next look at the first couple of entries in the geometry column. The lat long coordinates are for a region along the Chinese-Russian border. The Amur River makes up that border and AMR_RUS FPU (food production unit) is to the north of the Amur River in Russia.
Is the problem with my code or data or the tm_facet function in tmap?
Unfortunately, I can't figure out a solution with tmap, and not sure why is doing that misplacing of polygon names and values in the popup. UPDATE: seems that this was a tmap bug, which was immediately fixed - see tmap issue 268.
I know you asked for tmap solution, but, alternatively, could be worth exploring a solution with mapview - check this out and see if it works for you:
library(mapview)
breaks <- c(1.0, 2139.2, 4277.5, 6415.8, 8554)
m_2005 <- mapview(map.temp[map.temp$year == 2005, ],
zcol = "value",
at = breaks,
layer.name = "2005 - mt/ha")
m_2050 <- mapview(map.temp[map.temp$year == 2050, ],
zcol = "value",
at = breaks,
layer.name = "2050 - mt/ha")
m_2080 <- mapview(map.temp[map.temp$year == 2080, ],
zcol = "value",
at = breaks,
layer.name = "2080 - mt/ha")
sync(m_2005, m_2050, m_2080) # add ncol = 1, if you wish 1 column representation
I've been mapping a choropleth map and when I compare the colours drawn to the numbers they re assigned to it doesn't fit.
so here is my data
zip latitude longitude count2.x count2.freq reg colorBuckets colma
99501 61.21680 -149.87828 AK 67 Alaska 1 #EDF8FB
35010 32.90343 -85.92669 AL 1582 Alabama 3 #99D8C9
90001 33.97291 -118.24878 CA 20970 California 6 #006D2C
20001 38.90771 -77.01732 DC 952 NA 2 #CCECE6
so the code from the beginning that I've been using is the following
library("zipcode")
library("maps")
library("plyr")
library("RColorBrewer")
colors=brewer.pal(6, "BuGn")
data(zipcode)
merge1<-merge(zipcode, tel2, by.x='zip', by.y='zip_code', all.y=TRUE)
result<- ddply(merge1, .(state), transform, count2 = count(state))
#remove NA's#
final<- result[complete.cases(result),]
#remove duplicates#
nodupl <- final[!duplicated(final$state),]
#add state to abbreviations#
nodupl$reg<-state.name[match(nodupl$count2.x, state.abb)]
#intervalle bestimmen#
nodupl$colorBuckets<- as.numeric(cut(nodupl$count2.freq, c(0,500,1000,5000,10000,15000,22000)))
#intervall legend#
text<- c("< 500", "500 - 999", "1000 - 4999","5000 - 9999", "10000 - 14999", "15000 - 22000")
#see what color is assign to where#
nodupl$colma<- colors[nodupl$colorBuckets]
map("state",regions=nodupl$reg, exact=FALSE, col = colors[nodupl$colorBuckets], fill = TRUE,resolution = 0,lty = 0)
map("state",col = "black",fill=FALSE,add=TRUE,lty=1,lwd=1)
#legend plotten#
par(xpd=TRUE)
legend("bottomleft", text, horiz=FALSE, fill=colors)
so the problem is again if I see the legend the colours are assigned right but if I double check the numbers in my column (row count2.freq) with the colours on the map they don't fit. e.g california is really light colored but it should be dark. does anyone see what has been done wrong. plus I have some trouble with positioning the legend the way I have it is right on the map so Idon't see the map anymore. what could I do about that?
thanks for your help even though its saturday
The region Alaska is not available in the map function. Hence, your map shows the second and third entry of nodupl$reg (i.e., Alabama and California). But the first and second of your colours are used.
To print these states in the desired colours, use the command
map("state", regions=nodupl$reg[2:3], exact=FALSE,
col = colors[nodupl$colorBuckets][2:3], fill = TRUE,resolution = 0,lty = 0)
But I recommend searching a map of Alaska too.