I have a specific question: How can I choose either fill or color of a ggplot according to the data of an SpatialPolygonsDataFrame-object? For example consider the following SpatialPolygonsDataFrame sf:
sf <- readShapePoly("somePolygonShapeFile")
It allows me to access the the example data field FK like:
sf$FK // or
sf#data$FK
Now, I want to prepare a simple ggplot:
p <- ggplot(sf, aes(x=long, y=lat, group=group, FK=???))
However, I don't know what to pass to FK in aes(). Experiences from gridded data frames (grid.extent(...)) made me think, I could directly put in FK=FK. This does not seem to work for SpatialPolygonsDataFrame-objects. Trying FK=sf$FK or FK=sf#data$FK is not allowed because:
Error: Aesthetics must either be length one, or the same length as the data
I guess, the solution is trivial, but I simply don't get it at the moment.
Thanks to #juba, #rsc and #SlowLearner I've found out, that the installation of gpclib was still missing to be able to give the gpclibPermit. With this done, fortifying sf using a specified region is not problem anymore. Using the explanation from ggplot2/wiki I am able to transfer all data fields of the original shapefile into a plotting-friendly dataframe. The latter finally works as was intendet for plotting the shapefile in R. Here is the final code with the actual workingDir-variable content left out:
require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
workingDir <- ""
sf <- readOGR(dsn=workingDir, layer="BK50_Ausschnitt005")
sf#data$id <- rownames(sf#data)
sf.points <- fortify(sf, region="id")
sf.df <- join(sf.points, sf#data, by="id")
ggplot(sf.df,aes(x=long, y=lat, fill=NFK)) + coord_equal() + geom_polygon(colour="black", size=0.1, aes(group=group))
First, you should use the readOGR function from the rgdal library to read your shapefile (then you won't have problems with gpclib). Here is an example of how to do that.
Second, are you trying to pass the sf object to ggplot as-is? If so, you need to use fortify() to convert your spatial object into a data frame. There should be some kind of identifying column in sf#data such as ID or NAME. So try something like:
sf.df <- fortify(df, region = "NAME")
...and use sf.df for plotting using ggplot.
Related
The task at hand is mapping the empprevyearpct value to a county map. The sample data is below.
library(tidyverse)
library(tigris)
countyname <- c("Carson City","Churchill County","Clark County","Douglas County","Elko County","Esmeralda County","Eureka County","Humboldt County","Lander County","Lincoln County","Lyon County","Mineral County","Nye County","Pershing County","Storey County","Washoe County","White Pine County")
prevyearpct <- c(.545,.541,.539,.401,.301,.201,.101,.001,.664,.604,.704,.123,.129,.130,.085,.015,.099)
data2 <- data.frame(countyname, prevyearpct)
Here is the code that I use from Tigris to get the shape file.
NV_counties <- counties(32)
I do not need the work done for me. If I was to map the prevyearpct values on to the counties, where would I start? Do I need to append the data so that the NV_Counties and data2 are one consolidated item? I have read quite a few articles/tutorials but nothing that uses tigris.
You can use geo_join() to join the two datasets together. After that, you can use geom_sf() to map it out (this guide may help).
I would like to plot my figure using R (ggplot2). I'd like to have a line graph like image 2.
here my.data:
B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,gen,xaxile
0.3795,0.4192,0.4675,0.5357,0.6217,T18-Yield,B50K
0.3178,0.3758,0.4249,0.5010,0.5870,T20-Yield,B50K+1000C50K
0.2795,0.3266,0.3763,0.4636,0.5583,T21-Yield,B50K+2000C50K
0.2417,0.2599,0.2898,0.3291,0.3736,T18-Fertility,B50K+4000C50K
0.2002,0.2287,0.2531,0.2962,0.3485,T19-Fertility,B50K+8000C50K
0.1642,0.1911,0.2151,0.2544,0.2951,T20-Fertility
***--> The delimiter is ",". By the way, I have not any useful .r script which would be helpful or useful.
The illustrated image shows my figure in Microsoft word.
I have tried several scripts via internet but non of them have not worked.
would you please help me to have a .r script to read my data file like img1 and plot my data like illustrated figure.
The trick is to reshape your data (using melt from the reshape2 package) so that you can easily map colours and linetypes to gen.
# Your data - note i also added an extra comma after the fifth column in row 6.
# It would be easier if you gave data using dput as described in comments above - thanks
dat <- read.table(text="B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,xaxile,gen
0.3795,0.4192,0.4675,0.5357,0.6217,B50K,T18-Yield
0.3178,0.3758,0.4249,0.5010,0.5870,B50K+1000C50K,T20-Yield
0.2795,0.3266,0.3763,0.4636,0.5583,B50K+2000C50K,T21-Yield
0.2417,0.2599,0.2898,0.3291,0.3736,B50K+4000C50K,T18-Fertility
0.2002,0.2287,0.2531,0.2962,0.3485,B50K+8000C50K,T19-Fertility
0.1642,0.1911,0.2151,0.2544,0.2951,,T20-Fertility",
header=T, sep=",", na.strings="")
# load the pckages you need
library(ggplot2)
library(reshape2)
# assume xaxile column is unneeded? - did you add this column yourself?
dat$xaxile <- NULL
# reshape data for plotting
dat.m <- melt(dat)
# plot
ggplot(dat.m, aes(x=variable, y=value, colour=gen,
shape=gen, linetype=gen, group=gen)) +
geom_point() +
geom_line()
You can then use scale_linetype_manual and scale_shape_manual to manually specify how you want the plot to look. This post will help, but there are many others as well
I have read so many threads and articles and I keep getting errors. I am trying to make a choropleth? map of the world using data I have from the global terrorism database. I want to color countries on a factor of nkills or just the number of attacks in that country.. I don't care at this point. Because there are so many countries with data, it is unreasonable to make any plots to show this data.
Help is strongly appreciated and if I did not ask this correctly I sincerely apologize, I am learning the rules of this website as I go.
my code (so far..)
library(maps)
library(ggplot2)
map("world")
world<- map_data("world")
gtd<- data.frame(gtd)
names(gtd)<- tolower(names(gtd))
gtd$country_txt<- tolower(rownames(gtd))
demo<- merge(world, gts, sort=FALSE, by="country_txt")
In the gtd data frame, the name for the countries column is "country_txt" so I thought I would use that but I get error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
If that were to work, I would plot as I have seen on a few websites..
I have honestly been working on this for so long and I have read so many codes/other similar questions/websites/r handbooks etc.. I will accept that I am incompetent when it comes to R gladly for some help.
Something like this? This is a solution using rgdal and ggplot. I long ago gave up on using base R for this type of thing.
library(rgdal) # for readOGR(...)
library(RColorBrewer) # for brewer.pal(...)
library(ggplot2)
setwd(" < directory with all files >")
gtd <- read.csv("globalterrorismdb_1213dist.csv")
gtd.recent <- gtd[gtd$iyear>2009,]
gtd.recent <- aggregate(nkill~country_txt,gtd.recent,sum)
world <- readOGR(dsn=".",
layer="world_country_admin_boundary_shapefile_with_fips_codes")
countries <- world#data
countries <- cbind(id=rownames(countries),countries)
countries <- merge(countries,gtd.recent,
by.x="CNTRY_NAME", by.y="country_txt", all.x=T)
map.df <- fortify(world)
map.df <- merge(map.df,countries, by="id")
ggplot(map.df, aes(x=long,y=lat,group=group)) +
geom_polygon(aes(fill=nkill))+
geom_path(colour="grey50")+
scale_fill_gradientn(name="Deaths",
colours=rev(brewer.pal(9,"Spectral")),
na.value="white")+
coord_fixed()+labs(x="",y="")
There are several versions of the Global Terrorism Database. I used the full dataset available here, and then subsetted for year > 2009. So this map shows total deaths due to terrorism, by country, from 2010-01-01 to 2013-01-01 (the last data available from this source). The files are available as MS Excel download, which I converted to csv for import into R.
The world map is available as a shapefile from the GeoCommons website.
The tricky part of making choropleth maps is associating your data with the correct polygons (countries). This is generally a four step process:
Find a field in the shapefile attributes table that maps (no pun intended) to a corresponding field in your data. In this case, it appears that the field "CNTRY_NAME" in the shapefile maps to the field "country_txt" in gtd database.
Create an association between ploygon IDs (stored in the row names of the attribute table), and the CNTRY_NAME field.
Merge the result with your data using CNTRY_NAME and country_txt.
Merge the result of that with the data frame created using the fortify(map) - this associates ploygons with deaths (nkill).
Building on the nice work by #jlhoward. You could instead use rworldmap that already has a world map in R and has functions to aid joining data to the map. The default map is deliberately low resolution to create a 'cleaner' look. The map can be customised (see rworldmap documentation) but here is a start :
library(rworldmap)
#3 lines from #jlhoward
gtd <- read.csv("globalterrorismdb_1213dist.csv")
gtd.recent <- gtd[gtd$iyear>2009,]
gtd.recent <- aggregate(nkill~country_txt,gtd.recent,sum)
#join data to a map
gtdMap <- joinCountryData2Map( gtd.recent,
nameJoinColumn="country_txt",
joinCode="NAME" )
mapDevice('x11') #create a world shaped window
#plot the map
mapCountryData( gtdMap,
nameColumnToPlot='nkill',
catMethod='fixedWidth',
numCats=100 )
Following a comment from #hk47, you can also add the points to the map sized by the number of casualties.
deaths <- subset(x=gtd, nkill >0)
mapBubbles(deaths,
nameX='longitude',
nameY='latitude',
nameZSize='nkill',
nameZColour='black',
fill=FALSE,
addLegend=FALSE,
add=TRUE)
Have a Question on Mapping with R, specifically around the choropleth maps in R.
I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).
My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.
I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?
To be a little more detailed, here's the point where R stops working -
shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")
(where the shp file is the census/TIGER) shape file.
Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output
require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")
Error: cannot allocate vector of size 317 Kb
There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.
The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.
This map:
was made from your data using the following code.
library(rgdal)
library(ggplot2)
library(stringr)
library(RColorBrewer)
setwd("<directory containing shapfiles and sample data>")
data <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros
zips <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
map <- zips[zips$ZCTA5CE10 %in% data$ZIP,] # extract only zips in your Sample.csv
map.df <- fortify(map) # convert to data frame suitable for plotting
# merge data from Samples.csv into map data frame
map.data <- data.frame(id=rownames(map#data),ZIP=map#data$ZCTA5CE10)
map.data <- merge(map.data,data,by="ZIP")
map.df <- merge(map.df,map.data,by="id")
# load state boundaries
states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
states.df <- fortify(states) # convert to data frame suitable for plotting
ggMap <- ggplot(data = map.df, aes(long, lat, group = group))
ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
ggMap <- ggMap + coord_equal()
ggMap
Explanation:
The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map#polygons and map#data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).
So the basic workflow is as follows:
1. Load polygon shapefiles into Spatial object ( => zips)
2. Subset if appropriate ( => map).
3. Convert to data frame suitable for plotting ( => map.df).
4. Merge data from Sample.csv into map.df.
5. Draw the map.
Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.
Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map#data. The zip codes are stored in map#data, in column ZCTA5CE10. So first we must create a data frame that associates the map#data row names (id) with map#data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.
Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.
NB: The state boundary layer, uses the US States TIGER file.
I would advise the following.
Use readOGR from the rgdal package rather than readShapeSpatial.
Consider using ggplot2 for good-looking maps - many of the examples use this.
Refer to one of the existing examples of creating a choropleth such as this one to get an overview.
Start with a simple choropleth and gradually add your own data; don't try and get it all right at once.
If you need more help, create a reproducible example with a SMALL fake dataset and with links to the shapefiles in question. The idea is that you make it easy to help us help you rather than discourage us by not supplying code and data in your question.
I want to to convert two .shp files into one database that would allow me to draw the maps together.
Also, is there a way to convert .shp files into .csv files? I want to be able to personalize and add some data which is easier for me under a .csv format. What I have in mind if to add overlay yield data and precipitation data on the maps.
Here are the shapefiles for Morocco, and Western Sahara.
Code to plot the two files:
# This is code for mapping of CGE_Morocco results
# Loading administrative coordinates for Morocco maps
library(sp)
library(maptools)
library(mapdata)
# Loading shape files
Mor <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
Sah <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
# Ploting the maps (raw)
png("Morocco.png")
Morocco <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
plot(Morocco)
dev.off()
png("WesternSahara.png")
WesternSahara <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
plot(WesternSahara)
dev.off()
After looking into suggestions from #AriBFriedman and #PaulHiemstra and subsequently figuring out how to merge .shp files, I have managed to produce the following map using the following code and data (For .shp data, cf. links above)
code:
# Merging Mor and Sah .shp files into one .shp file
MoroccoData <- rbind(Mor#data,Sah#data) # First, 'stack' the attribute list rows using rbind()
MoroccoPolys <- c(Mor#polygons,Sah#polygons) # Next, combine the two polygon lists into a single list using c()
summary(MoroccoData)
summary(MoroccoPolys)
offset <- length(MoroccoPolys) # Next, generate a new polygon ID for the new SpatialPolygonDataFrame object
browser()
for (i in 1: offset)
{
sNew = as.character(i)
MoroccoPolys[[i]]#ID = sNew
}
ID <- c(as.character(1:length(MoroccoPolys))) # Create an identical ID field and append it to the merged Data component
MoroccoDataWithID <- cbind(ID,MoroccoData)
MoroccoPolysSP <- SpatialPolygons(MoroccoPolys,proj4string=CRS(proj4string(Sah))) # Promote the merged list to a SpatialPolygons data object
Morocco <- SpatialPolygonsDataFrame(MoroccoPolysSP,data = MoroccoDataWithID,match.ID = FALSE) # Combine the merged Data and Polygon components into a new SpatialPolygonsDataFrame.
Morocco#data$id <- rownames(Morocco#data)
Morocco.fort <- fortify(Morocco, region='id')
Morocco.fort <- Morocco.fort[order(Morocco.fort$order), ]
MoroccoMap <- ggplot(data=Morocco.fort, aes(long, lat, group=group)) +
geom_polygon(colour='black',fill='white') +
theme_bw()
Results:
New Question:
1- How to eliminate the boundaries data that cuts though the map in half?
2- How to combine different regions within a .shp file?
Thanks you all.
P.S: the community in stackoverflow.com is wonderful and very helpful, and especially toward beginners like :) Just thought of emphasizing it.
Once you have loaded your shapefiles into Spatial{Lines/Polygons}DataFrames (classes from the sp-package), you can use the fortify generic function to transform them to flat data.frame format. The specific functions for the fortify generic are included in the ggplot2 package, so you'll need to load that first. A code example:
library(ggplot2)
polygon_dataframe = fortify(polygon_spdf)
where polygon_spdf is a SpatialPolygonsDataFrame. A similar approach works for SpatialLinesDataFrame's.
The difference between my solution and that of #AriBFriedman is that mine includes the x and y coordinates of the polygons/lines, in addition to the data associated to those polgons/lines. I really like visualising my spatial data with the ggplot2 package.
Once you have your data in a normal data.frame you can simply use write.csv to generate a csv file on disk.
I think you mean you want the associated data.frame from each?
If so, it can be accessed with the # slot access function. The slot is called data:
write.csv( WesternSahara#data, file="/home/wherever/myWesternSahara.csv")
Then when you read it back in with read.csv, you can try assigning:
myEdits <- read.csv("/home/wherever/myWesternSahara_modified.csv")
WesternSahara#data <- myEdits
You may need to do some massaging of row names and so forth to get it to accept the new data.frame as valid. I'd probably try to merge the existing data.frame with a csv you read in in R, rather than making edits destructively....