I'm new to R and I'm trying to plot a data frame of county values using the usmap library.
I have a dataset containing all the FIPS (county codes) for a particular region and the data (deaths) that I want to show on the map.
This is my first R script so what I'm trying to accomplish is likely pretty easy and I just have no idea what I'm doing.
I'm pretty sure the error I'm receiving is because I haven't specified any kind of coloring to apply to the data? I'm unsure.
Here's my code - note that I'm trying to initially just plot one frame of data (a particular date):
library(usmap)
library(ggplot2)
library(RColorBrewer)
#set working directory
setwd("C:/Users/Name/Documents/RScripts/")
#input data from file separated by commas
usa.dat <- read.csv("covid.csv", header = T)
#Start at 1/22/2020
#End at 10/8/2021
plot_usmap(regions = "counties",
data=usa.dat$countyFIPS,
values=usa.dat$X1.22.2020,
) +
theme(panel.background = element_rect(color = "black", fill = "black"))
Here's the data:
The error I'm getting is Error in if (is.null(geom_args[["fill"]]) & nrow(data) == 0) { : argument is of length zero
When I remove the data/values lines from the function, I get a map that looks like this:
Any help is greatly appreciated!
Ideally, I'd like to animate each frame of the data with color scales; if you guys can help me with that, I'd appreciate it!
Edit:
Okay so, I've been working on this for awhile and I managed to get it working. I have no idea how to update the color scales/gradients that are used, however.
I got it to loop through the data and save a bunch of plots, so that's pretty awesome! Just need to figure out how to change the colors/scales if anyone can help!
I got it figured out (with some help from Ben!)
library(usmap)
library(ggplot2)
library(viridis)
library(RColorBrewer)
library(stringr)
library(stringi)
#set working directory
setwd("C:/Users/Tyrael/Documents/RScripts/")
#input data from file separated by commas
usa.dat <- read.csv("ccovid.csv", header = T)
for (i in 2:ncol(usa.dat)) {
da <- data.frame(fips=usa.dat$countyFIPS,val=usa.dat[,i])
da$val <- cut(da$val,breaks=c(0,100,500,1000,5000,20000,30000),labels=c("1-100","100-500","500-1K","1K-5K","5K-20K","20K-30K"))
theDate <- substr(names(usa.dat)[i],2,100)
plot_usmap(regions = "counties",
data=da,
values="val"
) +
labs(title=paste("Covid-19 Deaths ",str_replace_all(theDate,"\\.","/"),sep='')) +
scale_fill_viridis(name="Deaths",discrete=TRUE,na.translate=F) +
theme(panel.background = element_rect(color = "#101010", fill = "#101010"))
ggsave(paste(sprintf("%03d",i),".png",sep=''))
}
This splits everything up into a legend that looks like this:
The files are output in sequential order and, as a bonus, I'll show how to combine in ffmpeg:
ffmpeg -framerate 15 -i "C:\Users\Tyrael\Documents\RScripts\vpublish\%03d.png" -codec copy out.mkv
And to get it into .mp4 format:
ffmpeg -i out.mkv -codec copy out.mp4
Results: https://www.youtube.com/watch?v=-z3LL5j__es
Related
I am trying to work with municipality data in Norway, and I'm totally new to QGIS, shapefiles and plotting this in R. I download the municipalities from here:
Administrative enheter kommuner / Administrative units municipalities
Reproducible files are here:
Joanna's github
I have downloaded QGIS, so I can open the GEOJson file there and convert it to a shapefile. I am able to do this, and read the data into R:
library(sf)
test=st_read("C:/municipality_shape.shp")
head(test)
I have on my own given the different municipalities different values/ranks that I call faktor, and I have stored this classification in a dataframe that I call df_new. I wish to merge this "classification" on to my "test" object above, and wish to plot the map with the classification attribute onto the map:
test33=merge(test, df_new[,c("Kommunekode_str","faktor")],
by=c("Kommunekode_str"), all.x=TRUE)
This works, but when I am to plot this with tmap,
library(tmap)
tmap_mode("view")
tm_shape(test33) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
it throws this error:
Error in object[-omit, , drop = FALSE] : incorrect number of
dimensions
If I just use base plot,
plot(test33)
I get the picture:
You see I get three plots. Does this has something to do with my error above?
I think the main issue here is that the shapes you are trying to plot are too complex so tmap is struggling to load all of this data. ggplot also fails to load the polygons.
You probably don't need so much accuracy in your polygons if you are making a choropleth map so I would suggest first simplifying your polygons. In my experience the best way to do this is using the package rmapshaper:
# keep = 0.02 will keep just 2% of the points in your polygons.
test_33_simple <- rmapshaper::ms_simplify(test33, keep = 0.02)
I can now use your code to produce the following:
tmap_mode("view")
tm_shape(test_33_simple) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
This produces an interactive map and the colour scheme is not ideal to tell differences between municipalities.
static version
Since you say in the comments that you are not sure if you want an interactive map or a static one, I will give an example with a static map and some example colour schemes.
The below uses the classInt package to set up breaks for your map. A popular break scheme is 'fisher' which uses the fisher-jenks algorithm. Make sure you research the various different options to pick one that suits your scenario:
library(ggplot2)
library(dplyr)
library(sf)
library(classInt)
breaks <- classIntervals(test_33_simple$faktor, n = 6, style = 'fisher')
#label breaks
lab_vec <- vector(length = length(breaks$brks)-1)
rounded_breaks <- round(breaks$brks,2)
lab_vec[1] <- paste0('[', rounded_breaks[1],' - ', rounded_breaks[2],']')
for(i in 2:(length(breaks$brks) - 1)){
lab_vec[i] <- paste0('(',rounded_breaks[i], ' - ', rounded_breaks[i+1], ']')
}
test_33_simple <- test_33_simple %>%
mutate(faktor_class = factor(cut(faktor, breaks$brks, include.lowest = T), labels = lab_vec))
# map
ggplot(test_33_simple) +
geom_sf(aes(fill = faktor_class), size= 0.2) +
scale_fill_viridis_d() +
theme_minimal()
I am trying to make online plotting of spatial data using plotly in R but I got an error message "Request Entity Too Large This file is too big! Your current subscription is limited to 524 KB uploads." Any clue about how to solve that? In oder to reproduce my code, you need to (i) register on plotly and (ii) download shapefiles of French departments available on my github repo. The 3 files should be in a folder that is named shapefile. It seems to me that is the ggplot2 function geom_sf that produces files that are too large. My code is below
require(tidyverse)
require(ggplot2)
#Info required for online plotting
Sys.setenv("plotly_username"="replace_by_your_username")
Sys.setenv("plotly_api_key"="replace_by_your_apikey")
#Read shapefile
dep <- sf::st_read("replace_with_the_correctPATH/shapefile/DEPARTEMENT.shp")
#Variable to plot
zz<-runif(length(dep$CODE_DEPT),-10,3)
#ggplot2 object
gg <- dep %>%
mutate(discrete = cut(zz, c(-10, seq(-3, 3, by = 1)))) %>%
ggplot() +
geom_sf(aes(fill = discrete, text = paste("Department:", dep$CODE_DEPT, "<br>", "bli", zz))) +
scale_fill_brewer(palette = "PuOr", name = "bla")
#Plotting the figure on your local computer works
#plotly::ggplotly(gg, tooltip = c("text"))
#Generate an error message
plotly::api_create(gg, tooltip = c("text"),filename = "sthing")
This doesn't appear to be your issue but I experienced the same thing. For future readers, my issue seemed to be that the dataframe I was using (say, 100 rows) was subset from a large dataset (15,000 rows) which was larger than the file limit.
Although my subset was quite small and well within the upload limit, I had to save the subset as a csv, load it back in and use that new loaded dataframe as my plotly upload. Even though the imported dataframe was the same row count as the subset dataframe, I had to break the subset connection to the original larger dataset, don't know why.
I'm attempting to step through a dataset and create a histogram and summary table for each factor and save the output as a .svg . The histogram is created using ggplot2 and the summary table using summary().
I have successfully used the code below to save the output to a single .pdf with each page containing the relevant histogram/table. However, when I attempt to save each histogram/table combo into a set of .svg images using ggsave only the ggplot histogram is showing up in the .svg. The table is just white space.
I've tried using dev.copy Cairo and svg but all end up with the same result: Histogram renders, but table does not. If I save the image as a .png the table shows up.
I'm using the iris data as a reproducible dataset. I'm not using R-Studio which I saw was causing some "empty plot" grief for others.
#packages used
library(ggplot2)
library(gridExtra)
library(gtable)
library(Cairo)
#Create iris histogram plot
iris.hp<-ggplot(data=iris, aes(x=Sepal.Length)) +
geom_histogram(binwidth =.25,origin=-0.125,
right = TRUE,col="white", fill="steelblue4",alpha=1) +
labs(title = "Iris Sepal Length")+
labs(x="Sepal Length", y="Count")
iris.list<-by(data = iris, INDICES = iris$Species, simplify = TRUE,FUN = function(x)
{iris.hp %+% x + ggtitle(unique(x$Species))})
#Generate list of data to create summary statistics table
sum.str<-aggregate(Sepal.Length~Species,iris,summary)
spec<-sum.str[,1]
spec.stats<-sum.str[,2]
sum.data<-data.frame(spec,spec.stats)
sum.table<-tableGrob(sum.data)
colnames(sum.data) <-c("species","sep.len.min","sep.len.1stQ","sep.len.med",
"sep.len.mean","sep. len.3rdQ","sep.len.max")
table.list<-by(data = sum.data, INDICES = sum.data$"species", simplify = TRUE,
FUN = function(x) {tableGrob(x)})
#Combined histogram and summary table across multiple plots
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list,table.list))),
nrow=2, ncol=1, top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
#bypass the class check per #baptiste
ggsave <- ggplot2::ggsave; body(ggsave) <- body(ggplot2::ggsave)[-2]
#
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35),
top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
ggsave(filename,multi.plots)
#dev.off()
}
Edit removed theme tt3 that #rawr referenced. It was accidentally left in example code. It was not causing the problem, just in case anyone was curious.
Edit: Removing previous answer regarding it working under 32bit install and not x64 install because that was not the problem. Still unsure what was causing the issue, but it is working now. Leaving the info about grid.export as it may be a useful alternative for someone else.
Below is the loop for saving the .svg's using grid.export(), although I was having some text formatting issues with this (different dataset).
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35), top =quote(paste(iris$labels$Species,'\nPage', g,
'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
grid.draw(multi.plots)
grid.export(filename)
grid.newpage()
}
EDIT: As for using arrangeGrob per #baptiste's comment. Below is the updated code. I was incorrectly using the single brackets [] for the returned by list, so I switched to the correct double brackets [[]] and used grid.draw to on the ggsave call.
for(i in 1:3){
prefix<-unique(iris$Species)
prefix<-prefix[i]
multi.plots<-grid.arrange(arrangeGrob(iris.list[[i]],table.list[[i]],
nrow=2,ncol=1,top = quote(paste(iris$labels$Species))))
filename<-paste(prefix,".svg",sep="")
ggsave(filename,grid.draw(multi.plots))
}
I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)
I have a set of data which contains around 150,000 observations of 800 subjects. Each observation has: subject ID, latitude, longitude, and the time that the subject was at those coordinates. The data covers a 24-hour period.
If I plot all the data at once I just get a blob. Is anyone able to give me some tips as to how I can animate this data so that I can observe the paths of the subjects as a function of time?
I've read the spacetime vignette but I'm not entirely sure it will do what I want. At this point I'm spending a whole lot of time googling but not really coming up with anything that meets my needs.
Any tips and pointers greatly appreciated!
Here my first use of animation package. It was easier than I anticipated and especially the saveHTML is really amazing. Here my scenario(even I think that my R-code will be clearer:)
I generate some data
I plot a basic plot for all persons as a background plot.
I reshape data to get to a wide format in a way I can plot an arrow between present and next position for each person.
I loop over hours , to generate many plots. I put the llop within the powerful saveHTML function.
You get a html file with a nice animation. I show here one intermediate plot.
Here my code:
library(animation)
library(ggplot2)
library(grid)
## creating some data of hours
N.hour <- 24
dat <- data.frame(person=rep(paste0('p',1:3),N.hour),
lat=sample(1:10,3*N.hour,rep=TRUE),
long=sample(1:10,3*N.hour,rep=TRUE),
time=rep(1:N.hour,each=3))
# the base plot with
base <- ggplot() +
geom_point(data=dat,aes(x=lat, y=long,colour = person),
size=5)+ theme(legend.position = "none")
## reshape data to lat and long formats
library(plyr)
dat.segs <- ddply(dat,.(person),function(x){
dd <- do.call(rbind,
lapply(seq(N.hour-1),
function(y)c(y,x[x$time %in% c(y,y+1),]$lat,
x[x$time %in% c(y,y+1),]$long)))
dd
})
colnames(dat.segs) <- c('person','path','x1','x2','y1','y2')
# a function to create the animation
oopt <- ani.options(interval = 0.5)
saveHTML({
print(base)
interval = ani.options("interval")
for(hour in seq(N.hour-1)){
# a segment for each time
tn <- geom_segment(aes(x= x1, y= y1, xend = x2,
yend = y2,colour = person),
arrow = arrow(), inherit.aes = FALSE,
data =subset(dat.segs,path==hour))
print(base <- base + tn)
ani.pause()
}
}, img.name = "plots", imgdir = "plots_dir",
htmlfile = "random.html", autobrowse = FALSE,
title = "Demo of animated lat/long for different persons",
outdir=getwd())
Your question is a bit vague, but I will share how I have done this kind of animation in the past.
Create a function that plots all the subject locations for one time slice:
plot_time = function(dataset, time_id) {
# make a plot with your favorite plotting package (e.g. `ggplot2`)
# Save it as a file on disk (e.g. using `ggsave`), under a regular name,
# frame001.png, frame002.png, see sprintf('frame%03d', time_index)
}
Call this function on each of your timeslices, e.g. using lapply:
lapply(start_time_id:stop_time_id, plot_time)
leading to a set of graphics files on the hard drive called frame001 to framexxx.
Use a tool to render those frames into a movie, e.g. using ffmpeg, see for example.
This is a general workflow, which has been already implemented in the animation package (thanks for reminding me #mdsummer). You can probably leverage that package to get your animation.