File too big when using plotly::api_create and ggplot2::geom_sf functions - r

I am trying to make online plotting of spatial data using plotly in R but I got an error message "Request Entity Too Large This file is too big! Your current subscription is limited to 524 KB uploads." Any clue about how to solve that? In oder to reproduce my code, you need to (i) register on plotly and (ii) download shapefiles of French departments available on my github repo. The 3 files should be in a folder that is named shapefile. It seems to me that is the ggplot2 function geom_sf that produces files that are too large. My code is below
require(tidyverse)
require(ggplot2)
#Info required for online plotting
Sys.setenv("plotly_username"="replace_by_your_username")
Sys.setenv("plotly_api_key"="replace_by_your_apikey")
#Read shapefile
dep <- sf::st_read("replace_with_the_correctPATH/shapefile/DEPARTEMENT.shp")
#Variable to plot
zz<-runif(length(dep$CODE_DEPT),-10,3)
#ggplot2 object
gg <- dep %>%
mutate(discrete = cut(zz, c(-10, seq(-3, 3, by = 1)))) %>%
ggplot() +
geom_sf(aes(fill = discrete, text = paste("Department:", dep$CODE_DEPT, "<br>", "bli", zz))) +
scale_fill_brewer(palette = "PuOr", name = "bla")
#Plotting the figure on your local computer works
#plotly::ggplotly(gg, tooltip = c("text"))
#Generate an error message
plotly::api_create(gg, tooltip = c("text"),filename = "sthing")

This doesn't appear to be your issue but I experienced the same thing. For future readers, my issue seemed to be that the dataframe I was using (say, 100 rows) was subset from a large dataset (15,000 rows) which was larger than the file limit.
Although my subset was quite small and well within the upload limit, I had to save the subset as a csv, load it back in and use that new loaded dataframe as my plotly upload. Even though the imported dataframe was the same row count as the subset dataframe, I had to break the subset connection to the original larger dataset, don't know why.

Related

Trying to plot in tmap shapefile with attribute

I am trying to work with municipality data in Norway, and I'm totally new to QGIS, shapefiles and plotting this in R. I download the municipalities from here:
Administrative enheter kommuner / Administrative units municipalities
Reproducible files are here:
Joanna's github
I have downloaded QGIS, so I can open the GEOJson file there and convert it to a shapefile. I am able to do this, and read the data into R:
library(sf)
test=st_read("C:/municipality_shape.shp")
head(test)
I have on my own given the different municipalities different values/ranks that I call faktor, and I have stored this classification in a dataframe that I call df_new. I wish to merge this "classification" on to my "test" object above, and wish to plot the map with the classification attribute onto the map:
test33=merge(test, df_new[,c("Kommunekode_str","faktor")],
by=c("Kommunekode_str"), all.x=TRUE)
This works, but when I am to plot this with tmap,
library(tmap)
tmap_mode("view")
tm_shape(test33) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
it throws this error:
Error in object[-omit, , drop = FALSE] : incorrect number of
dimensions
If I just use base plot,
plot(test33)
I get the picture:
You see I get three plots. Does this has something to do with my error above?
I think the main issue here is that the shapes you are trying to plot are too complex so tmap is struggling to load all of this data. ggplot also fails to load the polygons.
You probably don't need so much accuracy in your polygons if you are making a choropleth map so I would suggest first simplifying your polygons. In my experience the best way to do this is using the package rmapshaper:
# keep = 0.02 will keep just 2% of the points in your polygons.
test_33_simple <- rmapshaper::ms_simplify(test33, keep = 0.02)
I can now use your code to produce the following:
tmap_mode("view")
tm_shape(test_33_simple) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
This produces an interactive map and the colour scheme is not ideal to tell differences between municipalities.
static version
Since you say in the comments that you are not sure if you want an interactive map or a static one, I will give an example with a static map and some example colour schemes.
The below uses the classInt package to set up breaks for your map. A popular break scheme is 'fisher' which uses the fisher-jenks algorithm. Make sure you research the various different options to pick one that suits your scenario:
library(ggplot2)
library(dplyr)
library(sf)
library(classInt)
breaks <- classIntervals(test_33_simple$faktor, n = 6, style = 'fisher')
#label breaks
lab_vec <- vector(length = length(breaks$brks)-1)
rounded_breaks <- round(breaks$brks,2)
lab_vec[1] <- paste0('[', rounded_breaks[1],' - ', rounded_breaks[2],']')
for(i in 2:(length(breaks$brks) - 1)){
lab_vec[i] <- paste0('(',rounded_breaks[i], ' - ', rounded_breaks[i+1], ']')
}
test_33_simple <- test_33_simple %>%
mutate(faktor_class = factor(cut(faktor, breaks$brks, include.lowest = T), labels = lab_vec))
# map
ggplot(test_33_simple) +
geom_sf(aes(fill = faktor_class), size= 0.2) +
scale_fill_viridis_d() +
theme_minimal()

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

r geom_map fails with GeoJSON map simplified with gSimplify

I'm constructing world maps with countries color-filled with the (continuous) value depending on a column in a data frame called temp.sp. I want to put several of these maps in a graph. I construct each map using ggplot with geom_map and then construct and display the graphs using multiplot() which uses grid code.
I'm using a GeoJSON map (world <- readOGR(dsn = "ne_50m_admin_0_countries.geojson", layer = "OGRGeoJSON")). The resulting SpatialPolygonsDataFrame is 4.1 Mb and the dataframe that results from worldMap <- broom::tidy(world, region = "iso_a3") has 93391 rows. So when I run multiplot with 4 plot files, it takes a long time.
I thought that I could speed up the printing by simplifying the world map with gSimplify using code like world.simp <- gSimplify(world, tol = .1, topologyPreserve = TRUE). The resulting data frame, worldMap.simp only has 27033 rows but when I use this map I get the error message Error in unit(x, default.units) : 'x' and 'units' must have length > 0.
The error message is generated when I run this code with worldMap.simp. When I use worldMap I have no problems.
gg <- ggplot(temp.sp, aes(map_id = id))
gg <- gg + geom_map(aes(fill = temp.sp$value), map = worldMap.simp, color = "white").
I tried converting temp.sp$value to factor but it made no difference.
To summarize, using a gSimplified map causes the displaying of a graph produced with ggplot and geom_map to fail.
Rather than try to figure out what was going wrong with gSimplify, I found and downloaded a lower resolution map from http://geojson.xyz. The one I'm currently using is
https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_admin_0_countries.geojson
Note that it has a similar filename, but with 110m instead of 50m.

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

How to create an animation of geospatial / temporal data

I have a set of data which contains around 150,000 observations of 800 subjects. Each observation has: subject ID, latitude, longitude, and the time that the subject was at those coordinates. The data covers a 24-hour period.
If I plot all the data at once I just get a blob. Is anyone able to give me some tips as to how I can animate this data so that I can observe the paths of the subjects as a function of time?
I've read the spacetime vignette but I'm not entirely sure it will do what I want. At this point I'm spending a whole lot of time googling but not really coming up with anything that meets my needs.
Any tips and pointers greatly appreciated!
Here my first use of animation package. It was easier than I anticipated and especially the saveHTML is really amazing. Here my scenario(even I think that my R-code will be clearer:)
I generate some data
I plot a basic plot for all persons as a background plot.
I reshape data to get to a wide format in a way I can plot an arrow between present and next position for each person.
I loop over hours , to generate many plots. I put the llop within the powerful saveHTML function.
You get a html file with a nice animation. I show here one intermediate plot.
Here my code:
library(animation)
library(ggplot2)
library(grid)
## creating some data of hours
N.hour <- 24
dat <- data.frame(person=rep(paste0('p',1:3),N.hour),
lat=sample(1:10,3*N.hour,rep=TRUE),
long=sample(1:10,3*N.hour,rep=TRUE),
time=rep(1:N.hour,each=3))
# the base plot with
base <- ggplot() +
geom_point(data=dat,aes(x=lat, y=long,colour = person),
size=5)+ theme(legend.position = "none")
## reshape data to lat and long formats
library(plyr)
dat.segs <- ddply(dat,.(person),function(x){
dd <- do.call(rbind,
lapply(seq(N.hour-1),
function(y)c(y,x[x$time %in% c(y,y+1),]$lat,
x[x$time %in% c(y,y+1),]$long)))
dd
})
colnames(dat.segs) <- c('person','path','x1','x2','y1','y2')
# a function to create the animation
oopt <- ani.options(interval = 0.5)
saveHTML({
print(base)
interval = ani.options("interval")
for(hour in seq(N.hour-1)){
# a segment for each time
tn <- geom_segment(aes(x= x1, y= y1, xend = x2,
yend = y2,colour = person),
arrow = arrow(), inherit.aes = FALSE,
data =subset(dat.segs,path==hour))
print(base <- base + tn)
ani.pause()
}
}, img.name = "plots", imgdir = "plots_dir",
htmlfile = "random.html", autobrowse = FALSE,
title = "Demo of animated lat/long for different persons",
outdir=getwd())
Your question is a bit vague, but I will share how I have done this kind of animation in the past.
Create a function that plots all the subject locations for one time slice:
plot_time = function(dataset, time_id) {
# make a plot with your favorite plotting package (e.g. `ggplot2`)
# Save it as a file on disk (e.g. using `ggsave`), under a regular name,
# frame001.png, frame002.png, see sprintf('frame%03d', time_index)
}
Call this function on each of your timeslices, e.g. using lapply:
lapply(start_time_id:stop_time_id, plot_time)
leading to a set of graphics files on the hard drive called frame001 to framexxx.
Use a tool to render those frames into a movie, e.g. using ffmpeg, see for example.
This is a general workflow, which has been already implemented in the animation package (thanks for reminding me #mdsummer). You can probably leverage that package to get your animation.

Resources