Displaying counts instead of "levels" using stat_density2d - r

My objective is to portray the locations with varying numbers of traffic conflicts in a road intersection. My data consists of all the conflicts that we observed in a given time period at an intersection coded into a .CSV file with the following fields "time of conflict", "TTC" (means Time to Collision), "Lat", "Lon" and "Conflict Type". I figured the best way to do so would be using the 'ggmap+stat_density2d' function in R. I am using the following code:
df = read.csv(filename, header = TRUE)
int.map = get_map(location = c(mean.long, mean.lat), zoom = 20, maptype = "satellite")
int.map = ggmap(int.map, extent ="device", legend = "right")'''
int.map +stat_density2d(data = new_xdf, aes(x, y, fill = ..levels.., alpha = ..levels..),
geom = "polygon")
int.map + scale_fill_gradientn(guide = "colourbar", colours = rev(brewer.pal(7,"Spectral")),
name = "Conflict Density")
The output is a very nice map Safety Heat Map that correctly portrays the conflict hotspots. My problem is that in the legends it gives the values of "levels" automatically calculated by the 'stat_density2d()' function. I tried searching for a way to display, say, the counts of all conflict points inside each level on the legend bar but to no avail.
I did find the below link that handles a similar question, but the problem with that is that it creates a new data frame (new_xdf) with much more points than in the original data. Thus, the counts determined in that program seems to be of no use to me as I want the exact number of conflict points in my original data to be displayed in the legends bar.
How to find points within contours in R?
Thanks in advance.
Edit: Link to a sample data file
https://docs.google.com/spreadsheets/d/11vc3lOhzQ-tgEiAXe-MNw2v3fsAqnadweVrvBdNyNuo/edit?usp=sharing

Related

Formatting changes affect only legend and not bar graph using swimplot and ggplot2 packages

Update- this issue was solved, updated code is at the end of the post.
I am trying to create a swimmer plot to visualize individual patient duration of treatment with a drug administered at multiple dose levels (DLs). Each patient will be be assigned to treatment with only one DL, but multiple patients can be assigned to a given DL (e.g. 3 patients at DL1, 3 patients and DL2, etc.). I would like to color code the bars in the swimmer plot according to DL.
I am using the swimplot package for R and have been following the guide located here (https://cran.r-project.org/web/packages/swimplot/vignettes/Introduction.to.swimplot.html).
This guide has been sufficient for most things I have tried, up until I tried to change the colors of the bars in the plot and corresponding legend. Following the section in that guide titled "Modifying Colours and shapes" under "Making the plots more aesthetically pleasing with ggplot manipulations", I was able to change the bar colors in the legend, but not the bars themselves.
Example here
I have been using the following code.
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
arm_plot
I have tried a number of things to fix this, but am quite new to R and don't think I really know enough to troubleshoot effectively. I have tried various syntax changes (e.g. removing quotation marks) and have tried using the geom bar command but wasn't sure how/what to map to X and Y (it also seems like I shouldn't need to do this).
I have also tried using the following code, but get an error.
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85, fill = Colors)+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (20): fill
Run `rlang::last_error()` to see where the error occurred.
Any help here would be greatly appreciated.
Solved! Updated, working code
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt', name_fill = "Arm", width=.85) + scale_fill_manual(name="Arm",values = Colors) +
scale_color_manual(name="Arm",values=Colors)
To make your code work you first have to map a variable on the fill aesthetic which using swimplot could be achieved via the name_fill argument:
Note: As I use the ClinicalTrial.Arm dataset from the swimplot package I adjusted your color palette to make it work with the three categories of the Arm column in this dataset.
library(ggplot2)
library(swimplot)
#pal <- c("DL1" = "#003f5c", "DL2" = "#374c80", "DL3" = "#7a5195", "DL4" = "#bc5090", "DL5" = "#ef5675", "DL6" = "#ff764a", "DL7" = "#ffa600")
pal <- c("Arm A" = "#003f5c", "Arm B" = "#bc5090", "Off Treatment" = "#ffa600")
swimmer_plot(df = ClinicalTrial.Arm, id = "id", end = "End_trt", name_fill = "Arm", width = .85) +
scale_fill_manual(name = "Arm", values = pal)

How to fix incorrect automatic labels on a map (tmap)

I have a point shapefile with cities (CitiesPoints), and a dataframe that assigns a number of libraries to some of those cities (df; the data is fictitious). I also have a polygon shapefile for the background.
I joined those files to create a map in which a point is generated for each city that has libraries, and the size of the point is determined by the number of libraries it has.
df$CityCode <- as.factor(df$CityCode)
Joint <- CitiesPoints %>%
left_join(df, by=c("link"="CityCode"))
tmap_mode("view")
tm_shape(Background) +
tm_borders() +
tm_shape(Joint) + tm_symbols(id = "localidad",
size = "BIBLIO",
col = "brown1")
However, when I hover over those points with the mouse, the city name shown is incorrect.
Apparently, the top rows in the shape file (including those with no libraries, NA) are the ones being used to assign the labels.
Example
The correct label for this point should be “Rafaela”.
You can download the files I used here: Files
I would really appreciate the help!
I found a way to fix it. I created a new shapefile only containing the rows corresponding to the citys which have libraries.
Joint$BIBLIO[is.na(Joint$BIBLIO)] <- 0
JOINT2 = filter(Joint,BIBLIO>0)
Using this new shapefile, the automatic labels shown are now correct.

Map plot incomplete when using rworldmap

I have some data that I'd like to plot with rworldmap. Normally this works well. But I can't figure out why it's not plotting all the data when it says it's going to. Particularly it's not plotting data for the US.
I've got some data here: https://drive.google.com/file/d/1Fp7O2TRH5Blar56SqdRdcPh8Mb1Vb0pc/view?usp=sharing
And I'm running this code:
mergedData = readRDS("sampleData.rds")
changeHeatMapPalette = c('#D7191D', '#FDAE61', '#FFFFBF', '#ABD9E9', '#2C7BB6')
mapData = joinCountryData2Map(mergedData, joinCode="ISO2", nameJoinColumn="country", mapResolution = "high")
mapCountryData(mapData, nameColumnToPlot="change", mapTitle="", catMethod = "diverging", colourPalette = changeHeatMapPalette, numCats = 90, borderCol = "grey70")
But then I'm getting this map:
Notice how the US has no data. But it's definitely in the sample data. And it's only excluding one country, which is not the US.
108 codes from your data successfully matched countries in the map
1 codes from your data failed to match with a country code in the map
failedCodes
[1,] "GF"
143 codes from the map weren't represented in your data
Any idea what I'm doing wrong?
The problem is that you set the colourPalette and numCats parameters in a quite random fashion.
From your data we know exactly how many categories we have, and it can be counted with: length(table(mapData$change) and you need exactly that many colors (if you provide less then mapCountData will interpolate them with a warning).
Having said that, one solution of your problem is this
mapCountryData(mapData,
nameColumnToPlot="change",
mapTitle="",
catMethod = "diverging",
colourPalette = brewer.pal(library(RColorBrewer), 'RdYlBu'),
numCats = length(table(mapData$change)),
borderCol = "grey70")

spplot/lattice: objects not drawn/overdrawn

I have a grid and I want to produce a map out of this grid with some map elements (scale, north arrow, etc). I have no problem drawing the grid and the coloring I need, but the additional map elements won't show on the map. I tried putting first=TRUE to the sp.layout argument according to the sp manual, but still no success.
I reproduced the issue with the integrated meuse dataset, so you may just copy&paste that code. I use those package versions: lattice_0.20-33 and sp_1.2-0
library(sp)
library(lattice) # required for trellis.par.set():
trellis.par.set(sp.theme()) # sets color ramp to bpy.colors()
alphaChannelSupported = function() {
!is.na(match(names(dev.cur()), c("pdf")))
}
data(meuse)
coordinates(meuse)=~x+y
data(meuse.riv)
library(gstat, pos = match(paste("package", "sp", sep=":"), search()) + 1)
data(meuse.grid)
coordinates(meuse.grid) = ~x+y
gridded(meuse.grid) = TRUE
v.uk = variogram(log(zinc)~sqrt(dist), meuse)
uk.model = fit.variogram(v.uk, vgm(1, "Exp", 300, 1))
meuse[["ff"]] = factor(meuse[["ffreq"]])
meuse.grid[["ff"]] = factor(meuse.grid[["ffreq"]])
zn.uk = krige(log(zinc)~sqrt(dist), meuse, meuse.grid, model = uk.model)
zn.uk[["se"]] = sqrt(zn.uk[["var1.var"]])
meuse.sr = SpatialPolygons(list(Polygons(list(Polygon(meuse.riv)),"meuse.riv")))
rv = list("sp.polygons", meuse.sr, fill = "lightblue")
sampling = list("sp.points", meuse.riv, color = "black")
scale = list("SpatialPolygonsRescale", layout.scale.bar(),
offset = c(180500,329800), scale = 500, fill=c("transparent","black"), which = 4)
text1 = list("sp.text", c(180500,329900), "0", cex = .5, which = 4)
text2 = list("sp.text", c(181000,329900), "500 m", cex = .5, which = 4)
arrow = list("SpatialPolygonsRescale", layout.north.arrow(),
offset = c(181300,329800),
scale = 400, which = 4)
library(RColorBrewer)
library(lattice)
trellis.par.set(sp.theme())
precip.pal <- colorRampPalette(brewer.pal(7, name="Blues"))
spplot(zn.uk, "var1.pred",
sp.layout = list(rv, sampling, scale, text1, text2),
main = "log(zinc); universal kriging standard errors",
col.regions=precip.pal,
contour=TRUE,
col='black',
pretty=TRUE,
scales=list(draw = TRUE),
labels=TRUE)
And that's how it looks...all naked:
So my questions:
Where is the scale bar, north arrow, etc hiding? Did I miss something? Every example I could find on the internet looks similar to that. On my own dataset I can see the scale bar and north arrow being drawn at first, but as soon as the grid is rendered, it superimposes the additional map elements (except for the scale text, that is shown on the map - not the bar and north arrow for some reason I don't seem to comprehend).
The error message appearing on the map just shows when I try to add the sampling locations sampling = list("sp.points", meuse.riv, color = "black"). Without this entry, the map shows without error, but also without additional map elements. How can I show the sampling points on the map (e.g. in circles whose size depends on the absolute value of this sampling point)?
This bothered me for many, many hours by now and I can't find any solution to this. In Bivand et al's textbook (2013) "Applied Spatial Data Analysis with R" I could read the following entry:
The order of items in the sp.layout argument matters; in principle objects
are drawn in the order they appear. By default, when the object of spplot has
points or lines, sp.layout items are drawn before the points to allow grids
and polygons drawn as a background. For grids and polygons, sp.layout
items are drawn afterwards (so the item will not be overdrawn by the grid
and/or polygon). For grids, adding a list element first = TRUE ensures that
the item is drawn before the grid is drawn (e.g. when filled polygons are added). Transparency may help when combining layers; it is available for the
PDF device and several other devices.
Function sp.theme returns a lattice theme that can be useful for plots
made by spplot; use trellis.par.set(sp.theme()) after a device is opened
or changed to make this effective.
However, also with this additional information I wasn't able to solve this problem. Glad for any hint!
The elements you miss are being drawn in panel four, which does not exist, so are not being drawn. Try removing the which = 4.
meuse.riv in your example is a matrix, which causes the error message, but should be a SpatialPoints object, so create sampling by:
sampling = list("sp.points", SpatialPoints(meuse.riv), color = "black")
When working from examples, my advice is to choose examples as close as possible to what you need, and only change one thing at a time.

ggplot2 equivalent of 'factorization or categorization' in googleVis in R

Due to static graph prepared by ggplot, we are shifting our graphs to googleVis with interactive charts. But when it comes to categorization we are facing many problems. Let me give example which will help you understand:
#dataframe
df = data.frame( x = sample(1:100), y = sample(1:100), cat = sample(c('a','b','c'), 100, replace=TRUE) )
ggplot2 provides parameter like alpha, colour, linetype, size which we can use with categories like shown below:
ggplot(df) + geom_line(aes(x = x, y = y, colour = cat))
Not just line chart, but majority of ggplot2 graphs provide categorization based on column values. Now I would like to do the same in googleVis, based on value df$cat I would like parameters to get changed or grouping of line or charts.
Note:
I have already tried dcast to make multiple columns based on category column and use those multiple columns as Y input, but that it not what I would like to do.
Can anyone help me regarding this?
Let me know if you need more information.
vrajs5 you are not alone! We struggled with this issue. In our case we wanted to fill bar charts like in ggplot. This is the solution. You need to add specifically named columns, linked to your variables, to your data table for googleVis to pick up.
In my fill example, these are called roles, but once you see my syntax you can abstract it to annotations and other cool features. Google has them all documented here (check out superheroes example!) but it was not obvious how it applied to r.
#mages has this documented on this webpage, which shows features not in demo(googleVis):
http://cran.r-project.org/web/packages/googleVis/vignettes/Using_Roles_via_googleVis.html
EXAMPLE ADDING NEW DIMENSIONS TO GOOGLEVIS CHARTS
# in this case
# How do we fill a bar chart showing bars depend on another variable?
# We wanted to show C in a different fill to other assets
suppressPackageStartupMessages(library(googleVis))
library(data.table) # You can use data frames if you don't like DT
test.dt = data.table(px = c("A","B","C"), py = c(1,4,9),
"py.style" = c('silver', 'silver', 'gold'))
# Add your modifier to your chart as a new variable e.g. py1.style
test <-gvisBarChart(test.dt,
xvar = "px",
yvar = c("py", "py.style"),
options = list(legend = 'none'))
plot(test)
We have shown py.style deterministically here, but you could code it to be dependent on your categories.
The secret is myvar.googleVis_thing_youneed linking the variable myvar to the googleVis feature.
RESULT BEFORE FILL (yvar = "py")
RESULT AFTER FILL (yvar = c("py", "py.style"))
Take a look at mages examples (code also on Github) and you will have cracked the "categorization based on column values" issue.

Resources