I only recently learned of tibble::lst, which creates a list object but automatically names list items. I'm using this as a shortcut within a %>% workflow that makes use of the names as the .id argument in map_dfr, so the automatic naming is really helpful.
However, the names are coming in with quotation marks around them. I noticed this because they awkwardly printed in axis tick labels in a ggplot, i.e. I had a label saying "Hartford" instead of Hartford.
I looked through issues on the tidyverse/tibble github but didn't find anything. Is this a bug, or am I doing something wrong?
library(dplyr)
library(purrr)
cities <- lst("New Haven", "Bridgeport", "Hartford")
cities
#> $`"New Haven"`
#> [1] "New Haven"
#>
#> $`"Bridgeport"`
#> [1] "Bridgeport"
#>
#> $`"Hartford"`
#> [1] "Hartford"
cities %>%
map_dfr(~tibble(dummy = rnorm(1)), .id = "city")
#> # A tibble: 3 x 2
#> city dummy
#> <chr> <dbl>
#> 1 "\"New Haven\"" -0.956
#> 2 "\"Bridgeport\"" 0.533
#> 3 "\"Hartford\"" -0.0553
At first I thought it might be to escape the space in "New Haven", but it happens with single characters as well:
lst("a", "b", "c")
#> $`"a"`
#> [1] "a"
#>
#> $`"b"`
#> [1] "b"
#>
#> $`"c"`
#> [1] "c"
It works as I expect when I provide names, but that defeats this advantage that lst has over the base list.
lst(a = "a", b = "b", c = "c")
#> $a
#> [1] "a"
#>
#> $b
#> [1] "b"
#>
#> $c
#> [1] "c"
Pretty sure I'm up to date on tidyverse-related packages, but here's my session info just in case:
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.2.5 dplyr_0.7.6
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.18 knitr_1.20 bindr_0.1.1 magrittr_1.5
#> [5] tidyselect_0.2.4 R6_2.2.2 rlang_0.2.2 fansi_0.3.0
#> [9] stringr_1.3.1 tools_3.5.1 utf8_1.1.4 cli_1.0.0
#> [13] htmltools_0.3.6 yaml_2.2.0 assertthat_0.2.0 rprojroot_1.3-2
#> [17] digest_0.6.16 tibble_1.4.2 crayon_1.3.4 bindrcpp_0.2.2
#> [21] glue_1.3.0 evaluate_0.11 rmarkdown_1.10 stringi_1.2.4
#> [25] compiler_3.5.1 pillar_1.3.0 backports_1.1.2 pkgconfig_2.0.2
lst() is really meant to be used with variables. Such as
xa<-"a"
xb<-"b"
xc<-"c"
lst(xa,xb,xc)
# $`xa`
# [1] "a"
# $xb
# [1] "b"
# $xc
# [1] "c"
It doesn't play well with literal, unnamed values. It takes the name of the element from the unevaluated expression you pass in. So if you pass in a character value, that evaluated expression still has the quotes. I think you just want list() here. Possibly with names:
cities <- list("New Haven", "Bridgeport", "Hartford")
names(cities)<-unname(cities)
cities
# $`New Haven`
# [1] "New Haven"
# $Bridgeport
# [1] "Bridgeport"
# $Hartford
# [1] "Hartford"
or just write your own function
nlist <- function(...) {
setNames(list(...), c(...))
}
cities <- nlist("New Haven", "Bridgeport", "Hartford")
Related
Please have a look at the reprex at the end of the post.
I need to read a column as a string, perform several manipulations and then save convert it to a numerical column.
The blanks ("") in the string column give me a headache because arrow does not convert them to numerical missing values NA.
Does anybody know how to achieve that?
Many thanks
library(tidyverse)
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
df <- tibble(x=rep(c("4000 -", "6000 -", "", "8000 - "), 10),
y=seq(1,10, length=40))
write_csv(df, "test_string.csv")
data <- open_dataset("test_string.csv",
format="csv",
skip=1,
schema=schema(x=string(), y=double()))
data2 <- data |>
mutate(x= sub(" -.*", "", x) ) |>
mutate(x2=as.numeric(x)) |>
collect() ## how to convert the blank to a numeric NA ?
#> Error in `collect()`:
#> ! Invalid: Failed to parse string: '' as a scalar of type double
#> Backtrace:
#> ▆
#> 1. ├─dplyr::collect(mutate(mutate(data, x = sub(" -.*", "", x)), x2 = as.numeric(x)))
#> 2. └─arrow:::collect.arrow_dplyr_query(mutate(mutate(data, x = sub(" -.*", "", x)), x2 = as.numeric(x)))
#> 3. └─base::tryCatch(...)
#> 4. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 5. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 6. └─value[[3L]](cond)
#> 7. └─arrow:::augment_io_error_msg(e, call, schema = x$.data$schema)
#> 8. └─rlang::abort(msg, call = call)
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] arrow_10.0.0 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10
#> [5] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8
#> [9] ggplot2_3.4.0 tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] lubridate_1.9.0 assertthat_0.2.1 digest_0.6.30
#> [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
#> [7] backports_1.4.1 reprex_2.0.2 evaluate_0.17
#> [10] httr_1.4.4 highr_0.9 pillar_1.8.1
#> [13] rlang_1.0.6 googlesheets4_1.0.1 readxl_1.4.1
#> [16] R.utils_2.12.1 R.oo_1.25.0 rmarkdown_2.17
#> [19] styler_1.8.0 googledrive_2.0.0 bit_4.0.4
#> [22] munsell_0.5.0 broom_1.0.1 compiler_4.2.2
#> [25] modelr_0.1.9 xfun_0.34 pkgconfig_2.0.3
#> [28] htmltools_0.5.3 tidyselect_1.2.0 fansi_1.0.3
#> [31] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1
#> [34] withr_2.5.0 R.methodsS3_1.8.2 grid_4.2.2
#> [37] jsonlite_1.8.3 gtable_0.3.1 lifecycle_1.0.3
#> [40] DBI_1.1.3 magrittr_2.0.3 scales_1.2.1
#> [43] vroom_1.6.0 cli_3.4.1 stringi_1.7.8
#> [46] fs_1.5.2 xml2_1.3.3 ellipsis_0.3.2
#> [49] generics_0.1.3 vctrs_0.5.0 tools_4.2.2
#> [52] bit64_4.0.5 R.cache_0.16.0 glue_1.6.2
#> [55] hms_1.1.2 parallel_4.2.2 fastmap_1.1.0
#> [58] yaml_2.3.6 timechange_0.1.1 colorspace_2.0-3
#> [61] gargle_1.2.1 rvest_1.0.3 knitr_1.40
#> [64] haven_2.5.1
Created on 2022-11-07 with reprex v2.0.2
ifelse works here when all classes are correct (and not double()); if_else enforces this already, so we can use either.
data |>
mutate(x = sub(" -.*", "", x)) |>
mutate(
x = ifelse(x == "", NA_character_, x), # also if_else works
x2 = as.numeric(x)
) |>
collect()
# # A tibble: 40 x 3
# x y x2
# <chr> <dbl> <dbl>
# 1 4000 1 4000
# 2 6000 1.23 6000
# 3 NA 1.46 NA
# 4 8000 1.69 8000
# 5 4000 1.92 4000
# 6 6000 2.15 6000
# 7 NA 2.38 NA
# 8 8000 2.62 8000
# 9 4000 2.85 4000
# 10 6000 3.08 6000
# # ... with 30 more rows
Try using the read_csv instead of open_dataset
library(readr)
data <- read_csv("test_string.csv")
This has been asked several times in different forms, but here is a very simple example. I have used this function several times in the past, but apparently it fails miserably in the simplest case.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
bind_df_list <- function(ll){
res <- map_df(ll, bind_rows)
return(res)
}
ll<-list(seq(4), seq(4), seq(4))
dd<-bind_df_list(ll)
#> Error: Argument 1 must have names.
print(sessionInfo())
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 10 (buster)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.3.4 dplyr_1.0.4
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.31 magrittr_2.0.1 tidyselect_1.1.0 R6_2.5.0
#> [5] rlang_0.4.10 stringr_1.4.0 styler_1.3.2 highr_0.8
#> [9] tools_4.0.4 xfun_0.21 DBI_1.1.1 htmltools_0.5.1.1
#> [13] ellipsis_0.3.1 assertthat_0.2.1 yaml_2.2.1 digest_0.6.27
#> [17] tibble_3.0.6 lifecycle_1.0.0 crayon_1.4.1 vctrs_0.3.6
#> [21] fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.6
#> [25] reprex_1.0.0 stringi_1.5.3 compiler_4.0.4 pillar_1.4.7
#> [29] generics_0.1.0 backports_1.2.1 pkgconfig_2.0.3
Created on 2021-03-08 by the reprex package (v1.0.0)
Any idea about how to fix this?
Many thanks!
it is just a vector and ?bind_rows input should be a set of dataframes
... - Data frames to combine.
One option is to loop over the list, with map, convert to data.frame and append them by rows with _dfr
library(purrr)
map_dfr(ll, as.data.frame.list)
Or transpose and convert to data.frame
map_dfr(ll, ~ as.data.frame(t(.x)))
Or this can also be a named vector
map_dfr(ll, ~ setNames(.x, seq_along(.x)))
Or use rbind with do.call from base R as rbind have methods for matrix and data.frame
do.call(rbind, ll)
do.call(rbind.data.frame, ll)
I have a shapefile of polygons that I want to use to extract raster values into a data frame. So I do that in the following code.
shp <- sf:st_read('example.shp')
r <- raster::raster('example.tif')
extract <- raster::extract(r, shp, df=TRUE)
This gives me a data frame of two columns: the numeric ID for each polygon and the associated extracted raster value. Now I would like to add the x, y coordinates for each extracted raster value. I have seen this done for point shapefiles but I am not sure how to apply it for polygon shapefile geometry.
Try this, I combined raster::extract(..., cellnumbers = TRUE) and raster::xyFromCell:
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)
#> Loading required package: sp
library(giscoR)
# Dummy data
shp <-
gisco_get_nuts(country = "Netherlands",
nuts_level = 3,
epsg = 4326)[, 1]
r <- raster::getData("alt", country = "Netherlands")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on WGS84 ellipsoid in Proj4
#> definition
extract <- raster::extract(r, shp, df = TRUE, cellnumbers = TRUE)
# Order (for checking purposes)
extract <- extract[order(extract$cell),]
# Extract coordinates
xy <- xyFromCell(r, cell = extract$cell, spatial = FALSE)
# Convert to df and add cellnumber
xy <- as.data.frame(xy)
xy$cell <- extract$cell
# Merge two data frames
extract_end <- merge(extract, xy)
extract_end <- extract_end[order(extract_end$cell),]
# This is what you are looking for: extract_end is a data frame
# has values and x and y are coordinates
head(extract_end)
#> cell ID NLD_msk_alt x y
#> 1 2319 3 NA 6.620833 53.56250
#> 2 2796 3 NA 6.595833 53.55417
#> 3 2797 3 NA 6.604167 53.55417
#> 4 2798 3 NA 6.612500 53.55417
#> 5 2799 3 NA 6.620833 53.55417
#> 6 2800 3 NA 6.629167 53.55417
# Checks - can be ommited
nrow(extract) == nrow(extract_end)
#> [1] TRUE
# Check NAs in coordinates
nrow(extract_end[is.na(extract_end$x), ])
#> [1] 0
nrow(extract_end[is.na(extract_end$y), ])
#> [1] 0
# Convert to sf for checks
sfobj <- st_as_sf(extract_end, coords = c("x", "y"))
sfobj <- st_set_crs(sfobj, st_crs(4326))
# Plot as sf points
par(mar = c(3, 3, 3, 3))
plot(
sfobj[, 3],
axes = TRUE,
main = "sf points",
key.pos = 4,
breaks = "equal",
nbreaks = 100,
pal = rev(terrain.colors(100))
)
# Compare with raster plot
par(mar = c(3, 3, 3, 3))
plot(r, main = "Raster")
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] giscoR_0.2.4-9000 raster_3.4-5 sp_1.4-5 sf_0.9-7
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.6 pillar_1.4.7 compiler_4.0.3 highr_0.8
#> [5] class_7.3-18 tools_4.0.3 digest_0.6.27 evaluate_0.14
#> [9] lifecycle_1.0.0 tibble_3.0.6 lattice_0.20-41 pkgconfig_2.0.3
#> [13] rlang_0.4.10 reprex_1.0.0 DBI_1.1.1 rgdal_1.5-23
#> [17] yaml_2.2.1 xfun_0.21 e1071_1.7-4 styler_1.3.2
#> [21] stringr_1.4.0 dplyr_1.0.4 knitr_1.31 generics_0.1.0
#> [25] fs_1.5.0 vctrs_0.3.6 classInt_0.4-3 grid_4.0.3
#> [29] tidyselect_1.1.0 glue_1.4.2 R6_2.5.0 rmarkdown_2.6
#> [33] purrr_0.3.4 magrittr_2.0.1 codetools_0.2-18 backports_1.2.1
#> [37] ellipsis_0.3.1 htmltools_0.5.1.1 units_0.6-7 assertthat_0.2.1
#> [41] countrycode_1.2.0 KernSmooth_2.23-18 stringi_1.5.3 crayon_1.4.1
Created on 2021-02-19 by the reprex package (v1.0.0)
For some reason stat_density_2d() does not seem to be working right for me when specifying geom = "polygon" and I am absolutely stumped. Here is my code...
library(sf)
library(tidyverse)
library(RANN2)
library(hexbin)
library(mapproj)
options(stringsAsFactors = FALSE)
raleigh_police <- rgdal::readOGR("https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.geojson", "OGRGeoJSON")
raleigh_police_sf <- raleigh_police %>%
st_as_sf()
raleigh_police_sf %>%
filter(crime_description == "Burglary/Residential") %>%
st_coordinates() %>%
as_tibble() %>%
ggplot() +
stat_density_2d(aes(X, Y, fill = stat(level)), geom = "polygon")
Here is my sessionInfo()...
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 mapproj_1.2.6 maps_3.3.0 hexbin_1.27.2 RANN2_0.1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[9] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1 sf_0.7-1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-38 class_7.3-14 utf8_1.1.4 assertthat_0.2.0 rprojroot_1.3-2
[8] digest_0.6.18 R6_2.3.0 cellranger_1.1.0 plyr_1.8.4 backports_1.1.2 evaluate_0.12 e1071_1.7-0
[15] httr_1.3.1 blogdown_0.9 pillar_1.3.0 rlang_0.3.0.1 lazyeval_0.2.1 readxl_1.1.0 rstudioapi_0.8
[22] rmarkdown_1.10 labeling_0.3 rgdal_1.3-6 munsell_0.5.0 broom_0.5.0 compiler_3.5.1 modelr_0.1.2
[29] xfun_0.4 pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.5 bookdown_0.7 codetools_0.2-15 fansi_0.4.0
[36] crayon_1.3.4 withr_2.1.2 MASS_7.3-51.1 grid_3.5.1 nlme_3.1-137 spData_0.2.9.4 jsonlite_1.5
[43] gtable_0.2.0 DBI_1.0.0 magrittr_1.5 units_0.6-1 scales_1.0.0 cli_1.0.1 stringi_1.2.4
[50] sp_1.3-1 xml2_1.2.0 tools_3.5.1 glue_1.3.0 hms_0.4.2 yaml_2.2.0 colorspace_1.3-2
[57] classInt_0.2-3 rvest_0.3.2 knitr_1.20 bindr_0.1.1 haven_1.1.2
I just get a blank plot with nothing on it. Completely stumped. What am I doing wrong here?
Update (2018-11-18)
It turns out that the primary issue here was options(stringsAsFactors = FALSE). If you comment that out and run the original code, everything actually works fine. I found this GitHub Issue which was the reason I tried that. Much more efficient code solutions are provided in the answers to this question and they also made sure not to use options(stringsAsFactors = FALSE).
Aside from downloading the file before reading and changing readOGR to read_sf, it works as-is for me save warnings for a couple NA points caused by empty geometries:
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
path <- "~/Downloads/raleigh.geojson"
download.file(
"https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.geojson",
path,
method = "curl"
)
raleigh_police <- sf::read_sf(path, "OGRGeoJSON")
raleigh_police %>%
filter(crime_description == "Burglary/Residential") %>%
st_coordinates() %>%
as_tibble() %>%
ggplot() +
stat_density_2d(aes(X, Y, fill = stat(level)), geom = "polygon")
#> Warning: Removed 5 rows containing non-finite values (stat_density2d).
The empty rows:
raleigh_police %>%
filter(crime_description == "Burglary/Residential",
st_is_empty(.))
#> Simple feature collection with 5 features and 21 fields (with 5 geometries empty)
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> # A tibble: 5 x 22
#> OBJECTID GlobalID case_number crime_category crime_code crime_descripti…
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 205318 8057315… P14076062 BURGLARY/RESI… 30B Burglary/Reside…
#> 2 417488 70afb27… P15027702 BURGLARY/RESI… 30B Burglary/Reside…
#> 3 424718 bdf69fa… P18029113 BURGLARY/RESI… 30B Burglary/Reside…
#> 4 436550 711c05b… P18044139 BURGLARY/RESI… 30B Burglary/Reside…
#> 5 442091 9d7a008… P18051764 BURGLARY/RESI… 30B Burglary/Reside…
#> # … with 16 more variables: crime_type <chr>, reported_block_address <chr>,
#> # city_of_incident <chr>, city <chr>, district <chr>, reported_date <dttm>,
#> # reported_year <int>, reported_month <int>, reported_day <int>,
#> # reported_hour <int>, reported_dayofwk <chr>, latitude <dbl>,
#> # longitude <dbl>, agency <chr>, updated_date <dttm>, geometry <POINT [°]>
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS 10.14.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_0.7-1 forcats_0.3.0 stringr_1.3.1
#> [4] dplyr_0.7.99.9000 purrr_0.2.5 readr_1.2.0
#> [7] tidyr_0.8.2 tibble_1.4.99.9005 ggplot2_3.1.0
#> [10] tidyverse_1.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_0.2.5 haven_1.1.2 lattice_0.20-35
#> [4] colorspace_1.3-2 htmltools_0.3.6 yaml_2.2.0
#> [7] rlang_0.3.0.1 e1071_1.7-0 pillar_1.3.0.9001
#> [10] glue_1.3.0 withr_2.1.2 DBI_1.0.0
#> [13] modelr_0.1.2 readxl_1.1.0 plyr_1.8.4
#> [16] munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0
#> [19] rvest_0.3.2 evaluate_0.12 labeling_0.3
#> [22] knitr_1.20 class_7.3-14 broom_0.5.0
#> [25] Rcpp_0.12.19.3 scales_1.0.0 backports_1.1.2
#> [28] classInt_0.2-3 jsonlite_1.5 hms_0.4.2.9001
#> [31] digest_0.6.18 stringi_1.2.4 grid_3.5.1
#> [34] rprojroot_1.3-2 cli_1.0.1 tools_3.5.1
#> [37] magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4
#> [40] pkgconfig_2.0.2 MASS_7.3-51 xml2_1.2.0
#> [43] spData_0.2.9.4 lubridate_1.7.4 assertthat_0.2.0
#> [46] rmarkdown_1.10 httr_1.3.1 R6_2.3.0
#> [49] units_0.6-1 nlme_3.1-137 compiler_3.5.1
As noted, this is just point-level data and the CSV they provide can be a fine substitute:
library(tidyverse)
rp_csv_url <- "https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.csv"
httr::GET(
url = rp_csv_url,
httr::write_disk(basename(rp_csv_url)), # won't overwrite if it exists unless explicitly told to so you get caching for free
httr::progress() # I suspect this is a big file so it's nice to see a progress bar
)
raleigh_police <- read_csv(basename(rp_csv_url))
mutate(
raleigh_police,
longitude = as.numeric(longitude), # they come in wonky, still
latitude = as.numeric(latitude) # they come in wonky, still
) -> raleigh_police
raleigh_police %>%
filter(crime_description == "Burglary/Residential") %>%
ggplot() +
stat_density_2d(
aes(longitude, latitude, fill = stat(level)),
color = "#2b2b2b", size=0.125, geom = "polygon"
) +
viridis::scale_fill_viridis(direction=-1, option="magma") +
hrbrthemes::theme_ipsum_rc()
If you'd like to turn level into something more meaningful:
h <- c(MASS::bandwidth.nrd(rp_br$longitude),
MASS::bandwidth.nrd(rp_br$latitude))
dens <- MASS::kde2d(
rp_br$longitude, rp_br$latitude, h = h, n = 100
)
breaks <- pretty(range(dens$z), 10)
zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
z <- tapply(zdf$z, zdf[c("x", "y")], identity)
cl <- grDevices::contourLines(
x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
levels = breaks
)
sp::SpatialPolygons(
lapply(1:length(cl), function(idx) {
sp::Polygons(
srl = list(sp::Polygon(
matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
)),
ID = idx
)
})
) -> cont
sp::coordinates(rp_br) <- ~longitude+latitude
then:
data_frame(
ct = sapply(sp::over(cont, sp::geometry(rp_br), returnList = TRUE), length),
id = 1:length(ct),
lvl = sapply(cl, function(x) x$level)
) %>%
count(lvl, wt=ct) %>%
mutate(
pct = n/nrow(rp_br),
pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
)
## # A tibble: 10 x 4
## lvl n pct pct_lab
## <dbl> <int> <dbl> <chr>
## 1 10. 7302 0.927 92.7% of the points fall within this level
## 2 20. 6243 0.792 79.2% of the points fall within this level
## 3 30. 4786 0.607 60.7% of the points fall within this level
## 4 40. 3204 0.407 40.7% of the points fall within this level
## 5 50. 1945 0.247 24.7% of the points fall within this level
## 6 60. 1277 0.162 16.2% of the points fall within this level
## 7 70. 793 0.101 10.1% of the points fall within this level
## 8 80. 474 0.0601 6.0% of the points fall within this level
## 9 90. 279 0.0354 3.5% of the points fall within this level
## 10 100. 44 0.00558 0.6% of the points fall within this level
I'd like to use dplyr::mutate to add p-values to a dataframe but it's not working and I can't get my head around why.
This works:
my_add<-function(x, y) x + y
str(my_add(5, 15))
#> num 20
df <- data.frame(success=c(5,8,4), fail=c(15,13,18))
mutate(df, total=my_add(success, fail))
#> success fail total
#> 1 5 15 20
#> 2 8 13 21
#>13 4 18 22
But this doesn't:
my_binom <- function(x, y) binom.test(x, y)$"p.value"
str(my_binom(5, 20))
#> num 0.0414
df <- data.frame(success=c(5,8), total=c(20,21))
mutate(df, p_value=my_binom(success, total))
#> success total p_value
#> 1 5 20 0.5810547
#> 2 8 21 0.5810547
df <- data.frame(success=c(5,8,4), total=c(20,21,22))
mutate(df, p_value=my_binom(success, total))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: incorrect length of 'x'.
Both functions take the same input and return a single numeric, so I can't wrap my head around this discrepancy. Can someone enlighten me as to what's going on? Thanks!
Session info:
sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: OS X El Capitan 10.11.6
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2 dplyr_0.7.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.1 magrittr_1.5 assertthat_0.2.0 R6_2.2.2 tools_3.4.1
#> [6] glue_1.1.1 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.14 pkgconfig_2.0.1
#> [11] rlang_0.1.2 bindr_0.1
mutate(df, p_value = purrr::map2(success, total, my_binom))