In my process I need to perform many dplyr::inner_joins. Thought I might define a custom pipe operator for it as explained here:
library(tidyverse)
library(rlang)
df1 <- tibble(a = 1:10, b = 11:20)
df2 <- tibble(a = 1:10, c = 21:30)
`%J>%` <- function(lhs, rhs){
inner_join(lhs, rhs)
}
df1 %J>% df2
This works as expected and I get:
Joining, by = "a"
# A tibble: 10 x 3
a b c
<int> <int> <int>
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30
But then also a warning:
Warning message:
`chr_along()` is soft-deprecated as of rlang 0.2.0.
This warning is displayed once per session.
Plot thickens if I don't include library(rlang) at all (in a new session), in which case I get no warnings:
library(tidyverse)
df1 <- tibble(a = 1:10, b = 11:20)
df2 <- tibble(a = 1:10, c = 21:30)
`%J>%` <- function(lhs, rhs){
inner_join(lhs, rhs)
}
df1 %J>% df2
Obviously I don't have to include library(rlang) at all in this example, but if I did - this is one weird warning. Where is it coming from and how to avoid it if I did wanted to include library(rlang)?
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_Israel.1252 LC_CTYPE=English_Israel.1252 LC_MONETARY=English_Israel.1252 LC_NUMERIC=C LC_TIME=English_Israel.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rlang_0.3.0.1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 cellranger_1.1.0 pillar_1.3.0 compiler_3.5.1 plyr_1.8.4 bindr_0.1.1 tools_3.5.1 packrat_0.4.9-3 jsonlite_1.5 lubridate_1.7.4 nlme_3.1-137
[12] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2 cli_1.0.1 rstudioapi_0.8 haven_1.1.2 bindrcpp_0.2.2 withr_2.1.2 xml2_1.2.0 httr_1.3.1 hms_0.4.2
[23] grid_3.5.1 tidyselect_0.2.4 glue_1.3.0 R6_2.2.2 fansi_0.3.0 readxl_1.1.0 modelr_0.1.2 magrittr_1.5 backports_1.1.2 scales_1.0.0 rvest_0.3.2
[34] assertthat_0.2.0 colorspace_1.3-2 utf8_1.1.4 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0 crayon_1.3.4
From your description, I would say that If you load rlang as part of the tidyverse, (i.e. just load tidyverse), then R will use the verse's rlang which is automatically updated whithin the verse. If you load tidyverse first and then rlang, then R will use the last seen one, which is the one you loaded manually. Thus, If you did not update rlang manually then It will give the warning.
The problem should go away If you manually update rlang.
Related
I am working with data that has significant digits (i.e. digits after the "."). These digits appear when viewing my data both as a variable in base R, and also when the data is stored in a dataframe. However, they do not appear when I view the data in a tibble.
I need to view these significant digits for my work. Is there a way to make them appear when using tibbles?
Here is a reproducible example:
x has 5 significant digits, and 3 are displayed when using base R:
x = 1234.56789
x
[1] 1234.568
Within a data.frame, 3 significant digits are also displayed:
df = data.frame(x=x)
df
x
1 1234.568
Within a tibble, though, 0 significant digits are displayed:
library(tibble)
df = tibble(x=x)
df
# A tibble: 1 x 1
x
<dbl>
1 1235.
Again, I am looking for a way to make more than 0 significant digits appear whening viewing my data in a tibble.
Here is the result of my sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] tibble_1.4.2 readr_1.1.1 choroplethr_3.6.2
[4] acs_2.1.3 XML_3.98-1.12 stringr_1.3.1
loaded via a namespace (and not attached):
[1] httr_1.3.1 maps_3.3.0 splines_3.5.1
[4] Formula_1.2-3 assertthat_0.2.0 sp_1.3-1
[7] latticeExtra_0.6-28 yaml_2.2.0 pillar_1.3.0
[10] backports_1.1.2 lattice_0.20-35 glue_1.3.0
[13] uuid_0.1-2 digest_0.6.15 RColorBrewer_1.1-2
[16] checkmate_1.8.5 colorspace_1.3-2 htmltools_0.3.6
[19] Matrix_1.2-14 plyr_1.8.4 pkgconfig_2.0.1
[22] WDI_2.5 purrr_0.2.5 scales_0.5.0
[25] jpeg_0.1-8 tigris_0.7 ggmap_2.6.1
[28] htmlTable_1.12 ggplot2_3.0.0 nnet_7.3-12
[31] lazyeval_0.2.1 cli_1.0.0 proto_1.0.0
[34] survival_2.42-6 RJSONIO_1.3-0 magrittr_1.5
[37] crayon_1.3.4 maptools_0.9-2 fansi_0.2.3
[40] foreign_0.8-71 class_7.3-14 tools_3.5.1
[43] data.table_1.11.4 hms_0.4.2 geosphere_1.5-7
[46] RgoogleMaps_1.4.2 munsell_0.5.0 cluster_2.0.7-1
[49] bindrcpp_0.2.2 compiler_3.5.1 e1071_1.7-0
[52] rlang_0.2.1 classInt_0.2-3 units_0.6-0
[55] grid_3.5.1 rstudioapi_0.7 rjson_0.2.20
[58] rappdirs_0.3.1 htmlwidgets_1.2 base64enc_0.1-3
[61] gtable_0.2.0 curl_3.2 DBI_1.0.0
[64] reshape2_1.4.3 R6_2.2.2 gridExtra_2.3
[67] knitr_1.20 dplyr_0.7.6 rgdal_1.3-3
[70] utf8_1.1.4 bindr_0.1.1 Hmisc_4.1-1
[73] stringi_1.2.4 Rcpp_0.12.18 mapproj_1.2.6
[76] sf_0.6-3 rpart_4.1-13 acepack_1.4.1
[79] png_0.1-7 spData_0.2.9.0 tidyselect_0.2.4
you can set the option pillar.sigfig
options(pillar.sigfig = 1)
as_tibble(iris)
# # A tibble: 150 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
# 1 5. 4. 1. 0.2 setosa
# 2 5. 3 1. 0.2 setosa
# 3 5. 3. 1. 0.2 setosa
# 4 5. 3. 2. 0.2 setosa
# 5 5 4. 1. 0.2 setosa
# 6 5. 4. 2. 0.4 setosa
# 7 5. 3. 1. 0.3 setosa
# 8 5 3. 2. 0.2 setosa
# 9 4. 3. 1. 0.2 setosa
# 10 5. 3. 2. 0.1 setosa
options(pillar.sigfig = 7)
tb = tibble(x=x)
tb
# # A tibble: 1 x 1
# x
# <dbl>
# 1 1234.568
See also:
?`tibble-options`
or online:
https://www.rdocumentation.org/packages/tibble/versions/1.4.2/topics/tibble-options
I have a very large dataset where I am looking to take a column of identifiers (CP) first edit how the identifiers look to match another file, and then search if there are ```CP`` matches between the files.
I do the editing of the CP first with:
fullGWAS <- fread('file.csv',sep=",")
colnames(fullGWAS)[1] <- "CP"
fullGWAS2<-gsub("_.*","",fullGWAS$CP)
fullGWAS2 <-data.frame(fullGWAS2)
colnames(fullGWAS2)[1] <- "CP"
fullGWAS3 <- select(fullGWAS, c(2:15))
gwasdf <- cbind(fullGWAS2, fullGWAS3)
As an example gwasdf looks like:
> head(gwasdf)
CP chr bpos a1 a2 freq BETAsbp Psbp BETAdbp Pdbp BETApp Ppp minP
1 1:2556125 1 2556125 t c 0.3255 -0.0262 0.41300 -0.0113 0.5388 -0.0157 0.4690 0.41300
2 1:2556548 1 2556548 t c 0.3261 -0.0274 0.39270 -0.0121 0.5096 -0.0160 0.4615 0.39270
3 1:2556709 1 2556709 a g 0.3257 -0.0263 0.41210 -0.0116 0.5266 -0.0155 0.4749 0.41210
4 12:11366987 12 11366987 t c 0.9443 0.0355 0.61460 0.0019 0.9631 0.0185 0.7007 0.61460
5 17:21949792 17 21949792 a c 0.4570 -0.0384 0.20690 -0.0043 0.8065 -0.0212 0.3050 0.20690
6 17:21955349 17 21955349 t g 0.5253 0.0505 0.09562 0.0103 0.5574 0.0248 0.2303 0.09562
minTRAIT BETAmean
1 SBP -0.01875
2 SBP -0.01975
3 SBP -0.01895
4 SBP 0.01870
5 SBP -0.02135
6 SBP 0.03040
I can see CP is here yet when I try to check this I get:
exists("gwasdf$CP")
[1] FALSE
class(gwasdf)
[1] "data.frame"
nrow(gwasdf)
[1] 7083535
Why is this false and how can I make it be true?
I am trying to ultimately check whether the CP identifiers are present in another file with follow-up code using:
CPmatches <- df2[CP %in% gwasdf$CP] #df2 is another file I just read in
mismatchextract <- subset(gwasdf, !(CP %in% df2$CP))
For extra info I use RStudio with:
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] matrixStats_0.57.0 sqldf_0.4-11 RSQLite_2.2.1 gsubfn_0.7
[5] proto_1.0.0 data.table_1.13.2 forcats_0.5.0 stringr_1.4.0
[9] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[13] tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 haven_2.3.1 tcltk_4.0.2 colorspace_1.4-1 vctrs_0.3.4
[6] generics_0.1.0 chron_2.3-56 blob_1.2.1 rlang_0.4.8 pillar_1.4.7
[11] glue_1.4.1 withr_2.3.0 DBI_1.1.0 bit64_4.0.5 dbplyr_2.0.0
[16] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0
[21] cellranger_1.1.0 rvest_0.3.6 memoise_1.1.0 fansi_0.4.1 broom_0.7.2
[26] Rcpp_1.0.5 scales_1.1.1 backports_1.1.10 jsonlite_1.7.1 fs_1.5.0
[31] bit_4.0.4 hms_0.5.3 digest_0.6.27 stringi_1.5.3 grid_4.0.2
[36] cli_2.2.0 tools_4.0.2 magrittr_2.0.1 crayon_1.3.4 pkgconfig_2.0.3
[41] ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0 lubridate_1.7.9 assertthat_0.2.1
[46] httr_1.4.2 rstudioapi_0.13 R6_2.5.0 compiler_4.0.2
Something like this using dplyr and the %in% operator? Assuming there are two separate datasets and a goal of subsetting based on whether an element in one dataset belongs to a separate dataset.
qwasdf_1 <- data.frame(
CP1 = c("1:2556125", "1:2556548", "99:12345678")
)
qwasdf_2 <- data.frame(
CP2 = c("1:2556125", "1:2556548", "1:2556709")
)
library(dplyr)
qwasdf_1 %>%
filter(CP1 %in% qwasdf_2$CP2)
#> CP1
#> 1 1:2556125
#> 2 1:2556548
Created on 2020-11-23 by the reprex package (v0.3.0)
I have a mlr3 task
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
How can I rename the feature "v1" values "a" to values "c" within pipeline?
The code:
library(mlr3)
library(mlr3pipelines)
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
Gives the output (see column v1, row 2):
data v1 v2
1 3.15 c 1
2 4.11 2 2
3 3.56 c 2
But need output
data v1 v2
1 3.15 c 1
2 4.11 b 2
3 3.56 c 2
This is quite straightforward to do using PipeOpColApply.
We need to define a function that will take the provided input and perform the requested operation (applicator).
library(mlr3)
library(mlr3pipelines)
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
We also need to define on which columns the function will operate:
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
#output
data v1 v2
1: 3.15 c 1
2: 4.11 b 2
3: 3.56 c 2
This is very similar to the example in the function help.
data:
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
task <- TaskRegr$new("bmsp", df, target = "data")
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlr3pipelines_0.3.0-9000 mlr3_0.7.0 Biostrings_2.56.0 XVector_0.28.0 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] Biobase_2.48.0 httr_1.4.2 bit64_4.0.5 splines_4.0.2 foreach_1.5.0 prodlim_2019.11.13 assertthat_0.2.1 lgr_0.3.4 askpass_1.1
[10] BiocFileCache_1.12.1 blob_1.2.1 mlr3misc_0.5.0 progress_1.2.2 ipred_0.9-9 backports_1.1.10 pillar_1.4.6 RSQLite_2.2.1 lattice_0.20-41
[19] glue_1.4.2 uuid_0.1-4 pROC_1.16.2 digest_0.6.25 checkmate_2.0.0 colorspace_1.4-1 recipes_0.1.13 Matrix_1.2-18 plyr_1.8.6
[28] timeDate_3043.102 XML_3.99-0.5 pkgconfig_2.0.3 biomaRt_2.44.1 caret_6.0-86 zlibbioc_1.34.0 purrr_0.3.4 scales_1.1.1 gower_0.2.2
[37] lava_1.6.8 tibble_3.0.3 openssl_1.4.3 generics_0.0.2 ggplot2_3.3.2 ellipsis_0.3.1 withr_2.3.0 nnet_7.3-14 paradox_0.4.0-9000
[46] survival_3.1-12 magrittr_1.5 crayon_1.3.4 memoise_1.1.0 nlme_3.1-148 MASS_7.3-51.6 class_7.3-17 tools_4.0.2 data.table_1.13.0
[55] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 glmnet_4.0-2 AnnotationDbi_1.50.3 compiler_4.0.2 tinytex_0.26
[64] rlang_0.4.7 grid_4.0.2 iterators_1.0.12 rstudioapi_0.11 rappdirs_0.3.1 gtable_0.3.0 ModelMetrics_1.2.2.2 codetools_0.2-16 DBI_1.1.0
[73] curl_4.3 reshape2_1.4.4 R6_2.4.1 lubridate_1.7.9 dplyr_1.0.2 bit_4.0.4 biomartr_0.9.2 shape_1.4.5 stringi_1.5.3
[82] Rcpp_1.0.5 vctrs_0.3.4 rpart_4.1-15 dbplyr_1.4.4 tidyselect_1.1.0 xfun_0.18
For some reason stat_density_2d() does not seem to be working right for me when specifying geom = "polygon" and I am absolutely stumped. Here is my code...
library(sf)
library(tidyverse)
library(RANN2)
library(hexbin)
library(mapproj)
options(stringsAsFactors = FALSE)
raleigh_police <- rgdal::readOGR("https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.geojson", "OGRGeoJSON")
raleigh_police_sf <- raleigh_police %>%
st_as_sf()
raleigh_police_sf %>%
filter(crime_description == "Burglary/Residential") %>%
st_coordinates() %>%
as_tibble() %>%
ggplot() +
stat_density_2d(aes(X, Y, fill = stat(level)), geom = "polygon")
Here is my sessionInfo()...
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 mapproj_1.2.6 maps_3.3.0 hexbin_1.27.2 RANN2_0.1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[9] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1 sf_0.7-1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-38 class_7.3-14 utf8_1.1.4 assertthat_0.2.0 rprojroot_1.3-2
[8] digest_0.6.18 R6_2.3.0 cellranger_1.1.0 plyr_1.8.4 backports_1.1.2 evaluate_0.12 e1071_1.7-0
[15] httr_1.3.1 blogdown_0.9 pillar_1.3.0 rlang_0.3.0.1 lazyeval_0.2.1 readxl_1.1.0 rstudioapi_0.8
[22] rmarkdown_1.10 labeling_0.3 rgdal_1.3-6 munsell_0.5.0 broom_0.5.0 compiler_3.5.1 modelr_0.1.2
[29] xfun_0.4 pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.5 bookdown_0.7 codetools_0.2-15 fansi_0.4.0
[36] crayon_1.3.4 withr_2.1.2 MASS_7.3-51.1 grid_3.5.1 nlme_3.1-137 spData_0.2.9.4 jsonlite_1.5
[43] gtable_0.2.0 DBI_1.0.0 magrittr_1.5 units_0.6-1 scales_1.0.0 cli_1.0.1 stringi_1.2.4
[50] sp_1.3-1 xml2_1.2.0 tools_3.5.1 glue_1.3.0 hms_0.4.2 yaml_2.2.0 colorspace_1.3-2
[57] classInt_0.2-3 rvest_0.3.2 knitr_1.20 bindr_0.1.1 haven_1.1.2
I just get a blank plot with nothing on it. Completely stumped. What am I doing wrong here?
Update (2018-11-18)
It turns out that the primary issue here was options(stringsAsFactors = FALSE). If you comment that out and run the original code, everything actually works fine. I found this GitHub Issue which was the reason I tried that. Much more efficient code solutions are provided in the answers to this question and they also made sure not to use options(stringsAsFactors = FALSE).
Aside from downloading the file before reading and changing readOGR to read_sf, it works as-is for me save warnings for a couple NA points caused by empty geometries:
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
path <- "~/Downloads/raleigh.geojson"
download.file(
"https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.geojson",
path,
method = "curl"
)
raleigh_police <- sf::read_sf(path, "OGRGeoJSON")
raleigh_police %>%
filter(crime_description == "Burglary/Residential") %>%
st_coordinates() %>%
as_tibble() %>%
ggplot() +
stat_density_2d(aes(X, Y, fill = stat(level)), geom = "polygon")
#> Warning: Removed 5 rows containing non-finite values (stat_density2d).
The empty rows:
raleigh_police %>%
filter(crime_description == "Burglary/Residential",
st_is_empty(.))
#> Simple feature collection with 5 features and 21 fields (with 5 geometries empty)
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> # A tibble: 5 x 22
#> OBJECTID GlobalID case_number crime_category crime_code crime_descripti…
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 205318 8057315… P14076062 BURGLARY/RESI… 30B Burglary/Reside…
#> 2 417488 70afb27… P15027702 BURGLARY/RESI… 30B Burglary/Reside…
#> 3 424718 bdf69fa… P18029113 BURGLARY/RESI… 30B Burglary/Reside…
#> 4 436550 711c05b… P18044139 BURGLARY/RESI… 30B Burglary/Reside…
#> 5 442091 9d7a008… P18051764 BURGLARY/RESI… 30B Burglary/Reside…
#> # … with 16 more variables: crime_type <chr>, reported_block_address <chr>,
#> # city_of_incident <chr>, city <chr>, district <chr>, reported_date <dttm>,
#> # reported_year <int>, reported_month <int>, reported_day <int>,
#> # reported_hour <int>, reported_dayofwk <chr>, latitude <dbl>,
#> # longitude <dbl>, agency <chr>, updated_date <dttm>, geometry <POINT [°]>
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS 10.14.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sf_0.7-1 forcats_0.3.0 stringr_1.3.1
#> [4] dplyr_0.7.99.9000 purrr_0.2.5 readr_1.2.0
#> [7] tidyr_0.8.2 tibble_1.4.99.9005 ggplot2_3.1.0
#> [10] tidyverse_1.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_0.2.5 haven_1.1.2 lattice_0.20-35
#> [4] colorspace_1.3-2 htmltools_0.3.6 yaml_2.2.0
#> [7] rlang_0.3.0.1 e1071_1.7-0 pillar_1.3.0.9001
#> [10] glue_1.3.0 withr_2.1.2 DBI_1.0.0
#> [13] modelr_0.1.2 readxl_1.1.0 plyr_1.8.4
#> [16] munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0
#> [19] rvest_0.3.2 evaluate_0.12 labeling_0.3
#> [22] knitr_1.20 class_7.3-14 broom_0.5.0
#> [25] Rcpp_0.12.19.3 scales_1.0.0 backports_1.1.2
#> [28] classInt_0.2-3 jsonlite_1.5 hms_0.4.2.9001
#> [31] digest_0.6.18 stringi_1.2.4 grid_3.5.1
#> [34] rprojroot_1.3-2 cli_1.0.1 tools_3.5.1
#> [37] magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4
#> [40] pkgconfig_2.0.2 MASS_7.3-51 xml2_1.2.0
#> [43] spData_0.2.9.4 lubridate_1.7.4 assertthat_0.2.0
#> [46] rmarkdown_1.10 httr_1.3.1 R6_2.3.0
#> [49] units_0.6-1 nlme_3.1-137 compiler_3.5.1
As noted, this is just point-level data and the CSV they provide can be a fine substitute:
library(tidyverse)
rp_csv_url <- "https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.csv"
httr::GET(
url = rp_csv_url,
httr::write_disk(basename(rp_csv_url)), # won't overwrite if it exists unless explicitly told to so you get caching for free
httr::progress() # I suspect this is a big file so it's nice to see a progress bar
)
raleigh_police <- read_csv(basename(rp_csv_url))
mutate(
raleigh_police,
longitude = as.numeric(longitude), # they come in wonky, still
latitude = as.numeric(latitude) # they come in wonky, still
) -> raleigh_police
raleigh_police %>%
filter(crime_description == "Burglary/Residential") %>%
ggplot() +
stat_density_2d(
aes(longitude, latitude, fill = stat(level)),
color = "#2b2b2b", size=0.125, geom = "polygon"
) +
viridis::scale_fill_viridis(direction=-1, option="magma") +
hrbrthemes::theme_ipsum_rc()
If you'd like to turn level into something more meaningful:
h <- c(MASS::bandwidth.nrd(rp_br$longitude),
MASS::bandwidth.nrd(rp_br$latitude))
dens <- MASS::kde2d(
rp_br$longitude, rp_br$latitude, h = h, n = 100
)
breaks <- pretty(range(dens$z), 10)
zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
z <- tapply(zdf$z, zdf[c("x", "y")], identity)
cl <- grDevices::contourLines(
x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
levels = breaks
)
sp::SpatialPolygons(
lapply(1:length(cl), function(idx) {
sp::Polygons(
srl = list(sp::Polygon(
matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
)),
ID = idx
)
})
) -> cont
sp::coordinates(rp_br) <- ~longitude+latitude
then:
data_frame(
ct = sapply(sp::over(cont, sp::geometry(rp_br), returnList = TRUE), length),
id = 1:length(ct),
lvl = sapply(cl, function(x) x$level)
) %>%
count(lvl, wt=ct) %>%
mutate(
pct = n/nrow(rp_br),
pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
)
## # A tibble: 10 x 4
## lvl n pct pct_lab
## <dbl> <int> <dbl> <chr>
## 1 10. 7302 0.927 92.7% of the points fall within this level
## 2 20. 6243 0.792 79.2% of the points fall within this level
## 3 30. 4786 0.607 60.7% of the points fall within this level
## 4 40. 3204 0.407 40.7% of the points fall within this level
## 5 50. 1945 0.247 24.7% of the points fall within this level
## 6 60. 1277 0.162 16.2% of the points fall within this level
## 7 70. 793 0.101 10.1% of the points fall within this level
## 8 80. 474 0.0601 6.0% of the points fall within this level
## 9 90. 279 0.0354 3.5% of the points fall within this level
## 10 100. 44 0.00558 0.6% of the points fall within this level
Look at the follow example generated with reprex:
library(data.table)
DT <- data.table(id = letters[1:3], `counts(a>=0)` = 1:3)
DT[`counts(a>=0)` >= 2] # 1
#> id counts(a>=0)
#> 1: b 2
#> 2: c 3
DT[`counts(a>=0)` == 2] # 2
#> Error in `[.data.table`(DT, `counts(a>=0)` == 2): Column(s) [counts(a] not found in x
DT[id == "a"] # 3
#> id counts(a>=0)
#> 1: a 1
As both the lines marked with #1 and #3 work, I wonder why subsetting with `counts(a>=0)` == 2 (#2) doesn't work.
SessionInfo:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_0.1.2 data.table_1.11.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 rprojroot_1.3-2 digest_0.6.15 crayon_1.3.4 withr_2.1.2 assertthat_0.2.0 R6_2.2.2
[8] backports_1.1.2 magrittr_1.5 formatR_1.5 evaluate_0.10.1 stringi_1.1.6 debugme_1.1.0 rstudioapi_0.7
[15] callr_2.0.2 whisker_0.3-2 rmarkdown_1.9 devtools_1.13.5 tools_3.4.4 stringr_1.3.0 yaml_2.1.17
[22] compiler_3.4.4 htmltools_0.3.6 memoise_1.1.0 knitr_1.20
It works for me with :
DT[as.numeric(`counts(a>=0)`) == 2]