I have read a single-variable .nc file into R as a SpatRaster object using the excellent terra package, with the intention of fitting geostatistical models based on the cell centroids. For this I need to construct a dataframe with columns corresponding to "lon, lat, value" using data from the SpatRaster. This feels like a task which might have a standard solution, but I'm unfamiliar with R's spatial statistics ecosystem.
Any advice/suggestions would be much appreciated.
It's even more straightforward to use the function terra::as.data.frame(). See https://rspatial.github.io/terra/reference/as.data.frame.html
library(terra)
#> terra version 1.3.4
# make test raster with terra::rast()
a <- terra::rast(ncols = 10, nrows = 10,
xmin = -84, xmax = -83,
ymin = 42, ymax = 43)
# give it some values
values(a) <- 1:ncell(a)
plot(a)
a_df <- terra::as.data.frame(a, xy = TRUE, na.rm = FALSE)
# take special note of default values
head(a_df)
#> x y lyr.1
#> 1 -83.95 42.95 1
#> 2 -83.85 42.95 2
#> 3 -83.75 42.95 3
#> 4 -83.65 42.95 4
#> 5 -83.55 42.95 5
#> 6 -83.45 42.95 6
packageVersion("terra")
#> [1] '1.3.4'
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] terra_1.3-4
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 codetools_0.2-18 lattice_0.20-44 digest_0.6.27
#> [5] withr_2.4.2 grid_4.1.0 magrittr_2.0.1 reprex_2.0.0
#> [9] evaluate_0.14 highr_0.9 rlang_0.4.11 stringi_1.7.3
#> [13] cli_3.0.1 fs_1.5.0 sp_1.4-5 raster_3.4-13
#> [17] rmarkdown_2.9 tools_4.1.0 stringr_1.4.0 glue_1.4.2
#> [21] xfun_0.24 yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1
#> [25] knitr_1.33
Created on 2021-10-21 by the reprex package (v2.0.0)
library(terra)
#> Warning: package 'terra' was built under R version 4.0.5
#> terra version 1.3.22
r <- rast(ncol=10, nrow=10)
values(r) <- runif(ncell(r))
plot(r)
p <- terra::as.points(r)
df <- data.frame(values(p), geom(p))
head(df)
#> lyr.1 geom part x y hole
#> 1 0.9557333 1 1 -162 81 0
#> 2 0.2974196 2 1 -126 81 0
#> 3 0.9703617 3 1 -90 81 0
#> 4 0.3046196 4 1 -54 81 0
#> 5 0.7334711 5 1 -18 81 0
#> 6 0.8880635 6 1 18 81 0
Related
Please have a look at the reprex at the end of the post.
I need to read a column as a string, perform several manipulations and then save convert it to a numerical column.
The blanks ("") in the string column give me a headache because arrow does not convert them to numerical missing values NA.
Does anybody know how to achieve that?
Many thanks
library(tidyverse)
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
df <- tibble(x=rep(c("4000 -", "6000 -", "", "8000 - "), 10),
y=seq(1,10, length=40))
write_csv(df, "test_string.csv")
data <- open_dataset("test_string.csv",
format="csv",
skip=1,
schema=schema(x=string(), y=double()))
data2 <- data |>
mutate(x= sub(" -.*", "", x) ) |>
mutate(x2=as.numeric(x)) |>
collect() ## how to convert the blank to a numeric NA ?
#> Error in `collect()`:
#> ! Invalid: Failed to parse string: '' as a scalar of type double
#> Backtrace:
#> ▆
#> 1. ├─dplyr::collect(mutate(mutate(data, x = sub(" -.*", "", x)), x2 = as.numeric(x)))
#> 2. └─arrow:::collect.arrow_dplyr_query(mutate(mutate(data, x = sub(" -.*", "", x)), x2 = as.numeric(x)))
#> 3. └─base::tryCatch(...)
#> 4. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 5. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 6. └─value[[3L]](cond)
#> 7. └─arrow:::augment_io_error_msg(e, call, schema = x$.data$schema)
#> 8. └─rlang::abort(msg, call = call)
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] arrow_10.0.0 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10
#> [5] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8
#> [9] ggplot2_3.4.0 tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] lubridate_1.9.0 assertthat_0.2.1 digest_0.6.30
#> [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
#> [7] backports_1.4.1 reprex_2.0.2 evaluate_0.17
#> [10] httr_1.4.4 highr_0.9 pillar_1.8.1
#> [13] rlang_1.0.6 googlesheets4_1.0.1 readxl_1.4.1
#> [16] R.utils_2.12.1 R.oo_1.25.0 rmarkdown_2.17
#> [19] styler_1.8.0 googledrive_2.0.0 bit_4.0.4
#> [22] munsell_0.5.0 broom_1.0.1 compiler_4.2.2
#> [25] modelr_0.1.9 xfun_0.34 pkgconfig_2.0.3
#> [28] htmltools_0.5.3 tidyselect_1.2.0 fansi_1.0.3
#> [31] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1
#> [34] withr_2.5.0 R.methodsS3_1.8.2 grid_4.2.2
#> [37] jsonlite_1.8.3 gtable_0.3.1 lifecycle_1.0.3
#> [40] DBI_1.1.3 magrittr_2.0.3 scales_1.2.1
#> [43] vroom_1.6.0 cli_3.4.1 stringi_1.7.8
#> [46] fs_1.5.2 xml2_1.3.3 ellipsis_0.3.2
#> [49] generics_0.1.3 vctrs_0.5.0 tools_4.2.2
#> [52] bit64_4.0.5 R.cache_0.16.0 glue_1.6.2
#> [55] hms_1.1.2 parallel_4.2.2 fastmap_1.1.0
#> [58] yaml_2.3.6 timechange_0.1.1 colorspace_2.0-3
#> [61] gargle_1.2.1 rvest_1.0.3 knitr_1.40
#> [64] haven_2.5.1
Created on 2022-11-07 with reprex v2.0.2
ifelse works here when all classes are correct (and not double()); if_else enforces this already, so we can use either.
data |>
mutate(x = sub(" -.*", "", x)) |>
mutate(
x = ifelse(x == "", NA_character_, x), # also if_else works
x2 = as.numeric(x)
) |>
collect()
# # A tibble: 40 x 3
# x y x2
# <chr> <dbl> <dbl>
# 1 4000 1 4000
# 2 6000 1.23 6000
# 3 NA 1.46 NA
# 4 8000 1.69 8000
# 5 4000 1.92 4000
# 6 6000 2.15 6000
# 7 NA 2.38 NA
# 8 8000 2.62 8000
# 9 4000 2.85 4000
# 10 6000 3.08 6000
# # ... with 30 more rows
Try using the read_csv instead of open_dataset
library(readr)
data <- read_csv("test_string.csv")
In the windrose example below, I would like to modify a few things:
remove the square frame around the windrose, so there is no border between the windrose itself and the legend.
remove the "title" which says "Frequency of counts..".
modify the labels of the directions N, E, W and S to "North", "East", "West" and "South".
add a small white background to each of the percentage numbers itself (from 2% to 10%).
Every answer to any question would already help! Sorry, I wasn't allowed to attach images, but the code below should help.
Code:
if (!require("openair")) install.packages("openair")
#> Lade nötiges Paket: openair
data_wind <- mydata
head(data_wind)
#> # A tibble: 6 × 10
#> date ws wd nox no2 o3 pm10 so2 co pm25
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1998-01-01 00:00:00 0.6 280 285 39 1 29 4.72 3.37 NA
#> 2 1998-01-01 01:00:00 2.16 230 NA NA NA 37 NA NA NA
#> 3 1998-01-01 02:00:00 2.76 190 NA NA 3 34 6.83 9.60 NA
#> 4 1998-01-01 03:00:00 2.16 170 493 52 3 35 7.66 10.2 NA
#> 5 1998-01-01 04:00:00 2.4 180 468 78 2 34 8.07 8.91 NA
#> 6 1998-01-01 05:00:00 3 190 264 42 0 16 5.50 3.05 NA
windRose(
mydata = data_wind,
auto.text = T,
paddle = F,
angle = 10,
seg = 1,
key.position = "right",
key.header = "wind speed [m/s]",
key.footer = "",
key = list(plot.style = "border"),
grid.line = list(value = 2, lty = 1, pch = 3, col = "black"),
offset = 2.5,
max.freq = 10,
angle.scale = 45,
border = "black"
)
Created on 2022-07-29 by the reprex package (v2.0.1)
The openair package version is 2.10.0. If you need any information about my OS etc.:
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
#> [5] LC_TIME=German_Germany.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.39 magrittr_2.0.3 R.cache_0.16.0
#> [5] rlang_1.0.3 fastmap_1.1.0 fansi_1.0.3 stringr_1.4.0
#> [9] styler_1.7.0 highr_0.9 tools_4.2.1 xfun_0.31
#> [13] R.oo_1.25.0 utf8_1.2.2 cli_3.3.0 withr_2.5.0
#> [17] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.3.5 digest_0.6.29
#> [21] tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1 purrr_0.3.4
#> [25] R.utils_2.12.0 vctrs_0.4.1 fs_1.5.2 glue_1.6.2
#> [29] evaluate_0.15 rmarkdown_2.14 reprex_2.0.1 stringi_1.7.6
#> [33] compiler_4.2.1 pillar_1.7.0 R.methodsS3_1.8.2 pkgconfig_2.0.3
Created on 2022-07-29 by the reprex package (v2.0.1)
Thanks in advance!
I have a shapefile of polygons that I want to use to extract raster values into a data frame. So I do that in the following code.
shp <- sf:st_read('example.shp')
r <- raster::raster('example.tif')
extract <- raster::extract(r, shp, df=TRUE)
This gives me a data frame of two columns: the numeric ID for each polygon and the associated extracted raster value. Now I would like to add the x, y coordinates for each extracted raster value. I have seen this done for point shapefiles but I am not sure how to apply it for polygon shapefile geometry.
Try this, I combined raster::extract(..., cellnumbers = TRUE) and raster::xyFromCell:
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)
#> Loading required package: sp
library(giscoR)
# Dummy data
shp <-
gisco_get_nuts(country = "Netherlands",
nuts_level = 3,
epsg = 4326)[, 1]
r <- raster::getData("alt", country = "Netherlands")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded datum Unknown based on WGS84 ellipsoid in Proj4
#> definition
extract <- raster::extract(r, shp, df = TRUE, cellnumbers = TRUE)
# Order (for checking purposes)
extract <- extract[order(extract$cell),]
# Extract coordinates
xy <- xyFromCell(r, cell = extract$cell, spatial = FALSE)
# Convert to df and add cellnumber
xy <- as.data.frame(xy)
xy$cell <- extract$cell
# Merge two data frames
extract_end <- merge(extract, xy)
extract_end <- extract_end[order(extract_end$cell),]
# This is what you are looking for: extract_end is a data frame
# has values and x and y are coordinates
head(extract_end)
#> cell ID NLD_msk_alt x y
#> 1 2319 3 NA 6.620833 53.56250
#> 2 2796 3 NA 6.595833 53.55417
#> 3 2797 3 NA 6.604167 53.55417
#> 4 2798 3 NA 6.612500 53.55417
#> 5 2799 3 NA 6.620833 53.55417
#> 6 2800 3 NA 6.629167 53.55417
# Checks - can be ommited
nrow(extract) == nrow(extract_end)
#> [1] TRUE
# Check NAs in coordinates
nrow(extract_end[is.na(extract_end$x), ])
#> [1] 0
nrow(extract_end[is.na(extract_end$y), ])
#> [1] 0
# Convert to sf for checks
sfobj <- st_as_sf(extract_end, coords = c("x", "y"))
sfobj <- st_set_crs(sfobj, st_crs(4326))
# Plot as sf points
par(mar = c(3, 3, 3, 3))
plot(
sfobj[, 3],
axes = TRUE,
main = "sf points",
key.pos = 4,
breaks = "equal",
nbreaks = 100,
pal = rev(terrain.colors(100))
)
# Compare with raster plot
par(mar = c(3, 3, 3, 3))
plot(r, main = "Raster")
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] giscoR_0.2.4-9000 raster_3.4-5 sp_1.4-5 sf_0.9-7
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.6 pillar_1.4.7 compiler_4.0.3 highr_0.8
#> [5] class_7.3-18 tools_4.0.3 digest_0.6.27 evaluate_0.14
#> [9] lifecycle_1.0.0 tibble_3.0.6 lattice_0.20-41 pkgconfig_2.0.3
#> [13] rlang_0.4.10 reprex_1.0.0 DBI_1.1.1 rgdal_1.5-23
#> [17] yaml_2.2.1 xfun_0.21 e1071_1.7-4 styler_1.3.2
#> [21] stringr_1.4.0 dplyr_1.0.4 knitr_1.31 generics_0.1.0
#> [25] fs_1.5.0 vctrs_0.3.6 classInt_0.4-3 grid_4.0.3
#> [29] tidyselect_1.1.0 glue_1.4.2 R6_2.5.0 rmarkdown_2.6
#> [33] purrr_0.3.4 magrittr_2.0.1 codetools_0.2-18 backports_1.2.1
#> [37] ellipsis_0.3.1 htmltools_0.5.1.1 units_0.6-7 assertthat_0.2.1
#> [41] countrycode_1.2.0 KernSmooth_2.23-18 stringi_1.5.3 crayon_1.4.1
Created on 2021-02-19 by the reprex package (v1.0.0)
dplyr::lag works fine with integers with the first entry being <NA>, but with bit64::integer64 the first entry is a huge number.
This is my setting:
library(tidyverse)
library(magrittr)
#> ...
library(bit64)
#> Loading required package: bit
#> Attaching package bit
#> ...
#> Attaching package bit64
#> ...
#> The following object is masked from 'package:bit':
#>
#> still.identical
#> The following objects are masked from 'package:base':
#>
#> :, %in%, is.double, match, order, rank
library(reprex)
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-apple-darwin18.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#>
#> ...
#>
#> other attached packages:
#> [1] reprex_0.3.0 bit64_0.9-7 bit_1.1-14 magrittr_1.5
#> [5] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.1 purrr_0.3.2
#> [9] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
#> [13] tidyverse_1.2.1
#>
#> ...
Here is a minimal reprex:
tib_int64 <- tibble(A_int = as.integer(c(1,2,3)),
A_int64 = as.integer64(c(1,2,3)))
tib_int64 %>% mutate(B = lag(A_int), C = lag(A_int64))
#> # A tibble: 3 x 4
#> A_int A_int64 B C
#> <int> <int64> <int> <int64>
#> 1 1 1 NA 9218868437227407266
#> 2 2 2 1 1
#> 3 3 3 2 2
The first entry in the C column should be <NA> like in the B column.
Is this a dplyr problem or a bit64 problem?
This is not too difficult to work around, but shouldn't this be filed as a bug?
I'd like to use dplyr::mutate to add p-values to a dataframe but it's not working and I can't get my head around why.
This works:
my_add<-function(x, y) x + y
str(my_add(5, 15))
#> num 20
df <- data.frame(success=c(5,8,4), fail=c(15,13,18))
mutate(df, total=my_add(success, fail))
#> success fail total
#> 1 5 15 20
#> 2 8 13 21
#>13 4 18 22
But this doesn't:
my_binom <- function(x, y) binom.test(x, y)$"p.value"
str(my_binom(5, 20))
#> num 0.0414
df <- data.frame(success=c(5,8), total=c(20,21))
mutate(df, p_value=my_binom(success, total))
#> success total p_value
#> 1 5 20 0.5810547
#> 2 8 21 0.5810547
df <- data.frame(success=c(5,8,4), total=c(20,21,22))
mutate(df, p_value=my_binom(success, total))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: incorrect length of 'x'.
Both functions take the same input and return a single numeric, so I can't wrap my head around this discrepancy. Can someone enlighten me as to what's going on? Thanks!
Session info:
sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: OS X El Capitan 10.11.6
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2 dplyr_0.7.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.1 magrittr_1.5 assertthat_0.2.0 R6_2.2.2 tools_3.4.1
#> [6] glue_1.1.1 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.14 pkgconfig_2.0.1
#> [11] rlang_0.1.2 bindr_0.1
mutate(df, p_value = purrr::map2(success, total, my_binom))