Wrong Dimensions in Geospatial NetCDF - r

I would like to load the following geospatial file in R: ftp://ftp.nodc.noaa.gov/pub/data.nodc/icoads/1930s/1930s/ICOADS_R3.0.0_1930-10.nc. The problem is that using the subsequent code I only obtain one dimension, even though I should obtain three:
require("raster")
require("ncdf4")
nc_data <- nc_open("ICOADS_R3.0.0_1930-10.nc")
id.array <- ncvar_get(nc_data, "ID")
dim(id.array)
How do I fix this?
Thank you for any comments and suggestions.

Does this give you what you expect?
library(tidync)
library(magrittr)
tfile <- tempfile(fileext = ".nc")
download.file("ftp://ftp.nodc.noaa.gov/pub/data.nodc/icoads/1930s/1930s/ICOADS_R3.0.0_1930-10.nc", tfile)
id <- tidync(tfile) %>% activate("ID") %>% hyper_tibble()
dim(id)
[1] 69779 3
tidync is only on Github: https://github.com/hypertidy/tidync

Related

How to create a subset of a shp file, with all its properties

I am new to programming in R and with .shp files.
I am trying to take a subsample / subset of a .shp file that is so big, you can download this file from here: https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259952026632&p=1259952026632&pagename=ProductosYServicios%2FPYSLayout (select the year 2021 and then go ahead).
I have tried several things but none of them work, neither is it worth passing it to sf because it would simply add one more column called geometry with the coordinates listed and that is not enough for me to put it later in the leaflet package.
I have tried this here but it doesn't work for me:
myspdf = readOGR(getwd(), layer = "SECC_CE_20210101") #It works
PV2 = myspdf[myspdf#data$NCA == 'País Vasco', ] #Dont work
PV2 = myspdf[,myspdf#data$NCA == 'País Vasco'] #Dont work
What I intend is to create a sample of myspdf (with data, polygons, plotorder, bbox and proj4string) but I don't want it from all the NCA values ​​(myspdf#data$NCA), I only want those in which data$NCA are 'País Vasco'
In short, I would like to have a sample for each value of the different NCA column.
Is that possible? someone can help me on this? thank you very much.
I have tried this too but the same thing as before appears to me, all 18 variables appear and all are empty:
Pais_V = subset(myspdf, NCA == 'País Vasco')
dim(Pais_V)
Here's one approach:
library(rgdal)
dlshape=function(shploc, shpfile) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
fp <- file.path(temp, f)
return(readOGR(dsn=".",shpfile))
})
}
setwd("C:/temp")
x = dlshape(shploc="https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_aitsn_500k.zip", "cb_2020_us_aitsn_500k")
x<-x$. # extract the shapefile
mycats<-c("00","T2","T3","28")
x2<-subset(x, x$LSAD %in% mycats) # subset using the list `mycats`
mypal=colorFactor("Dark2",domain=x2$LSAD)
library(leaflet)
leaflet(x2) %>% addPolygons(weight=.2, color=mypal(x2$LSAD))
dlshape function courtesy of #yokota
Here's another option. This uses the package sf.
myspdf <- st_read("./_data/España_Seccionado2021/SECC_CE_20210101.shp",
as_tibble = T)
Now you can filter this data any way that you filter a data frame. It will still work as spatial data, as well.
Using tidyverse (well, technically dplyr):
myspdf %>% filter(NCA == "País Vasco")
This takes it from 36,334 observations to 1714 observations.
The base R method you tried to use with readOGR will work, as well.
myspdf[myspdf$NCA == "País Vasco",]

how to dissolve a polyline in R

I have a large polyline shapefile that needs to be dissolved. However, the examples online only relate to polygons not polylines for example gUnaryUnion. I am reading in my shapefile using st_read from the sf package. Any help would be appreciated.
If I understand your question, one option is to use dplyr.
library(sf)
library(dplyr)
# get polyline road data
temp_shapefile <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017//ROADS/tl_2017_06075_roads.zip", temp_shapefile)
temp_dir <- tempdir()
unzip(temp_shapefile, exdir = temp_dir)
sf_roads <- read_sf(file.path(temp_dir,'tl_2017_06075_roads.shp'))
Use the RTTYP field to reduce the polyline from ~4000 unique segments to 6 segments.
sf_roads_summarized <- sf_roads %>%
group_by(RTTYP) %>%
summarize()
I manged to achieve this by using st_combine.

Error: `t.haven_labelled()` not supported while attempting MICE package in R

Here is my sample code:
library(haven)
community_surveys <- read_sav("community_surveys.sav")
diss_data <- as.data.frame(community_surveys)
diss_data$FOC_1 <- as.factor(diss_data$FOC_1)
diss_data$DR_1 <- as.factor(diss_data$DR_1)
diss_data$IR_1 <- as.factor(diss_data$IR_1)
diss_data$HAITI <- as.factor(diss_data$HAITI)
diss_data$TREATMENT <- as.factor(diss_data$TREATMENT)
library(mice)
mice(diss_data, maxit = 10, m = 10)
I get this error below:
Error: `t.haven_labelled()` not supported
As far as the level of comprehension, I am a newbie R user with a couple intro classes and some reading under my belt. Any assistance much appreciated.
Labelled data from haven leads to all sorts of weird problems. You could try one of the following:
If your data should be numeric: sapply(diss_data, haven::zap_labels)
For factors: sapply(diss_data, haven::as_factor)
You could also just try to replace the command in your code like this:
diss_data$FOC_1 <- haven::as_factor(diss_data$FOC_1)
diss_data$DR_1 <- haven::as_factor(diss_data$DR_1)
diss_data$IR_1 <- haven::as_factor(diss_data$IR_1)
diss_data$HAITI <- haven::as_factor(diss_data$HAITI)
diss_data$TREATMENT <- haven::as_factor(diss_data$TREATMENT)
You could remove value labels by using remove_val_labels() from labelled library.

combining multiple shapefiles in R

I have a folder with about 100 point shapefiles that are locations obtained while scat sampling of an ungulate species. I would like to merge all these point shapefiles into one shapefile in R. All the point data were in .gpx format initially which I then changed to shapefiles.
I am fairly new to R,so I am very confused as on how to do it and could not find codes that merged or combined more than a few shapefiles. Any suggestions would be much appreciated. Thanks!
Building on #M_Merciless ..
for long lists you can use
all_schools <- do.call(rbind, shapefile_list)
Or, alternatively, the very fast:
all_schools <- sf::st_as_sf(data.table::rbindlist(x))
library(sf)
list all the shapefiles within the folder location
file_list <- list.files("shapefile/folder/location", pattern = "*shp", full.names = TRUE)
read all shapefiles, store as a list
shapefile_list <- lapply(file_list, read_sf)
append the separate shapefiles, in my example there are 4 shapefiles, this can probably be improved by using a for loop or apply function for a longer list.
all_schools <- rbind(shapefile_list[[1]], shapefile_list[[2]], shapefile_list[[3]], shapefile_list[[4]])
Adding a solution that I think is "tidier"
library(fs)
library(tidyverse)
# Getting all file paths
shapefiles <- 'my/data/folder' |>
dir_ls(recurse = TRUE) |>
str_subset('.shp$')
# Loading all files
sfdf <- shapefiles |>
map(st_read) |>
bind_rows()
Admittedly, more lines of code but personally I think the code is much easier to read and comprehend this way.

Parse the XML data using library(xml2) [in R]

I am trying to solve Data Cleaning course in Coursera. I am encountering troubles in coding:
How to parse the XML data (using library: xml2) and use it to find the number of restaurants?
How to parse XML to data frame?
Read the XML data on Baltimore restaurants from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml
How many restaurants have zipcode 21231?
library(xml2)
x <- read_xml("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml")
y <- as.numeric(xml_path(xml_find_all(x, "//row[#zipcode='21231']]")))
y
or
library(rvest)
library(purrr)
pg <- read_html ("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml")
html_nodes(pg, "//row[#zipcode='21231']]") %>%
map(xml_attrs) %>%
map_df(~as.list(.))
I tried to code two ways but none worked. Any help will be greatly appreciated. Thanks.
looking for something like this?
length( xml_find_all( x, './/zipcode[text()="21231"]' ) )
[1] 127

Resources