convert multiple .csv files to .shp files in R - r

I should preface that I am terrible at loops in R and I recognize this question is similar to this post Batch convert .csv files to .shp files in R. However, I was not able to leave a comment to see if this user found a solution on this thread because I do not have enough reputation points and the suggested solutions did not help me.
I have multiple .csv files that contain GPS points of animals. I would like to create multiple shapefiles for spatial analysis. I have tried creating a loop to read in the .csv file, make spatial data from csv file with latitudes and longitudes, transform the spatial data frame to a UTM projection so that I can calculate distances and then write the file as a shapefile. Here is the loop I have tried, but I think my indexing in the out and utm_out is incorrect.
Here is some test data; remember to set your working directory before writing the .csv:
#write sample .csv for animal 1
ID1<-rep(1, 3)
Latitude1<-c(25.48146, 25.49211, 25.47954)
Longitude1<-c(-84.66530, -84.64892, -84.69765)
df1<-data.frame(ID1, Latitude1, Longitude1)
colnames(df1)<-c("ID", "Latitude", "Longitude")
write.csv(df1, "df1.csv", row.names = FALSE)
#write sample .csv for animal 2
ID2<-rep(2, 2)
Latitude2<-c(28.48146, 28.49211)
Longitude2<-c(-88.66530, -88.64892)
df2<-data.frame(ID2, Latitude2, Longitude2)
colnames(df2)<-c("ID", "Latitude", "Longitude")
write.csv(df2, "df2.csv", row.names = FALSE)
#create a list of file names in my working directory where .csv files are located
all.files<-list.files(pattern="\\.csv")
#set the points geographic coordinate system
points_crs <- crs("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords<-file[c("Longitude", "Latitude")] #select coordinates from the file
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
utm_out<-spTransform(out, crs("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test",
"*", driver="ESRI Shapefile")
}
This gives me the following: Error: inherits(obj, "Spatial") is not TRUE
I've also tried:
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE)
coords<-file[c("Longitude", "Latitude")]
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs)
utm_out<-spTransform(out[[i]], crs("+init=epsg:32616"))
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test", "*", driver="ESRI Shapefile")
}
This produces: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘spTransform’ for signature ‘"integer", "CRS"’
Ideally, the output file will be something like "animal1.shp" "animal2.shp"...etc.
Alternatively, I do have animal 1 and 2 in one file. I could bring in this file, set the projection and then create multiple subsets for each unique animal ID and write the subset to a .shp file, but I am having issues with subsetting the spatial data and I think that is a topic for another thread.
Thanks in advance for your assistance.

Here is a minor variation on the solution by Mako212
library(raster)
all.files <- list.files(pattern="\\.csv$")
out.files <- gsub("\\.csv$", ".shp")
crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
for(i in 1:length(all.files)) {
d <- read.csv(all.files[i], stringsAsFactors = FALSE)
sp <- SpatialPointsDataFrame(coords = d[c("Longitude", "Latitude")], d, proj4string = crs)
utm <- spTransform(sp, CRS("+proj=utm +zone=16 +datum=WGS84"))
shapefile(utm, out.files[i])
}

Expanding on my comment, it's important to test batch processing operations like this on a single file first, and then adapt your solution as necessary to process the batch. My first step in troubleshooting your issue was to strip away the for loop, and try running the code with the first file,all.files[1], and it still failed, indicating there was at least one issue not related to the loop.
Try this out. I've changed crs to CRS because the function in sp is capitalized. Your loop range can be simplified to for(i in all.files), and I removed the attempts to access non-existent lists with out and utm_out
require(sp)
require(rgdal)
points_crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in all.files) {
file <- read.csv(i, header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords <- file[c("Longitude", "Latitude")] #select coordinates from the file
out <- SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
names<-substr(i, 1, nchar(all.files)-4)
utm_out <- spTransform(out, CRS("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out,dsn="/path/Shapefile_test", layer=names, driver="ESRI Shapefile")
}
Edit:
I had to modify the writeOGR line by specifying a layer name:
writeOGR(utm_out,dsn="/path/Shapefile_test", layer="test", driver="ESRI Shapefile")

Related

Subsetting and saving netcdf in R

I am trying to subset 8 netcdf files (one of them is here) in a foor loop according to a shorter period of time and then to save them as a new netcdf file.
I saw that other people already asked on how to subset netcdf files according to different time period (here or here) but once I do it on my netcdf files the "for loop" keeps on running without finishing (not even the first netcdf file) and I can't figure out why.
Here the code I use:
library(raster)
library(netcdf4)
library(lubridate)
#setting wd=indir containing netcdf files
setwd(indir)
files=list.files(pattern="nc")
for (j in seq_along(files)){
#setting wd containing netcdf files in the loop
setwd(indir)
b<-brick(files[j])
nc<-nc_open(files[j])
#variable
varname<-names(nc[['var']][3])
varunits <- ncatt_get(nc,varname,"units")[[2]]
lon<-ncvar_get(nc,"lon")
lat<-ncvar_get(nc,"lat", verbose = F)
time<-ncvar_get(nc, "time")
tunits <- ncatt_get(nc, "time", "units")[[2]]
dlname <- ncatt_get(nc, varname,"long_name")[[2]]
nc_close(nc)
#assigning a crs
proj4string(b)<-"+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
#setting time as.Date
tm<-ymd(getZ(b))
#setting time to rasterBrick
b<-setZ(b, tm)
# subsetting
b2<-subset(b, which(tm < as.Date('2006-01-01')))
#setting wd where I want to save the "new" netcdf files
setwd(outdir)
writeRaster(b2, filename = paste0(varname, "_1971_2006_Noce.nc"),
format="CDF", varname=varname, varunit=varunits, longname=dlname,
xname="lon", yname="lat", zname="time", zunit=tunits, overwrite=TRUE)
}
Any help on how to get the loop working would be very much appreciated!

How to filter a shape file before plotting it in R

Just to try to filter a shape file to ease plotting
I have a shape file downloaded from UK gov:
http://geoportal.statistics.gov.uk/datasets/7ff28788e1e640de8150fb8f35703f6e_1/data?geometry=-76.678%2C45.365%2C69.572%2C63.013&orderBy=lad16cd&orderByAsc=false&where=%20(UPPER(lad16cd)%20like%20%27%25E0800000%25%27%20OR%20UPPER(lad16cd)%20like%20%27%25E08000010%25%27)%20
Based on this: https://www.r-bloggers.com/r-and-gis-working-with-shapefiles/
I wrote but do not know to filter :
setwd("~/Documents/filename")
getwd() # --double confirm real data directory
#install.packages("maptools")
library(maptools)
crswgs84=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
ukmap=readShapePoly("filename.shp",proj4string=crswgs84,verbose=TRUE)
class(ukmap)
str(ukmap#data)
str(ukmap#polygons[[1]])
ukmap#bbox
# <-- need to do some filterig
plot(ukmap) # as this will take too long and not want to plot whole UK
For example I just want "E06000001" to "E06000020".
(the filename is "Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK" not sure how to include it in the program coding quote)
You can consider to use the sf package to read the shapefile and plot the data. Filtering the sf object is the same as filtering a data frame.
library(sf)
# Read the sahpefile
ukmap <- st_read("Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK.shp")
# Subset the sf object
ukmap_sub <- ukmap[ukmap$lad16cd %in% c("E06000001", "E06000020"), ]
# Plot the boundary of ukmap_sub
plot(st_geometry(ukmap_sub))
If you prefer to work with on SpatialPolygonsDataFrame, we can use the readOGR function from the rgdal package. After that, we can subset the SpatialPolygonsDataFrame like a regular data frame.
library(maptools)
library(rgdal)
ukmap <- readOGR(dsn = getwd(), layer = "Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK")
ukmap_sub <- ukmap[ukmap$lad16cd %in% c("E06000001", "E06000020"), ]
plot(ukmap_sub)

Iterating over a function to combine many raster stacks into one

Been stuck on this for a while now. Looked everywhere for an answer, but I can't seem to find anything on Stack. Any help you all can give that would be very appreciated.
My main issue is that I need to import many, many netcdf4 files, create raster bricks of each, then combine many bricks to make a "master brick" per variable. To give you a clearer example, I have 40 years (netcdf = 40) of many climate variables (n = 15) that are at a daily resolution. The goal is to aggregate to monthly, but first I have to get this function that reads all years of netcdf's for one variable in and into one large stack.
What I have now reads as follows:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
rstlist = stack()
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
}
t <- for(i in 1:length(tmp_dl)) {
x <- netcdf_import(tmp_dl[i])
rstlist <- stack(rstlist, x)
}
allyears <- stack(t)
Two years of the data can be found here:
https://www.northwestknowledge.net/metdata/data/pdsi_2016.nc
https://www.northwestknowledge.net/metdata/data/pdsi_2015.nc
Any help would be most welcomed. Thank you all in advance, and if this is a duplicate post I apologize; I looked far and wide to no avail.
Your code is fine, you just need to return the loaded brick rbrck from your function, otherwise you'll get the extent.
As for loading and stacking, I'd suggest using lapply to apply the function to each datafile. This will give you a neat list with a year per item. There you could do some more processing and finally just call stack on the list to produce your "master brick".
Mind that I only did this with two files, so I'm not sure about the size of the whole thing when you do it with 40.
Here's your modified code:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
return(rbrck)
}
# apply function
allyrs <- lapply(tmp_dl,netcdf_import)
# stack to master brick
allyrs <- do.call(stack,allyrs)
HTH

R: create a loop for importing, manipulating (spatial join) of multiple files

I am a basic R user who needs your help. I have multiple data files that I want to process by creating a loop function; basically, import one or two files, process, and remove them; and repeat this process for several times. However, I am stuck with probably simple codes for many of you. Please kindly help me solve this.
Simply I can import and process data with a single file, followed by.
test <- read.table("test.txt", header = FALSE, sep='\t', stringsAsFactors = FALSE)
test <- as.data.frame(test)
## prepared for spatial joining with polygon
coordinates(test) = ~ lon + lat
proj4string(test) = CRS("+proj=longlat +datum=NAD83")
## Import gis polygon shapefile
ZIPshp <- readShapeSpatial("D:/data/gis/Zipcode.shp",
proj4string=CRS("+proj=longlat +datum=NAD83"))
## spatial join b/w point and polygon
test_zip <- over(test, ZIPshp[,"zipc"])
test_zip <- subset(test_zip, zipc != "")
write.table(test_zip, "test_zip.csv", sep = ",", na = "NA", row.names = FALSE)
However, I failed to figure out how to create a loop function to repeat this process in multiple times, especially, removing processed data frame after data processing is complete. Here are my trial but it still misses a key portion, which I really need your help. (I also thought about do.call and lapply functions but failed to come up with)
files=list.files(pattern='*.txt')
ldf <- list()
for (i in 1:length(files)) {
ldf[[i]] <- read.table(files[[i]], header=FALSE, sep='\t',
stringsAsFactors = FALSE)
coordinates(ldf[[i]]) = ~ lon + lat
proj4string(ldf[[i]]) = CRS("+proj=longlat +datum=NAD83")
}
## (missing parts are spatial join, removal of processed
## data frame, and repeating this process with new data)
Please help me! Thanks
you can use the below as a skeleton to complete your solution
options(stringsAsFactors=FALSE)
## Import gis polygon shapefile
ZIPshp <- readShapeSpatial("D:/data/gis/Zipcode.shp",
proj4string=CRS("+proj=longlat +datum=NAD83"))
##read in each file and process it
lapply(list.files(pattern='*.txt'), function(txtfile) {
test <- read.table(txtfile, header=FALSE, sep='\t')
## prepared for spatial joining with polygon
coordinates(test) <- ~lon+lat
proj4string(test) <- CRS("+proj=longlat +datum=NAD83")
## spatial join b/w point and polygon
test_zip <- over(test, ZIPshp[,"zipc"])
test_zip <- subset(test_zip, zipc!="")
## output processed file as a csv
write.csv(test_zip,
paste0(tools::file_path_sans_ext(txtfile), ".csv"),
row.names = FALSE)
})

Error when loading a .shp file after joining attributes from a .csv to a .dbf file

I am having problems opening the .shp file in R after I have joined attributes from a csv file to the dbf file. I have a lot of experience coding in R, but limited experience with GIS in R. I have experience in ArcGIS, but do not have access to the program anymore. I know how to create bubbleplots and other maps in R using the csv file and plotting points, but I would like to be able to add the attributes to the .dbf, then use the shapefile to fill in the county areas with the brewer palette. I can open the shape file fine before joining the attributes to the .dbf file (the files were obtained from the us census bureau webpage).
Here is my code below:
library(gpclib)
library(maptools)
library(RColorBrewer)
library(classInt)
library(TeachingDemos)
gci<-read.csv("C:/Users/Smackbug/marketingmapexample.csv", header=TRUE) #Has Geo_ID
#read in dbf file to append data
gci2<-gci
gci2<-na.omit(gci2) #remove any empty data points
#read in dbf file to add attributes
akdbf<-read.dbf(file.choose())#downloaded from the us census bureau
#merge to join attributes
joined<-merge(akdbf,gci2, by=c("GEO_ID"))
#Save original and new dbf
write.dbf(akdbf, "C:/Users/Smackbug/Desktop/shapefiles/gz_2010_02_060_00_500koriginal.dbf")
write.dbf(joined, "C:/Users/Smackbug/Desktop/shapefiles/gz_2010_02_060_00_500k.dbf")
and I get the error from this part of the code
**alaska<-readShapePoly(file.choose(),proj4string=CRS("+proj=longlat") )
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length**
and the rest of the code
#the rest of the code should look something like this
colors<-brewer.pal(5,"Reds")
brks<-classIntervals(alaska$medianIncome, n=5, style="fixed", fixedBreaks=c(0,25,50,100,250))
plot(brks, pal=colors)
brks<-brks$brks
plot(alaska, col=colors[findInterval(alaska$medianIncome, brks, all.inside=TRUE)], axes=F)
You are breaking the sp (shapefile) object in multiple ways. You cannot not add data to the dbf independently of the operating on the shapefile. Everything is indexed in one of the binary files (shx) comprising the shapefile. You are also breaking the internal relationship of the sp object by using merge.
The most efficient way is to use rgdal to read the shapefile, join the dbf, and finally write out (or overwrite) a new shapefile.
require(rgdal)
require(sp)
require(foreign)
# Read data
shp <- readORG(getwd(), "ShpName")
tbl <- read.dbf("infile.dbf")
# Merge data using match
shp#data = data.frame(shp#data, tbl[match(shp#data[,"GEO_ID"], tbl[,"GEO_ID"]),])
# Write new shapefile with added attributes, THe additional flags will overwrite if
# the name is the same as the original
writeOGR(shp, getwd(), "NewShp", driver="ESRI Shapefile", check_exists=TRUE, overwrite_layer=TRUE)
If you need a more formal merge function you can use this.
##########################################################################
# PROGRAM: merge.sp.df
#
# USE: JOINS A dataframe OBJECT TO A sp CLASS SpatialDataFrame OBJECT
# KEEPING INTEGRITY OF DATA
#
# REQUIRES: sp CLASS SpatialDataFrame OBJECT
# PACKAGES: sp
#
# ARGUMENTS:
# x sp SpatialDataFrame OBJECT
# y dataframe OBJECT TO MERGE
# xcol MERGE COLUMN NAME IN sp OBJECT
# ycol MERGE COLUMN NAME IN dataframe OBJECT
#
# EXAMPLE:
# # Not Run (dat.sp is sp object and dat is a data.frame to merge
# dat.sp <- merge.sp.df(dat.sp, dat, "dat.sp-ID", "dat-ID")
#
# VALUE:
# A NEW SpatialDataFrame OBJECT WITH MERGED DATA
##########################################################################
merge.sp.df <- function(x, y, xcol, ycol) {
x#data$sort <- 1:nrow(as(x#data, "data.frame"))
xdf <- as( x#data, "data.frame")
xdf <- merge(xdf, y, by.x=xcol, by.y=ycol)
xdf <- xdf[order(xdf$sort), ]
row.names(xdf) <- xdf$sort
xdf <- xdf[,- which(names(xdf) == "sort")]
x#data <- xdf
return(x)
}
Here's a working result (thanks Jeffery for your help) using some of Jeffery's code above:
library(sp)
library(rgdal)
library(foreign)
setwd("C:/Users/rhonda/Documents/R scripts/shapefiles")
gc<-read.csv("gcmarketingmapexample.csv", header=TRUE)
#read in dbf file to append data
akdbf<-read.dbf("gz_2010_02_060_00_500k.dbf")
#merge to join attributes
joined<-merge(akdbf,gc, by=c("GEO_ID")`enter code here`
#save new dbf
write.dbf(joined, "akdbf")
#Shape File and DBF file
frame<-readOGR(getwd(),"gz_2010_02_060_00_500k")
akdbf2<-read.dbf("akdbf.dbf")
frame#data=akdbf2[match(frame#data[,"GEO_ID"], akdbf2[,"GEO_ID"]),]
writeOGR(frame,getwd() , "akdbf",driver="ESRI Shapefile", check_exists=TRUE, overwrite_layer=TRUE)
#use new shapefiles to create maps

Resources