Extracting HDF subdataset to raster or data.frame in R

Extracting HDF subdataset to raster or data.frame in R - r

Currently I'm trying to work with CALIPSO LiDAR data. The files are distributed in HDF4 (checked it with HDFView). The subdataset I'm interested is the Cloud Top Height, which is stored in a simple table (56160 rows, 5 columns):
So far, I tried two things:
A) Converting HDF subdataset to GeoTiff using gdal_translate (gdalUtils package)
# Get a list of sds names
sds <- get_subdatasets('C:\\Users\\Thomas\\Desktop\\NASA Sat Data\\Cloud Layer Data\\CAL_LID_L2_333mCLay-ValStage1-V3-01.2011-08-16T03-02-01ZN.hdf')
# Isolate name of subdataset and convert
name <- sds[25]
filename <- 'Cloud_Top_Height_Test.tif'
gdal_translate(name, dst_dataset = filename)
# Load the Geotiff created into R
r <- raster(filename)
# fix extent, projection and negative values
e <- extent(-80.016243, 64.831413, -179.183868, -1.711731)
r <- setExtent(r, e)
values(r)[values(r) < 0] = 0 # replace -9999 with NA
crs(r) <- '+proj=utm +zone=28 +ellps=WGS84 +datum=WGS84 +units=m +no_defs'
The result was a raster file which was obviously missing any spatial information (dimensions, extend, crs, resolution), that's why I fixed some things afterwards.
This is the result: .
As you can see, the result are some strange lines where it should be square pixels. I suppose I'm missing some statements in the gdal_translate call maybe? Also, all the adjustments afterwards seem a bit stuporous or let's say 'unprofessional'.
B) Converting the table (subdataset) into a data frame (to convert it into a raster in an second step)
A few days ago I came up with the idea to convert the subdataset (which is essentially 'only' a table) to a data frame and convert it in a second step into a raster, where I specify all spatial information 'from scratch' rather than adjusting it afterwards as in my first attempt. Still, I found no commands/packages to do so and I'm still not confident that this is the right way.
What I need in the end: A proper (georeferenced etc) raster (56160 rows, 5 columns) holding the values of the subdataset from the HDF4 file.
My question: How can I extract a HDF4 subdataset and convert it to a raster having the specifications mentioned above?
Any help is greatly appreciated (and since this is my first post at SO: sorry, if this question is missing any forms required in this forum, criticism is also appreciated)

to convert to dataframe:
as.data.frame(r, xy = TRUE)

Related

How to streamline and speed up loop with getData package in R

I am trying to download high-resolution climate data for a bunch of lat/long coordinates, and combine them into a single dataframe. I've come up with a solution (below), but it will take forever with the large list of coordinates I have. I asked a related question on the GIS StackExchange to see if anyone knew of a better approach for downloading and merging the data, but I'm wondering if I could somehow just speed up the operation of the loop? Does anyone have any suggestions on how I might do that? Here is a reproducible example:
# Download and merge 0.5 minute MAT/MAP data from WorldClim for a list of lon/lat coordinates
# This is based on https://emilypiche.github.io/BIO381/raster.html
# Make a dataframe with coordinates
coords <- data.frame(Lon = c(-83.63, 149.12), Lat=c(10.39,-35.31))
# Load package
library(raster)
# Make an empty dataframe for dumping data into
coords3 <- data.frame(Lon=integer(), Lat=integer(), MAT_10=integer(), MAP_MM=integer())
# Get WorldClim data for all the coordinates, and dump into coords 3
for(i in seq_along(coords$Lon)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coords[i,1], lat=coords[i,2]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coords[i,1:2]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coords2 <- cbind.data.frame(coords[i,1:2],values)
coords3 <- rbind(coords3, coords2)
}
# Convert MAT*10 from WorldClim into MAT in Celcius
coords3$MAT_C <- coords3$MAT_10/10
Edit: Thanks to advice from Dave2e, I've first made a list, then put intermediate results in the list, and rbind it at the end. I haven't timed this yet to see how much faster it is than my original solution. If anyone has further suggestions on how to improve the speed, I'm all ears! Here is the new version:
coordsList <- list()
for(i in seq_along(coordinates$lon_stm)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coordinates[i,7], lat=coordinates[i,6]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coordinates[i,7:6]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coordsList[[i]] <- cbind.data.frame(coordinates[i,7:6],values)
}
coords_new <- bind_rows(coordsList)
Edit2: I used system.time() to time the execution of both of the above approaches. When I did the timing, I had already downloaded all of the data, so the download time isn't included in my time estimates. My first approach took 45.01 minutes, and the revised approach took 44.15 minutes, so I'm not really seeing a substantial time savings by doing it the latter way. Still open to advice on how to revise the code so I can improve the speed of the operations!

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.

Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

Why does cropping a raster stack changes the names of layers?

I'm processing yearly multilayer netCDF files with daily precipitation data from CHIRPS. I have the files for the whole world, each file about 1.2gb large. I need to calculate indices from the precipitation data for each cell in the raster for a specific region. In order to do that I'm trying to crop the files to get a rectangular shape above my area of interest using the raster R package.
This is the code I'm using, exemplary for the first file.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(79, 89, 25, 31), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# Set directory with original files
setwd("~/data")
# Read file
chirps81 <- stack("chirps-v2.0.1981.days_p05.nc")
chirps81crop <-crop(chirps1981, crop_extent)
# Write cropped file back to different folder
setwd("~/croppeddata")
writeRaster(chirps81crop, "chirps81crop.nc", overwrite=TRUE)
For some reason however while writing the file the layers lose their name. In the original files and after cropping the names have layer names of the format "X1981.01.01". But after writing and reading the netCDF file with new file <- stack("chirps81crop.nc") the layer names are changed to the format 'X1' up to 'X365'. I think it should be fine working with it, assuming that the order of the layers didn't get mixed up but I don't understand what is happening to the layer names and if this happens because there is something wrong with the code.

It's the writeRaster() function that is losing the layer names, not the crop operation. It is possible to use lower level ncdf functions to assign a numeric value (not a string unfortunately) to each layer which will then show up in the name of the layers after reading. Taking inspiration from the example here, I created some code that shows this.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(5.74, 5.75, 50.96, 50.97), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# make a sample file
r <- raster(system.file("external/test.grd", package="raster"))
r.latlon <- projectRaster(r, crs = proj4string(crop_extent))
writeRaster(x=r.latlon, filename = 'test.nc', format = 'CDF', overwrite=TRUE)
# read the sample as a 2 layer stack and crop it
test <- stack('test.nc', 'test.nc')
writeRaster(test, 'teststack.nc', overwrite=TRUE, format='CDF')
testcrop <- crop(test, crop_extent)
names(testcrop)
# [1] "test.1" "test.2"
# write the cropped file and make the zname equal to Layer
writeRaster(testcrop, 'testcrop.nc', overwrite=TRUE, format='CDF', zname='Layer')
# open the cdf file directly
nc <- nc_open('testcrop.nc', write = T)
# give the layers numbers starting from 10 so
# we can see them easily
layers = 1:nlayers(testcrop) + 10
layers
# [1] 11 12
ncvar_put(nc, 'Layer', layers)
nc_close(nc)
newtestcrop <- stack('testcrop.nc')
names(newtestcrop)
# [1] "X11" "X12"
nc <- nc_open('testcrop.nc', write = F)
layers = ncvar_get(nc, 'Layer')
layers
# [1] 11 12
nc_close(nc)
So it is possible to get names with numbers under your control when writing the raster, but I don't know enough about your environment to determine if this will help since it might be tricky to map the names you need to a single unambiguous number.

I hope you don't mind me offering a non-R solution, but this task is much easier from the command line using CDO:
cdo sellonlatbox,79,89,25,31 chirps-v2.0.1981.days_p05.nc cropped_file.nc
Which indices did you want to calculate? I suspect it is possible to calculate those quickly and easily with CDO functions too...

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.

I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.

I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

How to convert .shp file into .csv in R?

I want to to convert two .shp files into one database that would allow me to draw the maps together.
Also, is there a way to convert .shp files into .csv files? I want to be able to personalize and add some data which is easier for me under a .csv format. What I have in mind if to add overlay yield data and precipitation data on the maps.
Here are the shapefiles for Morocco, and Western Sahara.
Code to plot the two files:
# This is code for mapping of CGE_Morocco results
# Loading administrative coordinates for Morocco maps
library(sp)
library(maptools)
library(mapdata)
# Loading shape files
Mor <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
Sah <- readShapeSpatial("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
# Ploting the maps (raw)
png("Morocco.png")
Morocco <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/MAR_adm1.shp")
plot(Morocco)
dev.off()
png("WesternSahara.png")
WesternSahara <- readShapePoly("F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/ESH_adm1.shp")
plot(WesternSahara)
dev.off()
After looking into suggestions from #AriBFriedman and #PaulHiemstra and subsequently figuring out how to merge .shp files, I have managed to produce the following map using the following code and data (For .shp data, cf. links above)
code:
# Merging Mor and Sah .shp files into one .shp file
MoroccoData <- rbind(Mor#data,Sah#data) # First, 'stack' the attribute list rows using rbind()
MoroccoPolys <- c(Mor#polygons,Sah#polygons) # Next, combine the two polygon lists into a single list using c()
summary(MoroccoData)
summary(MoroccoPolys)
offset <- length(MoroccoPolys) # Next, generate a new polygon ID for the new SpatialPolygonDataFrame object
browser()
for (i in 1: offset)
{
sNew = as.character(i)
MoroccoPolys[[i]]#ID = sNew
}
ID <- c(as.character(1:length(MoroccoPolys))) # Create an identical ID field and append it to the merged Data component
MoroccoDataWithID <- cbind(ID,MoroccoData)
MoroccoPolysSP <- SpatialPolygons(MoroccoPolys,proj4string=CRS(proj4string(Sah))) # Promote the merged list to a SpatialPolygons data object
Morocco <- SpatialPolygonsDataFrame(MoroccoPolysSP,data = MoroccoDataWithID,match.ID = FALSE) # Combine the merged Data and Polygon components into a new SpatialPolygonsDataFrame.
Morocco#data$id <- rownames(Morocco#data)
Morocco.fort <- fortify(Morocco, region='id')
Morocco.fort <- Morocco.fort[order(Morocco.fort$order), ]
MoroccoMap <- ggplot(data=Morocco.fort, aes(long, lat, group=group)) +
geom_polygon(colour='black',fill='white') +
theme_bw()
Results:
New Question:
1- How to eliminate the boundaries data that cuts though the map in half?
2- How to combine different regions within a .shp file?
Thanks you all.
P.S: the community in stackoverflow.com is wonderful and very helpful, and especially toward beginners like :) Just thought of emphasizing it.

Once you have loaded your shapefiles into Spatial{Lines/Polygons}DataFrames (classes from the sp-package), you can use the fortify generic function to transform them to flat data.frame format. The specific functions for the fortify generic are included in the ggplot2 package, so you'll need to load that first. A code example:
library(ggplot2)
polygon_dataframe = fortify(polygon_spdf)
where polygon_spdf is a SpatialPolygonsDataFrame. A similar approach works for SpatialLinesDataFrame's.
The difference between my solution and that of #AriBFriedman is that mine includes the x and y coordinates of the polygons/lines, in addition to the data associated to those polgons/lines. I really like visualising my spatial data with the ggplot2 package.
Once you have your data in a normal data.frame you can simply use write.csv to generate a csv file on disk.

I think you mean you want the associated data.frame from each?
If so, it can be accessed with the # slot access function. The slot is called data:
write.csv( WesternSahara#data, file="/home/wherever/myWesternSahara.csv")
Then when you read it back in with read.csv, you can try assigning:
myEdits <- read.csv("/home/wherever/myWesternSahara_modified.csv")
WesternSahara#data <- myEdits
You may need to do some massaging of row names and so forth to get it to accept the new data.frame as valid. I'd probably try to merge the existing data.frame with a csv you read in in R, rather than making edits destructively....

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extracting HDF subdataset to raster or data.frame in R - r

to convert to dataframe: as.data.frame(r, xy = TRUE)

Related

How to streamline and speed up loop with getData package in R

Normalize RasterLayer as Matrix to use as Clip Frame

Why does cropping a raster stack changes the names of layers?

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

How to convert .shp file into .csv in R?

Categories

Resources