Finding the nearest distance between two SpatialPointsDataframes using gDistance rgeos? - r

I have two separate but related questions.
First, I would like to determine the distance to the nearest construction site (construction_layer.csv) for every data point within the subset_original_data.csv file. I am trying to use the gDistance() function to calculate the nearest neighbor, but I am open to other ideas as well.
I want to append my subset_original_data.csv dataframe with this new vector of nearest neighbor distances from the construction_layer.csv. That is, for every row of my subset_original_data.csv dataframe, I want the minimum distance to the nearest construction site.
The second goal is to determine the nearest distance from each subset_original_data.csv row to a freeway shapefile (fwy.shp). I would also like to append this new vector back onto the subset_original.csv dataframe.
I have successfully converted the construction_layer.csv and subset_original_data.csv into SpatialPointsDataFrame. I have also converted the fwy.shp file into a SpatialLinesDataFrame by reading in the shape file with the readOGR() function. I am not sure where to go next. Your input is greatly appreciated!
~ $ spacedSparking
Here's my data:
construction_layer.csv, fwy.shp, subset_original_data.csv
Here's my code:
#requiring necessary packages:
library(rgeos)
library(sp)
library(rgdal)
#reading in the files:
mydata <- read.csv("subset_original_data.csv", header = T)
con <- read.csv("construction_layer.csv", header = T)
fwy <- readOGR(dsn = "fwy.shp")
#for those who prefer not to download any files:
data.lat <- c(45.53244, 45.53244, 45.53244, 45.53244, 45.53245, 45.53246)
data.lon <- c(-122.7034, -122.7034, -122.7034, -122.7033, -122.7033, -122.7032)
data.black.carbon <- c(187, 980, 466, 826, 637, 758)
mydata <- data.frame(data.lat, data.lon, data.black.carbon)
con.lat <- c(45.53287, 45.53293, 45.53299, 45.53259, 45.53263, 45.53263)
con.lon <- c(-122.6972, -122.6963, -122.6952, -122.6929, -122.6918, -122.6918)
con <- data.frame(con.lat, con.lon)
#I am not sure how to include the `fwy.shp` in a similar way,
#so don't worry about trying to solve that problem if you would prefer not to download the file.
#convert each file to SpatialPoints or SpatialLines Dataframes:
mydata.coords <- data.frame(lon = mydata[,2], lat = mydata[,1], data = mydata)
mydata.sp <- sp::SpatialPointsDataFrame(mydata.coords, data = data.frame(BlackCarbon = mydata[,3])) #appending a vector containing air pollution data
con.coords <- data.frame(lon = con[,2], lat = con[,1])
con.sp <- sp:SpatialPointsDataFrame(con.coords, data = con)
str(fwy) #already a SpatialLinesDataFrame
#Calculate the minimum distance (in meters) between each observation between mydata.sp and con.sp and between mydata.sp and fwy objects.
#Create a new dataframe appending these two nearest distance vectors back to the original mydata file.
#Desired output:
head(mydata.appended)
LATITUDE LONGITUDE BC6. NEAREST_CON (m) NEAREST_FWY (m)
1 45.53244 -122.7034 187 ??? ???
2 45.53244 -122.7034 980 ??? ???
3 45.53244 -122.7034 466 ??? ???
4 45.53244 -122.7033 826 ??? ???
5 45.53245 -122.7033 637 ??? ???
6 45.53246 -122.7032 758 ??? ???
EDIT:
SOLUTION:
When in doubt, ask a friend who is an R wizard! He even made a map.
library(rgeos)
library(rgdal)
library(leaflet)
library(magrittr)
#Define Projections
wgs84<-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")
utm10n<-CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=m +no_defs +towgs84=0,0,0")
#creating example black carbon data by hand:
lat <- c(45.5324, 45.5325, 45.53159, 45.5321, 45.53103, 45.53123)
lon <- c(-122.6972, -122.6963, -122.6951, -122.6919, -122.6878, -122.6908)
BlackCarbon <- c(187, 980, 466, 826, 637, 758)
bc.coords <- data.frame(lat, lon, BlackCarbon)
bc<-SpatialPointsDataFrame(data.frame(x=lon,y =lat),data=data.frame(BlackCarbon),proj4string = wgs84)
# Project into something - Decimal degrees are no fun to work with when measuring distance!
bcProj<-spTransform(bc,utm10n)
#creating example construction data layer:
con.lat <- c(45.53287, 45.53293, 45.53299, 45.53259, 45.53263, 45.53263)
con.lon <- c(-122.6972, -122.6963, -122.6952, -122.6929, -122.6918, -122.6910)
con.coords <- data.frame(con.lat, con.lon)
con<-SpatialPointsDataFrame(data.frame(x=con.lon,y =con.lat),data=data.frame(ID=1:6),proj4string = wgs84)
conProj<-spTransform(con,utm10n)
#All at once (black carbon points on top, construction on the y-axis)
dist<-gDistance(bcProj,conProj,byid=T)
min_constructionDistance<-apply(dist, 2, min)
# make a new column in the WGS84 data, set it to the distance
# The distance vector will stay in order, so just stick it on!
bc#data$Nearest_Con<-min_constructionDistance
bc#data$Near_ID<-as.vector(apply(dist, 2, function(x) which(x==min(x))))
#Map the original WGS84 data
pop1<-paste0("<b>Distance</b>: ",round(bc$Nearest_Con,2),"<br><b>Near ID</b>: ",bc$Near_ID)
pop2<-paste0("<b>ID</b>: ",con$ID)
m<-leaflet()%>%
addTiles()%>%
addCircleMarkers(data=bc,radius=8,fillColor = 'red',fillOpacity=0.8,weight=1,color='black',popup=pop1)%>%
addCircleMarkers(data=con,radius=8,fillColor = 'blue',fillOpacity=0.8,weight=1,color='black',popup=pop2)
m

You can use the a haversine distance function and use functional programming to achieve the desired result.
library(geosphere)
find_min_dist <- function(site, sites) {
min(distHaversine(site, sites))
}
#X is the data id, split into a list so you can iterate through each site point
data <- split(mydata[ , 3:2], mydata$X)
sapply(data, find_min_dist, sites = con.coords)

Related

Error in checkranin(tlim, tt, "tlim") : 'tlim[1]' must be < 'tlim[2]'

So I'm currently trying to perform a spatio-temporal kernel density function where I'm able to see kernel density distribution change over time. This was attempted using the sparr package. I'm running the following code:
smell_Cases <- subset(newdata_proj, smell == '1',
select=c(x,y, smell))
smell_controls <- subset(newdata_proj, smell == '0',
select=c(x,y, smell))
smell_ppp <- list()
smell_ppp$cases<-ppp((smell_Cases$x), smell_Cases$y, marks=as_vector(as.integer(smell_Cases$smell)),
window=as.owin(as_Spatial(boundary)))
smell_ppp$controls<-ppp((smell_controls$x), smell_controls$y,
window=as.owin(as_Spatial(boundary)))
smell_ppp_Cases <- smell_ppp$cases
hlam <- LIK.spattemp(smell_ppp_Cases)
Then get the following error:
Error in checkranin(tlim, tt, "tlim") : 'tlim[1]' must be < 'tlim[2]'
The error is saying the temporal window of the data you've supplied is invalid. As per the documentation at help(LIK.spattemp) (see the entries for the 'pp' and 'tt' arguments), if you do not supply the times of each observation (which you haven't done in the above function call) the function will attempt to use the 'marks' of your data object. Are the marks of your data object the observation times? At any rate, we need a MWE to help you fully.
#Tilman Davies
It is not very long:
smell <- read_csv('smell_report_2020_2021.csv') %>%
subset(., select= c("skewed_longitude", "skewed_latitude", "smell_category")) %>%
st_as_sf(.,coords=c("skewed_longitude", "skewed_latitude"))
st_crs(smell) = 4326
#Transform the coordinates to U.S Atlas Equal Area(crs=2163):
newdata_proj<-st_transform(smell, crs=2272)%>%
st_coordinates(.)%>%
cbind(.,smell$smell_category)%>%
as.data.frame(.)
colnames(newdata_proj)<-c("x", "y", "smell")
#Read the contiguous United States boundary shapefile and transform toU.S Atlas Equal Area:
boundary<-st_read("Data/Allegheny_County_Zip_Code_Boundaries.shp")
plot(boundary)
# boundary.sf <- st_transform(boundary, "+proj=utm +zone=19 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
# boundary
# Dissolve boundaries to just the outline of the county
plot(boundary$geom)
plot(st_union(boundary$geom))
boundary <- st_union(boundary$geom)
plot(boundary)
#the Spatial relative risk/density ratiofunction, which we are using here, takes a point pattern (ppp)object
#with dichotomous factor-valued marks, which distinguish cases and controls.
#Therefore, we need to wrangle our data into this format first:
pitts_ppp<-ppp(newdata_proj$x, newdata_proj$y, marks=as.factor(newdata_proj$smell),
window=as.owin(as_Spatial(boundary)))
smell_Cases <- subset(newdata_proj, smell == '1',
select=c(x,y, smell))
smell_controls <- subset(newdata_proj, smell == '0',
select=c(x,y, smell))
smell_ppp <- list()
smell_ppp$cases<-ppp(smell_Cases$x, smell_Cases$y, marks=as_vector(as.integer(smell_Cases$smell)),
window=as.owin(as_Spatial(boundary)))
smell_ppp$controls<-ppp(smell_controls$x, smell_controls$y,
window=as.owin(as_Spatial(boundary)))
smell_ppp_Cases <- smell_ppp$cases
hlam <- LIK.spattemp(smell_ppp_Cases)
hlam <- LIK.spattemp(fmd_case)

How to convert Sentinel-3 .nc-file into .tiff-file?

regarding the conversion of .nc-files into .tiff-files i encounter the problem of loosing geoinformation of my pixels. I know that other users experienced the same problem and tried to solve it via kotlin but failed. i would prefer a solution using R. see here for kotlin approach URL:https://gis.stackexchange.com/questions/259700/converting-sentinel-3-data-netcdf-to-geotiff
I downloaded freely available Sentinel-3 data of the ESA (URL:https://scihub.copernicus.eu/dhus/#/home). This data comes unfortunately in the .nc-format, so I want to convert it into the .tiff-format. I have already tried various approaches, but failed. What I have tried so far:
data_source <- 'D:/user_1/01_test_data/S3A_SL_1_RBT____20180708T093240_20180708T093540_20180709T141944_0179_033_150_2880_LN2_O_NT_003.SEN3/F1_BT_in.nc'
# define path to .nc-file
data_output <- 'D:/user_1/01_test_data/S3A_SL_1_RBT____20180708T093240_20180708T093540_20180709T141944_0179_033_150_2880_LN2_O_NT_003.SEN3/test.tif'
# define path of output .tiff-file
###################################################
# 1.) use gdal_translate via Windows cmd-line in R
# see here URL:https://stackoverflow.com/questions/52046282/convert-netcdf-nc-to-geotiff
system(command = paste('gdal_translate -of GTiff -sds -a_srs epsg:4326', data_source, data_output))
# hand over character string to Windows cmd-line to use gdal_translate
###################################################
# 2.) use the raster-package
# see here URL:https://www.researchgate.net/post/How_to_convert_a_NetCDF4_file_to_GeoTIFF_using_R2
epsg4326 <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# proj4-code
# URL:https://spatialreference.org/ref/epsg/wgs-84/proj4/
specific_band <- raster(data_source)
crs(specific_band) <- epsg4326
writeRaster(specific_band, filename = data_output)
# both approaches work, i can convert the files from .nc-format into .tiff-format, but i **loose the geoinformation for the pixels** and just get pixel coordinates instead of longlat-values.
I really appreciate any solutions that keep the geoinformation for the pixels!
Thanks a lot in advance, ExploreR
As #j08lue points out,
The product format for Sentinel 3 products is horrible. Yes, the data
values are stored in netCDF, but the coordinate axes are in separate
files and it is all just a bunch of files and metadata.
I did not find any documentation (I assume it must exist), but it seems you can get the data like this:
library(ncdf4)
# coordinates
nc <- nc_open("geodetic_in.nc")
lon <- ncvar_get(nc, "longitude_in")
lat <- ncvar_get(nc, "latitude_in")
# including elevation for sanity check only
elv <- ncvar_get(nc, "elevation_in")
nc_close(nc)
# the values of interest
nc <- nc_open("F1_BT_in.nc")
F1_BT <- ncvar_get(nc, "F1_BT_in")
nc_close(nc)
# combine
d <- cbind(as.vector(lon), as.vector(lat), as.vector(elv), as.vector(F1_BT_in))
Plot a sample of the locations. Note that the raster is rotated
plot(d[sample(nrow(d), 25000),1:2], cex=.1)
I would need to investigate a bit more to see how to write a rotated raster.
For now, a not recommended shortcut could be to rasterize to a non-rotated raster
e <- extent(as.vector(apply(d[,1:2],2, range))) + 1/120
r <- raster(ext=e, res=1/30)
#elev <- rasterize(d[,1:2], r, d[,3], mean)
F1_BT <- rasterize(d[,1:2], r, d[,4], mean, filename="")
plot(F1_BT)
so that´s what i have done so far - unfortunately the raster is not somehow rotated by 180degree, but somehow distorted in another way...
# (1.) first part of the code adapted to Robert Hijmans approach (see code of answer provided above)
nc_geodetic <- nc_open(paste0(wd, "/01_test_data/sentinel_3/geodetic_in.nc"))
nc_geodetic_lon <- ncvar_get(nc_geodetic, "longitude_in")
nc_geodetic_lat <- ncvar_get(nc_geodetic, "latitude_in")
nc_geodetic_elv <- ncvar_get(nc_geodetic, "elevation_in")
nc_close(nc_geodetic)
# to get the longitude, latitude and elevation information
F1_BT_in_vars <- nc_open(paste0(wd, "/01_test_data/sentinel_3/F1_BT_in.nc"))
F1_BT_in <- ncvar_get(F1_BT_in_vars, "F1_BT_in")
nc_close(F1_BT_in_vars)
# extract the band information
###############################################################################
# (2.) following part of the code is adapted to #Matthew Lundberg rotation-code see URL:https://stackoverflow.com/questions/16496210/rotate-a-matrix-in-r
rotate_fkt <- function(x) t(apply(x, 2, rev))
# create rotation-function
F1_BT_in_rot180 <- rotate_fkt(rotate_fkt(F1_BT_in))
# rotate raster by 180degree
test_F1_BT_in <- raster(F1_BT_in_rot180)
# convert matrix to raster
###############################################################################
# (3.) extract corner coordinates and transform with gdal
writeRaster(test_F1_BT_in, filename = paste0(wd, "/01_test_data/sentinel_3/test_flip.tif"), overwrite = TRUE)
# write the raster layer
data_source_flip <- '"D:/unknown_user/X_processing/01_test_data/sentinel_3/test_flip.tif"'
data_tmp_flip <- '"D:/unknown_user/X_processing/01_test_data/temp/test_flip.tif"'
data_out_flip <- '"D:/unknown_user/X_processing/01_test_data/sentinel_3/test_flip_ref.tif"'
# define input, temporary output and output for gdal-transformation
nrow_nc_mtx <- nrow(nc_geodetic_lon)
ncol_nc_mtx <- ncol(nc_geodetic_lon)
# investigate on matrix size of the image
xy_coord_char1 <- as.character(paste("1", "1", nc_geodetic_lon[1, 1], nc_geodetic_lat[1, 1]))
xy_coord_char2 <- as.character(paste(nrow_nc_mtx, "1", nc_geodetic_lon[nrow_nc_mtx, 1], nc_geodetic_lat[nrow_nc_mtx, 1]))
xy_coord_char3 <- as.character(paste(nrow_nc_mtx, ncol_nc_mtx, nc_geodetic_lon[nrow_nc_mtx, ncol_nc_mtx], nc_geodetic_lat[nrow_nc_mtx, ncol_nc_mtx]))
xy_coord_char4 <- as.character(paste("1", ncol_nc_mtx, nc_geodetic_lon[1, ncol_nc_mtx], nc_geodetic_lat[1, ncol_nc_mtx]))
# extract the corner coordinates from the image
system(command = paste('gdal_translate -of GTiff -gcp ', xy_coord_char1, ' -gcp ', xy_coord_char2, ' -gcp ', xy_coord_char3, ' -gcp ', xy_coord_char4, data_source_flip, data_tmp_flip))
system(command = paste('gdalwarp -r near -order 1 -co COMPRESS=NONE ', data_tmp_flip, data_out_flip))
# run gdal-transformation

Why am I getting NA when I try to extract values from a tif file?

I have a tif file from WORLDCLIM and I need to extract values related to temperature.
Sample code:
t_min_jan2 <-raster::brick("wc2.0_30s_tmin_01.tif")
t_min_fev <-raster::brick("wc2.0_30s_tmin_02.tif")
t_min_mar <-raster::brick("wc2.0_30s_tmin_03.tif")
t_min_abr <- raster::brick("wc2.0_30s_tmin_04.tif")
t_min_maio <- raster::brick("wc2.0_30s_tmin_05.tif")
t_min_jun <- raster::brick("wc2.0_30s_tmin_06.tif")
t_min_jul <-raster::brick("wc2.0_30s_tmin_07.tif")
t_min_ago <-raster::brick("wc2.0_30s_tmin_08.tif")
t_min_set <-raster::brick("wc2.0_30s_tmin_09.tif")
t_min_out <- raster::brick("wc2.0_30s_tmin_10.tif")
t_min_nov <-raster::brick("wc2.0_30s_tmin_11.tif")
t_min_dez <-raster::brick("wc2.0_30s_tmin_12.tif")
t <-stack(t_min_jan2,t_min_fev,t_min_mar,t_min_abr,t_min_maio,t_min_jun,t_min_jul,t_min_ago,t_min_set,t_min_out,t_min_nov,t_min_dez)`
plot(t)
newt <- c(-10, 5, 35, 45)
tmin1 <- crop(t, newt)
plot(tmin1)
With this code I get the map I want...I have a file with coordinates (local) and I need to extract temperature values from these coordinates
xy<-local[,c("Longitude" ,"Latitude")]
spdf <- SpatialPointsDataFrame(coords = xy, data = local,
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84+towgs84=0,0,0"))
value<-extract(tmin1,spdf)
value
But when I run the code I get NA instead of getting the average temperatures. Maybe I'm not writing the code correctly. Can you spot any mistakes?
A simpler way to put the data together:
library(raster)
# get all filenames
ff <- paste0(sprintf("wc2.0_30s_tmin_%02d", 1:12), ".tif")
wtmin <- stack(ff)
tmin <- crop(wtmin, c(-10, 5, 35, 45))
Start with checking if the points are on the raster (they probably are not)
xy <- local[,c("Longitude" ,"Latitude")]
plot(tmin[[1]])
points(xy)
If they are on top, this should work
value <- extract(tmin, xy)
If they are not, and you can't figure out why, show us what is returned by
tmin
extent(xy)

how to add average rasters within for-loop that creates the rasters? R

I have several directories with 700+ binary encoded rasters that i take average the output rasters per directory. however, i currently create the rasters 1 by 1 in a for loop, then load newly created rasters back into R to take the sum to obtain the monthly rainfall total.
However, since I dont need the individual rasters, only the average raster, I have a hunch that I could do this all w/in 1 loop and not save the rasters but just the output average raster, but I am coming up short in how to program this in R.
setwd("~/Desktop/CMORPH/Levant-Clip/200001")
dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
y<-matrix((data=data), ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(ry, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
}
#
raspath <- list.files ('~/Desktop/CMORPH/Levant-Clip/200001',pattern="*.tif", full.names=T, recursive=T)
rasstk <- stack(raspath)
sum200001<-sum(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
currently, this code takes about 75 mins to execute, and I have about 120 more directories to go, and am looking for faster solutions.
thank you for all and any comments and input. best, evan
Elaborating on my previous comment, you could try:
setwd("~/Desktop/CMORPH/Levant-Clip/200001")
dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
raster_list = list()
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
y<-matrix((data=data), ncol=1649, nrow=4948)
r <- raster(y)
if (i == 1) {
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
}
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(ry, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
raster_list[[i]] = M_C_Lev
}
#
rasstk <- stack(raster_list, quick = TRUE) # OR rasstk <- brick(raster_list, quick = TRUE)
avg200001<-mean(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
Using the "quick" options in stack should definitely speed-up things, in particular if you have many rasters.
Another possibility is to first compute the average, and then perform the "spatial proceesing". For example:
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
if (i == 1) {
totdata <- data
num_nonNA <- as.numeric(!is.na(data))
} else {
totdata = rowSums(cbind(totdata,data), na.rm = TRUE)
# We have to count the number of "valid" entries so that the average is correct !
num_nonNA = rowSums(cbind(num_nonNA,as.numeric(!is.na(data))),na.rm = TRUE)
}
}
avg_data = totdata/num_nonNA # Compute the average
# Now do the "spatial" processing
y<-matrix(avg_data, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(avg_data, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
This could be faster or slower, depending from "how much" you are cropping the original data.
HTH,
Lorenzo
I'm adding another answer to clarify and simplify things a bit, also in relation with comments in chat. The code below should do what you ask: that is, cycle over files, read the "data", compute the sum over all files and convert it to a raster with specified dimensions.
Note that for testing purposes here I substituted your cycle on file names with a simple 1 to 720 cycle, and file reading with the creation of arrays of the same length as yours filled with values from 1 to 4 and some NA !
totdata <- array(dim = 4948*1649) # Define Dummy array
for (i in 1:720) {
message("Working on file: ", i)
data <- array(rep(c(1,2,3,4),4948*1649/4), dim = 4948*1649) # Create a "fake" 4948*1649 array each time to simulate data reading
data[1:1000] <- -999 # Set some values to NA
data[data == -999] <- NA #convert missing data from -999
totdata <- rowSums(cbind(totdata, data), na.rm = T) # Let's sum the current array with the cumulative sum so far
}
# Now reshape to matrix and convertt to raster, etc.
y <- matrix(totdata, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,e) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
This generates a "proper" raster:
> ry
class : RasterLayer
dimensions : 1649, 4948, 8159252 (nrow, ncol, ncell)
resolution : 0.07275667, 0.1052902 (x, y)
extent : -180, 180, -90, 83.6236 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : in memory
names : layer
values : 0, 2880 (min, max)
contatining the sum of the different arrays: You can notice that max value is 720 * 4 = 2880 (Only caveat: If you have cells which are always at NA, you will get 0 instead than NA)
On my laptop, this runs in about 5 minutes !
In practice:
to avoid memory problems, I am not reading in memory all the data.
Each of your arrays is more or less 64MB, so I cannot load them all
and then do the sum (unless I have 50 GB of RAM to throw away - and even in
that case it would be slow). I instead make use of the associative
propoerty of summation by computing a "cumulative" sum at each
cycle. In this way you are only working with two 8-millions arrays at
a time: the one you read from file "i", and the one that contains
the current sum.
to avoid unnecessary computations here I am summing directly the
1-dimensional arrays I get from reading the binary. You don't need
to reshape to matrix the arrays in the cycle because you can do that
on the final "summed" array which you can then convert to matrix form
I hope this will work for you and that I am not missing something obvious !
As far as I can understand, if using this approach is still slow you are having problems elsewhere (for example in data reading: on 720 files, 3 seconds spent on reading for each file means roughly 35 minutes of processing).
HTH,
Lorenzo

How to identify lat and long for a global matrix?

I am going to use a binary files (climate variable for the globe ) that can be downloaded from here:
ftp://sidads.colorado.edu/pub/DATASETS/nsidc0301_amsre_ease_grid_tbs/global/
This file is a binary (matrix) file with 586 lines and 1383 columns (global map).
I would like to extract a value that is at 100 longitude and 50 latitude.
I can extract any point using x and y using:
X<-450 ; Y<-145
extract<-vector()
file<- readBin(conne, integer(), size=2, n=586*1383, signed=T)
file2<-t(t(matrix(data=file,ncol=1383,nrow=586)))
extract[file2]<-file2[X,Y]
More info:
These data are provided in EASE-Grid projections global cylindricalat 25 km resolution, are two-
byte
Spatial Coordinates:
N: 90° S: -90° E: 180° W: -180°
But my question is how to know to its lat and long? Any idea pleas
I would use the raster package and convert your data to raster objects. Like:
> file<- readBin("ID2r1-AMSRE-ML2010001A.v03.06H", integer(), size=2, n=586*1383, signed=T)
> m = matrix(data=file,ncol=1383,nrow=586,byrow=TRUE)
> r = raster(m, xmn=-180, xmx=180, ymn=-90, ymx=90)
> plot(r)
Now you have a properly spatially referenced object, but without a full specification of the cylindrical projection used you can't get back to lat-long coordinates.
Longitude is easy, but latitude not so - my use of -90 and +90 probably makes it right at the poles and the equator but not elsewhere. If its a right cylindrical projection then sines and cosines will work it out, but if you have a projection specification in something like PROJ.4 format then there's better ways of doing it.
There some more info here http://nsidc.org/data/ease/tools.html including a link to some grids that have the lat-long of grid cells for that grid system:
ftp://sidads.colorado.edu/pub/tools/easegrid/lowres_latlon/
so for example you can create a raster of latitude for the cells in your data grid:
> lat <- readBin("MLLATLSB",integer(), size=4, n=586*1383, endian="little")/100000
> latm = matrix(data=lat,ncol=1383,nrow=586,byrow=TRUE)
> latr = raster(latm, xmn=-180, xmx=180, ymn=-90, ymx=90)
> plot(latr)
and then latr[450,123] is the latitude of cell [450,123] in your data. Repeat with MLLONLSB for longitude.
Here's how I would go about it.
f <- "ftp://sidads.colorado.edu/pub/DATASETS/nsidc0301_amsre_ease_grid_tbs/global/2002/ID2r1-AMSRE-ML2002170A.v03.06H.gz"
download.file(f, basename(f), mode = "wb")
con <- gzfile(basename(f), open = "rb")
mdat <- readBin(con, integer(), size=2, n=586*1383, signed=TRUE)
close(con)
mdat <- matrix(data = mdat,ncol=1383,nrow=586,byrow=TRUE)
library(raster)
library(rgdal)
## build a dummy longlat raster and project its extent
## this proj string may not be enough, should be documented though
prj <- "+proj=cea +ellps=WGS84"
ex <- extent(projectExtent(raster(xmn = -180, xmx = 180, ymn = -90, ymx = 90, crs = "+proj=longlat"), prj))
r <- setValues(raster(ex, nrows = 586, ncols = 1383, crs = prj), mdat)
Now we can plot this raster in its native form, and transform other data to it:
library(maptools)
data(wrld_simpl)
plot(r)
plot(spTransform(wrld_simpl, CRS(projection(r))), add = TRUE)

Resources