How to identify lat and long for a global matrix? - r

I am going to use a binary files (climate variable for the globe ) that can be downloaded from here:
ftp://sidads.colorado.edu/pub/DATASETS/nsidc0301_amsre_ease_grid_tbs/global/
This file is a binary (matrix) file with 586 lines and 1383 columns (global map).
I would like to extract a value that is at 100 longitude and 50 latitude.
I can extract any point using x and y using:
X<-450 ; Y<-145
extract<-vector()
file<- readBin(conne, integer(), size=2, n=586*1383, signed=T)
file2<-t(t(matrix(data=file,ncol=1383,nrow=586)))
extract[file2]<-file2[X,Y]
More info:
These data are provided in EASE-Grid projections global cylindricalat 25 km resolution, are two-
byte
Spatial Coordinates:
N: 90° S: -90° E: 180° W: -180°
But my question is how to know to its lat and long? Any idea pleas

I would use the raster package and convert your data to raster objects. Like:
> file<- readBin("ID2r1-AMSRE-ML2010001A.v03.06H", integer(), size=2, n=586*1383, signed=T)
> m = matrix(data=file,ncol=1383,nrow=586,byrow=TRUE)
> r = raster(m, xmn=-180, xmx=180, ymn=-90, ymx=90)
> plot(r)
Now you have a properly spatially referenced object, but without a full specification of the cylindrical projection used you can't get back to lat-long coordinates.
Longitude is easy, but latitude not so - my use of -90 and +90 probably makes it right at the poles and the equator but not elsewhere. If its a right cylindrical projection then sines and cosines will work it out, but if you have a projection specification in something like PROJ.4 format then there's better ways of doing it.
There some more info here http://nsidc.org/data/ease/tools.html including a link to some grids that have the lat-long of grid cells for that grid system:
ftp://sidads.colorado.edu/pub/tools/easegrid/lowres_latlon/
so for example you can create a raster of latitude for the cells in your data grid:
> lat <- readBin("MLLATLSB",integer(), size=4, n=586*1383, endian="little")/100000
> latm = matrix(data=lat,ncol=1383,nrow=586,byrow=TRUE)
> latr = raster(latm, xmn=-180, xmx=180, ymn=-90, ymx=90)
> plot(latr)
and then latr[450,123] is the latitude of cell [450,123] in your data. Repeat with MLLONLSB for longitude.

Here's how I would go about it.
f <- "ftp://sidads.colorado.edu/pub/DATASETS/nsidc0301_amsre_ease_grid_tbs/global/2002/ID2r1-AMSRE-ML2002170A.v03.06H.gz"
download.file(f, basename(f), mode = "wb")
con <- gzfile(basename(f), open = "rb")
mdat <- readBin(con, integer(), size=2, n=586*1383, signed=TRUE)
close(con)
mdat <- matrix(data = mdat,ncol=1383,nrow=586,byrow=TRUE)
library(raster)
library(rgdal)
## build a dummy longlat raster and project its extent
## this proj string may not be enough, should be documented though
prj <- "+proj=cea +ellps=WGS84"
ex <- extent(projectExtent(raster(xmn = -180, xmx = 180, ymn = -90, ymx = 90, crs = "+proj=longlat"), prj))
r <- setValues(raster(ex, nrows = 586, ncols = 1383, crs = prj), mdat)
Now we can plot this raster in its native form, and transform other data to it:
library(maptools)
data(wrld_simpl)
plot(r)
plot(spTransform(wrld_simpl, CRS(projection(r))), add = TRUE)

Related

Unable to project simple features or change projection

I am trying to convert a csv to an sf spatial data file, however I'm getting errors that I cant' figure out.
Example:
library(tidyverse)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
point_df <- tibble::tribble(
~city_name, ~longitude, ~latitude,
"Akron", -81.5190053, 41.0814447,
"Albany", -73.7562317, 42.6525793,
"Schenectady", -73.9395687, 42.8142432,
"Albuquerque", -106.650422, 35.0843859,
"Allentown", -75.4714098, 40.6022939,
"Bethlehem", -75.3704579, 40.6259316,
"Atlanta", -84.3879824, 33.7489954,
"Augusta", -82.0105148, 33.4734978,
"Austin", -97.7430608, 30.267153,
"Bakersfield", -119.0187125, 35.3732921
)
point_sf <- st_as_sf(point_df, coords = c("longitude", "latitude"))
point_sf <- st_set_crs(point_sf, 4326)
st_transform(point_sf, 102003)
#> Warning in CPL_crs_from_input(x): GDAL Error 1: PROJ: proj_create_from_database:
#> crs not found
#> Error in CPL_transform(x, crs, aoi, pipeline, reverse): crs not found: is it missing?
Any help would be greatly appreciated.
EDIT
I found a kludgy solution which I adapted from this github page, but I am stil looking for a more systematic solution if possible. https://github.com/r-spatial/sf/issues/1419
The solution here is to convert the sf object into sp then change back to sf.
reProject <- function (sf, proj_in = "+init=epsg:4326",
proj_out = "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs") {
require(sp)
data_sp <- as(sf, "Spatial")
proj4string(data_sp) <- CRS(proj_in)
sf_out <- st_as_sf(spTransform(data_sp, CRS(proj_out)))
}
dat_out <- reProject(point_sf)
It appears something was expected to happen with the following line of code. But that something is not happening.
point_sf <- st_as_sf(point_df, coords = c("longitude", "latitude"))
While this line of code creates the simple feature geometric point objects, this code does not create the simple feature geometry column (sfc) object. And since there is no sfc object, the next line of code does not work.
point_sf <- st_set_crs(point_sf, 4326)
In this other line of code, the function, st_set_crs(), retrieves a coordinate reference system from sf or sfc objects. But neither the sf or the sfc objects currently exist.
Therefore, the sfc object must be first created before using the function: st_set_crs().
It really helps to follow the following steps whenever doing these types of simple feature projects.
x.sfg <- st_multipoint(c(lon,lat), dim = "XY") # create sf geometry from lon/lat
x.sfc <- st_sfc(x.sfg, crs = 4326) # create sfc from geometry
x.sf <- st_sf(df, x.sfc) # create sf object from sfc
First convert the log and lat to vectors, then create the matrix, and then create the simple feature objects in the correct progression.
lon <- c(-81.5190053, -73.7562317, -73.9395687, -106.650422, -75.4714098, -75.3704579, -84.3879824, -82.0105148, -97.7430608, -119.0187125)
lat <- c(41.0814447, 42.6525793, 42.8142432, 35.0843859, 40.6022939, 40.6259316, 33.7489954, 33.4734978, 30.267153, 35.3732921)
m <- matrix(data = c(lon, lat), nrow = 10, ncol = 2, byrow = FALSE)
m.sfg <- st_multipoint(m, dim = "XY")
m.sfc <- st_sfc(m.sfg, crs = 4326)
m.sf <- st_sf(df, m.sfc)
head(m.sf, 3)
Then create a base plot of the continental US, and then plot the simple feature object onto the base map.
plot(US_48, axes = TRUE)
plot(m.sf, add= TRUE, pch = 19, col = "red")
The link shown above with the question does not seem to have anything related to this question. The answer shown here does not convert the sf object into sp then change back to sf.
The plot is shown at link:

How to plot global rasters with tmap in Robinson projection without duplicated areas?

I've been plotting some global rasters lately using mainly raster and tmap. I'd like to plot the maps in Robinson projection instead of lat-lon. Simple projection to Robinson however duplicates some areas on the edges of the map as you can see from the figures below (Alaska, Siberia, NZ).
Previously, I found a workaround with PROJ.4 code parameter "+over" as outlined in here and here.
With the latest changes to rgdal using GDAL > 3 and PROJ >= 6, this workaround seems to be obsolete. Has anyone found a new way on how to plot global rasters in Robinson/Eckert IV/Mollweide without duplicated areas?
I'm running R 4.0.1, tmap 3.1, stars 0.4-3, raster 3.3-7, rgdal 1.5-12, sp 1.4-2, GDAL 3.1.1 and PROJ 6.3.1 on a macOS Catalina 10.15.4
require(stars)
require(raster)
require(tmap)
require(dplyr)
# data
worldclim_prec = getData(name = "worldclim", var = "prec", res = 10)
jan_prec <- worldclim_prec$prec1
# to Robinson and plot - projection outputs a warning
jp_rob <- jan_prec %>%
projectRaster(crs = "+proj=robin +over")
tm_shape(jp_rob) + tm_raster(style = "fisher")
Warning messages:
1: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
Discarded ellps WGS 84 in CRS definition: +proj=robin +over
2: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
Discarded datum WGS_1984 in CRS definition
I tried to do the same with stars instead of raster but no resolution was found, supposedly since tmap uses stars since version 3.0.
# new grid for warping stars objects
newgrid <- st_as_stars(jan_prec) %>%
st_transform("+proj=robin +over") %>%
st_bbox() %>%
st_as_stars()
# to stars object - projection outputs no warning
jp_rob_stars <- st_as_stars(jan_prec) %>%
st_warp(newgrid)
tm_shape(jp_rob_stars) + tm_raster(style = "fisher")
Thanks for any insights - hoping someone else is thinking about this issue!
With raster you can do
library(raster)
prec <- getData(name = "worldclim", var = "prec", res = 10)[[1]]
crs <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m"
rrob <- projectRaster(prec, crs=crs)
Create a mask
library(geosphere)
e <- as(extent(prec), "SpatialPolygons")
crs(e) <- crs(prec)
e <- makePoly(e) # add additional vertices
re <- spTransform(e, crs)
And use it
mrob <- mask(rrob, re)
The new package terra has a mask argument for that (you need version >= 0.8.3 for this, available from github)
prec <- getData(name = "worldclim", var = "prec", res = 10)[[1]]
jp <- rast(prec$prec1)
jp <- jp * 1 # to deal with NAs in this datasaet
rob <- project(jp, crs, mask=TRUE)

Why am I getting NA when I try to extract values from a tif file?

I have a tif file from WORLDCLIM and I need to extract values related to temperature.
Sample code:
t_min_jan2 <-raster::brick("wc2.0_30s_tmin_01.tif")
t_min_fev <-raster::brick("wc2.0_30s_tmin_02.tif")
t_min_mar <-raster::brick("wc2.0_30s_tmin_03.tif")
t_min_abr <- raster::brick("wc2.0_30s_tmin_04.tif")
t_min_maio <- raster::brick("wc2.0_30s_tmin_05.tif")
t_min_jun <- raster::brick("wc2.0_30s_tmin_06.tif")
t_min_jul <-raster::brick("wc2.0_30s_tmin_07.tif")
t_min_ago <-raster::brick("wc2.0_30s_tmin_08.tif")
t_min_set <-raster::brick("wc2.0_30s_tmin_09.tif")
t_min_out <- raster::brick("wc2.0_30s_tmin_10.tif")
t_min_nov <-raster::brick("wc2.0_30s_tmin_11.tif")
t_min_dez <-raster::brick("wc2.0_30s_tmin_12.tif")
t <-stack(t_min_jan2,t_min_fev,t_min_mar,t_min_abr,t_min_maio,t_min_jun,t_min_jul,t_min_ago,t_min_set,t_min_out,t_min_nov,t_min_dez)`
plot(t)
newt <- c(-10, 5, 35, 45)
tmin1 <- crop(t, newt)
plot(tmin1)
With this code I get the map I want...I have a file with coordinates (local) and I need to extract temperature values from these coordinates
xy<-local[,c("Longitude" ,"Latitude")]
spdf <- SpatialPointsDataFrame(coords = xy, data = local,
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84+towgs84=0,0,0"))
value<-extract(tmin1,spdf)
value
But when I run the code I get NA instead of getting the average temperatures. Maybe I'm not writing the code correctly. Can you spot any mistakes?
A simpler way to put the data together:
library(raster)
# get all filenames
ff <- paste0(sprintf("wc2.0_30s_tmin_%02d", 1:12), ".tif")
wtmin <- stack(ff)
tmin <- crop(wtmin, c(-10, 5, 35, 45))
Start with checking if the points are on the raster (they probably are not)
xy <- local[,c("Longitude" ,"Latitude")]
plot(tmin[[1]])
points(xy)
If they are on top, this should work
value <- extract(tmin, xy)
If they are not, and you can't figure out why, show us what is returned by
tmin
extent(xy)

Finding the nearest distance between two SpatialPointsDataframes using gDistance rgeos?

I have two separate but related questions.
First, I would like to determine the distance to the nearest construction site (construction_layer.csv) for every data point within the subset_original_data.csv file. I am trying to use the gDistance() function to calculate the nearest neighbor, but I am open to other ideas as well.
I want to append my subset_original_data.csv dataframe with this new vector of nearest neighbor distances from the construction_layer.csv. That is, for every row of my subset_original_data.csv dataframe, I want the minimum distance to the nearest construction site.
The second goal is to determine the nearest distance from each subset_original_data.csv row to a freeway shapefile (fwy.shp). I would also like to append this new vector back onto the subset_original.csv dataframe.
I have successfully converted the construction_layer.csv and subset_original_data.csv into SpatialPointsDataFrame. I have also converted the fwy.shp file into a SpatialLinesDataFrame by reading in the shape file with the readOGR() function. I am not sure where to go next. Your input is greatly appreciated!
~ $ spacedSparking
Here's my data:
construction_layer.csv, fwy.shp, subset_original_data.csv
Here's my code:
#requiring necessary packages:
library(rgeos)
library(sp)
library(rgdal)
#reading in the files:
mydata <- read.csv("subset_original_data.csv", header = T)
con <- read.csv("construction_layer.csv", header = T)
fwy <- readOGR(dsn = "fwy.shp")
#for those who prefer not to download any files:
data.lat <- c(45.53244, 45.53244, 45.53244, 45.53244, 45.53245, 45.53246)
data.lon <- c(-122.7034, -122.7034, -122.7034, -122.7033, -122.7033, -122.7032)
data.black.carbon <- c(187, 980, 466, 826, 637, 758)
mydata <- data.frame(data.lat, data.lon, data.black.carbon)
con.lat <- c(45.53287, 45.53293, 45.53299, 45.53259, 45.53263, 45.53263)
con.lon <- c(-122.6972, -122.6963, -122.6952, -122.6929, -122.6918, -122.6918)
con <- data.frame(con.lat, con.lon)
#I am not sure how to include the `fwy.shp` in a similar way,
#so don't worry about trying to solve that problem if you would prefer not to download the file.
#convert each file to SpatialPoints or SpatialLines Dataframes:
mydata.coords <- data.frame(lon = mydata[,2], lat = mydata[,1], data = mydata)
mydata.sp <- sp::SpatialPointsDataFrame(mydata.coords, data = data.frame(BlackCarbon = mydata[,3])) #appending a vector containing air pollution data
con.coords <- data.frame(lon = con[,2], lat = con[,1])
con.sp <- sp:SpatialPointsDataFrame(con.coords, data = con)
str(fwy) #already a SpatialLinesDataFrame
#Calculate the minimum distance (in meters) between each observation between mydata.sp and con.sp and between mydata.sp and fwy objects.
#Create a new dataframe appending these two nearest distance vectors back to the original mydata file.
#Desired output:
head(mydata.appended)
LATITUDE LONGITUDE BC6. NEAREST_CON (m) NEAREST_FWY (m)
1 45.53244 -122.7034 187 ??? ???
2 45.53244 -122.7034 980 ??? ???
3 45.53244 -122.7034 466 ??? ???
4 45.53244 -122.7033 826 ??? ???
5 45.53245 -122.7033 637 ??? ???
6 45.53246 -122.7032 758 ??? ???
EDIT:
SOLUTION:
When in doubt, ask a friend who is an R wizard! He even made a map.
library(rgeos)
library(rgdal)
library(leaflet)
library(magrittr)
#Define Projections
wgs84<-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")
utm10n<-CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=m +no_defs +towgs84=0,0,0")
#creating example black carbon data by hand:
lat <- c(45.5324, 45.5325, 45.53159, 45.5321, 45.53103, 45.53123)
lon <- c(-122.6972, -122.6963, -122.6951, -122.6919, -122.6878, -122.6908)
BlackCarbon <- c(187, 980, 466, 826, 637, 758)
bc.coords <- data.frame(lat, lon, BlackCarbon)
bc<-SpatialPointsDataFrame(data.frame(x=lon,y =lat),data=data.frame(BlackCarbon),proj4string = wgs84)
# Project into something - Decimal degrees are no fun to work with when measuring distance!
bcProj<-spTransform(bc,utm10n)
#creating example construction data layer:
con.lat <- c(45.53287, 45.53293, 45.53299, 45.53259, 45.53263, 45.53263)
con.lon <- c(-122.6972, -122.6963, -122.6952, -122.6929, -122.6918, -122.6910)
con.coords <- data.frame(con.lat, con.lon)
con<-SpatialPointsDataFrame(data.frame(x=con.lon,y =con.lat),data=data.frame(ID=1:6),proj4string = wgs84)
conProj<-spTransform(con,utm10n)
#All at once (black carbon points on top, construction on the y-axis)
dist<-gDistance(bcProj,conProj,byid=T)
min_constructionDistance<-apply(dist, 2, min)
# make a new column in the WGS84 data, set it to the distance
# The distance vector will stay in order, so just stick it on!
bc#data$Nearest_Con<-min_constructionDistance
bc#data$Near_ID<-as.vector(apply(dist, 2, function(x) which(x==min(x))))
#Map the original WGS84 data
pop1<-paste0("<b>Distance</b>: ",round(bc$Nearest_Con,2),"<br><b>Near ID</b>: ",bc$Near_ID)
pop2<-paste0("<b>ID</b>: ",con$ID)
m<-leaflet()%>%
addTiles()%>%
addCircleMarkers(data=bc,radius=8,fillColor = 'red',fillOpacity=0.8,weight=1,color='black',popup=pop1)%>%
addCircleMarkers(data=con,radius=8,fillColor = 'blue',fillOpacity=0.8,weight=1,color='black',popup=pop2)
m
You can use the a haversine distance function and use functional programming to achieve the desired result.
library(geosphere)
find_min_dist <- function(site, sites) {
min(distHaversine(site, sites))
}
#X is the data id, split into a list so you can iterate through each site point
data <- split(mydata[ , 3:2], mydata$X)
sapply(data, find_min_dist, sites = con.coords)

how to add average rasters within for-loop that creates the rasters? R

I have several directories with 700+ binary encoded rasters that i take average the output rasters per directory. however, i currently create the rasters 1 by 1 in a for loop, then load newly created rasters back into R to take the sum to obtain the monthly rainfall total.
However, since I dont need the individual rasters, only the average raster, I have a hunch that I could do this all w/in 1 loop and not save the rasters but just the output average raster, but I am coming up short in how to program this in R.
setwd("~/Desktop/CMORPH/Levant-Clip/200001")
dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
y<-matrix((data=data), ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(ry, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
}
#
raspath <- list.files ('~/Desktop/CMORPH/Levant-Clip/200001',pattern="*.tif", full.names=T, recursive=T)
rasstk <- stack(raspath)
sum200001<-sum(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
currently, this code takes about 75 mins to execute, and I have about 120 more directories to go, and am looking for faster solutions.
thank you for all and any comments and input. best, evan
Elaborating on my previous comment, you could try:
setwd("~/Desktop/CMORPH/Levant-Clip/200001")
dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
raster_list = list()
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
y<-matrix((data=data), ncol=1649, nrow=4948)
r <- raster(y)
if (i == 1) {
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
}
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(ry, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
raster_list[[i]] = M_C_Lev
}
#
rasstk <- stack(raster_list, quick = TRUE) # OR rasstk <- brick(raster_list, quick = TRUE)
avg200001<-mean(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
Using the "quick" options in stack should definitely speed-up things, in particular if you have many rasters.
Another possibility is to first compute the average, and then perform the "spatial proceesing". For example:
for (i in 1:length(path)) {
files = bzfile(path[i], "rb")
data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
if (i == 1) {
totdata <- data
num_nonNA <- as.numeric(!is.na(data))
} else {
totdata = rowSums(cbind(totdata,data), na.rm = TRUE)
# We have to count the number of "valid" entries so that the average is correct !
num_nonNA = rowSums(cbind(num_nonNA,as.numeric(!is.na(data))),na.rm = TRUE)
}
}
avg_data = totdata/num_nonNA # Compute the average
# Now do the "spatial" processing
y<-matrix(avg_data, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(avg_data, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
This could be faster or slower, depending from "how much" you are cropping the original data.
HTH,
Lorenzo
I'm adding another answer to clarify and simplify things a bit, also in relation with comments in chat. The code below should do what you ask: that is, cycle over files, read the "data", compute the sum over all files and convert it to a raster with specified dimensions.
Note that for testing purposes here I substituted your cycle on file names with a simple 1 to 720 cycle, and file reading with the creation of arrays of the same length as yours filled with values from 1 to 4 and some NA !
totdata <- array(dim = 4948*1649) # Define Dummy array
for (i in 1:720) {
message("Working on file: ", i)
data <- array(rep(c(1,2,3,4),4948*1649/4), dim = 4948*1649) # Create a "fake" 4948*1649 array each time to simulate data reading
data[1:1000] <- -999 # Set some values to NA
data[data == -999] <- NA #convert missing data from -999
totdata <- rowSums(cbind(totdata, data), na.rm = T) # Let's sum the current array with the cumulative sum so far
}
# Now reshape to matrix and convertt to raster, etc.
y <- matrix(totdata, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,e) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
This generates a "proper" raster:
> ry
class : RasterLayer
dimensions : 1649, 4948, 8159252 (nrow, ncol, ncell)
resolution : 0.07275667, 0.1052902 (x, y)
extent : -180, 180, -90, 83.6236 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : in memory
names : layer
values : 0, 2880 (min, max)
contatining the sum of the different arrays: You can notice that max value is 720 * 4 = 2880 (Only caveat: If you have cells which are always at NA, you will get 0 instead than NA)
On my laptop, this runs in about 5 minutes !
In practice:
to avoid memory problems, I am not reading in memory all the data.
Each of your arrays is more or less 64MB, so I cannot load them all
and then do the sum (unless I have 50 GB of RAM to throw away - and even in
that case it would be slow). I instead make use of the associative
propoerty of summation by computing a "cumulative" sum at each
cycle. In this way you are only working with two 8-millions arrays at
a time: the one you read from file "i", and the one that contains
the current sum.
to avoid unnecessary computations here I am summing directly the
1-dimensional arrays I get from reading the binary. You don't need
to reshape to matrix the arrays in the cycle because you can do that
on the final "summed" array which you can then convert to matrix form
I hope this will work for you and that I am not missing something obvious !
As far as I can understand, if using this approach is still slow you are having problems elsewhere (for example in data reading: on 720 files, 3 seconds spent on reading for each file means roughly 35 minutes of processing).
HTH,
Lorenzo

Resources