Extraction of data from HDF at different pressure level - r

I am trying to extract a variable named Data Fields/OzoneTropColumn at a point location (lon=40, lat=34) at different pressure level (825.40198, 681.29102, 464.16000, 316.22699 hPa) from multiple hdf files
library(raster)
library(ncdf4)
library(RNetCDF)
# read file
nc <- nc_open("E:/Ozone/test1.nc")
list_col1 <- as.list(list.files("E:/Ozone/", pattern = "*.hdf",
full.names = TRUE))
> attributes(nc$var) #using a single hdf file to check its variables
$names
[1] "Data Fields/Latitude" "Data Fields/Longitude"
[3] "Data Fields/O3" "Data Fields/O3DataCount"
[5] "Data Fields/O3Maximum" "Data Fields/O3Minimum"
[7] "Data Fields/O3StdDeviation" "Data Fields/OzoneTropColumn"
[9] "Data Fields/Pressure" "Data Fields/TotColDensDataCount"
[11] "Data Fields/TotColDensMaximum" "Data Fields/TotColDensMinimum"
[13] "Data Fields/TotColDensStdDeviation" "Data Fields/TotalColumnDensity"
[15] "HDFEOS INFORMATION/StructMetadata.0" "HDFEOS INFORMATION/coremetadata"
> pres <- ncvar_get(nc, "Data Fields/Pressure") #looking at pressure level from single file of hdf
> pres
[1] 825.40198 681.29102 464.16000 316.22699 215.44400 146.77901 100.00000 68.12950 46.41580 31.62290
[11] 21.54430 14.67800 10.00000 6.81291 4.64160
ncin <- raster::stack(list_col1,
varname = "Data Fields/OzoneTropColumn",
ncdf=TRUE)
#cannot extract using the following code
o3 <- ncvar_get(list_col1,attributes(list_col1$var)$names[9])
"Error in ncvar_get(list_col1, attributes(list_col1$var)$names[9]) :
first argument (nc) is not of class ncdf4!"
#tried to extract pressure levels
> prsr <- raster::stack(list_col1,varname = "Data Fields/Pressure",ncdf=TRUE)
"Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'stack': varname: Data Fields/Pressure does not exist in the file. Select one from:
Data Fields/O3, Data Fields/O3DataCount, Data Fields/O3Maximum, Data Fields/O3Minimum, Data Fields/O3StdDeviation, Data Fields/OzoneTropColumn, Data Fields/TotColDensDataCount, Data Fields/TotColDensMaximum, Data Fields/TotColDensMinimum, Data Fields/TotColDensStdDeviation, Data Fields/TotalColumnDensity"
#tried using index
#Point location can also be written as below 1 deg by 1 deg resolution
lonIdx <- which(lon >32 & lon <36)
latIdx <- which(lat >38 & lat <42)
presIdx <- which(pres >= 400 & pres <= 900)
#also tried
# Option 2 -- subset using array indexing
o3 <- ncvar_get(list_col1,'Data Fields/OzoneTropColumn')
"Error in ncvar_get(list_col1, "Data Fields/OzoneTropColumn") :
first argument (nc) is not of class ncdf4!"
extract2 <- o3[lonIdx, latIdx, presIdx, ]
How to I extract these values vertically at each pressure level ? (SM=Some value)
I would like the output in following way at location (lon=40, lat=34):
Pressure 1 2 3 4 5 .... 10
825.40198 SM1 SM2 SM3 SM4 SM5... SM10
681.29102 SM11 SM12
464.16000
316.22699 SM.. SM.. SM.. SM.. SM.. SM..
Appreciate any help.
Thank you

This might be an issue with how netcdf4 and raster name each of the layers in the file. And perhaps some confusion with trying to create a multilayer object from multiple ncdf at once.
I would do the following, using only raster: Load a single netCDF, using stack() or brick(). This will load the file as a multilayer object in R. Use names() to identify what is the name of the Ozone layer according to the raster package.
firstraster <- stack("E:/Ozone/test1.nc")
names(firstraster)
Once you find out the name, you can just execute a reading of all objects as stack(), extract the information on points of interest, without even assembling all layers in a single stack.
Ozonelayername <- "put name here"
files <- list.files("E:/Ozone/", pattern = "*.hdf", full.names = TRUE)
stacklist <- lapply(files, stack)
Ozonelayerlist <- lapply(stacklist, "[[", "Ozonelayername")
The line above will output a list of rasters objects (not stacks or bricks, just plain rasters), with only the layer you want.
Now we just need to execute an extract on each of these layers. sapply() will format that neatly in a matrix for us.
pointsofinterest <- expand.grid(32:36,38:42)
values <- sapply(Ozonelayerlist, extract, pointsofinterest)
I can test it, since I do not have the data, but I assume this would work.

Related

Error in NormalizeData.default running DoubletFinder on a SCE to Seurat object in R

When I use the DoubletFinder function on my Seurat object, it keeps showing an error on NormalizingData default. I transformed my SingleCellExperiment object into Seurat and then split it by sample. It seems to run well until I use the function Doublet Finder.
This is the script I've used:
pk_identification <- function(seurat_object){
sweep.res.list_seu <- paramSweep_v3(seurat_object, PCs = 1:50, sct = FALSE)
sweep.stats_seu<- summarizeSweep(sweep.res.list_seu, GT = FALSE)
bcmvn_seu <- find.pK(sweep.stats_seu)`
`plot <- ggplot(bcmvn_seu, aes(pK, BCmetric, group = 1)) +
geom_point() +
geom_line()`
` pk_maxBC <- bcmvn_seu %>% # select the pK that corresponds to max bcmvn to optimize doublet detection
filter(BCmetric == max(BCmetric)) %>%
select(pK)
pk_maxBC <- as.numeric(as.character(pk_maxBC[[1]]))
`
`results <- list("plot" = plot, "pk"= pk_maxBC)
return(results)
`
`results <- pk_identification(splited_seu$intact_27R)`
And this is the error I've been receiving
`[1] "Creating artificial doublets for pN = 5%"
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
Error in NormalizeData.default(object = GetAssayData(object = object, :
trying to get slot "params" from an object of a basic class ("NULL") with no slots
I'd really appreciate some help, I'm doing exactly as I've read on tutorials and courses and can't find what I'm doing wrong.
Thanks!

Parallel package for windows 10 in R

I have this dataset that I'm trying to parse in R. The data from HMDB and the dataset name is Serum Metabolites (in a format of xml file). The xml file contains about 25K metabolites nodes, each I want to parse to sub-nodes
I have a code that parses the XML file to a list object in R.
Since the XML file is quite big and since for each metabolite there are about 12 sub-nodes I want, It takes a long time to parse the file. about 3 hours to 1,000 metabolites.
I'm trying to use the package parallel but receive and error.
The packages:
library("XML")
library("xml2")
library( "magrittr" ) #for pipe operator %>%
library("pbapply") # to track on progress
library("parallel")
The function:
# The function receives an XML file (its location) and returns a list of nodes
Short_Parser_HMDB <- function(xml.file_location){
start.time<- Sys.time()
# Read as xml file
doc <- read_xml( xml.file_location )
#get metabolite nodes (only first three used in this sample)
met.nodes <- xml_find_all( doc, ".//d1:metabolite" ) [1:1000] # [(i*1000+1):(1000*i+1000)] # [1:3]
#list of data.frame
xpath_child.v <- c( "./d1:accession",
"./d1:name" ,
"./d1:description",
"./d1:synonyms/d1:synonym" ,
"./d1:chemical_formula" ,
"./d1:smiles" ,
"./d1:inchikey" ,
"./d1:biological_properties/d1:pathways/d1:pathway/d1:name" ,
"./d1:diseases/d1:disease/d1:name" ,
"./d1:diseases/d1:disease/d1:references",
"./d1:kegg_id" ,
"./d1:meta_cyc_id"
)
child.names.v <- c( "accession",
"name" ,
"description" ,
"synonyms" ,
"chemical_formula" ,
"smiles" ,
"inchikey" ,
"pathways_names" ,
"diseases_name",
"references",
"kegg_id" ,
"meta_cyc_id"
)
#first, loop over the met.nodes
L.sec_acc <- parLapply(cl, met.nodes, function(x) { # pblapply to track progress or lapply but slows down dramticlly the function and parLapply fo parallel
#second, loop over the xpath desired child-nodes
temp <- parLapply(cl, xpath_child.v, function(y) {
xml_find_all(x, y ) %>% xml_text(trim = T) %>% data.frame( value = .)
})
#set their names
names(temp) = child.names.v
return(temp)
})
end.time<- Sys.time()
total.time<- end.time-start.time
print(total.time)
return(L.sec_acc )
}
Now create the enviroment :
# select the location where the XML file is
location= "D:/path/to/file//HMDB/DataSets/serum_metabolites/serum_metabolites.xml"
cl <-makeCluster(detectCores(), type="PSOCK")
clusterExport(cl, c("Short_Parser_HMDB", "cl"))
clusterEvalQ(cl,{library("parallel")
library("magrittr")
library("XML")
library("xml2")
})
And execute :
Short_outp<-Short_Parser_HMDB(location)
stopCluster(cl)
The error received:
> Short_outp<-Short_Parser_HMDB(location)
Error in checkForRemoteErrors(val) :
one node produced an error: invalid connection
base on those links, Tried to implement the parallel :
Parallel Processing in R
How to call global function from the parLapply function?
Error in R parallel:Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: cannot open the connection
but couldn't find invalid connection as an error
I'm using windows 10 the latest R version 4.0.2 (not sure if it's enough information)
Any hint or idea will be appreciated

How to convert Sentinel-3 .nc-file into .tiff-file?

regarding the conversion of .nc-files into .tiff-files i encounter the problem of loosing geoinformation of my pixels. I know that other users experienced the same problem and tried to solve it via kotlin but failed. i would prefer a solution using R. see here for kotlin approach URL:https://gis.stackexchange.com/questions/259700/converting-sentinel-3-data-netcdf-to-geotiff
I downloaded freely available Sentinel-3 data of the ESA (URL:https://scihub.copernicus.eu/dhus/#/home). This data comes unfortunately in the .nc-format, so I want to convert it into the .tiff-format. I have already tried various approaches, but failed. What I have tried so far:
data_source <- 'D:/user_1/01_test_data/S3A_SL_1_RBT____20180708T093240_20180708T093540_20180709T141944_0179_033_150_2880_LN2_O_NT_003.SEN3/F1_BT_in.nc'
# define path to .nc-file
data_output <- 'D:/user_1/01_test_data/S3A_SL_1_RBT____20180708T093240_20180708T093540_20180709T141944_0179_033_150_2880_LN2_O_NT_003.SEN3/test.tif'
# define path of output .tiff-file
###################################################
# 1.) use gdal_translate via Windows cmd-line in R
# see here URL:https://stackoverflow.com/questions/52046282/convert-netcdf-nc-to-geotiff
system(command = paste('gdal_translate -of GTiff -sds -a_srs epsg:4326', data_source, data_output))
# hand over character string to Windows cmd-line to use gdal_translate
###################################################
# 2.) use the raster-package
# see here URL:https://www.researchgate.net/post/How_to_convert_a_NetCDF4_file_to_GeoTIFF_using_R2
epsg4326 <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# proj4-code
# URL:https://spatialreference.org/ref/epsg/wgs-84/proj4/
specific_band <- raster(data_source)
crs(specific_band) <- epsg4326
writeRaster(specific_band, filename = data_output)
# both approaches work, i can convert the files from .nc-format into .tiff-format, but i **loose the geoinformation for the pixels** and just get pixel coordinates instead of longlat-values.
I really appreciate any solutions that keep the geoinformation for the pixels!
Thanks a lot in advance, ExploreR
As #j08lue points out,
The product format for Sentinel 3 products is horrible. Yes, the data
values are stored in netCDF, but the coordinate axes are in separate
files and it is all just a bunch of files and metadata.
I did not find any documentation (I assume it must exist), but it seems you can get the data like this:
library(ncdf4)
# coordinates
nc <- nc_open("geodetic_in.nc")
lon <- ncvar_get(nc, "longitude_in")
lat <- ncvar_get(nc, "latitude_in")
# including elevation for sanity check only
elv <- ncvar_get(nc, "elevation_in")
nc_close(nc)
# the values of interest
nc <- nc_open("F1_BT_in.nc")
F1_BT <- ncvar_get(nc, "F1_BT_in")
nc_close(nc)
# combine
d <- cbind(as.vector(lon), as.vector(lat), as.vector(elv), as.vector(F1_BT_in))
Plot a sample of the locations. Note that the raster is rotated
plot(d[sample(nrow(d), 25000),1:2], cex=.1)
I would need to investigate a bit more to see how to write a rotated raster.
For now, a not recommended shortcut could be to rasterize to a non-rotated raster
e <- extent(as.vector(apply(d[,1:2],2, range))) + 1/120
r <- raster(ext=e, res=1/30)
#elev <- rasterize(d[,1:2], r, d[,3], mean)
F1_BT <- rasterize(d[,1:2], r, d[,4], mean, filename="")
plot(F1_BT)
so that´s what i have done so far - unfortunately the raster is not somehow rotated by 180degree, but somehow distorted in another way...
# (1.) first part of the code adapted to Robert Hijmans approach (see code of answer provided above)
nc_geodetic <- nc_open(paste0(wd, "/01_test_data/sentinel_3/geodetic_in.nc"))
nc_geodetic_lon <- ncvar_get(nc_geodetic, "longitude_in")
nc_geodetic_lat <- ncvar_get(nc_geodetic, "latitude_in")
nc_geodetic_elv <- ncvar_get(nc_geodetic, "elevation_in")
nc_close(nc_geodetic)
# to get the longitude, latitude and elevation information
F1_BT_in_vars <- nc_open(paste0(wd, "/01_test_data/sentinel_3/F1_BT_in.nc"))
F1_BT_in <- ncvar_get(F1_BT_in_vars, "F1_BT_in")
nc_close(F1_BT_in_vars)
# extract the band information
###############################################################################
# (2.) following part of the code is adapted to #Matthew Lundberg rotation-code see URL:https://stackoverflow.com/questions/16496210/rotate-a-matrix-in-r
rotate_fkt <- function(x) t(apply(x, 2, rev))
# create rotation-function
F1_BT_in_rot180 <- rotate_fkt(rotate_fkt(F1_BT_in))
# rotate raster by 180degree
test_F1_BT_in <- raster(F1_BT_in_rot180)
# convert matrix to raster
###############################################################################
# (3.) extract corner coordinates and transform with gdal
writeRaster(test_F1_BT_in, filename = paste0(wd, "/01_test_data/sentinel_3/test_flip.tif"), overwrite = TRUE)
# write the raster layer
data_source_flip <- '"D:/unknown_user/X_processing/01_test_data/sentinel_3/test_flip.tif"'
data_tmp_flip <- '"D:/unknown_user/X_processing/01_test_data/temp/test_flip.tif"'
data_out_flip <- '"D:/unknown_user/X_processing/01_test_data/sentinel_3/test_flip_ref.tif"'
# define input, temporary output and output for gdal-transformation
nrow_nc_mtx <- nrow(nc_geodetic_lon)
ncol_nc_mtx <- ncol(nc_geodetic_lon)
# investigate on matrix size of the image
xy_coord_char1 <- as.character(paste("1", "1", nc_geodetic_lon[1, 1], nc_geodetic_lat[1, 1]))
xy_coord_char2 <- as.character(paste(nrow_nc_mtx, "1", nc_geodetic_lon[nrow_nc_mtx, 1], nc_geodetic_lat[nrow_nc_mtx, 1]))
xy_coord_char3 <- as.character(paste(nrow_nc_mtx, ncol_nc_mtx, nc_geodetic_lon[nrow_nc_mtx, ncol_nc_mtx], nc_geodetic_lat[nrow_nc_mtx, ncol_nc_mtx]))
xy_coord_char4 <- as.character(paste("1", ncol_nc_mtx, nc_geodetic_lon[1, ncol_nc_mtx], nc_geodetic_lat[1, ncol_nc_mtx]))
# extract the corner coordinates from the image
system(command = paste('gdal_translate -of GTiff -gcp ', xy_coord_char1, ' -gcp ', xy_coord_char2, ' -gcp ', xy_coord_char3, ' -gcp ', xy_coord_char4, data_source_flip, data_tmp_flip))
system(command = paste('gdalwarp -r near -order 1 -co COMPRESS=NONE ', data_tmp_flip, data_out_flip))
# run gdal-transformation

User Based Recommendation in R

I am trying to do user based recommendation in R by using recommenderlab package but all the time I am getting 0(no) prediction out of the model.
my code is :
library("recommenderlab")
# Loading to pre-computed affinity data
movie_data<-read.csv("D:/course/Colaborative filtering/data/UUCF Assignment Spreadsheet_user_row.csv")
movie_data[is.na(movie_data)] <- 0
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
# Convert it as a matrix
R<-as.matrix(movie_data)
# Convert R into realRatingMatrix data structure
# realRatingMatrix is a recommenderlab sparse-matrix like data-structure
r <- as(R, "realRatingMatrix")
r
rec=Recommender(r[1:nrow(r)],method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
recom
as(recom, "list")
all the time I am getting out put like :
as(recom, "list")
$`1648`
character(0)
I am using user-row data from this link:
https://drive.google.com/file/d/0BxANCLmMqAyIQ0ZWSy1KNUI4RWc/view
In that data column A contains user id and apart from that all are movie rating for each movie name.
Thanks.
The line of code movie_data[is.na(movie_data)] <- 0 is the source of the error. For realRatingMatrix (unlike the binaryRatingMatrix) the movies that are not rated by the users are expected to be NA values, not zero values. For example, the following code gives the correct predictions:
library("recommenderlab")
movie_data<-read.csv("UUCF Assignment Spreadsheet_user_row.csv")
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
R<-as.matrix(movie_data)
r <- as(R, "realRatingMatrix")
rec=Recommender(r,method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
as(recom, "list")
# [[1]]
# [1] "X13..Forrest.Gump..1994." "X550..Fight.Club..1999."
# [3] "X77..Memento..2000." "X122..The.Lord.of.the.Rings..The.Return.of.the.King..2003."
# [5] "X1572..Die.Hard..With.a.Vengeance..1995."

Remove "time" dimension from RasterBrick R

Given the following:
library(raster)
r <- raster(ncol=10, nrow=10)
s <- stack(lapply(1:3, function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, as.Date('2000-1-1') + 0:2,name="time")
s
getZ(s)
how can I remove "time" from s?
The reason why I want to remove "time" is because I get errors while croppinga RasterStack P similar to s:
cr <- crop(P, extent(Germany),snap="out")
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
Error in R_nc4_def_var_float: NetCDF: String match to name in use
Name of variable that the error occurred on: "time"
I.e., you are trying to add a variable with that name to the file, but it ALREADY has a variable with that name!
[1] "----------------------"
[1] "Var: time"
[1] "Ndims: 3"
[1] "Dimids: "
[1] 2 1 0
Error in ncvar_add(nc, vars[[ivar]], verbose = verbose, indefine = TRUE) :
Error in ncvar_add, defining var time
If "time" dimension is not the problem, what can be the solution to this error?
Thanks for your thoughts on this.
You can try to set a different name in setZ function. When you crop the raster, a new variable with "time" name is created, and the new rasterbrick cannot be created with two variables with the same name.
I had the same problem when I tried to export a cropped raster. I had to change de 'varname' from "time" to something else ("Date", for example).

Resources