I'm running into an unusual issue with the cover function in R. We are trying to fill cloudy pixels with values from another layer. I can make it work just fine with stacks as follows -
library(raster)
r1 <- raster(ncols=36, nrows=18)
r1[] <- 1:ncell(r1)
r1b <- r1a <- r1
r1_stack <- stack(r1, r1a, r1b)
r2 <- setValues(r1, runif(ncell(r1)))
r2b <- r2a <- r2
r_stack <- stack(r2, r2a, r2b)
r_stack[r_stack < 0.5] <- NA
r3 <- cover(r_stack, r1_stack)
But then i try to do the same thing with a raster stack and i get the error:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
The code:
# get all tifs
LS5_032_032_2008_09_21 <- list.files("LT050340302008090301T1-SC20170526100900/",
pattern = glob2rx("*band*.tif$"), full.names = T)
# stack bands
cloudy_scene <- stack(LS5_032_032_2008_09_21)
# import cloud mask
cloud_mask <- raster('LT050340302008090301T1-SC20170526100900/LT05_L1TP_034030_20080903_20160905_01_T1_sr_cloud_qa.tif')
# mask data
masked_data <- mask(cloudy_scene, mask = cloud_mask, maskvalue=0, inverse=TRUE)
####### get cloud free data
# get files
LS5_2008_09_19 <- list.files("LT050340302008091901T1-SC20170526101124/",
pattern = glob2rx("*band*.tif$"), full.names = T)
# subset and stack cloud free bands
cloud_free_data <- stack(LS5_2008_09_19)
# use cover function to assign NA pixels to corresponding pixels in other scene
cover <- cover(masked_data, cloud_free_data)
TRACEBACK() output:
9: toupper(format)
8: .defaultExtension(format)
7: .getExtension(filename, filetype)
6: .local(x, filename, ...)
5: writeStart(outRaster, filename = filename, format = format, datatype = datatype,
overwrite = overwrite)
4: writeStart(outRaster, filename = filename, format = format, datatype = datatype,
overwrite = overwrite)
3: .local(x, y, ...)
2: cover(masked_data, cloud_free_data)
1: cover(masked_data, cloud_free_data)
UPDATE: I tried to resample the data - still doesn't work
cloud_free_resam <- resample(cloud_free_data, masked_data)
cover <- cover(masked_data, cloud_free_resam)
ERROR:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
I also tried to crop both layers - same error
# find intersection boundary
crop_extent <- intersect(extent(cloud_free_data), extent(masked_data))
cloud_free_data <- crop(cloud_free_data, crop_extent)
masked_data <- crop(masked_data, crop_extent)
# use cover function to assign NA pixels to corresponding pixels in other scene
cover <- cover(masked_data, cloud_free_data)
GET THE DATA: (WARNING: 317mb download - unpacks to ~1gb)
https://ndownloader.figshare.com/files/8561230
Any ideas what might be causing this error with this particular dataset?
I'm sure we're missing something quite basic but...what?
Thank you in advance.
Leah
This is a bug. Cover with two multi-layered Raster* objects cannot write to disk. This can be seen in the simple example by setting
rasterOptions(todisk=TRUE)
I have fixed this in version 2-6.1 (forthcoming)
I think it is to do with when the object is converted to a rasterBrick and the raster written to a temporary file. i.e., masked_data <- mask(cloudy_scene, mask = cloud_mask)
Using Spacedman's crop creates a 'in memory' rasterBrick objects that do not rely on accessing the files; in that case the example works with no error.
But using the full extent (or cropping at end of process), raster writes and accesses (temporary) files and the error occurs.
Maybe as a temporary fix is to explicitly split up the images into memory sized chunks, mask/cover and then restitch with mosaic.
The extents of the two objects are also slightly different, so that should be fixed. Also generally a good idea to avoid issues by explicitly setting band names, min-max values and values < 0 to NA.
Related
I am trying to downsample a list of wave objects using seewave::resamp.
To get my list I have imported a .wav file and split it into 10 second clips following #Jota 's answer here
So to get my list of wave object I have done the following (this is using the example from the above answer):
library(seewave)
# your audio file (using example file from seewave package)
data(tico)
audio <- tico # this is an S4 class object
# the frequency of your audio file
freq <- 22050
# the length and duration of your audio file
totlen <- length(audio)
totsec <- totlen/freq
# the duration that you want to chop the file into
seglen <- 0.5
# defining the break points
breaks <- unique(c(seq(0, totsec, seglen), totsec))
index <- 1:(length(breaks)-1)
# a list of all the segments
subsamps <- lapply(index, function(i) audio[(breaks[i]*freq):(breaks[i+1]*freq)])
I now have my list of wave objects. If do the following for individual objects it works:
resamp(subsamps[[1]], f = 48000, g = 22050, output = "Wave")
But when I try and do it to the list of objects it comes up with an error:
test_wave_downsample <- lapply(subsamps, function(i) resamp(subsamps[[i]], f = 22050, g = 8000, output = "Wave"))
Error in subsamps[[i]] : invalid subscript type 'S4'
I am pretty sure this is something to do with the way I using lapply as the S4 object is not an issue when done individually, but as someone who is new to using the apply family I am not sure what.
I have had a look around an can't find much on using existing functions within lapply or if that can be an issue.
Any advice greatly appreciated.
After asking around I have been given the a solution that works:
f <- function(x) {
resamp(x, f = 22050, g = 8000, output = "Wave")
}
test_wave_downsample <- lapply(subsamps, f)
When doing raster math, for example raster1-raster2, the datatype of the output raster is 'FLT4S', even if the datatype ot both raster1 and raster 2 is 'INT2S'. How can I force the output to be 'INT2S', without writing to disk? Is there a global way of doing it saying that all raster processing shall result in INT2S data?
The reason for wanting 'INT2S' instead of 'FLT4S' is to save memory space and speed up processing when using for loops on larger raster datasets.
In rasterOptions() one can specify dataType, but as far as I understand that only applies when writing to disk, right?
#load package raster
require (raster)
#create sample rasters
r1<-raster::raster(ext=extent(c(0,10,0,10)), res=1, vals=1:100)
r2<-raster::raster(ext=extent(c(0,10,0,10)), res=1, vals=100:1)
#set dataType of sample rasters to 'INT2S'
dataType(r1)<-'INT2S'
dataType(r2)<-'INT2S'
#check dataType of sample rasters
dataType(r1)
dataType(r2)
#do some simple arithmetics
r3<-r2-r1
#check the dataType of the output raster
dataType(r3)
I would like dataType(r3) to be 'INT2S' as well
In your example, it actually does work. Notice the L after the numbers if the vals argument. That is equivalent to as.integer
library(raster)
r1 <-raster::raster(ext=extent(c(0,10,0,10)), res=1, vals=1:100L)
r2 <-raster::raster(ext=extent(c(0,10,0,10)), res=1, vals=100:1L)
r3 <- r2 - r1
class(values(r3))
#[1] "integer"
But in other cases it does not work
class(values(r3 - 2L))
[1] "numeric"
And you cannot control that behavior.
Note that dataType provides information about the file that the RasterLayer refers to. If there is no file, like in the above example, the value is meaningless. You should also not set it, except for debugging when dealing with an existing file.
So the best you can do is set the datatype when writing to disk as in
writeRaster(r3, filename="test.tif", datatype="INT2S")
I have the same problem with 'INT2U' data and I don't believe it's possible. AFAIK, R supports 'numeric'. Floating point can be coerced to 'integer' with 'as.integer()' but I think it's just truncated.
I've had this same question too. Hopefully someone will chime in if I'm wrong, but I believe you really only have one option and that is to convert the output when you write to disk:
writeRaster(r3,filename="r3.tif", format="GTiff", datatype='INT2S')
then load back into R.
You can convert the output after the calculation, but it does not change the size of the object:
dataType(r3)<-'INT2S'
You can check the object size using:
object.size(r3)
If you write the data to disk and reload the object size will be smaller.
Another option is to use the calc or overlay functions in raster to do some math, save to disk and return the resulting raster in one go. I wrote the example rasters to disk to start since Robert said you should not assign it and in real cases you will probably be reading from disk.
# load package raster
library(raster)
# create sample rasters on disk
r1 <- raster::raster(ext = extent(c(0, 10, 0, 10)), res = 1, vals = 1:100)
r2 <- raster::raster(ext = extent(c(0, 10, 0, 10)), res = 1, vals = 100:1)
r1_fl <- tempfile(fileext = ".tif")
r2_fl <- tempfile(fileext = ".tif")
r3_fl <- tempfile(fileext = ".tif")
writeRaster(r1, r1_fl, datatype = "INT2S")
writeRaster(r2, r2_fl, datatype = "INT2S")
# read rasters from disk
r1 <- raster(r1_fl)
r2 <- raster(r2_fl)
# check dataType of sample rasters
dataType(r1)
dataType(r2)
# do some simple (or complex) arithmetic
r3 <- overlay(r1, r2, fun = function(x, y){x - y}, filename = r3_fl,
datatype = "INT2S")
# check the dataType of the output raster
dataType(r3)
I am a beginner in R and text mining. I have already performed the LDA and now I want to visualise my results with the LDAvis package. I have followed every step from the github example (https://ldavis.cpsievert.me/reviews/reviews.html) starting from the 'visualizing' chapter. However, I either get error notifications or empty pages.
I have tried the following:
RedditResults <- list(phi = phi,
theta = theta,
doc.length = doc.length,
vocab = vocab,
term.frequency = term.frequency)
json <- createJSON(phi = RedditResults$phi,
theta = RedditResults$theta,
doc.length = RedditResults$doc.length,
vocab = RedditResults$vocab,
term.frequency = RedditResults$term.frequency)
serVis(json, out.dir = "vis", open.browser = FALSE)
However, this gives me an error display saying:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'closure') cannot be handled by 'cat'
I reasoned this might have happened because the 'json' object is of class 'function' rather than a character string, which I read the object has to be in to perform serVis. Therefore I tried to convert it before using serVis by means of
RedditResults <- sapply(RedditResults, toJSON)
Resulting in the following error:
Error in run(timeoutMs) :
Evaluation error: argument must be a character vector of length 1.
I feel like I'm making a very obvious mistake somewhere, but after days of trial and error I haven't been able to spot what I should do differently.
The weirdest thing to me is that sometimes it does work, but then when I try to open the html file I only see a blank page. I have tried opening it in multiple browsers as well as opening up those browsers to display local files. I have also tried opening it using the servr package, but this gives me the same result, which is either an error notification (character vector length is not equal to 1) or an empty page.
Hope anyone can spot what I'm doing wrong. Thanks!
EDIT: objects/code underlying the code above:
Convenient to know:
I cleaned the data in corpus form (reddit_data_textcleaned) before converting it to my document-term matrix (tdm3).
After converting it to tdm3, I eliminated any 'empty' documents by excluding those with less than 2 words. Thus, 'reddit_data_textcleaned' contains more documents than relevant and 'tdm3' contains the data I want to work with.
'fit3' is the fitted model resulting from doing LDA on tdm3
'DTM' is the term-document matrix with exactly the same data as tdm3, but with transposed rows/columns.
I am aware that it makes very little sense to call your term-document matrix 'DTM' whilst naming your document-term matrix 'tdm', seeing the abbreviations. Sorry about that.
phi <- as.matrix(posterior(fit3)$terms)
theta <- as.matrix(posterior(fit3)$topics)
dp <- dim(phi) # should be K x W
dt <- dim(theta) # should be D x K
D <- length(as.matrix(tdm3[, 1])) # number of documents (2812)
doc.length <- colSums(as.matrix(DTM)) #number of tokens in each document
N <- sum(doc.length) # total number of tokens in the data (54,136)
vocab <- colnames(phi)# all terms in the vocab
W <- length(vocab) # number of terms in the vocab (6470)
temp_frequency <- inspect(tdm3)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
term.frequency <- freq_matrix$Freq
doc.list <- as.list(reddit_data_textcleaned, "[[:space:]]+")
get.terms <- function(x) {
index <- match(x, vocab)
index <- index[!is.na(index)]
rbind(as.integer(index - 1), as.integer(rep(1, length(index))))
}
documents <- lapply(doc.list, get.terms)
I presume something goes wrong in the creation of the 'get.terms' and 'documents' objects, as I don't exactly know what happens there. I used these methods based on answers to similar questions I read on this platform. Also, the 'doc.list' object still contains the empty documents I removed from the data after converting 'reddit_data_textcleaned' to 'tdm3'. However, the code above doesn't work with a document-term matrix object so that's why I used 'reddit_data_textcleaned' instead of 'tdm3'. I figured I would fix that issue later.
I am currently attempting to remove NA values from a huge raster file (1.9*10^7 observations). In these rasters 99.9% are NA values. My aim is to remove NA and create a .csv file conataining all non-NA values.
my attempt is as follows:
# Load packages
packs = c('raster', 'rgdal')
sapply(packs, FUN = 'require', character.only = TRUE)
xy <- xyFromCell(raster, 1:ncell(raster))
v <- as.data.frame(raster)
xyv <- data.frame(xy, v)
rm(xy,v)
xyv <- na.omit(xyv)
write.csv(xyv, file ="raster.csv", row.names = F)
When i execute na.omit() R/Rstudio gives an error message that it has encountered a fatal error and terminates. Is there a simpler and faster solution to execute this?
You can use the rasterToPoints function for that.
library(raster)
r <- raster()
r[50:52] <- 1:3
xyv <- rasterToPoints(r)
write.csv(xyv, file ="raster.csv", row.names = FALSE)
Whenever I see a large array with mostly missing values, I think "sparse matrix" as an efficient way to hold the data. If the non-missing data in your raster are all non-zero, then using a sparse matrix is straightforward. If there are zeros in the data, then one extra step (included below) is needed.
First lets create a large raster with mostly NA's. And also create a matrix from it.
my.raster <- raster(nrows=1e3, ncols=1e4, xmn=0, xmx=10, vals=NA)
my.raster[sample(1:(1e3*1e4), 100)] <- as.integer(runif(100,0,100))
my.matrix <- as.matrix(my.raster)
Sparse matrices only store the non-zero elements, so to make this sparse we need to change NA's to zeroes. In case the data may already contain zeroes that we don;t want to lose track of, we store the locations of the zeroes before making the matrix sparse.
library(Matrix)
zeros <- data.frame(xyFromCell(my.raster, which(my.matrix == 0)), val=0)
my.matrix[is.na(my.matrix)] <- 0
sp <- as(Matrix(my.matrix, sparse=T), "dgTMatrix") # use triplet form of sparse matrix
Now the values are in sp#x, and the coordinates are stored in #i and #j. So, to save to .csv
my.df <- data.frame(x = xFromCol(my.raster, sp#j), y = yFromRow(my.raster, sp#i), val=sp#x)
my.df <- rbind(zeros, my.df)
write.csv(my.df, file ="raster.csv", row.names = F)
I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)