I would like to remove the pixels that form a large cluster and keep only the small cluster to analyse (means get pixels number and locations). First I apply a filter to color in white all pixels that has a value lower to 0.66. Then I use the function clump() in R. The model works but I cannot remove only the large cluster. I do not understand how clump function works.
Initial image:
Results image: plot_r is the image where the pixels with value < 0.66 are changed to 0. plot_rc is the results after clump() function. As observed I cannot remove only the large cluster of pixels (on top of the image plot_r). I changed the value (700 in the code) but not better, how to do?
Here the code:
library(magick)
library(pixmap)
library(raster)
library(igraph)
f <- "https://i.stack.imgur.com/2CjCh.jpg"
x <- image_read(f)
x <- image_convert(x, format = "pgm", depth = 8)
# Save the PGM file
f <- tempfile(fileext = ".pgm")
image_write(x, path = f, format = "pgm")
# Read in the PGM file
picture <- read.pnm(file = f, cellres = 1)
str(picture)
picture#size
mat <- picture#grey
mat[mat<0.66] <- 0; x
##############################################################
##Remove clumps of pixels in R using package Raster and igraph
#Detect clumps (patches) of connected cells
r <-raster(mat)
rc <- clump(r)
#extract IDs of clumps according to some criteria
clump9 = data.frame(freq(rc))
#remove clump observations with frequency smaller/larger than N
clump9 = clump9[ ! clump9$count > 700, ]
# record IDs from clumps which met the criteria in previous step
clump9 = as.vector(clump9$value)
#replace cells with IDs which do not belong to the group of interest
rc[rc != clump9[1] & rc != clump9[2]] = NA
# converting rasterlayer to matrix
n <- as.matrix(r)
m <- as.matrix(rc)
Perhaps something like this
library(raster)
library(igraph)
Short-cutting your approach a bit
f <- "https://i.stack.imgur.com/2CjCh.jpg"
b <- brick(f)
x <- sum(b)
r <- x > 450
rc <- clump(r)
f <- freq(rc, useNA="no")
Replace the clumps with the number of cells they consist of and then set the larger one (here more than 100 cells) to NA, and use the result to mask the original raster
rs <- subs(rc, data.frame(f))
rsc <- reclassify(rs, cbind(100,Inf,NA))
m <- mask(b, rsc)
plotRGB(m)
Related
I am trying to read in chunks of a a large data set:
find the mean of each chunk (representing a larger column)
add the mean into a matrix column
then find the mean of the means to give me the overall mean of the column.
I have the set up, but my while-loop is not repeating its cycle. I think it may be with how I am referring to "chunks" and "chunk".
This is a practice using "iris.csv" in R
fl <- file("iris.csv", "r")
clname <- readLines(fl, n=1) # read the header
r <- unlist(strsplit(clname,split = ","))
length(r) # get the number of columns in the matrix
cm <- matrix(NA, nrow=1000, ncol=length(r)) # need a matrix that can be filled on each #iteration.
numchunk = 0 #set my chunks of code to build up
while(numchunk <= 0){ #stop when no more chunks left to run
numchunk <- numchunk + 1 # keep on moving through chunks of code
x <- readLines(fl, n=100) #read 100 lines at a time
chunk <- as.numeric(unlist(strsplit(x,split = ","))) # readable chunk of code
m <- matrix(chunk, ncol=length(r), byrow = TRUE) # put chunk in a matrix
cm[numchunk,] <- colMeans(m) #get the column means of the matrix and fill in larger matrix
print(numchunk) # print the number of chunks used
}
cm
close(fl)
final_mean <- colSums(cm)/nrow(cm)
return(final_mean)
--
This works when I set my n = 1000, but I want it to work for larger data sets, where the while will need to keep running.
Can anyone help me correct this please?
Perhaps, this helps
clname <- readLines(fl, n=1) # read the header
r <- unlist(strsplit(clname,split = ","))
length(r) # get the number of columns in the matrix
cm <- matrix(NA, nrow=1000, ncol=length(r)) #
numchunk = 0
flag <- TRUE
while(flag){
numchunk <- numchunk + 1 # keep on moving through chunks of code
x <- readLines(fl, n=5)
print(length(x))
if(length(x) == 0) {
flag <- FALSE
} else {
chunk <- as.numeric(unlist(strsplit(x,split = ","))) # readable chunk of code
m <- matrix(chunk, ncol=length(r), byrow = TRUE) # put chunk in a matrix
cm[numchunk,] <- colMeans(m) #get the column means of the matrix and fill in larger matrix
print(numchunk) # print the number of chunks used
}
}
cm
close(fl)
final_mean <- colSums(cm)/nrow(cm)
First, it might be helpful, to define a helper function r2v() to split raw lines into useful vectors.
r2v <- Vectorize(\(x) {
## splits raw lines to vectors
strsplit(gsub('\\"', '', x), split=",")[[1]][-1]
})
After opening file, check the size w/o the need to read it in, using system() and bash commands (for Windows see there.)
## open file
f <- 'iris.csv'
fl <- file(f, "r")
## rows
(nr <-
as.integer(gsub(paste0('\\s', f), '', system(paste('wc -l', f), int=T))) - 1)
# nr <- 150 ## alternatively define nrows manually
# [1] 150
## columns
nm <- readLines(fl, n=1) |> r2v()
(nc <- length(nm))
# [1] 5
Next, define a chunk size by which the rows can be divided.
## define chunk size
ch_sz <- 50
stopifnot(nr %% ch_sz == 0) ## all chunks should be filled
Then, using replicate(), we calculate chunk-wise rowMeans() (because we get the chunks transposed), and finally rowMeans() again on everything to get the column means of the entire matrix.
## calculate means chunk-wise
final_mean <-
replicate(nr / ch_sz,
rowMeans(type.convert(r2v(readLines(fl, n=ch_sz)), as.is=TRUE))) |>
rowMeans()
close(fl)
Vet's validate the result.
## test
all.equal(final_mean, as.numeric(colMeans(iris[-5])))
# [1] TRUE
Data:
iris[-5] |>
write.csv('iris.csv')
I have generated a pixel-based image by encoding each input character to a certain color in the image. For example, in input txt <- "ABACDAAFFEDDADFAFAED" i plotted 'A' as a red pixel, 'B' as purple, 'C' by blue and 'D' by some other color. I used R for it. Here is the answer from where I have taken help for this
Generate pixel based image in R from character array
Now, I want to update this for handling a case as well where I have a character presents 2 or three times consecutively and I want to give it a different color. For example txt <- "ABBACDAABBBEDDADCACABBDB", i want to give
A- red, AA maroon, AAA dark red.
B-green, BB- Pink, BBB-yellow,
C-light brown, CC brown, CCC dark brown etc.
I still want to give 1 pixel to each char but for consecutive 2 or 3 appearances color those 2 or 3 pixels with a different color. I am unable to code a reasonable solution for it in R. Your help will be appreciated. Thanks
I changed the function to support multiple character :
library(png)
library(tiff)
library(abind)
# function which plots the image
createImage <- function(txt,charToColorMap,destinationFile,format=c('png','tiff'),debugPlot=FALSE,unused.char='#'){
if(nchar(unused.char) != 1){
stop('unused.char must be a single character, and you should be sure that it will never be present in your text')
}
# helper function which finds all the divisors of a number
divisors <- function(x){
y <- seq_len(x)
y[ x%%y == 0 ]
}
# split the string in charaters
chars <- strsplit(txt,'')[[1]]
# find the most "squared" rectangle that contains all the characters without padding
d <- divisors(length(chars))
y <- d[length(d) %/% 2]
x <- length(chars) / y
# create an array with 4 matrices (or planes) one for each RGBA channel
RGBAmx <- col2rgb(charToColorMap,alpha=TRUE) / 255
colorIndexes <- match(chars,names(charToColorMap))
######################################
# MULTIPLE CHAR
######################################
# check if color map contains multiple character names
multiple <- names(charToColorMap)[nchar(names(charToColorMap)) > 1]
multiple <- multiple[order(nchar(multiple),decreasing=TRUE)]
txtForMultiple <- txt
for(m in multiple){
idxs <- gregexpr(pattern=m,text=txtForMultiple,fixed=TRUE)[[1]]
charRanges <- unlist(lapply(idxs,seq,length.out=nchar(m)))
colorIndexes[charRanges] <- which(names(charToColorMap)==m)[1]
tmp <- strsplit(txtForMultiple,'')[[1]]
tmp[charRanges] <- unused.char
txtForMultiple <- paste(tmp,collapse='')
}
#########################################################
colorIndexesR <- matrix(RGBAmx['red',colorIndexes],nrow=y,ncol=x,byrow = TRUE)
colorIndexesG <- matrix(RGBAmx['green',colorIndexes],nrow=y,ncol=x,byrow = TRUE)
colorIndexesB <- matrix(RGBAmx['blue',colorIndexes],nrow=y,ncol=x,byrow = TRUE)
colorIndexesA <- matrix(RGBAmx['alpha',colorIndexes],nrow=y,ncol=x,byrow = TRUE)
planes <- abind(colorIndexesR,colorIndexesG,colorIndexesB,colorIndexesA,along=3)
# write the PNG image
if(format[1] == 'png'){
writePNG(planes,destinationFile)
}else if(format[1] == 'tiff'){
writeTIFF(planes,destinationFile)
}else{
stop('usupported format')
}
# for debug purpose only we plot the image...
if(debugPlot){
mx <- matrix(colorIndexes,nrow=y,ncol=x,byrow = TRUE)
image(z=t(mx[nrow(mx):1,]),col=charToColorMap)
}
invisible()
}
Usage example ('AAA' set to white) :
charToColorMap <- c(A='red',B='blue',C='green',D='black',E='yellow',F='orange',AAA='white')
txt <- "ABACAAAFFEDDADFAFAED"
# please note that unused.char will be used to mark the characters of txt already analyzed
# during the multi-char handling, so it must not be present in txt
createImage(txt,charToColorMap,destinationFile = "test.png",debugPlot=TRUE,unused.char='#')
Result (zoom 800 %):
In my example I create a raster:
require(raster); require(sp)
## Raster Raster creation
r <- raster(nc=10, nr=10)
r <- setValues(r, round(runif(ncell(r))* 255))
After, I make pixels values extraction by selection of coordinates:
x <- c(-150)
y <- c(-80)
p <- data.frame(x,y)
pontos <- SpatialPoints(p)
p$cel <- cellFromXY(r, pontos)
p$col <- colFromCell(r, p$cel)
p$row <- rowFromCell(r, p$cel)
p
plot(r)
text(r)
points(pontos, pch = 4, col = 2)
But, I'd like to find a way to extract the value of the pixels in which I would select a coordinate and the function would perform the extraction of the pixels of entire horizontal lines of the raster to every two pixels from the given coordinate. For example, I choose xy(-150,-80) coordinates but my function below returns values only for the first line and need lines 4, 7 and 10 too.
require(plyr)
vals2cols <- ldply(1:nrow(p),
function(ir){
getValuesBlock(r,
col = p$col[ir],
ncols = 10,
row = p$row[ir],
nrows = 1)
}# end fun
)
df <- data.frame(p, vals2cols)
df
This is possible?
I'd like to merge two datasets based on a common column. Dataset A is a geoTIFF image, representing RGB values of an area. Dataset B is a point cloud with xyz values of the same area.
I want to merge the RGB info in the image to the 3d data. I thougth to use the x y coordinates of the two datasets (which are in the same coordinate system).
I wrote a script inspired by code snippets found in stackoverflow, but I need to implement my whole code (sources are 1, 2, and 3).
The issue is that the x y coordinates in thwe two files have different precision (decimal numbers). Dataset A has 0 to 2 digits; dataset B has much more. I rounded the dataset B digits to be 2. Now, I'd like to pad with zeros when the digits of datset A are less than 2, so that the final merge will hopefully work.
Would a simple if statement be fine considering that my datset has >280000 rows? Or should I go for indexing? Anyway, I'm fairly new in using R, so I hope the possible posters woud help me with a code example. Below is my code:
require(raster)
require(rgl)
setwd("C:/my/folder")
# Read tiff file
img <- stack("image.tif")
vals <- extract(img, 1:ncell(img))
coord <- xyFromCell(img, 1:ncell(img))
combine <- cbind(coord, vals)
remove(vals)
remove(coord)
# read POINTCLOUD and assign names
lidar <- read.table("lidardata.txt")
names(lidar) <- c("x","y","z")
decimalplaces <- function(x) {
if ((x %% 1) != 0) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed=TRUE)[[1]][[2]])
} else {
return(0)
}
}
# HERE I SHOULD PAD THE LIDAR VARIABLE WITH ZEROS IN DECIMAL POSITIONS WHEN THE DIGITS ARE LESS THAN 2!!!
lidar$xy <- do.call(paste0,lidar[,1:2])
combine$x <- round(combine$x, digits = 2)
combine$y <- round(combine$y, digits = 2)
combine$xy <- do.call(paste0,combine[1:2])
finaldata <- merge(combine,lidar,by = 'xy', all = FALSE)
EDIT 1
As suggested by #Heroka, here is also an example of how the lidar (the dataset A) looks like, and how it should be after padding it with zeros.
LIDAR (original)
x y z
12 9 87
11 23.4 100
LIDAR (altered, and with 'xy' column added for joining)
x y z xy
12.00 9.00 87 12.009.00
11.00 23.40 100 11.0023.40
EDIT 2
I somehow managed to retrieve the number of digits in all x and y of my 'lidar' variable (dataset B) with counting <- sapply(lidar$x, decimalplaces)
In the example above (LIDAR-original), this would give [0 0] for the first (x) column, and [0 1] for the second (y) column. I should be able to find each row in my x y datset with a value of 0 or 1 as digits (not 2) and pad with 0 like in LIDAR-altered above.
I do not understand why you need to pad with zeros. If the coordinates are of class numeric and both were rounded using round (which should avoid issues of floating point precision) you should be able to just merge by them. Something like this:
lidar$x <- round(lidar$x, 2)
lidar$y <- round(lidar$y, 2)
combine$x <- round(combine$x, digits = 2)
combine$y <- round(combine$y, digits = 2)
finaldata <- merge(combine, lidar, by = c("x", "y") , all = FALSE)
I would like to crop the no-data part of some rasters (example of the image in 1 where no-data is in black) without defining the extent manually.
Any idea?
You can use trim to remove exterior rows and columns that only have NA values:
library(raster)
r <- raster(ncols=18,nrows=18)
r[39:49] <- 1
r[205] <- 6
s <- trim(r)
To change other values to or from NA you can use reclassify. For example, to change NA to 0:
x <- reclassify(r, cbind(NA, 0))
[ subsetting and [<- replacement methods are defined for raster objects so you can simply do r[ r[] == 1 ] <- NA to get rid of the values where 1 is your nodata value (use NAvalue(r) to find out what R considers your nodata value is supposed to be if you aren't sure).
Note you have to use r[] inside the [ subsetting command to access the values. Here is a worked example...
Example
# Make a raster from system file
logo1 <- raster(system.file("external/rlogo.grd", package="raster"))
# Copy to see difference
logo2 <- logo1
# Set all values in logo2 that are > 230 to be NA
logo2[ logo2[] > 230 ] <- NA
# Observe difference
par( mfrow = c( 1,2 ) )
plot(logo1)
plot(logo2)
I have 2 slightly different solutions. The first requires to manually identify the extent but uses predefined functions. The second is more automatic, but a bit more handmade.
Create a reproducible raster for which the first 2 rows are NA
library(raster)
# Create a reproducible example
r1 <- raster(ncol=10, nrow=10)
# The first 2 rows are filled with NAs (no value)
r1[] <- c(rep(NA,20),21:100)
Solution #1
Manually get the extent from the plotted figure using drawExtent()
plot(r1)
r1CropExtent <- drawExtent()
Crop the raster using the extent selected from the figure
r2 <- crop(r1, r1CropExtent)
Plot for comparison
layout(matrix(1:2, nrow=1))
plot(r1)
plot(r2)
Solution #2
It identifies the rows and columns of the raster that only have NA values and remove the ones that are on the margin of the raster. It then calculate the extent using extent().
Transform the raster into a matrix that identifies whether the values are NA or not.
r1NaM <- is.na(as.matrix(r1))
Find the columns and rows that are not completely filled by NAs
colNotNA <- which(colSums(r1NaM) != nrow(r1))
rowNotNA <- which(rowSums(r1NaM) != ncol(r1))
Find the extent of the new raster by using the first ans last columns and rows that are not completely filled by NAs. Use crop() to crop the new raster.
r3Extent <- extent(r1, rowNotNA[1], rowNotNA[length(rowNotNA)],
colNotNA[1], colNotNA[length(colNotNA)])
r3 <- crop(r1, r3Extent)
Plot the rasters for comparison.
layout(matrix(1:2, nrow=1))
plot(r1)
plot(r3)
I have written a small function based on Marie's answer to quickly plot cropped rasters. However, there may be a memory issue if the raster is extremely large, because the computer may not have enough RAM to load the raster as a matrix.
I therefore wrote a memory safe function which will use Marie's method if the computer has enough RAM (because it is the fastest way), or a method based on raster functions if the computer does not have enough RAM (it is slower but memory-safe).
Here is the function:
plotCroppedRaster <- function(x, na.value = NA)
{
if(!is.na(na.value))
{
x[x == na.value] <- NA
}
if(canProcessInMemory(x, n = 2))
{
x.matrix <- is.na(as.matrix(x))
colNotNA <- which(colSums(x.matrix) != nrow(x))
rowNotNA <- which(rowSums(x.matrix) != ncol(x))
croppedExtent <- extent(x,
r1 = rowNotNA[1],
r2 = rowNotNA[length(rowNotNA)],
c1 = colNotNA[1],
c2 = colNotNA[length(colNotNA)])
plot(crop(x, croppedExtent))
} else
{
xNA <- is.na(x)
colNotNA <- which(colSums(xNA) != nrow(x))
rowNotNA <- which(rowSums(xNA) != ncol(x))
croppedExtent <- extent(x,
r1 = rowNotNA[1],
r2 = rowNotNA[length(rowNotNA)],
c1 = colNotNA[1],
c2 = colNotNA[length(colNotNA)])
plot(crop(x, croppedExtent))
}
}
Examples :
library(raster)
r1 <- raster(ncol=10, nrow=10)
r1[] <- c(rep(NA,20),21:100)
# Uncropped
plot(r1)
# Cropped
plotCroppedRaster(r1)
# If the no-data value is different, for example 0
r2 <- raster(ncol=10, nrow=10)
r2[] <- c(rep(0,20),21:100)
# Uncropped
plot(r2)
# Cropped
plotCroppedRaster(r2, na.value = 0)
If you use the rasterVis package (any version after Jun 25, 2021), it will automatically crop the NA values out for terra's SpatRaster
Install rasterVis development version from GitHub
if (!require("librarian")) install.packages("librarian")
librarian::shelf(raster, terra, oscarperpinan/rastervis)
# Create a reproducible example
r1 <- raster(ncol = 10, nrow = 10)
# The first 2 rows are filled with NAs (no value)
r1[] <- c(rep(NA, 20), 21:100)
levelplot() for r1
rasterVis::levelplot(r1,
margin = list(axis = TRUE))
Convert to terra's SpatRaster then plot again using levelplot()
r2 <- rast(r1)
rasterVis::levelplot(r2,
margin = list(axis = TRUE))
Created on 2021-06-26 by the reprex package (v2.0.0)