I want to calculate the share (%) of pixels classified as 1 from a list of files. For a single image the code works well, however, when I try to write it in a for loop R tells me named numeric(0) for all files.
How do I get what I want?
Single Image:
ras <- raster("path") # binary product
ras_df <- as.data.frame(ras) # creates data frame
ras_table <- table(ras_df$file) # creates table
share_suit_hab <- ras_table[names(ras_table)==1]/sum(ras_table[names(ras_table)]) # number of pixels with value 1 divided by sum of pixels with value 0 and 1 = share of suitable habitat (%)
print(share_suit_hab)
> ras
class : RasterLayer
dimensions : 1000, 1000, 1e+06 (nrow, ncol, ncell)
resolution : 2165.773, 2463.182 (x, y)
extent : -195054.2, 1970719, 2723279, 5186461 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
source : C:/Users/name/MASTERARBEIT/BASELINE/Eastern Arctic/Summer_EA_Output/ct/2006/cis_SGRDREA_20060703_pl_a.tif
names : cis_SGRDREA_20060703_pl_a
values : 0, 1 (min, max)
For Loop:
list_ct <- list.dirs("path")
i=0
for(year in list_ct){
ct_files_list <- list.files(year, recursive = FALSE, pattern = "\\.tif$", full.names = FALSE)
ct_file_df <- as.data.frame(paste0("path", i, "/", ct_files_list))
ct_file_df <- as.data.frame(matrix(unlist(ct_file_df), nrow= length(unlist(ct_file_df[1]))))
ct_table <- table(ct_file_df[, 1])
stored <- ct_table[names(ct_table)==1]/sum(ct_table[names(ct_table)])
print(stored)
}
This is the final code which is running perfectly!
list_ct <- list.dirs("path", recursive = FALSE)
stored <- list()
for (year in seq_along(list_ct)){
ct_file_list <- list.files(list_ct[year], recursive=FALSE, pattern = ".tif$", full.names = FALSE)
tmp <- list()
for (i in seq_along(ct_file_list)){
ct_file_df <- raster(paste0(list_ct[year], "/", ct_file_list[i])) %>% as.data.frame()
# do calculations
tmp[[i]] <- sum(ct_file_df[,1], na.rm=TRUE) / length(ct_file_df[!is.na(ct_file_df)[],1])
names(tmp)[i] <- paste0(list_ct[year], "/", ct_file_list[i])
print(tmp[i])
}
stored[[year]] <- tmp
names(stored)[year] <- paste0(list_ct[year])
}
Could you add a reproducible example (data incl.)?
You probably need to replace numeric(0) simply by 0. Numeric(0) does
not mean 0, it means a numeric vector of length zero (i.e., empty). I'm guessing you're probably assigning numeric(0)+1 which is still a numeric vector of 0.
Edit:
You have a folder containing multiple folders which each include 1 or more tif files. You want to loop through each of these folders, importing the tif(s) file, do a calculation, save the result.
In the following, my path contains 5 folders named '2006','2007','2008','2009' and '2010'. Each of these "year"-folders contain an .xlsx file. Each .xlsx file contains 1 column (here, you just need to select the right one in your data frame). This column has the same name in all excel files, "col1", and contains values between 0 and 1. Then this will work:
library(dplyr)
library(readxl)
#
list_ct <- list.dirs("mypath", recursive = FALSE)
stored <- list()
for (year in seq_along(list_ct)){
ct_file_list <- list.files(list_ct[year], recursive=FALSE, pattern = ".xlsx$", full.names = FALSE)
tmp <- list()
for (i in seq_along(ct_file_list)){
ct_file_df <- read_excel(paste0(list_ct[year], "/", ct_file_list[i])) %>% as.data.frame()
# do calculations ..
tmp[[i]] <- sum(ct_file_df$col1) / length(ct_file_df$col1)
names(tmp)[i] <- paste0(list_ct[year], "/", ct_file_list[i])
print(tmp[i])
}
stored[[year]] <- tmp
names(stored)[year] <- paste0(list_ct[year])
}
Instead of using "read_excel", you just use raster() like you did with the single file. Hope you can use the answer.
Example data
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
s <- s > 200
#plot(s)
If your actual data is all for the same area (and the raster data have the same extent and resolution, you want to create a RasterStack (using the filenames) and use freq as below
f <- freq(s)
f
#$red
# value count
#[1,] 0 3975
#[2,] 1 3802
#$green
# value count
#[1,] 0 3915
#[2,] 1 3862
#$blue
# value count
#[1,] 0 3406
#[2,] 1 4371
Followed by
sapply(f, function(x) x[2,2]/sum(x[,2]))
# red.count green.count blue.count
# 0.4888775 0.4965925 0.5620419
If you cannot make a RasterStack you can make a list and lapply and continue as above, or use sapply and do this
ss <- as.list(s)
x <- sapply(ss, freq)
x[4,] / colSums(x[3:4, ])
#[1] 0.4888775 0.4965925 0.5620419
If you insist on a loop
res <- rep(NA, length(ss))
for (i in 1:length(ss)) {
# r <- raster(ss[i]) # if these were filenames
r <- ss[[i]] # here we extract from the list
x <- freq(r)[,2]
res[i] <- x[2] / sum(x)
}
res
# 0.4888775 0.4965925 0.5620419
Thank you!
This is working perfectly for all files of one year!
library(raster)
s_list <- list.files("C:/Users/OneDrive - wwfgermany/MASTERARBEIT/BASELINE/Eastern Arctic/Summer_EA_Output/area_calc/ct/2006/", full.names = T)
s <- raster::stack(s_list)
f <- freq(s, useNA = 'no')
f
ct_avg <- sapply(f, function(x) x[2,2]/sum(x[,2]))
ct_avg__mean <- mean(ct_avg)
ct_avg__mean
However, when I want to write it in another loop, to get one value per year as a final result in the end, I end up with an error saying subscript out of bounds. This is the code I am using:
setwd("C:/Users/MASTERARBEIT/BASELINE/Eastern Arctic/Summer_EA_Output/area_calc/ct/")
list_ct <- list.dirs("C:/Users/MASTERARBEIT/BASELINE/Eastern Arctic/Summer_EA_Output/area_calc/ct/")
i=0
for (year in list_ct) {
s_list <- list.files(year, recursive = FALSE, pattern = "\\.tif$", full.names = FALSE)
s <- raster::stack(s_list)
f <- freq(s, useNA = 'no')
f
ct_avg <- sapply(f, function(x) x[2,2]/sum(x[,2]))
ct_avg__mean <- mean(ct_avg)
ct_avg__mean
}
Related
I am trying to create a CSV file that is a list of all unique values in my dataset. My data is from a folder that contains 200+ CSV files all with 9 columns and a varying number of rows. Some files have no duplicates but many have duplicate values. I have found a code that lists how many rows in each file but I am wondering what I could add to it so it removes the duplicate values and only counts the unique values in the final output CSV. I would like the final CSV file to list the row count each of the 200+ files in one sheet.
The code I found is below
library(tidyverse)
csv.file <- list.files("TestA") # Directory with your .csv files
data.frame.output <- data.frame(number_of_cols = NA,
number_of_rows = NA,
name_of_csv = NA) #The df to be written
MyF <- function(x){
csv.read.file <- data.table::fread(
paste("TestA", x, sep = "/")
)
number.of.cols <- ncol(csv.read.file)
number.of.rows <- nrow(csv.read.file)
data.frame.output <<- add_row(data.frame.output,
number_of_cols = number.of.cols,
number_of_rows = number.of.rows,
name_of_csv = str_remove_all(x,".csv")) %>%
filter(!is.na(name_of_csv))
}
map(csv.file, MyF)
data.table::fwrite(data.frame.output, file = "Output1.csv")
I appreciate any guidance as I am a total R/coding beginner.
The following function accepts a vector of file names, reads them one by one, removes duplicated rows and outputs a data.frame with numbers of columns and rows and CSV filename.
There is no need to previously create a results data.frame data.frame.output.
MyF <- function(x, path = "TestA"){
f <- function(x, path) {
# commented out to test the function
# uncomment these 3 lines and comment out the next one
#csv.read.file <- data.table::fread(
# file.path(path, x)
#)
csv.read.file <- data.table::fread(x)
i_dups <- (duplicated(csv.read.file) | duplicated(csv.read.file, fromLast = TRUE))
csv.read.file <- csv.read.file[!i_dups, ]
#
number.of.cols <- ncol(csv.read.file)
number.of.rows <- nrow(csv.read.file)
#
name_of_csv <- if(is.na(x)) NA_character_ else basename(x)
name_of_csv <- tools::file_path_sans_ext(name_of_csv)
#
data.frame(number_of_cols = number.of.cols,
number_of_rows = number.of.rows,
name_of_csv = name_of_csv) |>
dplyr::filter(!is.na(name_of_csv))
}
#
y <- purrr::map(x, f, path = path)
data.table::rbindlist(y)
}
data.frame.output <- MyF(csv.file)
data.table::fwrite(data.frame.output, file = "Output1.csv")
I find this for loop version better. Though for loops are not considered very idiomatic in R, there is nothing wrong with them. Like the function above, it avoids assignment in the parent environment with the operator <<- and the code is simpler. The results data.frame data.frame.output is created beforehand with the number of rows equal to the length of the input filenames vector and assignment is done by replacing the NA values by each CSV files' values.
MyF <- function(x, path = "TestA"){
data.frame.output <- data.frame(number_of_cols = rep(NA, length(x)),
number_of_rows = rep(NA, length(x)),
name_of_csv = rep(NA, length(x)))
for(i in seq_along(x)) {
# commented out to test the function
# uncomment this line and comment out the next one
#fl_name <- file.path(path, x[i])
fl_name <- x[i]
#
csv.read.file <- data.table::fread(fl_name)
i_dups <- (duplicated(csv.read.file) | duplicated(csv.read.file, fromLast = TRUE))
csv.read.file <- csv.read.file[!i_dups, ]
#
data.frame.output$number_of_cols[i] <- ncol(csv.read.file)
data.frame.output$number_of_rows[i] <- nrow(csv.read.file)
#
name_of_csv <- if(is.na(fl_name)) NA_character_ else basename(fl_name)
name_of_csv <- tools::file_path_sans_ext(name_of_csv)
data.frame.output$name_of_csv[i] <- name_of_csv
}
#
data.frame.output |> dplyr::filter(!is.na(name_of_csv))
}
MyF(csv.file)
I have been trying to write a loop to go through two folders of Sentinel 2 satellite images (Band 4 and 5) and get a NDVI for each date.
A stack is created for each band, some cropping and resampling to finally proceed to the NDVI calculation. I struggle with the integration of the NDVI calculation in the loop and the file name creation.
I'd simply want my loop to generate x files for x dates and then give each NDVI images the date as a name "YYYY/MM/DD.tif" extracted from the file name. But I can't think of a way to do so, after a lot of unsuccessful trial and error.
#list files
files4 <- list.files(path4, pattern = "jp2$", full.names = TRUE)
files5 <- list.files(path5, pattern = "jp2$", full.names = TRUE)
ms5 <- stack()
ms4 <- stack()
for (f in files4){
# loading a raster
r4 <- raster(f)
proj4string(r4)
proj4string(emprise)
emprise <- spTransform(emprise, proj4string(r4))
r4b <- crop(r4, emprise)
ms4<- stack(ms4,r4b)
#copy the date from the file to give a name to the final NDVI image (I have to get ride of everything but the date
x <- gsub("[A-z //.//(//)]", "", r4)
y <- substr(x, 4, 11)
}
for (f in files5){
# load the raster
r5 <- raster(f)
proj4string(r5)
proj4string(emprise)
emprise <- spTransform(emprise, proj4string(r5))
r5b <- crop(r5, emprise)
ms5<- stack(ms5,r5b)
}
#Resampling : setting the Band 5 to the same resolution as Band 4
b5_resamp <- resample(ms5, ms4)
Have you considered looping over dates rather than files? I can't give more specific advice without example data, but here is the general idea:
# List files
files4 <- list.files("./band4", pattern = ".tif", full.names = TRUE)
#> "band4/T31UDR_20170126T105321_B04.tif" "band4/T31UDR_20180126T105321_B04.tif"
files5 <- list.files("./band5", pattern = ".tif", full.names = TRUE)
#> "./band5/T31UDR_20170126T105321_B05.tif" "./band5/T31UDR_20180126T105321_B05.tif"
# Get dates
dates <- unique(gsub(pattern = ".*_(\\d{8}).*", replacement = "\\1", x = c(files4, files5)))
#> "20170126" "20180126"
# Define empty stacks
ms5 <- stack()
ms4 <- stack()
for(date in dates){
## Band 4
f4 <- list.files("./band4", pattern = date, full.names = TRUE)
# loading a raster
r4 <- raster(f4)
proj4string(r4)
proj4string(emprise)
emprise <- spTransform(emprise, proj4string(r4))
r4b <- crop(r4, emprise)
ms4 <- stack(ms4,r4b)
## Band 5
f5 <- list.files("./band5", pattern = date, full.names = TRUE)
# load the raster
r5 <- raster(f5)
proj4string(r5)
proj4string(emprise)
emprise <- spTransform(emprise, proj4string(r5))
r5b <- crop(r5, emprise)
ms5<- stack(ms5,r5b)
## Resampling : setting the Band 5 to the same resolution as Band 4
b5_resamp <- resample(ms5, ms4)
## Write to file
writeRaster(b5_resamp, filename = paste0(date, ".tif"))
}
I have a dataframe:
In total more than 3 million rows and 1800 species (scientific name)
The code below creates an empty raster at 0.5 degree scale..
library(raster)
ext <- extent(-180.0, 180, -90.0, 90.0)
gridsize <- 0.5
tempraster<- raster(ext, res=gridsize)
crs(tempraster) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"`
#and then the code below fills that raster with x y data one species at a time, creating an ascii as per the scientific name, with a 1 where the species is and a 0 where it is not.
selection<-animals
spp <- unique(animals$scientific_name)
result <- list()
for (i in 1:length(spp)) {
spi <- selection[selection$scientific_name == spp[i], c("lon", "lat")]
fname <- paste0(spp[i], ".asc")
result[[i]] <- rasterize(spi, tempraster, fun="count", filename=fname, background = 0, overwrite = TRUE)}
I would like to adjust this code so that instead of the resulting ascii having a 1 where the species is, it uses the value from the total column. Unfortunately I am a beginner at for loops and other functions so am asking for any help.
rasterize() function has field argument, so you can call it like this:
result[[i]] <- rasterize(spi, tempraster,
field=selection[selection$scientific_name == spp[i], "total"],
fun=sum,
filename=fname, background = 0,
overwrite = TRUE)
I wish to convert a raster to a csv file. I have tried to convert a raster to a dataframe on one file just to see if it works. I have tried using:
as.data.frame( rasterToPoints(species) )
but I get an error when I try to write "species" to a csv :
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class "structure("RasterLayer", package = "raster")" to a data.frame
This is my code (I need to convert multiple rasters to csv (see the loop))
#start loop
file.names <- dir(path, pattern=".csv")
for(i in 1:length(file.names)){
file<- read.csv(file.name[i], header = TRUE, stringsAsFactors=FALSE)
#subsetting each file and renaming column header names
sub.file<-subset(file, select = c('Matched.Scientific.Name', 'Vernacular.Name...matched', 'Latitude...processed', 'Longitude...processed'))
names(sub.file) <- c('species', 'name', 'Lat','Lon')
#turn into a SpatialPointsDataFrame
coordinates(sub.file) <- ~ Lon + Lat
proj4string(sub.file) <- '+init=EPSG:4326'
plot(sub.file, axes=TRUE)
#converting to BNG
sub.file.BNG <- spTransform(sub.file, '+init=EPSG:27700')
plot(sub.file.BNG, axes=TRUE)
#creating template raster
template <- raster(xmn=400000, xmx=600000, ymn=200000, ymx=800000, res=25000, crs='+init=EPSG:27700')
#point data > presence grid
species <- rasterize(sub.file.BNG, template, field=1)
plot(species)
# UK wide
template <- raster(xmn=-200000, xmx=700000, ymn=0, ymx=1250000, res=25000, crs='+init=EPSG:27700')
# use that to turn species point data into a presence grid
species <- rasterize(sub.file, template, field=1)
plot(species)
#converting a raster>dataframe>csv?????
as.data.frame( rasterToPoints(species) )
}
Always provide some example data when asking a question.
library(raster)
f <- system.file("external/test.grd", package="raster")
r <- raster(f)
To get the cell values
x <- as.data.frame(r)
head(x, 2)
# test
#1 NA
#2 NA
To get the cell coordinates and values, only for cells that are not NA
x <- rasterToPoints(r)
head(x, 2)
# x y test
#[1,] 181180 333740 633.686
#[2,] 181140 333700 712.545
To get the cell coordinates and values, only for all cells (including NA)
x <- cbind(coordinates(r), v=values(r))
head(x, 2)
# x y v
#[1,] 178420 333980 NA
#[2,] 178460 333980 NA
Whichever one you choose, you can then do
write.csv(x, "test.csv")
The mistake you made is that you did not assign the result of as.data.frame to a variable, and then tried to write the RasterLayer with write.csv. That is an error, and you get
write.csv(r)
#Error in as.data.frame.default(x[[i]], optional = TRUE) :
# cannot coerce class ‘structure("RasterLayer", package = "raster")’ to a
# data.frame
By the way, if you have multiple rasters, you may want to combine them first
s <- stack(r, r, r)
x <- rasterToPoints(s)
head(x, 2)
# x y test.1 test.2 test.3
#[1,] 181180 333740 633.686 633.686 633.686
#[2,] 181140 333700 712.545 712.545 712.545
write.csv(x, "test.csv")
Assuming your raster is "species"
species<- raster("C:/.../species.tif")
To perform this conversion, it is necessary to take the values of each pixel: X coordinates (1), Y coordinates (2) and own values of each cell (3).
# don't run these lines
#(1) = coordinates (species) [, 1]
#(2) = coordinates (species) [, 2]
#(3) = values (species)
Having these expressions we can add them to a dataframe as follows
dat<- data.frame("X"=coordinates(species)[,1],"Y"=coordinates(species)
[,2],"Values"=values(species))
I'm quite new to R and I have a problem on which I couldn't find a solution so far.
I have a folder of 1000 raster files. I have to get the median of all rasters for each cell.
The files contain NoData Cells (I think therefore they have different extents)
Is there any solution to loop through the folder, adding together all files an getting the median?
Error in rep(value, times = ncell(x)) : invalid 'times' argument
In addition: Warning message:
In setValues(x, rep(value, times = ncell(x))) : NAs introduced by coercion
Error in .local(x, i, j, ..., value) :
cannot replace values on this raster (it is too large
I tried with raster stack, but it doesn't work because of the different extents.
Thanks for your help.
I'll try to approach this by mosaic()'ing images with different extents and origins but same resolution.
Create a few rasterLayer objects and export them (to read latter)
library('raster')
library('rgdal')
e1 <- extent(0,10,0,10)
r1 <- raster(e1)
res(r1) <- 0.5
r1[] <- runif(400, min = 0, max = 1)
#plot(r1)
e2 <- extent(5,15,5,15)
r2 <- raster(e2)
res(r2) <- 0.5
r2[] <- rnorm(400, 5, 1)
#plot(r2)
e3 <- extent(18,40,18,40)
r3 <- raster(e3)
res(r3) <- 0.5
r3[] <- rnorm(1936, 12, 1)
#plot(r3)
# Write them out
wdata <- '../Stackoverflow/21876858' # your local folder
writeRaster(r1, file.path(wdata, 'r1.tif'),
overwrite = TRUE)
writeRaster(r2,file.path(wdata, 'r2.tif'),
overwrite = TRUE)
writeRaster(r3,file.path(wdata, 'r3.tif'),
overwrite = TRUE)
Read and Mosaic'ing with function
Since raster::mosaic do not accept rasterStack/rasterBrick or lists of rasterLayers, the best approach is to use do.call, like this excellent example.
To do so, adjust mosaic signature and how to call its arguments with:
setMethod('mosaic', signature(x='list', y='missing'),
function(x, y, fun, tolerance=0.05, filename=""){
stopifnot(missing(y))
args <- x
if (!missing(fun)) args$fun <- fun
if (!missing(tolerance)) args$tolerance<- tolerance
if (!missing(filename)) args$filename<- filename
do.call(mosaic, args)
})
Let's keep tolerance low here to evaluate any misbehavior of our function.
Finally, the function:
Mosaic function
f.Mosaic <- function(x=x, func = median){
files <- list.files(file.path(wdata), all.files = F)
# List TIF files at wdata folder
ltif <- grep(".tif$", files, ignore.case = TRUE, value = TRUE)
#lext <- list()
#1rt <- raster(file.path(wdata, i),
# package = "raster", varname = fname, dataType = 'FLT4S')
# Give an extent area here (you can read it from your first tif or define manually)
uext <- extent(c(0, 100, 0, 100))
# Get Total Extent Area
stkl <- list()
for(i in 1:length(ltif)){
x <- raster(file.path(wdata, ltif[i]),
package = "raster", varname = fname, dataType = 'FLT4S')
xext <- extent(x)
uext <- union(uext, xext)
stkl[[i]] <- x
}
# Global Area empty rasterLayer
rt <- raster(uext)
res(rt) <- 0.5
rt[] <- NA
# Merge each rasterLayer to Global Extent area
stck <- list()
for(i in 1:length(stkl)){
merged.r <- merge(stkl[[i]], rt, tolerance = 1e+6)
#merged.r <- reclassify(merged.r, matrix(c(NA, 0), nrow = 1))
stck[[i]] <- merged.r
}
# Mosaic with Median
mosaic.r <- raster::mosaic(stck, fun = func) # using median
mosaic.r
}
# Run the function with func = median
mosaiced <- f.Mosaic(x, func = median)
# Plot it
plot(mosaiced)
Possibly far from the best approach but hope it helps.