How to batch process geoTIFFs in R with lapply - r

I have some large geoTIFFs, now I want to convert them to ASCII files, after doing some searches, I write these codes:
library(raster)
f <- list.files("inputFolder", pattern = "*.tif", full.names = TRUE)
r <- lapply(f, raster)
a <- lapply(r, writeRaster, filename = "output", format = "ascii")
What confused me is that how can I name the output files respectively, according to its original names?
I tried:
a <- lapply(r, writeRaster, filename = "outputFolder" + f, format = "ascii")
But I received error:
non-numeric argument to binary operator
Then I tried:
a <- lapply(r, writeRaster, filename = paste0(f, ".asc"), format = "ascii")
But I received:
Error in file(filename, "w") : invalid 'description' argument In
addition: Warning messages: 1: In if (filename == "") { : the
condition has length > 1 and only the first element will be used 2: In
if (!file.exists(dirname(filename))) { : the condition has length >
1 and only the first element will be used 3: In if
(toupper(x#file#name) == toupper(filename)) { : the condition has
length > 1 and only the first element will be used 4: In if
(trim(filename) == "") { : the condition has length > 1 and only the
first element will be used 5: In if (!file.exists(dirname(filename)))
{ : the condition has length > 1 and only the first element will be
used 6: In if (filename == "") { : the condition has length > 1 and
only the first element will be used 7: In if (!overwrite &
file.exists(filename)) { : the condition has length > 1 and only the
first element will be used

I think you were basically nearly there, with two corrections:
First, you're calling writeRaster for its side effects (i.e. its ability to write a file to your filesystem) so you don't need to assign the output of your lapply() loop to an object. So, removing a <- we have:
lapply(r, writeRaster, filename = paste0(f, ".asc"), format = "ascii")
Next, the filename argument won't loop through f in this way. You have two options, of which the simplest is probably to pass the #file#name slot of r to the filename argument using an anonymous function:
lapply(r, function(x) {
writeRaster(x, filename = x#file#name, format = "ascii", overwrite = TRUE)
})
Your other option would be to loop through r and f in parallel like you can in python with for r, f in..., which can be done with purrr:
library("purrr")
walk2(r, f, function(x, y) {
writeRaster(x = x, filename = y, format = "ascii")
})
Here we're using walk2() rather than map2() because we need to call the function for side effects. This loops through r and f together so you can pass one to be the object to write, and one to be the filename.
Edit: here's the code I use to reproduce the problem
library("raster")
tmp_dir = tempdir()
tmp = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
"http://biogeo.ucdavis.edu/data/climate/cmip5/10m/cc26bi50.zip",
destfile = tmp
)
unzip(tmp, exdir = tmp_dir)
f = list.files(tmp_dir, pattern = ".tif$", full.names = TRUE)
r = lapply(f, raster)
# Solution one
lapply(r, function(x) {
writeRaster(x, filename = x#file#name, format = "ascii", overwrite = TRUE)
})
# solution two
library("purrr")
walk2(r, f, function(x, y) {
writeRaster(x = x, filename = y, format = "ascii")
})

To test how to do this with small files:
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
writeRaster(s, file='testtif', format='GTiff', bylayer=T, overwrite=T)
f <- list.files(pattern="testtif_..tif")
Now you can use f with Phil's nice examples. You can also combine all in one step lapply:
f <- list.files("inputFolder", pattern = "*.tif", full.names = TRUE)
r <- lapply(f, function(i) { writeRaster(raster(i), filename=extension(i, '.asc'), overwrite=TRUE)} )
But if you have trouble with lapply, write a loop (it is fine!):
for (i in 1:length(f)) {
r <- raster(f[i])
ff <- extension(f[i], '.asc')
writeRaster(r, ff)
}
Or like this
for (file in f) {
r <- raster(file)
ff <- extension(file, '.asc')
writeRaster(r, ff)
}

Related

Why can I not print out dataframe from Excel in R

Trying to print out dataframe that is created after importing Excel file into R using following code:
library("readxl")
data <- read_excel("grad programs.xlsx", sheet="Sheet2")
print(data)
But instead of getting the Excel file, I get this really long random message:
print(data)
function (..., list = character(), package = NULL, lib.loc = NULL,
verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
{
fileExt <- function(x) {
db <- grepl("\\.[^.]+\\.(gz|bz2|xz)$", x)
ans <- sub(".*\\.", "", x)
ans[db] <- sub(".*\\.([^.]+\\.)(gz|bz2|xz)$", "\\1\\2",
x[db])
ans
}
my_read_table <- function(...) {
lcc <- Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE", lcc))
Sys.setlocale("LC_COLLATE", "C")
read.table(...)
}
stopifnot(is.character(list))
names <- c(as.character(substitute(list(...))[-1L]), list)
if (!is.null(package)) {
if (!is.character(package))
stop("'package' must be a character vector or NULL")
}
paths <- find.package(package, lib.loc, verbose = verbose)
if (is.null(lib.loc))
paths <- c(path.package(package, TRUE), if (!length(package)) getwd(),
paths)
paths <- unique(normalizePath(paths[file.exists(paths)]))
paths <- paths[dir.exists(file.path(paths, "data"))]
dataExts <- tools:::.make_file_exts("data")
if (length(names) == 0L) {
db <- matrix(character(), nrow = 0L, ncol = 4L)
for (path in paths) {
entries <- NULL
packageName <- if (file_test("-f", file.path(path,
"DESCRIPTION")))
basename(path)
else "."
Message is longer than that, but that's the start - any idea why get this message rather than the actual data in the Excel sheet

Insert the content of column in many csv files in one list R

I want to read the content of the column defined as an input of my function , from different csv files I use the following function
Getmean <- function(directory,pollutant,id=1:332)
{
i<- 1
filenames <- sprintf("%0.3d", id)
Data <- list()
for (filename in filenames)
{
Data[i] <- read.csv(file= filename, sep=",", colClasses=c(pollutant))
i <- i++
}
result <- mean(Data ,na.rm = TRUE)
}
Error: unexpected '}' in:
"
}"
result <- mean(Data ,na.rm = TRUE)
Error in mean(Data, na.rm = TRUE) : object 'Data' not found
}
Error: unexpected '}' in "}"
any idea how to fix this, this is my first steps in R
Is the use of data.table::fread an option?
first, get a list of files you wish to read using list.files()
fileslist <- list.files(
path = "./",
pattern = "(^[0-3]\\d\\d|^4[0-3]\\d|^44[0-4]).csv$", # regex to select 000.csv to 444.csv
full.names = TRUE )
Then use fread to read to a list, only keeping the "pollutant"-column, usign the select-argument of the function.
library( data.table )
contents <- lapply( fileslist, fread, select = c( "pollutant" ) )
then perform the desired mean-operation on this list...
sapply( contents, mean, na.rm = TRUE )
R doesn't have the operand ++ as you use it, so it's waiting for the second argument that needs to follow a +. Replace with i=i+1.

adding to lists together using cbind

This program works because I made the varibles inisde lapply global by using the <<- operator. However, it does not work with the real files in the real program. These are .tsv files whith named columns. The answer I get when I run the real program is: Error: (converted from warning) Error in : (converted from warning) Error in : arguments imply differing number of rows: 3455, 4319. What might be causing this?
lc <- list("test.txt", "test.txt", "test.txt", "test.txt")
lc1 <- list("test.txt", "test.txt", "test.txt")
lc2 <- list("test.txt", "test.txt")
#list of lists. The lists contain file names
lc <- list(lc, lc1, lc2)
#new names for the three lists in the list of lists
new_dataFns <- list("name1", "name2", "name3")
file_paths <- NULL
new_path <- NULL
#add the file names to the path and read and merge the contents of each list in the list of lists
lapply(
lc,
function(lc) {
filenames <- file.path(getwd(), lc)
dataList <<- lapply(filenames, function (lc) read.table(file=lc, header=TRUE))
dataList <<- lapply(dataList, function(dataList) {merge(as.data.frame(dataList),as.data.frame(dataList))})
}
)
#add the new name of the file to the path total will be 3 paths/fille_newname.tsv.
lapply(new_dataFns, function(new_dataFns) {new_path <<- file.path(getwd(), new_dataFns)})
print(new_path)
print(dataList)
finalFiles <- merge(as.data.frame(dataList), as.data.frame(new_path))
print(finalFiles)
I found a solution to the problem by writing a different type of code. Please see below. The input to the function is provided by the app input widgets
glyCount1 <- function(answer = NULL, fileChoice = NULL, combination = NULL, enteredValue = NULL, nameList) {
lc = nameList
new_dataFns <- gsub(" ", "", nameList)
first_path <- NULL
new_path <- NULL
old_path <- NULL
file_content <- NULL
for(i in 1:length(lc)){
for(j in 1:length(lc[[i]])){
if(!is.null(lc[[i]])){
first_path[[j]]<- paste(getwd(), "/", lc[[i]][j], sep = "")
tryCatch(file_content[[j]] <- read.csv(file = first_path[[i]], header = TRUE, sep = ","), error = function(e) NULL)
old_path[[j]] <- paste(getwd(), "/", i, ".csv", sep = "")
write.table(file_content[[j]], file = old_path[[j]], append = TRUE, col.names = FALSE)
}
}
}
}

Skip empty files when importing text files

I have a folder with about 700 text files that I want to import and add a column to. I've figured out how to do this using the following code:
files = list.files(pattern = "*c.txt")
DF <- NULL
for (f in files) {
data <- read.table(f, header = F, sep=",")
data$species <- strsplit(f, split = "c.txt") <-- (column name is filename)
DF <- rbind(DF, data)
}
write.xlsx(DF,"B:/trends.xlsx")
Problem is, there are about 100 files that are empty. so the code stops at the first empty file and I get this error message:
Error in read.table(f, header = F, sep = ",") :
no lines available in input
Is there a way to skip over these empty files?
You can skip empty files by checking that file.size(some_file) > 0:
files <- list.files("~/tmp/tmpdir", pattern = "*.csv")
##
df_list <- lapply(files, function(x) {
if (!file.size(x) == 0) {
read.csv(x)
}
})
##
R> dim(do.call("rbind", df_list))
#[1] 50 2
This skips over the 10 files that are empty, and reads in the other 10 that are not.
Data:
for (i in 1:10) {
df <- data.frame(x = 1:5, y = 6:10)
write.csv(df, sprintf("~/tmp/tmpdir/file%i.csv", i), row.names = FALSE)
## empty file
system(sprintf("touch ~/tmp/tmpdir/emptyfile%i.csv", i))
}
For a different approach that introduces explicit error handling, think about a tryCatch to handle anything else bad that might happen in your read.table.
for (f in files) {
data <- tryCatch({
if (file.size(f) > 0){
read.table(f, header = F, sep=",")
}
}, error = function(err) {
# error handler picks up where error was generated
print(paste("Read.table didn't work!: ",err))
})
data$species <- strsplit(f, split = "c.txt")
DF <- rbind(DF, data)
}

Looping a function over multiple files

I wrote a simple function:
myfunction <- function(fileName, stringsAsFactors=TRUE,
check.names=FALSE,
skip =1,...) {
Data <- read.delim(fileName, skip = skip,
stringsAsFactors=stringsAsFactors,
check.names = check.names, ...)
cb <- list()
Index <- as.numeric(as.factor(Data[,1]))
cb <- cbind(Data, Index)
return(cb)
}
This function takes the first column of the file named Data, create an Index according to that first column and then cbind the file Data and the index created.
This function will be applied in file named: myfile_00.txt, myfile_01.txt and so on. For one single file it looks like:
myfunction (fileName = "myfile_00.txt")
myfunction (fileName = "myfile_01.txt")
.......
I have around 1000 files so I suppose, the loop can be as from another post:
mytxt <- dir(pattern=".txt")
n <- length(mytxt)
mylist <- vector("list", n)
for(i in 1:n) {
mylist[[i]] <- read.delim(mytxt[i], header = F, skip = 1)
}
then:
d <- lapply(mylist, myfunction)
Unfortunately it does not work... When using lapply an error occurs:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
Since I' m new in R probably I' m doing mistakes I'm not able to figure out.
Like #Arun pointed out, you are trying to run your function twice: once on the files and once one the data frames you have created... Instead, your code should look like this:
files <- list.files(pattern = ".txt")
mylist <- lapply(files, myfunction)

Resources