Why can I not print out dataframe from Excel in R - r

Trying to print out dataframe that is created after importing Excel file into R using following code:
library("readxl")
data <- read_excel("grad programs.xlsx", sheet="Sheet2")
print(data)
But instead of getting the Excel file, I get this really long random message:
print(data)
function (..., list = character(), package = NULL, lib.loc = NULL,
verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
{
fileExt <- function(x) {
db <- grepl("\\.[^.]+\\.(gz|bz2|xz)$", x)
ans <- sub(".*\\.", "", x)
ans[db] <- sub(".*\\.([^.]+\\.)(gz|bz2|xz)$", "\\1\\2",
x[db])
ans
}
my_read_table <- function(...) {
lcc <- Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE", lcc))
Sys.setlocale("LC_COLLATE", "C")
read.table(...)
}
stopifnot(is.character(list))
names <- c(as.character(substitute(list(...))[-1L]), list)
if (!is.null(package)) {
if (!is.character(package))
stop("'package' must be a character vector or NULL")
}
paths <- find.package(package, lib.loc, verbose = verbose)
if (is.null(lib.loc))
paths <- c(path.package(package, TRUE), if (!length(package)) getwd(),
paths)
paths <- unique(normalizePath(paths[file.exists(paths)]))
paths <- paths[dir.exists(file.path(paths, "data"))]
dataExts <- tools:::.make_file_exts("data")
if (length(names) == 0L) {
db <- matrix(character(), nrow = 0L, ncol = 4L)
for (path in paths) {
entries <- NULL
packageName <- if (file_test("-f", file.path(path,
"DESCRIPTION")))
basename(path)
else "."
Message is longer than that, but that's the start - any idea why get this message rather than the actual data in the Excel sheet

Related

Read csv file from S3 into spark in R

I've below code to read csv from s3 into spark
test_data <- spark_read_csv(
sc,
name = "Invites",
memory = FALSE,
path = "s3://xxxx/customer/Sample.csv")
csvcharobj <- rawToChar(test_data)
con <- textConnection(csvcharobj)
data <- read.csv(file = con)
But code is failing with below error
> csvcharobj <- rawToChar(test_data)
Error in rawToChar(test_data) : argument 'x' must be a raw vector
I have changed the code as below and it did work
test_data <- spark_read_csv(
sc,
name = "Invites",
memory = FALSE,
path = "s3://xxxx/customer/Sample.csv")
test <- as.data.table(test_data)
cols_to_mask <- c("EmailAddress")
anonymize <- function(x, algo="crc32") {
sapply(x, function(y) if(y == "" | is.na(y)) "" else digest(y, algo = algo))
}
setDT(test)
test[, (cols_to_mask) := lapply(.SD, anonymize), .SDcols = cols_to_mask]
print(test)

adding to lists together using cbind

This program works because I made the varibles inisde lapply global by using the <<- operator. However, it does not work with the real files in the real program. These are .tsv files whith named columns. The answer I get when I run the real program is: Error: (converted from warning) Error in : (converted from warning) Error in : arguments imply differing number of rows: 3455, 4319. What might be causing this?
lc <- list("test.txt", "test.txt", "test.txt", "test.txt")
lc1 <- list("test.txt", "test.txt", "test.txt")
lc2 <- list("test.txt", "test.txt")
#list of lists. The lists contain file names
lc <- list(lc, lc1, lc2)
#new names for the three lists in the list of lists
new_dataFns <- list("name1", "name2", "name3")
file_paths <- NULL
new_path <- NULL
#add the file names to the path and read and merge the contents of each list in the list of lists
lapply(
lc,
function(lc) {
filenames <- file.path(getwd(), lc)
dataList <<- lapply(filenames, function (lc) read.table(file=lc, header=TRUE))
dataList <<- lapply(dataList, function(dataList) {merge(as.data.frame(dataList),as.data.frame(dataList))})
}
)
#add the new name of the file to the path total will be 3 paths/fille_newname.tsv.
lapply(new_dataFns, function(new_dataFns) {new_path <<- file.path(getwd(), new_dataFns)})
print(new_path)
print(dataList)
finalFiles <- merge(as.data.frame(dataList), as.data.frame(new_path))
print(finalFiles)
I found a solution to the problem by writing a different type of code. Please see below. The input to the function is provided by the app input widgets
glyCount1 <- function(answer = NULL, fileChoice = NULL, combination = NULL, enteredValue = NULL, nameList) {
lc = nameList
new_dataFns <- gsub(" ", "", nameList)
first_path <- NULL
new_path <- NULL
old_path <- NULL
file_content <- NULL
for(i in 1:length(lc)){
for(j in 1:length(lc[[i]])){
if(!is.null(lc[[i]])){
first_path[[j]]<- paste(getwd(), "/", lc[[i]][j], sep = "")
tryCatch(file_content[[j]] <- read.csv(file = first_path[[i]], header = TRUE, sep = ","), error = function(e) NULL)
old_path[[j]] <- paste(getwd(), "/", i, ".csv", sep = "")
write.table(file_content[[j]], file = old_path[[j]], append = TRUE, col.names = FALSE)
}
}
}
}

How to batch process geoTIFFs in R with lapply

I have some large geoTIFFs, now I want to convert them to ASCII files, after doing some searches, I write these codes:
library(raster)
f <- list.files("inputFolder", pattern = "*.tif", full.names = TRUE)
r <- lapply(f, raster)
a <- lapply(r, writeRaster, filename = "output", format = "ascii")
What confused me is that how can I name the output files respectively, according to its original names?
I tried:
a <- lapply(r, writeRaster, filename = "outputFolder" + f, format = "ascii")
But I received error:
non-numeric argument to binary operator
Then I tried:
a <- lapply(r, writeRaster, filename = paste0(f, ".asc"), format = "ascii")
But I received:
Error in file(filename, "w") : invalid 'description' argument In
addition: Warning messages: 1: In if (filename == "") { : the
condition has length > 1 and only the first element will be used 2: In
if (!file.exists(dirname(filename))) { : the condition has length >
1 and only the first element will be used 3: In if
(toupper(x#file#name) == toupper(filename)) { : the condition has
length > 1 and only the first element will be used 4: In if
(trim(filename) == "") { : the condition has length > 1 and only the
first element will be used 5: In if (!file.exists(dirname(filename)))
{ : the condition has length > 1 and only the first element will be
used 6: In if (filename == "") { : the condition has length > 1 and
only the first element will be used 7: In if (!overwrite &
file.exists(filename)) { : the condition has length > 1 and only the
first element will be used
I think you were basically nearly there, with two corrections:
First, you're calling writeRaster for its side effects (i.e. its ability to write a file to your filesystem) so you don't need to assign the output of your lapply() loop to an object. So, removing a <- we have:
lapply(r, writeRaster, filename = paste0(f, ".asc"), format = "ascii")
Next, the filename argument won't loop through f in this way. You have two options, of which the simplest is probably to pass the #file#name slot of r to the filename argument using an anonymous function:
lapply(r, function(x) {
writeRaster(x, filename = x#file#name, format = "ascii", overwrite = TRUE)
})
Your other option would be to loop through r and f in parallel like you can in python with for r, f in..., which can be done with purrr:
library("purrr")
walk2(r, f, function(x, y) {
writeRaster(x = x, filename = y, format = "ascii")
})
Here we're using walk2() rather than map2() because we need to call the function for side effects. This loops through r and f together so you can pass one to be the object to write, and one to be the filename.
Edit: here's the code I use to reproduce the problem
library("raster")
tmp_dir = tempdir()
tmp = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
"http://biogeo.ucdavis.edu/data/climate/cmip5/10m/cc26bi50.zip",
destfile = tmp
)
unzip(tmp, exdir = tmp_dir)
f = list.files(tmp_dir, pattern = ".tif$", full.names = TRUE)
r = lapply(f, raster)
# Solution one
lapply(r, function(x) {
writeRaster(x, filename = x#file#name, format = "ascii", overwrite = TRUE)
})
# solution two
library("purrr")
walk2(r, f, function(x, y) {
writeRaster(x = x, filename = y, format = "ascii")
})
To test how to do this with small files:
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
writeRaster(s, file='testtif', format='GTiff', bylayer=T, overwrite=T)
f <- list.files(pattern="testtif_..tif")
Now you can use f with Phil's nice examples. You can also combine all in one step lapply:
f <- list.files("inputFolder", pattern = "*.tif", full.names = TRUE)
r <- lapply(f, function(i) { writeRaster(raster(i), filename=extension(i, '.asc'), overwrite=TRUE)} )
But if you have trouble with lapply, write a loop (it is fine!):
for (i in 1:length(f)) {
r <- raster(f[i])
ff <- extension(f[i], '.asc')
writeRaster(r, ff)
}
Or like this
for (file in f) {
r <- raster(file)
ff <- extension(file, '.asc')
writeRaster(r, ff)
}

R dataframes to multi sheet Excel Work

I'm trying to save quite a few data frames to a multi sheet Excel sheet and am getting some weird errors.
library(xlsx)
save.xlsx("WorkbookTitle.xlsx", mtcars, Titanic, iris)
save.xlsx <- function (file, ...)
{
require(xlsx, quietly = TRUE)
objects <- list(...)
fargs <- as.list(match.call(expand.dots = TRUE))
objnames <- as.character(fargs)[-c(1, 2)]
nobjects <- length(objects)
for (i in 1:nobjects) {
if (i == 1)
write.xlsx(objects[[i]], file, sheetName = objnames[i])
else write.xlsx(objects[[i]], file, sheetName = objnames[i],
append = TRUE)
}
print(paste("Workbook", file, "has", nobjects, "worksheets."))
}
gave me an actual workbook but when I try to do this with my dataframes, i get this error:
the condition has length > 1 and only the first element will be usedError in .jcall(cell, "V", "setCellValue", value) :
method setCellValue with signature ([Ljava/lang/String;)V not found
I removed row names but that didn't seem to work.

Skip empty files when importing text files

I have a folder with about 700 text files that I want to import and add a column to. I've figured out how to do this using the following code:
files = list.files(pattern = "*c.txt")
DF <- NULL
for (f in files) {
data <- read.table(f, header = F, sep=",")
data$species <- strsplit(f, split = "c.txt") <-- (column name is filename)
DF <- rbind(DF, data)
}
write.xlsx(DF,"B:/trends.xlsx")
Problem is, there are about 100 files that are empty. so the code stops at the first empty file and I get this error message:
Error in read.table(f, header = F, sep = ",") :
no lines available in input
Is there a way to skip over these empty files?
You can skip empty files by checking that file.size(some_file) > 0:
files <- list.files("~/tmp/tmpdir", pattern = "*.csv")
##
df_list <- lapply(files, function(x) {
if (!file.size(x) == 0) {
read.csv(x)
}
})
##
R> dim(do.call("rbind", df_list))
#[1] 50 2
This skips over the 10 files that are empty, and reads in the other 10 that are not.
Data:
for (i in 1:10) {
df <- data.frame(x = 1:5, y = 6:10)
write.csv(df, sprintf("~/tmp/tmpdir/file%i.csv", i), row.names = FALSE)
## empty file
system(sprintf("touch ~/tmp/tmpdir/emptyfile%i.csv", i))
}
For a different approach that introduces explicit error handling, think about a tryCatch to handle anything else bad that might happen in your read.table.
for (f in files) {
data <- tryCatch({
if (file.size(f) > 0){
read.table(f, header = F, sep=",")
}
}, error = function(err) {
# error handler picks up where error was generated
print(paste("Read.table didn't work!: ",err))
})
data$species <- strsplit(f, split = "c.txt")
DF <- rbind(DF, data)
}

Resources