XLConnect - readWorksheet with looping object - r

I am using R Studio version 3.1.2 with XLConnect package to load, read and write multiple xlsx files. I can do this with duplicating and creating multiple objects but I am trying to do it using 1 object(all files in the same folder). please see examples
I can do this listing each file but want to do it using a loop
tstA <- loadWorkbook("\\\\FS01\\DEPARTMENTFOLDERS$\\tst\\2015\\Apr\\DeptA.xlsx")
tstB <- loadWorkbook("\\\\FS01\\DEPARTMENTFOLDERS$\\tst\\2015\\Apr\\DeptB.xlsx")
This is the way im trying to do it but get an error
dept <- c("DeptA","DeptB","DeptC")
for(dp in 1:length(dept)){
dept[dp] <- loadWorkbook("\\\\FS01\\DEPARTMENTFOLDERS$\\tst\\2015\\Apr\\",dept[dp],".xlsx")}
After this I want to use the readWorksheet function from XLConnect.
Apologies for the lame question but I am struggling to workout how best to do this.
Thanks

You can read all the files into a list in one operation as follows (adjust pattern and sheet as needed to get the files/sheets you want):
path = "\\\\FS01\\DEPARTMENTFOLDERS$\\tst\\2015\\Apr\\"
df.list = lapply(list.files(path, pattern="xlsx$"), function(i) {
readWorksheetFromFile(paste0(path, i), sheet="YourSheetName")
})
If you want to combine all of the data frames into a single data frame, you can do this:
df = do.call(rbind, df.list)

Related

R - query all large text files in a folder without reading each file into Rstudio using vroom

I have a list of 15 txt files in a folder that are each 1.5 - 2 GB. Is there a way to query each file for specific rows based on a condition and then load the results into a list of data frames in R? Currently, I am loading a few files in at once using
temp = list.files(pattern="*.txt");
data_list = lapply(temp, read.delim); names(data_list) = temp
and then applying a custom filter function to each of the data frames in the list using lapply.
Due to RAM limitations, I cannot load entire files into my R environment and then query. I'm looking for some code to perhaps automatically read in one file, perform the query, add the data frame result to a list, free up the memory, and then repeat. Thank you!
Edit:
It seems like I should use vroom instead of read.delim:
temp = list.files(pattern="*.txt");
data_list = lapply(temp, vroom); names(data_list) = temp
I get a few warning messages and when I run problems(),
I get:
Error in vroom_materialize(x, replace = FALSE) : argument "x" is missing, with no default
Is this an issue?
Each of the files have a different number of columns, so do I need to use map as described here
Lastly, on each of the data frames in the list, I would like to run
filtered_list = lapply(data_list, filter, COL1 == "ABC")
Does doing so essentially read in each of the files, negating the benefits of vroom? When I run this lapply, R takes a very long time.

How do I convert a deliminated file in base - R into separate columns without creating a list?

I currently do this using the following code:
write(unlist(fieldList),"c:/temp/test.txt")
temp1<-cbind(fieldList,read.delim("c:/temp/test.txt",sep=" ",header=FALSE))
but this is slow because it obliges the app to write data to temp file on disk.
I have tried strsplit function and "qdap" library but they both create lists. I need to be able to manipulate the columns separately
NewColumn <- substr(temp1[,5],1,1)

Using a for loop to write in multiple .grd files

I am working with very large data layers for a SDM class and because of this I ended up breaking some of my layers into a bunch of blocks to avoid memory restraint. These blocks were written out as .grd files, and now I need to get them read back into R and merged together. I am extremely new to R an programming in general so any help would be appreciated. What I have been trying so far looks like this:
merge.coarse=raster("coarseBlock1.grd")
for ("" in 2:nBlocks){
merge.coarse=merge(merge.coarse,raster(paste("coarseBlock", ".grd", sep="")))
}
where my files are in coarseBlock.grd and are sequentially numbered from 1 to nBlocks (259)
Any feed back would be greatly appreciated.
Using for loops is generally slow in R. Also, using functions like merge and rbind in a for loop eat up a lot of memory because of the way R passes values to these functions.
A more efficient way to do this task would be to call lapply (see this tutorial on apply functions for details) to load the files into R. This will result in a list which can then be collapsed using the rbind function:
rasters <- lapply(list.files(GRDFolder), FUN = raster)
merge.coarse <- do.call(rbind, rasters)
I'm not too familiar with .grd files, but this overall process should at least get you going in the right direction. Assuming all your .grd files (1 through 259) are stored in the same folder (which I will refer to as GRDFolder), then you can try this:
merge.coarse <- raster("coarseBlock1.grd")
for(filename in list.files(GRDFolder))
{
temp <- raster(filename)
merge.coarse <- rbind(merge.coarse, temp)
}

Assigning unknown variable to new variable name

I have to load in many files and tansform their data. Each file contains only one data.table, however the tables have various names.
I would like to run a single script over all of the files -- to do so, i must assign the unknown data.table to a common name ... say blob.
What is the R way of doing this? At present, my best guess (which seems like a hack, but works) is to load the data.table into a new environment, and then: assign('blob', get(objects(envir=newEnv)[1], env=newEnv).
In a reproducible context this is:
newEnv <- new.env()
assign('a', 1:10, envir = newEnv)
assign('blob', get(objects(envir=newEnv)[1], env=newEnv))
Is there a better way?
The R way is to create a single object, i.e. a single list of data tables.
Here is some pseudocode that contains three steps:
Use list.files() to create a list of all files in a folder.
Use lapply() and read.csv() to read your files and create a list of data frames. Replace read.csv() with read.table() or whatever is appropriate for your data.
Use lapply() again, this time with as.data.table() to convert the data frames to data tables.
The pseudocode:
filenames <- list.files("path/to/files")
dat <- lapply(files, read.csv)
dat <- lapply(dat, as.data.table)
Your result should be a single list, called dat, containing a data table for each of your original files.
I assume that you saved the data.tables using save() somewhat like this:
d1 <- data.table(value=1:10)
save(d1, file="data1.rdata")
and your problem is that when you load the file you don't know the name (here: d1) that you used when saving the file. Correct?
I suggest you use instead saveRDS() and readRDS() for saving/loading single objects:
d1 <- data.table(value=1:10)
saveRDS(d1, file="data1.rds")
blob <- readRDS("data1.rds")

How can I add blank top rows to an xlsx file using the xlsx package?

I would like to add two blank rows to the top of an xlsx file that I create.
so far I have tried:
library("xlsx")
fn1 <- 'test1.xlsx'
fn2 <- 'test2.xlsx'
write.xlsx(matrix(rnorm(25),5),fn1)
wb <- loadWorkbook(fn1)
rows <- getRows(getSheets(wb)[[1]])
for(i in 1:length(rows))
rows[[i]]$setRowNum(as.integer(i+1))
saveWorkbook(wb,fn2)
But test2.xlsx is empty!
I'm a bit confused by what you're trying to do with the for loop, but:
You could create a dummy object with the same number of columns as wb, then use rbind() to join the dummy and wb to create fn2.
fn1 <- 'test1.xlsx'
wb <- loadWorkbook(fn1)
dummy <- wb[c(1,2),]
# set all values of dummy to whatever you want, e.g. "NA" or 0
fn2 <- rbind(dummy, wb)
saveWorkbook( fn2)
Hope that helps
So the xlsx package actually interfaces with a java library in order to pull in and modify the workbooks. The read.xlsx and write.xlsx functions are convenience wrappers to read and write data frames within R without having to manually write code to parse the individual cells and rows yourself using the java objects. The loadWorkbook and getRows functions give you access to the actual java objects, which you can then use to modify things such as cell styles.
However, if all you want to do is add a blank row to your spreadsheet before you output it, the easiest way is to add a blank row to your data frame before you export it (as Chris mentioned). You can accomplish this with the following variant on your code:
library("xlsx")
fn1 <- 'test1.xlsx'
fn2 <- 'test2.xlsx'
# I have added row.names=FALSE because the read.xlsx function
# does not have a parameter to include row names
write.xlsx(matrix(rnorm(25),5),fn1,row.names=FALSE)
# If you read your data back in using the read.xlsx function
# you will have a data frame rather than a series of java object references
wb <- read.xlsx(fn1,1)
# Now that we have a data frame we can add a blank row at the top
wb<-rbind.data.frame(rep("",5),wb)
# Then we write to the file using the same function as before
write.xlsx(wb,fn2,row.names=FALSE)
If you desire to use the advanced functionality in the java libraries, some of it is not currently implemented in the R package, and thus you will have to call it directly using .jcall. If you decide to pursue this line of action, I definitely recommend using .jmethods on the objects produced (i.e. .jmethods(rows[[1]]) which will list the available functions which you can use on the object (at the cellular level these are quite extensive).

Resources