Suppose I have an excel file, which I would like to read to R with read.xlsx function. File consists of spreadsheets, number of which I do not know (there is like 200 of such files so manually checking number of sheets would be huge pain). Each spreadsheet is organized like a proper data frame.
I would like to have those spreadsheets one on top of another.
I write something like:
columnsILike <- c(1,40)
for(i in 1:numberOfSheets){
dfInd <- read.xlsx("myfile.xlsx", i, # number of sheet
colIndex=columnsILike, endRow=201, startRow=2,
header=F)
PreviousEmptyDataFrame <- rbind(PreviousEmptyDataFrame, dfInd)
}
write.csv(PreviousEmptyDataFrame, "data.csv")
Question is, how do I know number of sheets in advance?
getSheets(loadWorkbook("file_path")) in the XLSX package should return a list of the sheets in the workbook so you can get the length of the list to find the amount of sheets.
This answer is rather late, but wouldn't this be simpler?
gdata::sheetCount("myworkbook.xlsx")
You can also use package XLConnect if the workbook isn't too large.
library(XLConnect)
wb <- loadWorkbook("myworkbook.xlsx")
result <- do.call(rbind,lapply(getSheets(wb),
function(sheet)readWorksheet(wb,sheet)))
Related
I am a very basic R user and not into loops or advanced R. Challenge I am facing with an Excel Workbook with 50 worksheets and each worksheet is comprising of 1 Million rows. Loading into R this huge workbook of appx 5GB is not getting possible. I am looking forward for a fast method in R to get this workbook split into multiple CSVs of a single consolidated one
Tried to search lot of solutions and system is not responding for hours.
Please help me out of this
What about a function like this?
library(readxl)
csv_saver <- function(sheet_number){
csv <- read_xlsx(path = "yr_file_name.xlsx", sheet = sheet_number)
write.csv(csv, file = paste0("sheet_",sheet_number,".csv"))
}
lapply(1:50, csv_saver)
This reads in the sheet number specified by the variable sheet_number as a dataframe and then writes the dataframe out as csv file. You then apply that function to the vector of all the numbers between 1 and 50
This seems like a silly question, but I really could not find a solution! I need to read only specific columns from an Excel file. The file have multiple sheets with different number of columns, but the ones I need to read will be there. I can do this for csv files, but not for excel! This is my present code, which reads the first 14 columns (but the columns I need might not always be in the first 14). I can't just read them all as rbind will throw an error citing row mismatch (different number of rows in the sheets).
EDIT: I solved this by omitting the col_types parameter, it worked as sheets with different column numbers only had column headers. Still, this is no way a robust solution, so I hope someone can do a better job than me.
INV <- lapply(sheets, function(X) read_excel("./Inventory.xlsx", sheet = X, col_types = c(rep("text", 14))))
names(INV) <- sheets
INV <- do.call("rbind", INV)
I am trying to do something like this:
INV <- lapply(FILES[grepl("Inventory", FILES)],
function(n) read_csv(file=paste0(n), col_types=cols_only(DIVISION="c",
DEPARTMENT="i",
ITEM_ID="c",
DESCRIPTION="c",
UNIT_QTY="i",
COMP_UNIT_QTY="i",
REGION="c",
LOCATION_TYPE="c",
ZONE="c",
LOCATION_ID="c",
ATS_IND="c",
CONTAINER_ID="c",
STATUS="c",
TROUBLE_CODES="c")))
But, for an Excel file. I tried using read.xlsx from openxlsx and read_excel from readxl, but nneither supported doing this. There must be some other way. Don't worry about column types, I am fine with all as characters.
I would very much appreciate if this can be done using readxl or openxlsx.
I'm relatively new to R (and programming).
I have an Excel workbook with 36 sheets, but suppose that I don't know how many sheets there are and I want my code to find that out for me. I have tried something like:
options(java.parameters = "-Xmx6g")
library(XLConnect)
myWorkbook <- loadWorkbook(filename)
numberofsheets <- length(getSheets(myWorkbook))
But even though I set my memory to 6GB I still run into memory errors with XLConnect, so I would like to use other packages (e.g. xlsx, openxlsx). Is there a way to find out the number of sheets in an Excel workbook without using XLConnect?
Thanks for your help.
Maybe try:
library( readxl )
length( excel_sheets( filename ) )
This should do exactly what you want.
gdata::sheetCount("your_path_here.xlsx")
Also, to list the sheet names as an array.
library(purrr)
library(readxl)
file <- 'your_path_here.xlsx'
sheets <- excel_sheets(file)
I was trying to read an excel spreadsheet into R data frame. However, some of the columns have formulas or are linked to other external spreadsheets. Whenever I read the spreadsheet into R, there are always many cells becomes NA. Is there a good way to fix this problem so that I can get the original value of those cells?
The R script I used to do the import is like the following:
options(java.parameters = "-Xmx8g")
library(XLConnect)
# Step 1 import the "raw" tab
path_cost = "..."
wb = loadWorkbook(...)
raw = readWorksheet(wb, sheet = '...', header = TRUE, useCachedValues = FALSE)
UPDATE: read_excel from the readxl package looks like a better solution. It's very fast (0.14 sec in the 1400 x 6 file I mentioned in the comments) and it evaluates formulas before import. It doesn't use java, so no need to set any java options.
# sheet can be a string (name of sheet) or integer (position of sheet)
raw = read_excel(file, sheet=sheet)
For more information and examples, see the short vignette.
ORIGINAL ANSWER: Try read.xlsx from the xlsx package. The help file implies that by default it evaluates formulas before importing (see the keepFormulas parameter). I checked this on a small test file and it worked for me. Formula results were imported correctly, including formulas that depend on other sheets in the same workbook and formulas that depend on other workbooks in the same directory.
One caveat: If an externally linked sheet has changed since the last time you updated the links on the file you're reading into R, then any values read into R that depend on external links will be the old values, not the latest ones.
The code in your case would be:
library(xlsx)
options(java.parameters = "-Xmx8g") # xlsx also uses java
# Replace file and sheetName with appropriate values for your file
# keepFormulas=FALSE and header=TRUE are the defaults. I added them only for illustration.
raw = read.xlsx(file, sheetName=sheetName, header=TRUE, keepFormulas=FALSE)
I would like to convert an Excel file (say it's name is "Jimmy") that is saved as a macro enabled workbook (Jimmy.xlsm) to Jimmy.xlsx.
I need this to be done in a coding environment. I cannot simply change this by opening the file in Excel and assigning a different file-type. I am currently programming in R. If I use the function
file.rename("Jimmy.xlsm", "Jimmy.xlsx")
the file becomes corrupted.
In your framework you have to read in the sheet and write it back out. Suppose you have an XLSM file (with macros, I presume) called "testXLSM2X.xlsm" containing one sheet with tabular columns of data. This will do the trick:
library(xlsx)
r <- read.xlsx("testXLSMtoX.xlsm", 1) # read the first sheet
# provides a data frame
# use the first column in the spreadsheet to create row names then delete that column from the data frame
# otherwise you will get an extra column of row index numbers in the first column
r2w<-data.frame(r[-1],row.names=r[,1])
w <- write.xlsx(r2w,"testXLSMtoX.xlsx") # write the sheet
The macros will be stripped out, of course.
That's an answer but I would question what you are trying to accomplish. In general it is easier to control R from Excel than Excel from R. I use REXCEL from http://rcom.univie.ac.at/, which is not open source but pretty robust.
Here is a function that converts XLSM files to XLSX files with the R package RDCOMClient :
convert_XLSM_File_To_XLSX <- function(path_XLSM_File, path_XLSX_File)
{
xlApp <- COMCreate("Excel.Application")
xlApp[['Visible']] <- FALSE
xlApp[["DisplayAlerts"]] <- FALSE
xlWbk <- xlApp$Workbooks()$Open(path_XLSM_File)
xlWbk$SaveAs(path_XLSX_File, 51)
xlWbk$Close()
xlApp$Quit()
}
library(RDCOMClient)
convert_XLSM_File_To_XLSX(path_XLSM_File, path_XLSX_File)