I would like to cite the variable name as a string in a function, but couldn't achieve it.
For example, in one excel, i have 4 worksheets, i need to use the following line 4 times,
sales.df<- read_xlsx("abc.xlsx", sheet ="sales")
profit.df<- read_xlsx("abc.xlsx", sheet ="profit")
revenue.df<-read_xlsx("abc.xlsx", sheet ="revenue")
budget.df<- read_xlsx("abc.xlsx", sheet ="budget")
Instead, I want to write a function:
read_func = function(sheet_name){
sheet_name.df<- read_xlsx("abc.xlsx", sheet ="sheet_name"))
return(sheet_name.df)
}
The call the function
read_func(sales)
Unfortunately, it doesn't work. The sheet_name is not dynamically updated.
Thank you in advance for your kind help.
The readxl package has a function excel_sheets() to read all sheets in a file, which you can use with lapply to accomplish the same thing.
library(readxl)
lapply(excel_sheets("abc.xlsx"), read_excel, path = "abc.xlsx")
It is a part of the tidyverse so you can read more on it there.
Related
I am new to R and trying to clean my dataset. I wrote my code yesterday and it did work well but today when I run it I get this error.
when I run this line of code I get this error
df = read.xlsx("C:/Users/......xlsx")
Please provide a sheet name OR a sheet index.
I did not change anything at all.
how can I solve it?
You need to specify the sheet argument in that function. sheet argument specifies either the name of the sheet(e.g. "data") or it's position in the file (e.g. 3) like so: read.xlsx("path/to/file", sheet = 1)
Alternatively, you could use antoher package such as the readxl package. Try installing it and after it's installed try readxl::read_excel("path/to/your/data.xlsx", sheet = x) where x is either index and name of the sheet.
I have an R function from a package that I need to pass a file path as an argument but it's expecting a csv and my file is an xlsx.
I've looked at the code for the function an it is using read.csv to load the file but unfortunately I can't make any changes to the package.
Is there a good way to read in the xlsx and pass it to the function without writing it to a csv and having the function read it back in?
I came across the text argument for read.csv here:
Is there a way to use read.csv to read from a string value rather than a file in R?
This seems like might be part way there but as I said I am unable to alter the function.
Maybe you could construct your own function checking if the file is xlsx, and in this case create a temporary csv file, feed it to your function, and delete it. Something like
yourfunction = function(path){
read.csv(path)
head(path)
}
library(readxl)
modified_function = function(path){
if(grepl{"\\.xlsx",path}){
tmp <- read_xlsx(path)
tmp_path <- paste0(gsub("\\.xlsx","",path),"_tmp.csv")
write.csv(tmp,file = tmp_path)
output <- yourfunction(tmp_path)
file.remove(tmp_path)
}else{
output <- yourfunction(path)
}
return(output)
}
If it is of help, here you can see how to modify only one function of a package: How to modify a function of a library in a module
How can I use magrittr to pipe the output of download.file() directly to readxl() without first saving to a temporary location?
For example, I have the following code:
download.file(www, method="curl") %>%
read_excel(x, sheet ="List 1", range="A3:L1902") -> cw
This gives me an error because I am missing the destfile= argument... any ideas?
I tried the idea of connections but from my searches readxldoesn't support reading from urls (you can look here and here). However, I found here something that might help you.
The rio package have a wrapper around read_excel which allow the use of urls.
You can even add the argument sheet to chose which sheet to load. In addition, from my experience, if you know the file extension you'll use - add the format argument.
install.packages("rio") # if needed
df <- rio::import("https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls",
format = "xls", sheet = "SDTM Terminology 2018-03-30")
I am using "openxlsx" package to read and write excel files. I have a fixed file with a sheet called "Data" which is used by formulas in other sheets. I want to update this Data sheet without touching the other.
I am trying the following code:
write.xlsx(x = Rev_4, file = "Revenue.xlsx", sheetName="Data")
But this erases the excel file and creates a new one with just the new data in the "Data" sheet while all else gets deleted. Any Advice?
Try this:
wb <- loadWorkbook("Revenue.xlsx")
writeData(wb, sheet = "Data", Rev_4, colNames = F)
saveWorkbook(wb,"Revenue.xlsx",overwrite = T)
You need to load the complete workbook, then modify its data and then save it to disk. With writeData you can also specify the starting row and column. And you could also modify other sections before saving to disk.
I've found this package. It depends on openxlsx and helps to insert many sheets on a xlsx file. Maybe it makes easier:
Package documentation
library(xlsx2dfs)
# However, be careful, the function xlsx2dfs assumes
# that all sheets contain simple tables. If that is not the case,
# use the accepted answer!
dfs <- xlsx2dfs("Revenue.xlsx") # all sheets of file as list of dfs
dfs["Data"] <- Rev_4 # replace df of sheet "Data" by updated df Rev_4
dfs2xlsx(dfs, "Revenue.xlsx") # this overwrites the existing file! cave!
I am trying to import a large xlsx file into R that has many sheets of data. I was attempting to do this through XLConnect, but java memory problems (such as those described in this thread have prevented this technique from being successful.)
Instead, I am trying to use the openxlsx package, which I have read works much faster and avoids Java altogether. But is there a way to use its read.xlsx function within a loop to read in all of the sheets into separate dataframes? The technique I was using with the other package is no longer valid bc commands like loadWorkbook() and getSheets() can no longer be used.
Thank you for your help.
I think the getSheetNames() function is the right function to use. It will give you a vector of the worksheet names in a file. Then you can loop over this list to read in a list of data.frames.
read_all_sheets = function(xlsxFile, ...) {
sheet_names = openxlsx::getSheetNames(xlsxFile)
sheet_list = as.list(rep(NA, length(sheet_names)))
names(sheet_list) = sheet_names
for (sn in sheet_names) {
sheet_list[[sn]] = openxlsx::read.xlsx(xlsxFile, sheet=sn, ...)
}
return(sheet_list)
}
read_all_sheets(myxlsxFile)
Doing nothing more than perusing the documentation for openxlsx quickly leads one to the function sheets(), which it states is deprecated in place of names() which returns the names of all the worksheets in a workbook. You can then iterate over them in a simple for loop.
I'm not sure why you say that loadWorkbook cannot be used. Again, the documentation clearly shows a function in openxlsx by that name that does roughly the same thing as in XLConnect, although it's arguments are slightly different.
You can also look into the readxl package, which also does not have a Java dependency.
'sapply' also can be used.
read_all_sheets = function(xlsxFile, ...) {
sheet_names = openxlsx::getSheetNames(xlsxFile)
sheet_list = sapply(sheet_names, function(sn){openxlsx::read.xlsx(xlsxFile, sheet=sn, ...)}, USE.NAMES = TRUE)
return(sheet_list)
}