Importing Excel files into R, xlsx or xls - r

Please can someone help me on the best way to import an excel 2007 (.xlsx) file into R. I have tried several methods and none seems to work. I have upgraded to 2.13.1, windows XP, xlsx 0.3.0, I don't know why the error keeps coming up. I tried:
AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx","DNA_Tag_Numbers")
OR
AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx",1)
but I get the error:
Error in .jnew("java/io/FileInputStream", file) :
java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified)
Thank you.

For a solution that is free of fiddly external dependencies*, there is now readxl:
The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies so it's easy to
install and use on all operating systems. It is designed to work with
tabular data stored in a single sheet.
Readxl supports both the legacy .xls format and the modern xml-based
.xlsx format. .xls support is made possible the with libxls C library,
which abstracts away many of the complexities of the underlying binary
format. To parse .xlsx, we use the RapidXML C++ library.
It can be installed like so:
install.packages("readxl") # CRAN version
or
devtools::install_github("hadley/readxl") # development version
Usage
library(readxl)
# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")
# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)
# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")
* not strictly true, it requires the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. But they don't require any fiddling with paths, etc., so that's an advantage over Java and Perl dependencies.
Update There is now the rexcel package. This promises to get Excel formatting, functions and many other kinds of information from the Excel file and into R.

You may also want to try the XLConnect package. I've had better luck with it than xlsx (plus it can read .xls files too).
library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)
also, if you are having trouble with your file not being found, try selecting it with file.choose().

I would definitely try the read.xls function in the gdata package, which is considerably more mature than the xlsx package. It may require Perl ...

Update
As the Answer below is now somewhat outdated, I'd just draw attention to the readxl package. If the Excel sheet is well formatted/lain out then I would now use readxl to read from the workbook. If sheets are poorly formatted/lain out then I would still export to CSV and then handle the problems in R either via read.csv() or plain old readLines().
Original
My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".
CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().
These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.
Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.

Example 2012:
library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
I would try 'xlsx' package for it is easy to handle and seems mature enough
worked fine for me and did not need any additionals like Perl or whatever
Example 2015:
library("readxl")
FirstTable <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
nowadays I use readxl and have made good experience with it.
no extra stuff needed
good performance

This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
It doesn't require rJava and is using 'Rcpp' for speed.

If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.
Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.

For me the openxlx package worked in the easiest way.
install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");

I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.
https://gist.github.com/schaunwheeler/5825002
The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.
This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when
Importing multiple excel files
Importing large files
Files that are updated regularly
Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.
Big props to Schaun for this solution.

What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?
I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.
Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.

You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05
Above comment should've solved your problem:
require("xlsx")
read.xlsx("filepath/filename.xlsx",1)
should work fine after that.

I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!
Try codes below:
library(rio)
AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
AB <- AB[,1]
Hope this help.
For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html

You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.

As stated by many here, I am writing the same thing but with an additional point!
At first we need to make sure that our R Studio has these two packages installed:
"readxl"
"XLConnect"
In order to load a package in R you can use the below function:
install.packages("readxl/XLConnect")
library(XLConnect)
search()
search will display the list of current packages being available in your R Studio.
Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"
To solve this issue you can simply resave your excel sheet "xlsx" in to
"CSV (Comma delimited)"
and your life will be super easy....
Have fun!!

The installation of xlsx package require rJava and xlsxjars. Indirectly they require the specific (32 or 64 bit) java runtime environment on the system.
Pro of read.xlsx: In the same package there are read.xlsx and write.xlsx
Con: Very low speed
As suggested, the easy way is to save in .csv format from excel.
Simple benchmark on a 5800x15 dataset (median)
read.xlsx: >10000ms
read_xlsx: 70ms
read.csv: 15ms

Related

Is there a way to accelerate formatted table writing from R to excel?

I have a 174603 rows and 178 column dataframe, which I'm importing to Excel using openxlsx::saveWorkbook, (Using this package to obtain the aforementioned format of cells, with colors, header styles and so on). But the process is extremely slow, (depending on the amount of memory used by the machine it can take from 7 to 17 minutes!!) and I need a way to reduce this significantly (Doesn't need to be seconds, but anything bellow 5 min would be OK)
I've already searched other questions but they all seem to focus either in exporting to R (I have no problem with this) or writing non-formatted files to R (using write.csv and other options of the like)
Apparently I can't use xlsx package because of the settings on my computer (industrial computer, Check comments on This question)
Any suggestions regarding packages or other functionalities inside this package to make this run faster would be highly appreciated.
This question has some time ,but I had the same problem as you and came up with a solution worth mentioning.
There is package called writexl that has implemented a way to export a data frame to Excel using the C library libxlsxwriter. You can export to excel using the next code:
library(writexl)
writexl::write_xlsx(df, "Excel.xlsx",format_headers = TRUE)
The parameter format_headers only apply centered and bold titles, but I had edited the C code of the its source in github writexl library made by ropensci.
You can download it or clone it. Inside src folder you can edit write_xlsx.c file.
For example in the part that he is inserting the header format
//how to format headers (bold + center)
lxw_format * title = workbook_add_format(workbook);
format_set_bold(title);
format_set_align(title, LXW_ALIGN_CENTER);
you can add this lines to add background color to the header
format_set_pattern (title, LXW_PATTERN_SOLID);
format_set_bg_color(title, 0x8DC4E4);
There are lots of formating you can do searching in the libxlsxwriter library
When you have finished editing that file and given you have the source code in a folder called writexl, you can build and install the edited package by
shell("R CMD build writexl")
install.packages("writexl_1.2.tar.gz", repos = NULL)
Exporting again using the first chunk of code will generate the Excel with formats and faster than any other library I know about.
Hope this helps.
Have you tried ;
write.table(GroupsAlldata, file = 'Groupsalldata.txt')
in order to obtain it in txt format.
Then on Excel, you can simply transfer you can 'text to column' to put your data into a table
good luck

read in xls file in r from business objects

I have downloaded a xls file from business objects and want to read it in R.
I have tried several options, the easiest one being:
library("readxl")
txt=read_excel("file.xls", sheet = 2)
The problem is that it gives me an empty tibble. However, if I open the xls file, do absolutely nothing, save it and try again, it does work!
Since I need to make a data pipeline I want it to work right away without this weird workaround.
Any idea what the problem is? My own thoughts went to some kind of security, read-only, adminstrator permission kind of property but couldn't figure it out.
Kind regards!
Piet
I always try to avoid importing .xls files due to such problems. Where possible I always import it as a .csv file. Depending on the structure of the .xls file, however, this is not always possible or may be extra work if you have many tabs within your .xls file.
If possible, export your .xls as an .csv file and then import it using read.table() or use a function through the many available packages such as data.table or tidyverse.
I don't know much, but it's a bug with the package. You can go down to readxl 1.0.0 and it works.
GitHub issue mentioning dropping versions: https://github.com/tidyverse/readxl/issues/474
How to go down to the version you want: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
You can use the package data.table that provides a very easy and faster method to read and write .csv or .xls/.xlsx with the fwrite and fread functions. It package already has an automatic separate detector.
You cand find more information about this package here.

Cannot open the file after writing xlsx file using R openxlsx package

At first, I tried to read and write xlsx files in R (while comparing the output between the xlsx and openxlsx packages).
I work on mac os.
It worked well to read xlsx files using the read.xlsx() from both packages.
However, when it comes to writing a new file, only the xlsx::write.xlsx() worked.
To be more exact, the openxlsx::write.xlsx() command gave no error, and an xlsx file was successfully saved, but when I tried to open the file using Numbers (by double clicking on the file in the folder), an error message popped up telling me the file cannot be opened.
I tried different data frames, but the results remained the same. To show an example, please refer to the following line which I took directly from R help page. It should work but does not work for me.
write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "columns")
Anyone tell me what the problem is? I tried to google for old threads but it seems no one is discussing this problem. I know in many similar threads people suggested changing packages, okay...before that, can you tell me what the limitations of openxlsx are?

read.sas7bdat unable to read compressed file

I am trying to read a .sas7bdat file in R. When I use the command
library(sas7bdat)
read.sas7bdat("filename")
I get the following error:
Error in read.sas7bdat("county2.sas7bdat") : file contains compressed data
I do not have experience with SAS, so any help will be highly appreciated.
Thanks!
According to the sas7bdat vignette [vignette('sas7bdat')], COMPRESS=BINARY (or COMPRESS=YES) is not currently supported as of 2013 (and this was the vignette active on 6/16/2014 when I wrote this). COMPRESS=CHAR is supported.
These are basically internal compression routines, intended to make filesizes smaller. They're not as good as gz or similar (not nearly as good), but they're supported by SAS transparently while writing SAS programs. Obviously they change the file format significantly, hence the lack of implementation yet.
If you have SAS, you need to write these to an uncompressed dataset.
options compress=no;
libname lib '//drive/path/to/files';
data lib.want;
set lib.have;
run;
That's the simplest way (of many), assuming you have a libname defined as lib as above and change have and want to names that are correct (have should be the filename without extension of the file, in most cases; want can be changed to anything logical with A-Z or underscore only, and 32 or fewer characters).
If you don't have SAS, you'll have to ask your data provided to make the data available uncompressed, or as a different format. If you're getting this from a PUDS somewhere on the web, you might post where you're getting it from and there might be a way to help you identify an uncompressed source.
This admittedly is not a pure R solution, but in many situations (e.g. if you aren't on a pc and don't have the ability to write the SAS file yourself) the other solutions posted are not workable.
Fortunately, Python has a module (https://pypi.python.org/pypi/sas7bdat) which supports reading compressed SAS data sets - it's certainly better using this than needing to acquire SAS if you don't already have it. Once you extract the file and save it to text via Python, you can then access it in R.
from sas7bdat import SAS7BDAT
import pandas as pd
InFileName = "myfile.sas7bdat"
OutFileName = "myfile.txt"
with SAS7BDAT(InFileName) as f:
df = f.to_data_frame()
df.to_csv(path_or_buf = OutFileName, sep = "\t", encoding = 'utf-8', index = False)
The haven package can read compressed SAS-files:
library(haven)
df <- read_sas("sasfile.sas7bdat")
But only SAS-files which are compressed using compress=char, but not compress=binary.
So haven will be able to read this SAS-file:
data output.compressed_data_char (compress=char);
set inputdata;
run;
But not this SAS-file:
data output.compressed_data_binary (compress=binary);
set inputdata;
run;
https://cran.r-project.org/package=haven
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001002773.htm
"RevoScaleR" is a good package to read SAS data sets (compressed or uncompressed).You can use rxImport function of this package. Below is the example
Importing library
library(RevoScaleR)
Reading data
R_df_name <- rxImport("fake_path/file_name.sas7bdat")
The speed of this function is far better than haven/sas7bdat/sas7bdat.parso. I hope this helps anyone who struggles to read SAS data sets in R.
Cheers!
I found R to be the easiest for this kind of challenge, especially with compressed sas7dbat files, three simple lines:
library(haven)
data <- read_sas("yourfile.sas7dbat")
and then transform it to csv
write.csv(data,"data.csv")

What's a robust method in R for importing from and exporting data to Excel?

I've used RODBC for some time to import Excel spreadsheets with mostly good results. However I have had no luck writing to an Excel spreadsheet. Also are there favorable differences using the xlsx format with Excel2007?
I've used the technique described here: Export Data Frames To Multi-worksheet Excel File
The R Data Import/Export manual should be considered the best source of advice for these questions.
For reading you can indeed use the RODBC package. An easier solutoion may be read.xls() from the gdata
For writing you can use one of the wrapper packages such as WriteXLS which wraps around Perl libraries that know how to write in the proprietary and not formally documented xls format.
In general, xlsx will not be a solution as this format is newer, once again proprietary and not documented. For that reason there are even fewer tools coping with this.
XLConnect works well. It is cross-platform. It can read and write xls and xlsx files. See this previous answer
This question is 3 years old, but I'll throw this in: if you want to write to the spreadsheet, remember to add readOnly=FALSE as an argument to odbcConnectExcel and odbcConnectExcel2007.
There is a package that may help with dealing with excel 2007, but I haven't tried it.
http://www.omegahat.org/RExcelXML/

Resources