Currently I am using openxlsx package to read a large excel file (~70Mb and 400,000 columns). I have tried other packages (XLConnect, xlsx, readxl) but they all either give me error or bring my computer to standstill. However, a big problem with openxlsx::read.xlsx is that they do not import all columns in the excel worksheet, as detailed below:
Picture above is the preview of the excel file I need to import. It has 15 columns. However, when I import this file into a R dataframe using openxlsx::read.xlsx, it only import 5 columns, as shown below:
It seems to me that openxlsx in this case only import columns with date and numerical values (Col 8 9 10 11 15) and ignore the rest. Please help me explain the reason for such behavior and is there anyway to remedy the issue (i.e. get openxlsx to import all columns). Thank you very much!
Had a similar issue today, I believe the cause was the way in which the file was created - by SAS. Have you tried opening the file in excel to get it to interpret all the formatting correctly?
My issue was solved by simply opening, saving, and closing the file.
Alternatively if you've since solved this issue another way I would like to hear it.
I can't explain why openxlsx behave like that but the readxl package seems to work in this case.
Related
I have a 174603 rows and 178 column dataframe, which I'm importing to Excel using openxlsx::saveWorkbook, (Using this package to obtain the aforementioned format of cells, with colors, header styles and so on). But the process is extremely slow, (depending on the amount of memory used by the machine it can take from 7 to 17 minutes!!) and I need a way to reduce this significantly (Doesn't need to be seconds, but anything bellow 5 min would be OK)
I've already searched other questions but they all seem to focus either in exporting to R (I have no problem with this) or writing non-formatted files to R (using write.csv and other options of the like)
Apparently I can't use xlsx package because of the settings on my computer (industrial computer, Check comments on This question)
Any suggestions regarding packages or other functionalities inside this package to make this run faster would be highly appreciated.
This question has some time ,but I had the same problem as you and came up with a solution worth mentioning.
There is package called writexl that has implemented a way to export a data frame to Excel using the C library libxlsxwriter. You can export to excel using the next code:
library(writexl)
writexl::write_xlsx(df, "Excel.xlsx",format_headers = TRUE)
The parameter format_headers only apply centered and bold titles, but I had edited the C code of the its source in github writexl library made by ropensci.
You can download it or clone it. Inside src folder you can edit write_xlsx.c file.
For example in the part that he is inserting the header format
//how to format headers (bold + center)
lxw_format * title = workbook_add_format(workbook);
format_set_bold(title);
format_set_align(title, LXW_ALIGN_CENTER);
you can add this lines to add background color to the header
format_set_pattern (title, LXW_PATTERN_SOLID);
format_set_bg_color(title, 0x8DC4E4);
There are lots of formating you can do searching in the libxlsxwriter library
When you have finished editing that file and given you have the source code in a folder called writexl, you can build and install the edited package by
shell("R CMD build writexl")
install.packages("writexl_1.2.tar.gz", repos = NULL)
Exporting again using the first chunk of code will generate the Excel with formats and faster than any other library I know about.
Hope this helps.
Have you tried ;
write.table(GroupsAlldata, file = 'Groupsalldata.txt')
in order to obtain it in txt format.
Then on Excel, you can simply transfer you can 'text to column' to put your data into a table
good luck
I have downloaded a xls file from business objects and want to read it in R.
I have tried several options, the easiest one being:
library("readxl")
txt=read_excel("file.xls", sheet = 2)
The problem is that it gives me an empty tibble. However, if I open the xls file, do absolutely nothing, save it and try again, it does work!
Since I need to make a data pipeline I want it to work right away without this weird workaround.
Any idea what the problem is? My own thoughts went to some kind of security, read-only, adminstrator permission kind of property but couldn't figure it out.
Kind regards!
Piet
I always try to avoid importing .xls files due to such problems. Where possible I always import it as a .csv file. Depending on the structure of the .xls file, however, this is not always possible or may be extra work if you have many tabs within your .xls file.
If possible, export your .xls as an .csv file and then import it using read.table() or use a function through the many available packages such as data.table or tidyverse.
I don't know much, but it's a bug with the package. You can go down to readxl 1.0.0 and it works.
GitHub issue mentioning dropping versions: https://github.com/tidyverse/readxl/issues/474
How to go down to the version you want: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
You can use the package data.table that provides a very easy and faster method to read and write .csv or .xls/.xlsx with the fwrite and fread functions. It package already has an automatic separate detector.
You cand find more information about this package here.
I'd like to be able to import xlsx files of varying lengths into R. I'm currently using the function read.xlsx from R's xlsx package to import the xlsx files into R, and unfortunately it drops empty rows. Is there a way that I can import every row of an xlsx file up until the last row with content without dropping empty rows?
That package has not been updated since 2014 (CRAN, though it looks like there has been some work in 2017 at https://github.com/dragua/xlsx), I suggest either readxl or openxlsx:
readxl::read_excel("file_with_blank_row.xlsx")
openxlsx::read.xlsx("file_with_blank_row.xlsx", skipEmptyRows=FALSE)
As noted by r2evans, both readxl and openxlsx have options to turn off skipping of empty rows. However, regardless of those switches, they will silently drop leading empty rows.
openxlsx doesn't seem to offer the possibility of altering that behaviour.
readxl has a range parameter that will indeed keep all empty rows. This is necessary if you're hoping to edit the same Excel file in very specific locations.
You need to have something like readxl::read_excel("path_to_your.xlsx", range = cell_limits(c(1, NA), c(NA,NA)). Using NA for all values apparently causes the function to revert to default and drop leading empty rows.
Try this:
library("readxl")
my_data <- read_xlsx("file_with_blank_row.xlsx")
This might be a noob question but I have a problem with exporting and importing a CSV.
I export a CSV with values (i.e. 300425.25). When I open this in Excel it is all comma delimited, as expected. When I hit Data To Columns I everytime changes my values to 300425,25 (I have tried all different combinations of decimal seperator in Advanced). This is excellent to work with in Excel but import this back into R I'm stuck with comma before the decimals, which in turn is unusable with the rest of my R code.
I never had this problem with the export of .CSV until I recently cleaned my computer and had a fresh install. I suspect it might be in R studio settings or Excel.
Can somebody help me out?
Thanks in advance.
I've found the package XLConnect to be useful for exporting matrices to a CLOSED workbook, but does anyone know how to write to an OPEN workbook?
Alternatively, does anyone know of code one can write in VBA to import a matrix from an R script file?
Thanks
Mike
I've been wanting to do just this and stumbled upon excel.link that writes easily into an active excel sheet. The method to write is very simple and straightforward:
library(excel.link)
xlrc[a1] <- seq(1, 10)
Note that inside the brackets you write the cell where the data will be written (if it is a dataframe, this cell will be the upper left of said dataframe).
Result in the active sheet of the active excel file:
Use package excel.link this can work with xlsm and also with open excels..