I have seen several questions about writing .bin (binary) files from R, but I am wondering what function or package R has that could read in .bin files?
Is R capable of reading in .bin files?
You need the hexView package, and you need to know a lot about the content of the files.
Dump the bytes using hexView::viewRaw(readRaw(filename)), then figure out the structure using what you know about the file. Then figure out what is in there, and use a more specific dump function like viewFormat. See the package documentation.
It's a lot of work, and requires a lot of knowledge specific to the creation of the file. You're unlikely to succeed, but that's how you should do it.
I'm a master student and I'm having a course in statistics with the program STATISTICA. I am rather familiar with R and would like to stick to it. So I am planning to do the provided exercises in R. However the data to work with is in the format *.sta... is there a way to import such a file into R? Any workaround is also fine, as long as it doesn't compromise the data.
I actually found the same question 2 years ago here but there was no answer to it.
I'd be very happy for any suggestions!
Thanks
Lukas
If R does not support importing Statistica Spreadsheet files (.sta files), Statistica supports exporting data to .xlsx, .csv, and SPSS and SAS format files. I would think that one of those would be able to be handled by R natively.
If you enable Statistica-R integration through Statistica, a package called COMadaptR will be installed to R.
Once that is done, you have a couple of options:
You can run your R script inside of Statistica using the extensions (provided by Statistica) ActiveDataSet or Spreadsheet to access the Statistica spreadsheet as an R data frame.
You can create an R node in a workspace; connect the Statistica spreadsheet to the R node in the workspace; and then run your R script through the node. ActiveDataSet and Spreadsheet will be available in the R node context as well.
You could write a small R script (and use ActiveDataSet or Spreadsheet) and run it in Statistica; the R integration features will translate the Statistica spreadsheet to an R data frame. You could then in your script store it to disk and work with it later.
The COMadaptR package will allow you to interact with Statistica through COM from within R; you could use that approach to read data from the Statistica spreadsheet.
Hope this helps.
I have plenty of .rda and .RData files from R statistical packageand would like to read them into SAS. Is there an (easy) way to do this?
If you can get R installed, then you could execute R code within PROC IML (assuming your SAS installation is configured properly, which can be a problem), read the results into e.g. a R data.frame and get the results back as described in the SAS documentation examples. If you do not have the SAS/IML license but have R, write the output out from R into some format SAS reads easily (e.g. csv). If you cannot get R installed on your system, can someone else with R installed do that for you?
Otherwise, you may have to write a SAS program that can parse the RData format. However, that would be a measure of desperation. I believe you would find the documentation of the format here and some discussion of the not so clear documentation here (see also)
Please can someone help me on the best way to import an excel 2007 (.xlsx) file into R. I have tried several methods and none seems to work. I have upgraded to 2.13.1, windows XP, xlsx 0.3.0, I don't know why the error keeps coming up. I tried:
AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx","DNA_Tag_Numbers")
OR
AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx",1)
but I get the error:
Error in .jnew("java/io/FileInputStream", file) :
java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified)
Thank you.
For a solution that is free of fiddly external dependencies*, there is now readxl:
The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies so it's easy to
install and use on all operating systems. It is designed to work with
tabular data stored in a single sheet.
Readxl supports both the legacy .xls format and the modern xml-based
.xlsx format. .xls support is made possible the with libxls C library,
which abstracts away many of the complexities of the underlying binary
format. To parse .xlsx, we use the RapidXML C++ library.
It can be installed like so:
install.packages("readxl") # CRAN version
or
devtools::install_github("hadley/readxl") # development version
Usage
library(readxl)
# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")
# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)
# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")
* not strictly true, it requires the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. But they don't require any fiddling with paths, etc., so that's an advantage over Java and Perl dependencies.
Update There is now the rexcel package. This promises to get Excel formatting, functions and many other kinds of information from the Excel file and into R.
You may also want to try the XLConnect package. I've had better luck with it than xlsx (plus it can read .xls files too).
library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)
also, if you are having trouble with your file not being found, try selecting it with file.choose().
I would definitely try the read.xls function in the gdata package, which is considerably more mature than the xlsx package. It may require Perl ...
Update
As the Answer below is now somewhat outdated, I'd just draw attention to the readxl package. If the Excel sheet is well formatted/lain out then I would now use readxl to read from the workbook. If sheets are poorly formatted/lain out then I would still export to CSV and then handle the problems in R either via read.csv() or plain old readLines().
Original
My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".
CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().
These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.
Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.
Example 2012:
library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
I would try 'xlsx' package for it is easy to handle and seems mature enough
worked fine for me and did not need any additionals like Perl or whatever
Example 2015:
library("readxl")
FirstTable <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
nowadays I use readxl and have made good experience with it.
no extra stuff needed
good performance
This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
It doesn't require rJava and is using 'Rcpp' for speed.
If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.
Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.
For me the openxlx package worked in the easiest way.
install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");
I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.
https://gist.github.com/schaunwheeler/5825002
The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.
This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when
Importing multiple excel files
Importing large files
Files that are updated regularly
Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.
Big props to Schaun for this solution.
What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?
I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.
Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.
You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05
Above comment should've solved your problem:
require("xlsx")
read.xlsx("filepath/filename.xlsx",1)
should work fine after that.
I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!
Try codes below:
library(rio)
AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
AB <- AB[,1]
Hope this help.
For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html
You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.
As stated by many here, I am writing the same thing but with an additional point!
At first we need to make sure that our R Studio has these two packages installed:
"readxl"
"XLConnect"
In order to load a package in R you can use the below function:
install.packages("readxl/XLConnect")
library(XLConnect)
search()
search will display the list of current packages being available in your R Studio.
Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"
To solve this issue you can simply resave your excel sheet "xlsx" in to
"CSV (Comma delimited)"
and your life will be super easy....
Have fun!!
The installation of xlsx package require rJava and xlsxjars. Indirectly they require the specific (32 or 64 bit) java runtime environment on the system.
Pro of read.xlsx: In the same package there are read.xlsx and write.xlsx
Con: Very low speed
As suggested, the easy way is to save in .csv format from excel.
Simple benchmark on a 5800x15 dataset (median)
read.xlsx: >10000ms
read_xlsx: 70ms
read.csv: 15ms
I am using R to work with meteorological data. I proceed in two steps:
convert grib to netcdf using the command line function ncl_convert2nc from ncar command language
use package ncdf in R to import the netcdf data.
I still have one problem:
2- For some particular grib files, the conversion with ncar tool does not work. Is there other ways or trick (other than transcription into netcdf) to read grib files in R ?
Problem Answered by Dirk: 1- I would like to do automatic treatment of many files within R. Can I call ncl_convert2nc within R ? (answered by Dirk Eddelbuettel below )
Regarding question 1, the answer is 'Yes' -- see help(system) and the internal=TRUE option if you want to capture results.
rgdal also can do it, but is less flexible and requires more care and detail than ncdf or RNetCDF - and depends of your GDAL/rgdal built including the GRIB driver.
ncl_convert2nc seems to be the best solution. However, if the structure of data is a little bit more complicated I use GrADS to convert GRIB file to ASCII (e.g. .csv) and then it is possible to create NetCDF file using ncdf4 package dedicated for R. GrADS also provides support for re-writing GRIB to NetCDF, but there is limitation to only 1 variable.
As an alternative to calling ncl_convert2nc from R, there are two alternatives I can suggest:
1. CDO conversion
Another quick and easy command line solution is to use cdo to convert to netcdf to read in:
cdo -f nc copy file.grb file.nc
If you want to output a netcdf4 file you specify "-f nc4".
One potential glitch with this approach is if your grib file has more than one time axis (e.g. for multiple seasonal forecasts) which can cause issues with the conversion.
2. ECCODES conversion
Instead eccodes offers a grib converter that is very robust and can handle all cases of multiple time axes which usually cause CDO and NCL based conversions to fail.
The command is called grib_to_netcdf
grib_to_netcdf -o output.nc input_grib.grb
So far, grib_to_netcdf has been able to handle every grib file I have thrown at it without problems.
Another solution is to use the wgrib/wgrib2 software (http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/) and dump your GRIB-1/GRIB-2 file directly to CSV format, e.g.:
/path/to/your/wgrib2 input_file.grb -csv output_file.csv
Then it may be read directly in R...