I'm trying to save an array as a HDF5 file using R, but having no luck.
To try and diagnose the problem I ran example(hdf5save). This successfully created a HDF5 file that I could read easily with h5dump.
When I then ran the R code manually, I found that it didn't work. The code I ran was exactly the same as is ran in the example script (except for a change of filename to avoid overwriting). Here is the code:
(m <- cbind(A = 1, diag(4)))
ll <- list(a=1:10, b=letters[1:8]);
l2 <- list(C="c", l=ll); PP <- pi
hdf5save("ex2.hdf", "m","PP","ll","l2")
rm(m,PP,ll,l2) # and reload them:
hdf5load("ex2.hdf",verbosity=3)
m # read from "ex1.hdf"; buglet: dimnames dropped
str(ll)
str(l2)
and here is the error message from h5dump:
h5dump error: unable to open file "ex2.hdf"
Does anyone have any ideas? I'm completely at a loss.
Thanks
I have had this problem. I am not sure of the cause and neither are the hdf5 maintainers. The authors of the R package have not replied.
Alternatives that work
In the time since I originally answered, the hdf5 package has been archived, and suitable alternatives (h5r, rhdf5, and ncdf4) have been created; I am currently usingncdf4`:
Since netCDF-4 uses hdf5 as a storage layer, the ncdf4 package provides an interface to both netCDF-4 and hdf5.
The h5r package with R>=2.10
the rhdf5 package is available on BioConductor.
Workarounds Two functional but unsatisfactory workarounds that I used prior to finding the alternatives above:
Install R 2.7, hdf5 version 1.6.6, R hdf5 v1.6.7, and zlib1g version 1:1.2.3.3 and use this when writing the files (this was my solution until migrating to the ncdf4 library).
Use h5totxt at the command line from the [hdf5utils][1] program (requires using bash and rewriting your R code)
A minimal, reproducible demonstration of the issue:
Here is a reproducible example that sends an error
First R session
library(hdf5)
dat <- 1:10
hdf5save("test.h5","dat")
q()
n # do not save workspace
Second R session:
library(hdf5)
hdf5load("test.h5")
output:
HDF5-DIAG: Error detected in HDF5 library version: 1.6.10 thread
47794540500448. Back trace follows.
#000: H5F.c line 2072 in H5Fopen(): unable to open file
major(04): File interface
minor(17): Unable to open file
#001: H5F.c line 1852 in H5F_open(): unable to read superblock
major(04): File interface
minor(24): Read failed
#002: H5Fsuper.c line 114 in H5F_read_superblock(): unable to find file
signature
major(04): File interface
minor(19): Not an HDF5 file
#003: H5F.c line 1304 in H5F_locate_signature(): unable to find a valid
file signature
major(05): Low-level I/O layer
minor(29): Unable to initialize object
Error in hdf5load("test.h5") : unable to open HDF file: test.h5
I've also run into the same issue and found a reasonable fix.
The issue seems like it stems from when the hdf5 library finalizes the file. If it doesn't get a chance to finalize the file, then the file is corrupted. I think this happens after the buffer is flushed but the buffer doesn't always flush.
One solution I've found is to do the hdf5save in a separate function. Assign the variables into the globalenv(), then call hdf5save and exit the function. When the function completes, the memory seems to clean up which makes the hdf5 libarary flush the buffer and finalize the file.
Hope this helps!
Related
I am working on submitting an R package to CRAN. Right now I am trying to reduce the memory footprint of the package. Because this package deals with spatial data that has a very particular format, I want to include a properly formatted shapefile as an example. If I include the full-size original shapefile, there are no warnings (other than file size) in the R CMD checks. However, if I crop the file and include the cropped version in the package (in "inst/extdata") I get this warning:
W checking for executable files (389ms)
Found the following executable file:
inst/extdata/temp/temp.dbf
Source packages should not contain undeclared executable files.
See section ‘Package structure’ in the ‘Writing R Extensions’ manual.
This file is the database file associated with the shapefile. I have tried cropping the file and saving it using rgdal functions, sf functions, and using QGIS. I have also verified that the modes of the cropped files match the original file using chmod. I even tried changing .dbf to .DBF. Does anyone have any additional suggestions, other than listing it in BinaryFiles, which CRAN will not accept in a submission?
I'm running R version 4.0.2 via RStudio 2021.09.1 on Mac OSX 10.15.7. rgdal and sf are fully updated, as are all of their dependencies.
This is a known issue[1] where file will mis-identify DBF files with last-update date in the year 2022. Easiest fix is to not use a 2022 update date when saving the file. Alternatively you can simply change the second byte of the file after the fact, e.g.:
fn = "myfile.dbf"
sz = file.info(fn)$size
r = readBin(fn, raw(), sz)
r[2] = as.raw(121) ## make it 2021 instead of 2022
writeBin(r, fn)
(See also corresponding discussion on R-package-devel)
I am currently attempting to use R to read a (large, 8.3 MB) .xlsx file into a matrix. I am attempting to do so with the read.xlsx file in the xlsx package. https://cran.r-project.org/web/packages/xlsx/index.html
I am now trying to read the contents of one of the sheets in the file with the following command:
sheetname<-read.xlsx("/Users/jinkinsonsmith/Downloads/Re _Introduction/filename.xlsx",sheetName='sheetname')
It looks like this command should work in terms of reading the contents of sheet "sheetname" in xlsx file "filename" into the vector "sheetname". However, instead, I am getting this error message:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : java.lang.OutOfMemoryError: Java heap space
It seems like I'm not the first person to get this error message (example: How to deal with "java.lang.OutOfMemoryError: Java heap space" error?), but even after reading the other post I just linked it is still not clear to me what I should do to fix this error. My MacBook Pro has long had issues with running out of disk space and requiring me to delete a bunch of files, so that could be the culprit, but it is also apparently possible that I have too many stored references to objects in R that I no longer use and that are taking up too much space. In the latter case I don't know how I would remove any unneeded references.
By using the following line of code before you load any other package, I could solve similar problems like this. I already described it here.
options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m"))
library(xlsx)
Please add this line and restart, since other packages can load some java things by themselves and the options have to be set before any Java is loaded.
In general, these option change the type of garbage collection which sometimes makes problems in the default settings and also increases the memory to 8GB.
I tried to load my R workspace and received this error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘WORKSPACE_Wedding_Weekend_September’ has magic number '#gets'
Use of save versions prior to 2 is deprecated
I'm not particularly interested in the technical details, but mostly in how I caused it and how I can prevent it in the future. Here's some notes on the situation:
I'm running R 2.15.1 on a MacBook Pro running Windows XP on a bootcamp partition.
There is something obviously wrong this workspace file, since it weighs in at only ~80kb while all my others are usually >10,000
Over the weekend I was running an external modeling program in R and storing its output to different objects. I ran several iterations of the model over the course of several days, eg output_Saturday <- call_model()
There is nothing special to the model output, its just a list with slots for betas, VC-matrices, model specification, etc.
I got that error when I accidentally used load() instead of source() or readRDS().
Also worth noting the following from a document by the R Core Team summarizing changes in versions of R after v3.5.0 (here):
R has new serialization format (version 3) which supports custom serialization of
ALTREP framework objects... Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
I encountered this issue when I saved a workspace in v3.6.0, and then shared the file with a colleague that was using v3.4.2. I was able to resolve the issue by adding "version=2" to my save function.
Assuming your file is named "myfile.ext"
If the file you're trying to load is not an R-script, for which you would use
source("myfile.ext")
you might try the readRDSfunction and assign it to a variable-name:
my.data <- readRDS("myfile.ext")
The magic number comes from UNIX-type systems where the first few bytes of a file held a marker indicating the file type.
This error indicates you are trying to load a non-valid file type into R. For some reason, R no longer recognizes this file as an R workspace file.
Install the readr package, then use library(readr).
It also occurs when you try to load() an rds object instead of using
object <- readRDS("object.rds")
I got the error when saved with saveRDS() rather than save(). E.g. save(iris, file="data/iris.RData")
This fixed the issue for me. I found this info here
Also note that with save() / load() the object is loaded in with the same name it is initially saved with (i.e you can't rename it until it's already loaded into the R environment under the name it had when you initially saved it).
I had this problem when I saved the Rdata file in an older version of R and then I tried to open in a new one. I solved by updating my R version to the newest.
If you are working with devtools try to save the files with:
devtools::use_data(x, internal = TRUE)
Then, delete all files saved previously.
From doc:
internal If FALSE, saves each object in individual .rda files in the data directory. These are available whenever the package is loaded. If
TRUE, stores all objects in a single R/sysdata.rda file. These objects
are only available within the package.
This error occured when I updated my R and R Studio versions and loaded files I created under my prior version. So I reinstalled my prior R version and everything worked as it should.
I am trying to read a matlab file into R using R.matlab but am encountering this error:
require(R.matlab)
r <- readMat("file.mat", verbose=T)
Trying to read MAT v5 file stream...
Error in readTag(this) : Unknown data type. Not in range [1,19]: 18569
In addition: Warning message:
In readMat5Header(this, firstFourBytes = firstFourBytes) :
Unknown MAT version tag: 512. Will assume version 5.
How can this issue be solved or is there an alternative way to load matlab files? I can use hdf5load but have heard this can mess with the data. Thanks!
This is a bit late on the response, but I've recently been running into the same issues. For me, the issue was that I was saving matlab files by default using the '-v7.3' option. After extensive searching, the R.matlab source documentation (http://cran.r-project.org/web/packages/R.matlab/R.matlab.pdf) indicates the following:
Reading compressed MAT files
From MATLAB v7, compressed MAT version 5 files are used by default
[3,4]. This function supports reading such
files, if running R v2.10.0 or newer. For older versions of R, the
Rcompression package is used. To install that package, please see
instructions at http://www.omegahat.org/ cranRepository.html. As a
last resort, use save -V6 in MATLAB to write MAT files that are
compatible with MATLAB v6, that is, to write non-compressed MAT
version 5 files.
About MAT files saved in MATLAB using ’-v7.3’
This function does not
support MAT files saved in MATLAB as save('foo.mat',
'-v7.3'). Such MAT files are of a completely different file format
[5,6] compared to those saved with, say, '-v7'."
adding the '-v7' option at the end of my save command fixed this issue.
i.e.: save('filename', 'variable', '-v7')
i had a very similar problem until i pointed the function to an actual .mat file that existed. before that i'd been specifying two files of the same name, but one was .mat and the other was .txt, so it may have been trying to open the other.
i realize this may not directly solve your issue (the only difference i saw in my error message was the absence of that first line "Trying ..." and the specific numbers thereafter as well as the presence of another couple similar warnings with odd numbers), but it might point to some simple filename problem as the issue.
i use the latest matlab on 64 bit vista and the latest R on 32 bit xp.
I want to find the location of the script .R files which are used for computation in R.
I know that by typing the object function, I will get the code which is running and then I can copy and edit and save it as a new script file and use that.
The reason for asking to find the foo.R file is
Curiosity
Know what is the algorithm used in the numerical computations
More immedietly, the function from stats package I am using, is running results for two of the arguments and not the others and have to figure out how to make it work.
Error shown by R implies that there might be some modification required in the script file.
I am looking for a more general answer, if its possible.
Edit: As per the comments so far, here is the code to compute spectrum of a time series using autoregressive methods. The data input is a univariate series.
x = ts(data)
spec.ar(x, method = "yule-walker") 1
spec.ar(x, method = "burg") 2
command 1 is running ok.
command 2 gives the following error.
Error in ar.burg.default(x, aic = aic, order.max = order.max, na.action = na.action, :
Burg's algorithm only implemented for univariate series
I did try specify all the arguments correctly like na.action=na.fail, order.max = NULL etc but the message is the same.
Kindly suggest possible solutions.
P.S. (This question is posted after searching the library folder where R is installed and zip files which come with packages, manuals, and opening .rdb, .rdx files)
See FAQ 7.40 How do I access the source code for a function?
In most cases, typing the name of the function will print its source
code. However, code is sometimes hidden in a namespace, or compiled.
For a complete overview on how to access source code, see Uwe Ligges
(2006), “Help Desk: Accessing the sources”, R News, 6/4, 43–45
(http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf).
When R installs a package, it evaluates all the ".R" source files and re-saves them into a binary format for faster loading. Therefore you typically cannot easily find the source file.
As has been suggested elsewhere, you can simply type the function name and see the source code, or download the source package and find the source there.
library(plyr)
ddply # prints the source for ddply
# See the content of the R directory for plyr,
# but it's only binary files:
dir(file.path(find.package("plyr"), "R"))
# [1] "plyr" "plyr.rdb" "plyr.rdx"
# Get the source for the package:
download.packages("plyr", "~", type="source")
# ...then unpack and inspect the R directory...
.libPaths() should tell you all of your current library locations. It's possible to have more than one installation of a package if there are two libraries but only the one that is in the first library will be used. Unless you offer the code and the exact error message, it's not likely that anyone will be able to offer better advice.
I think you are asking to see what I call the source code for a function in a package. If so, the way I do it is as follows, which has worked successfully for me on the three times I have tried. I keep these instructions handy in a few places and just copied and pasted them here:
To see the source code for a function in Program R download the package containing the function. Specifically, download the file that ends in "tar.gz". This is a compressed file. Expand the compressed file using, for example, "WinZip". Now you need to open the uncompressed file that ends in ".tar". Download the free software "7-Zip". Click on the file "7zFM.exe" and navigate to the directory containing the ".tar" file. You can extract the contents of that ".tar" file into a new folder. The contents consist of R files showing the source code for the functions in the R package.
EDIT:
Today (July 8, 2012) I was able to open the 'tar.gz' file using the latest version of 'WinZIP' and could copy the contents (the source code) from there without having to use '7-Zip'.
EDIT:
Today (January 19, 2013) I viewed the source code for functions in base R by downloading the file
'R-2.15.2.tar.gz'
To download that file go to the http://cran.at.r-project.org/ webpage and click on that file in this line:
"The latest release (2012-10-26, Trick or Treat): R-2.15.2.tar.gz, read what's new in the latest version."
Unzip the file. WinZip will work, or it did for me. Then search your computer for readtable.r or another base R function.
agstudy noted here https://stackoverflow.com/questions/14417214/source-file-for-r-function that source code for read.csv is located in the file readtable.r, so do not expect every base R function to have its own file.