Trouble with read.xlsx in R - r

I am currently attempting to use R to read a (large, 8.3 MB) .xlsx file into a matrix. I am attempting to do so with the read.xlsx file in the xlsx package. https://cran.r-project.org/web/packages/xlsx/index.html
I am now trying to read the contents of one of the sheets in the file with the following command:
sheetname<-read.xlsx("/Users/jinkinsonsmith/Downloads/Re _Introduction/filename.xlsx",sheetName='sheetname')
It looks like this command should work in terms of reading the contents of sheet "sheetname" in xlsx file "filename" into the vector "sheetname". However, instead, I am getting this error message:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : java.lang.OutOfMemoryError: Java heap space
It seems like I'm not the first person to get this error message (example: How to deal with "java.lang.OutOfMemoryError: Java heap space" error?), but even after reading the other post I just linked it is still not clear to me what I should do to fix this error. My MacBook Pro has long had issues with running out of disk space and requiring me to delete a bunch of files, so that could be the culprit, but it is also apparently possible that I have too many stored references to objects in R that I no longer use and that are taking up too much space. In the latter case I don't know how I would remove any unneeded references.

By using the following line of code before you load any other package, I could solve similar problems like this. I already described it here.
options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m"))
library(xlsx)
Please add this line and restart, since other packages can load some java things by themselves and the options have to be set before any Java is loaded.
In general, these option change the type of garbage collection which sometimes makes problems in the default settings and also increases the memory to 8GB.

Related

BiocParallel error: cannot open the connection, how do I fix it?

I'm trying to use the package bambu to quantify gene counts from bam files. I am using my university's HPC, so I have written an R script and a batch submission file to launch it.
When the script gets to the point of running the bambu function, it gives the following error:
Start generating read class files
| | 0%[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R3.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R2.sorted.bam.bai
|================== | 25%
Error: BiocParallel errors
element index: 1, 2, 3
first error: cannot open the connection
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
Execution halted
So it looks like BiocParallel isn't happy and cannot open a certain connection, but I'm not sure how to fix this?
This is my R script:
#Bambu R script
#load libraries
library(Rsamtools)
library(bambu)
#Creating files
bamFiles<- Rsamtools::BamFileList(c("./results/minimap2/KD_R1.sorted.bam","./results/minimap2/KD_R2.sorted.bam","./results/minimap2/KD_R3.sorted.bam","./results/minimap2/WT_R1.sorted.bam","./results/minimap2/WT_R2.sorted.bam","./results/minimap2/WT_R3.sorted.bam"))
annotation<-prepareAnnotations("./ref_data/Homo_sapiens.GRCh38.104.chr.gtf")
fa.file<-"./ref_data/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
#Running bambu
se<- bambu(reads=bamFiles, annotations=annotation, genome=fa.file,ncore=4)
se
seGene<- transcriptToGeneExpression(se)
#Saving files
save.file<-tempfile(fileext=".gtf")
writeToGTF(rowRanges(se),file=save.file)
save.dir <- tempdir()
writeBambuOutput(se,path=save.fir,prefix="Nanopore_")
writeBambuOutput(seGene,path=save.fir,prefix="Nanopore_")
If you have any ideas on why this happens it would be so helpful! Thank you
I think that #Chris has a good point. Under the hood it seems likely that bambu is running htslib based on those warnings. While they may indeed only be warnings, I would like to know what the results would look like if you ran this interactively.
This question is hard to answer right now as it's missing some information (what do the files look like, a minimal reproducible example, etc.). But in the meantime here are some possibly useful questions for figuring it out:
what does bamFiles look like? Does it have the right number of read records? Do all of those files have nonzero read records? Are any suspiciously small?
What are the timestamps on the bai vs bam files (e.g. ls -lh /results/minimap2/)? Are they about what you'd expect or is it wonky? Are any of them (say, ./results/minimap2/WT_R2.sorted.bam.bai) weirdly small?
What happens when you run it interactively? Where does it fail? You say it's at the bambu() call, but how do you know that?
What happens when you run bambu() with ncores=1?
It seems very likely that this is due to a problem with the files, and it is only at the biocParallel step that the error is bubbling up to the top. Many utilities have an annoying habit of being happy to accept an empty file, only to fail confusingly without informative error messages when asked to do something with the empty file.
You might also consider raising an issue with the developers.
(why the warning is only possibly a problem: The index file sometimes has a timestamp like that for very small alignment files which are generated and indexed programmatically, where the indexing step is near-instantaneous.)

Unable to import previously working SAS-formats files using R-package 'haven'

Around a year ago, I used the 'haven'-package to import two .sas7bdat files along with their respective .sas7bcat formats and it worked wonderfully.
For some reason, however, it does not any longer even though all the SAS-files incl. format files have remained unchanged since then.
When I try running the code now, R gives me the following error:
Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding,
catalog_encoding = catalog_encoding, : Failed to parse P:/SAS
files/formats.sas7bcat: Invalid file, or file has unsupported features.
R and the 'haven'-package have been reinstalled to their newest versions since the first time when it worked, so I imagine that this might be the reason since all the SAS-files and the code remains unchanged.
For this reason, I tried to reinstall the old version of 'haven' but cannot since this apparently requires a manual installation of 'Rtools' which is not allowed on my computer, so I am a bit stuck here.
Any suggestions will be greatly appreciated, thanks.
A potential workaround is that the package sas7bdat can also read SAS files. I don't know how much extra work this might involve for you though
You can read in a dataset with the code
read.sas7bdat("filename.sas7bdat")

Text mining with tm in R antiword error

So I'm rather new to R, and I'm learning how to mine text from this handy website: https://eight2late.wordpress.com/2015/05/27/a-gentle-introduction-to-text-mining-using-r/
I do have my own text set of .doc, .docx, and .xlsx files and I'm trying to mine them. They're located in a folder in my working directory called 'files', but I have already encountered an error after simply writing a few lines of code.
The code I have so far is:
library(tm)
library(readtext)
data = readtext('files')
At this point, after waiting for 25 seconds or so, I get the error:
Error: System call to 'antiword' failed (1): The Big Block Depot is damaged
and the code stops running there.
I have tried searching online for solutions but it seems like a fairly rare error and so I only found 1 possible solution at https://github.com/ropensci/antiword/issues/1 but that did not work for me.
This solution suggested that one of my files were corrupt, and suggested using the code
fixInNamespace(antiword, pos="package:antiword")
to change the error to a warning to not interrupt the reading of the files. I tried that, and at first it raised the error of
Error in as.environment(pos):
no item called "package:antiword" on the search list
After which, I loaded the antiword library with a library(antiword) and changed the stop( to a warning(. However, when I ran the data = readtext('files') line again, it immediately raised the error
Error in is_windows() : could not find function "is_windows"
I'm at a loss here! Any help would be appreciated. Should I be using another package in this case?
I had the same problem with my code, where I tried to get a doc. file in R. I also used the readtext library. What helped me was converting the Word documents I was trying to get into R from doc. to docx. When I ran the same code after it worked.

The cause of "bad magic number" error when loading a workspace and how to avoid it?

I tried to load my R workspace and received this error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘WORKSPACE_Wedding_Weekend_September’ has magic number '#gets'
Use of save versions prior to 2 is deprecated
I'm not particularly interested in the technical details, but mostly in how I caused it and how I can prevent it in the future. Here's some notes on the situation:
I'm running R 2.15.1 on a MacBook Pro running Windows XP on a bootcamp partition.
There is something obviously wrong this workspace file, since it weighs in at only ~80kb while all my others are usually >10,000
Over the weekend I was running an external modeling program in R and storing its output to different objects. I ran several iterations of the model over the course of several days, eg output_Saturday <- call_model()
There is nothing special to the model output, its just a list with slots for betas, VC-matrices, model specification, etc.
I got that error when I accidentally used load() instead of source() or readRDS().
Also worth noting the following from a document by the R Core Team summarizing changes in versions of R after v3.5.0 (here):
R has new serialization format (version 3) which supports custom serialization of
ALTREP framework objects... Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
I encountered this issue when I saved a workspace in v3.6.0, and then shared the file with a colleague that was using v3.4.2. I was able to resolve the issue by adding "version=2" to my save function.
Assuming your file is named "myfile.ext"
If the file you're trying to load is not an R-script, for which you would use
source("myfile.ext")
you might try the readRDSfunction and assign it to a variable-name:
my.data <- readRDS("myfile.ext")
The magic number comes from UNIX-type systems where the first few bytes of a file held a marker indicating the file type.
This error indicates you are trying to load a non-valid file type into R. For some reason, R no longer recognizes this file as an R workspace file.
Install the readr package, then use library(readr).
It also occurs when you try to load() an rds object instead of using
object <- readRDS("object.rds")
I got the error when saved with saveRDS() rather than save(). E.g. save(iris, file="data/iris.RData")
This fixed the issue for me. I found this info here
Also note that with save() / load() the object is loaded in with the same name it is initially saved with (i.e you can't rename it until it's already loaded into the R environment under the name it had when you initially saved it).
I had this problem when I saved the Rdata file in an older version of R and then I tried to open in a new one. I solved by updating my R version to the newest.
If you are working with devtools try to save the files with:
devtools::use_data(x, internal = TRUE)
Then, delete all files saved previously.
From doc:
internal If FALSE, saves each object in individual .rda files in the data directory. These are available whenever the package is loaded. If
TRUE, stores all objects in a single R/sysdata.rda file. These objects
are only available within the package.
This error occured when I updated my R and R Studio versions and loaded files I created under my prior version. So I reinstalled my prior R version and everything worked as it should.

R.matlab/readMat : Error in readTag(this)

I am trying to read a matlab file into R using R.matlab but am encountering this error:
require(R.matlab)
r <- readMat("file.mat", verbose=T)
Trying to read MAT v5 file stream...
Error in readTag(this) : Unknown data type. Not in range [1,19]: 18569
In addition: Warning message:
In readMat5Header(this, firstFourBytes = firstFourBytes) :
Unknown MAT version tag: 512. Will assume version 5.
How can this issue be solved or is there an alternative way to load matlab files? I can use hdf5load but have heard this can mess with the data. Thanks!
This is a bit late on the response, but I've recently been running into the same issues. For me, the issue was that I was saving matlab files by default using the '-v7.3' option. After extensive searching, the R.matlab source documentation (http://cran.r-project.org/web/packages/R.matlab/R.matlab.pdf) indicates the following:
Reading compressed MAT files
From MATLAB v7, compressed MAT version 5 files are used by default
[3,4]. This function supports reading such
files, if running R v2.10.0 or newer. For older versions of R, the
Rcompression package is used. To install that package, please see
instructions at http://www.omegahat.org/ cranRepository.html. As a
last resort, use save -V6 in MATLAB to write MAT files that are
compatible with MATLAB v6, that is, to write non-compressed MAT
version 5 files.
About MAT files saved in MATLAB using ’-v7.3’
This function does not
support MAT files saved in MATLAB as save('foo.mat',
'-v7.3'). Such MAT files are of a completely different file format
[5,6] compared to those saved with, say, '-v7'."
adding the '-v7' option at the end of my save command fixed this issue.
i.e.: save('filename', 'variable', '-v7')
i had a very similar problem until i pointed the function to an actual .mat file that existed. before that i'd been specifying two files of the same name, but one was .mat and the other was .txt, so it may have been trying to open the other.
i realize this may not directly solve your issue (the only difference i saw in my error message was the absence of that first line "Trying ..." and the specific numbers thereafter as well as the presence of another couple similar warnings with odd numbers), but it might point to some simple filename problem as the issue.
i use the latest matlab on 64 bit vista and the latest R on 32 bit xp.

Resources