XLConnect 'envir' error - r

I manage a number of Excel reports, and I use R to do the preprocessing and write the output report. It's great because all I have to do is run the R function and distribute the reports, and the rest of the report writing is inactive time. The reports need to be in Excel format because it is the easiest to disseminate and the audience is large and non-technical. Once the data is pre-processed, I do this very, very simply using XLConnect:
file.copy(from = template,
to = newFileName)
writeWorksheetToFile(file = newFileName,
data = newData,
sheet = "Data",
clearSheets = T)
However, one of my reports began throwing this error when I attempted to write the new data:
Error in ls(envir = envir, all.names = private) :
invalid 'envir' argument
Furthermore, before throwing the error, the function ties up R for 15 minutes. The normal writing time is less than 10 seconds. I must confess, I don't understand what this error even means, and it did not succumb to my usual debugging methods or to any other SO solution.
I've noticed that others have referred to rJava (reinstalling this package didn't work) and to a Java cache of log files (not sure where this would be located on Mac). I'm especially confused as the report ran with no problems just one day earlier using precisely the same process, AND my other reports using the exact same process still work just fine.
I didn't update Java or R or my OS, or debug/rewrite any of the R code. So, starting from the beginning - how can I investigate this 'envir' error? What would you do if you were in my shoes? I've been working on this for a couple days and I'm stumped.
I'm happy to provide extra information if it will provide better context for more discerning programmers than myself :)

Update:
My previous answer (below) did not, in fact, fix this intermittent error (which as the OP points out is extremely difficult to unpick due to the Java dependency). Instead, I followed the advice given here and migrated from the XLConnect package to openxlsx, which sidesteps the problem entirely.
Previous answer:
I've been frustrated by precisely this error for a while, including the apparent intermittency and the tying up of R for several minutes when writing a workbook.
I just realised what the problem was: the length of the name of an Excel worksheet appears to be limited to 31 characters, and my R code was generating worksheet names in excess of this limit.
Just to be clear, I'm referring to the names of the individual tabbed sheets within an Excel workbook, not the filename of the workbook itself.
Trimming each worksheet name to no more than 31 characters fixed this error for me.

Related

Rendering a Quarto blog post trips an error when reading in a brms file object

First, I'll apologize for not having a fuller reproducable example, but I'm not entirely sure how to go about that given the various layers to the question/problem.
I'm moving a blog over from Blogdown to a new Quarto-based website and blog. I have three saved brms object files that I'm trying to read into a code chunk in one of the posts. The code chunks work fine when I run them manually, but when I try to render the blog post I get the following error:
Quitting from lines 75-86 (tables-modelsummary-brms.qmd)
Error in stri_replace_all_charclass(str, "[\\u0020\\r\\n\\t]", " ", merge = TRUE) :
invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8()
Calls: .main ... stri_trim -> stri_trim_both -> stri_replace_all_charclass
Execution halted
I've checked the primary data frame contained in the brms model object and all of the character vectors there are valid UTF-8 vectors. These models objects can be quite large, so it's possible I'm missing something buried deep within the model object, but so far it's nothing apparent.
I tried re-running the models again to ensure that the model objects' files weren't corrupted, and also to make sure that the encoding issue wasn't somehow introduced the last time they were run, which would have been on a Windows machine and a different version of brms.
I've also moved the brms files around to different directories to see if it's a file path issue. The same error comes up regardless of whether the files are in the same folder as the blog post qmd file or in a parent directory file I use for storing site data.
I've also migrated several other posts to the new Quarto site successfully, and some of them also contain R code, but it's all rendering without a problem.
Finally, I don't quite understand how to implement the suggersted alternate function found in the error message either.

BiocParallel error: cannot open the connection, how do I fix it?

I'm trying to use the package bambu to quantify gene counts from bam files. I am using my university's HPC, so I have written an R script and a batch submission file to launch it.
When the script gets to the point of running the bambu function, it gives the following error:
Start generating read class files
| | 0%[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R3.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R2.sorted.bam.bai
|================== | 25%
Error: BiocParallel errors
element index: 1, 2, 3
first error: cannot open the connection
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
Execution halted
So it looks like BiocParallel isn't happy and cannot open a certain connection, but I'm not sure how to fix this?
This is my R script:
#Bambu R script
#load libraries
library(Rsamtools)
library(bambu)
#Creating files
bamFiles<- Rsamtools::BamFileList(c("./results/minimap2/KD_R1.sorted.bam","./results/minimap2/KD_R2.sorted.bam","./results/minimap2/KD_R3.sorted.bam","./results/minimap2/WT_R1.sorted.bam","./results/minimap2/WT_R2.sorted.bam","./results/minimap2/WT_R3.sorted.bam"))
annotation<-prepareAnnotations("./ref_data/Homo_sapiens.GRCh38.104.chr.gtf")
fa.file<-"./ref_data/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
#Running bambu
se<- bambu(reads=bamFiles, annotations=annotation, genome=fa.file,ncore=4)
se
seGene<- transcriptToGeneExpression(se)
#Saving files
save.file<-tempfile(fileext=".gtf")
writeToGTF(rowRanges(se),file=save.file)
save.dir <- tempdir()
writeBambuOutput(se,path=save.fir,prefix="Nanopore_")
writeBambuOutput(seGene,path=save.fir,prefix="Nanopore_")
If you have any ideas on why this happens it would be so helpful! Thank you
I think that #Chris has a good point. Under the hood it seems likely that bambu is running htslib based on those warnings. While they may indeed only be warnings, I would like to know what the results would look like if you ran this interactively.
This question is hard to answer right now as it's missing some information (what do the files look like, a minimal reproducible example, etc.). But in the meantime here are some possibly useful questions for figuring it out:
what does bamFiles look like? Does it have the right number of read records? Do all of those files have nonzero read records? Are any suspiciously small?
What are the timestamps on the bai vs bam files (e.g. ls -lh /results/minimap2/)? Are they about what you'd expect or is it wonky? Are any of them (say, ./results/minimap2/WT_R2.sorted.bam.bai) weirdly small?
What happens when you run it interactively? Where does it fail? You say it's at the bambu() call, but how do you know that?
What happens when you run bambu() with ncores=1?
It seems very likely that this is due to a problem with the files, and it is only at the biocParallel step that the error is bubbling up to the top. Many utilities have an annoying habit of being happy to accept an empty file, only to fail confusingly without informative error messages when asked to do something with the empty file.
You might also consider raising an issue with the developers.
(why the warning is only possibly a problem: The index file sometimes has a timestamp like that for very small alignment files which are generated and indexed programmatically, where the indexing step is near-instantaneous.)

Unable to import previously working SAS-formats files using R-package 'haven'

Around a year ago, I used the 'haven'-package to import two .sas7bdat files along with their respective .sas7bcat formats and it worked wonderfully.
For some reason, however, it does not any longer even though all the SAS-files incl. format files have remained unchanged since then.
When I try running the code now, R gives me the following error:
Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding,
catalog_encoding = catalog_encoding, : Failed to parse P:/SAS
files/formats.sas7bcat: Invalid file, or file has unsupported features.
R and the 'haven'-package have been reinstalled to their newest versions since the first time when it worked, so I imagine that this might be the reason since all the SAS-files and the code remains unchanged.
For this reason, I tried to reinstall the old version of 'haven' but cannot since this apparently requires a manual installation of 'Rtools' which is not allowed on my computer, so I am a bit stuck here.
Any suggestions will be greatly appreciated, thanks.
A potential workaround is that the package sas7bdat can also read SAS files. I don't know how much extra work this might involve for you though
You can read in a dataset with the code
read.sas7bdat("filename.sas7bdat")

Visualize tree structure of HTTP GET requests in server log files using R 3.2.3

I'm relatively new to R, and I've volunteered to try to use it to determine which files on an old media server are still being used, and which aren't. I have access to the server logs, and specifically the cs-uri-stem column. Here's what I get when I type head(uridata):
1: /favicon.ico
2: /courses/filipino/Kuwentong_Pambata/Isinaayos_ni_Leslie_Joy_Cruz.html
3: /courses/filipino/Kuwentong_Pambata/Isinaayos_ni_Leslie_Joy_Cruz_files/Isinaayos_ni_Leslie_Joy_Cruz.css
4: /courses/filipino/Kuwentong_Pambata/Isinaayos_ni_Leslie_Joy_Cruz_files/Isinaayos_ni_Leslie_Joy_CruzMoz.css
5: /courses/filipino/Kuwentong_Pambata/Isinaayos_ni_Leslie_Joy_Cruz_files/shapeimage_1.jpg
6: /courses/filipino/Kuwentong_Pambata/Isinaayos_ni_Leslie_Joy_Cruz_files/WidgetCommon.js
Obviously, the samples in this case are all coming from one set of folders, but in fact, there are thousands of different folders and languages, all of which have their own websites. I'm interested in being able to visualize this as a tree, to see which folders/languages are still getting usage.
I've looked at the data.tree package for R, which I thought would be ideal. I've tried to follow the guide at https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#trees-in-data.tree, but when I type "as.Node(uridata)", R gives me the error message "Error in myrow[[pathName]] : subscript out of bounds". I've searched online for that error, and understand that it occurs when you try to call a subscript that isn't in the original data set, but I don't understand why it is happening here.
Can anyone give me some guidance as to why I'm running into this problem, or how I can solve it? I'm running R 3.2.3 on OS 10.11.3, using RStudio.
Never mind, I figured it out. I didn't read the data.tree guide closely enough the first time. In order to use as.Node() to transform my HTML paths into a tree, I needed to add a column, $pathString, with that same data using the following command:
uridata$pathString <- paste("..", uridata$cs.uri.stem, sep = "/")
That creates a "pathString" column for the data. Then, as.Node(uridata) works properly.

Warning / Error when Importing a .sav

I have two versions of SPSS at work. SPSS 11 running on Windows XP and SPSS 20 running on Linux. Both copies of SPSS work fine. Files created with either version of SPSS open without incident on the other version of SPSS. I.E. - I can create a .sav file with SPSS 20 on Linux and open it on SPSS 11 on Windows without incident.
But, if I create a .sav file with SPSS 20 and import the data into either R or PSPP (on Linux), I get a bunch of warnings. The data appears to import correctly, but I am concerned by the warnings. I do not see any warning when importing a .sav from SPSS 11 or other .sav files I have been sent. Many of the analysts at my company use SPSS so I've gotten SPSS files from different versions of SPSS and I have never before seen this warning. The warning messages are nearly identical between PSPP and R which makes sense. AFAIK, they use the same underlying libs to import the data. This is the R error:
Warning messages:
1: In read.spss("test.sav") :
test.sav: File-indicated value is different from internal value for at least one of the three system values. SYSMIS: indicated -1.79769e+308, expected -1.79769e+308; HIGHEST: 1.79769e+308, 1.79769e+308; LOWEST: -1.79769e+308, -1.79769e+308
2: In read.spss("test.sav") :
test.sav: Unrecognized record type 7, subtype 18 encountered in system file
The .sav file is really simple. It has two columns, dumb and dumber. Both are integers. The first two contains two values of 1.0. The second row contains two values of 2.0. I can provide the file on request (I don't see any way to upload it to SO). If anyone would like to see the actual file, PM me and I'll send it to you.
dumb dumber
1.0 1.0
2.0 2.0
Thoughts? Anyone know the best way to file a bug against R without getting roasted alive on the mailing list? :-)
EDIT: I used the term "Error" in the title line. I'll leave it, but I should not have used this word. The comments below are correct in pointing out that the messages I am seeing are warnings, not errors. I do however feel that this is made clear in the body of the question above. Clearly, the SPSS data format has changed over time and SPSS/IBM have failed to document these changes which is the root of the problem.
It's not an error message. It is only a warning. SPSS refuses to document their file formats so people have not been motivated to track down by reverse engineering the structure of new "subtypes". There is no way to file a bug report without getting roasted because there is no bug .... other than a closed format and that bug complaint should be filed with the owners of SPSS!
EDIT: The R-Core is a volunteer group and takes it responsibilities very seriously. It exerts major efforts to track down anything that affects the stability of systems or produces erroneous calculations. If you were willing to be a bit more respectful of the authors of R and suggest the possibility of collaboration on the R-devel mailing list to identify solutions to this problem without using the term "bug", you would arouse much less hostility. There might be someone who would be willing to see if a simple .sav file such as the one you constructed could be examined under a hexadecimal microscope to identify whatever infinite negative value is being mistaken for another infinite negative value. Most of the R-Core is not in possession of working copies of SPSS.
You could offer this link as an example of the product of others who have attempted the reverse engineering of SPSS .sav formats:
http://svn.opendatafoundation.org/ddidext/org.opendatafoundation.data/references/pspp_source/sfm-read.c
Edit: 4/2015; I have seen a recent addition to the ?read.spss help file that refers one to pkg:memisc: "A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file." I have used that package's function successfully (once) on files created by more recent versions of SPSS.
The SPSS file format is not publicly documented and can change, but IBM SPSS does provide free libraries that can read and write the SAV file format. These mask any changes to the format. You can get them from the SPSS Community website (along with many other free goodies including the SPSS integration with R). Go to www.ibm.com/developerworks/spssdevcentral and look around. BTW, there have been substantial additions/changes to the sav file since year 2000, although the core data can still be read by old versions.
HTH,
Jon Peck

Resources