Jedox Integrator RScript Transform: Failed to retrieve data - r

Currently I'm working with Jedox and try to use the RScript Transform component.
The installation of R itself on the server was a little bit tricky, but after several attempts it finally worked.
For the installation helpful were the infos on this blog: jedoxtools.wordpress.com
The key challenge though was to enter the correct directory path in the 'Path' (C:\Program Files\R\R-3.4.1\bin\x64) and in the 'R_Home' (C:\Program Files\R\R-3.4.1) variables.
But now where the 'hard part' should already be done I simply can't get the transform component running.
Based on the example Rscript in this presentation everytime I try simple scripts, I got the following error message:
Failed to retrieve data from source [my RScript components name] : null
The script I run is as simple as this:
data <- my_datasource
Result <- data
There is data in the source and if I do the test locally in RStudio it works perfectly fine.
Anyone here with R experiences in Jedox?

A few attempts later I found the solution myself and it's of course super easy, u just have to know about it.
In the Jedox documentation the given example shows a script which indicates the returned result set is called 'result'.
Instead you can return any object, all you have to do is to name the result set in an extra field which is above the script-box.
The working script (input=output) is shown here:
rscript solution

Related

BiocParallel error: cannot open the connection, how do I fix it?

I'm trying to use the package bambu to quantify gene counts from bam files. I am using my university's HPC, so I have written an R script and a batch submission file to launch it.
When the script gets to the point of running the bambu function, it gives the following error:
Start generating read class files
| | 0%[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/KD_R3.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R1.sorted.bam.bai
[W::hts_idx_load2] The index file is older than the data file: ./results/minimap2/WT_R2.sorted.bam.bai
|================== | 25%
Error: BiocParallel errors
element index: 1, 2, 3
first error: cannot open the connection
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
Execution halted
So it looks like BiocParallel isn't happy and cannot open a certain connection, but I'm not sure how to fix this?
This is my R script:
#Bambu R script
#load libraries
library(Rsamtools)
library(bambu)
#Creating files
bamFiles<- Rsamtools::BamFileList(c("./results/minimap2/KD_R1.sorted.bam","./results/minimap2/KD_R2.sorted.bam","./results/minimap2/KD_R3.sorted.bam","./results/minimap2/WT_R1.sorted.bam","./results/minimap2/WT_R2.sorted.bam","./results/minimap2/WT_R3.sorted.bam"))
annotation<-prepareAnnotations("./ref_data/Homo_sapiens.GRCh38.104.chr.gtf")
fa.file<-"./ref_data/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
#Running bambu
se<- bambu(reads=bamFiles, annotations=annotation, genome=fa.file,ncore=4)
se
seGene<- transcriptToGeneExpression(se)
#Saving files
save.file<-tempfile(fileext=".gtf")
writeToGTF(rowRanges(se),file=save.file)
save.dir <- tempdir()
writeBambuOutput(se,path=save.fir,prefix="Nanopore_")
writeBambuOutput(seGene,path=save.fir,prefix="Nanopore_")
If you have any ideas on why this happens it would be so helpful! Thank you
I think that #Chris has a good point. Under the hood it seems likely that bambu is running htslib based on those warnings. While they may indeed only be warnings, I would like to know what the results would look like if you ran this interactively.
This question is hard to answer right now as it's missing some information (what do the files look like, a minimal reproducible example, etc.). But in the meantime here are some possibly useful questions for figuring it out:
what does bamFiles look like? Does it have the right number of read records? Do all of those files have nonzero read records? Are any suspiciously small?
What are the timestamps on the bai vs bam files (e.g. ls -lh /results/minimap2/)? Are they about what you'd expect or is it wonky? Are any of them (say, ./results/minimap2/WT_R2.sorted.bam.bai) weirdly small?
What happens when you run it interactively? Where does it fail? You say it's at the bambu() call, but how do you know that?
What happens when you run bambu() with ncores=1?
It seems very likely that this is due to a problem with the files, and it is only at the biocParallel step that the error is bubbling up to the top. Many utilities have an annoying habit of being happy to accept an empty file, only to fail confusingly without informative error messages when asked to do something with the empty file.
You might also consider raising an issue with the developers.
(why the warning is only possibly a problem: The index file sometimes has a timestamp like that for very small alignment files which are generated and indexed programmatically, where the indexing step is near-instantaneous.)

How to address an Rscript parse error: premature EOF?

Running my working R script in the windows command line (cmd) using Rscript results in a parsing error (premature EOF).
When I run the script in RStudio, it compiles and runs as expected.
I have read the Rscript page in R documentation, and I see that the problem must be due to spaces in my script itself, which probably make it into the cmd console somehow during parsing, but that's as far as I get.
Or should I have done something with the #! functionality mentioned therein?
I am trying to run it on cmd:
Rscript .\start_app.r
I am in the right working directory, and have set the folder containing Rscript in my environment.
The script is too long to share, and I am too inexperienced to give you the parts that make it break (otherwise I wouldn't be here), but it is full of functions, if statements and the like, that use curly brackets and are indented. I also often include empty rows (someteimes indented) for readability. It makes use of the shiny-package. An example could be:
islocal = nchar(Sys.getenv("LOCAL"))>1 | interactive()
if (islocal){
source('../../path/app/variables/styling.R')
} else {
source('./variables/styling.R')
}
As the example above, it also includes other R code called via source()
Can that somehow make it to the cmd line and be incorrectly compiled?
I get the following messages:
Error: parse error: premature EOF
(right here) ------^
Execution halted
Not enough memory resources are available to process this command.
(I guess the second message is an unrelated issue, but include it here just to be sure.)
As suggested in a comment, the solution was changing the encoding.
As mentionned by the requestor himself, Using "Save with Encoding -> ISO-8895-1 (System default)" solves the issue.

RSAP package to connect to SAP through R (windows)

I'd need to be able to grab data straight from into R without going through using its GUI. I've found that the RSAP package seems to be exactly what I'm looking for.
I followed the steps recommended by Piers and Alvaro Tejada Galindo (made it work on windows environment) and here is where I'm stuck:
managed to compile the RSAP package
managed to install it
everything is looking in good shape when I run library(RSAP)
whatever i try in the RSAPConnect command, my R session crashes without any log or tools to be able to debug.
Of course I've tried a few combinations of arguments in this command, but in every single case it still crashed without me knowing why. It does not matter whether i enter a valid ashost or just aaa for instance, still crashes...
Here is the code I was thinking would work (of course I added stars in there):
conn <- RSAPConnect(ashost = "*****.****.com", sysnr = "00", client = "410",
user = "*****", passwd = "*********", TRACE = "3")
Has anyone experienced something similar ? I don't even know in which direction to look to try and make this work. In fact I'd have expected some error message like "server could not be reach" for instance should the ashost not be right, but none of that happens.
I'd appreciate any assistance on this.
Thanks ahead for your support.
Kind regards
After some talking with Piers Harding, it appears that the segfault happens because of some code changes between previous version and version 3.x, which I use.
M. Alvaro Tejada Galindo also tried to use RSAP on a windows machine like me, but if you read his post, you'll see that he was using R 2.15.0 at the time.
Unfortunately I do not have the skills to locate these changes and make the required adjustments within the RSAP code.
Piers did confirm though that RSAP is still working great using R latest build for linux.
Lastly, for those like me who struggled to find the NW RFC library, you can find it on GitHub.
If this can help anyone...
Well I thought I'd add this as another answer.
It is possible to code some vba embedded in an excel file to go fetch stuff into SAP. The interesting part is that I just ran into some code to run a specific vba macro from a specific excel file, all from R :
# Open a specific workbook in Excel:
xlApp <- COMCreate("Excel.Application")
xlWbk <- xlApp$Workbooks()$Open("C:\\Excel_file.xlsm")
# Run the macro called "MyMacro"
vxlApp$Run("MyMacro")
# Close the workbook (and save it) and quit the app:
xlWbk$Close(TRUE)
vxlApp$Quit()
# Release resources:
rm(xlWbk, xlApp)
So in the end, if your macro is set up to grab and store the SAP data, all you have to do next is just read this file using XLConnect or any other package as you'd normally do, and you're all set !

How to solve this error message in rmarkdown?

I am just starting to explore the rmarkdown package. I don't use Rstudio. I use the default R environment. What I did was as follows.
I created a new R document.
Started typing few lines in rmarkdown format.
Saved the file with Rmd extension.
I saved the file in the working directory.
I installed the pandoc using the pkg file.
I installed 'rmarkdown' package. Loaded the package.
Used the following command to render the Rmd file.
rmarkdown::render("Untitled.Rmd")
I get the following error.
Error in tools::file_path_as_absolute(input) : file 'Untitled.Rmd'
does not exist
I tried all the possible ways such as giving the exact path instead of filename etc. But nothing worked out. I googled the error message and found that none had similar error. Can someone help me with this. What I am missing. What the error message mean?
Most of the time the error file not found is either a type error or a real missing file (as in your case, the real one is named in another way).
In order to discard those possibilities:
Copy the fullpath from your filebrowser.
Make sure the file exists, inside R you could type:
file.exists("/fullpath/to/file")
If that return TRUE and the error persists, then you suspect another thing is going on.

How can to get the filename from a streaming mapreduce job in R?

I am streaming an R mapreduce job and I am need to get the filename. I know that Hadoop sets environment variables for the current job before it starts and I can access env vars in R with Sys.getenv().
I found :
Get input file name in streaming hadoop program
and Sys.getenv(mapred_job_id) works fine, but it is not what I need. I just need the filename and not the job id or name. I also found: How to get filename when running mapreduce job on EC2?
But this isn't helpful either. What is the easiest way to get the current filename while streaming from R? Thank you
I have not tried this, but from the second link you provided, it seems that this is available in an environment variable called map.input.file. Then, this should work:
Sys.getenv("map.input.file")
EDIT:
Upon further investigation, I learned that you need to replace the dots with underscores, so this is the way to do it:
Sys.getenv("map_input_file")
However, the map.input.file property has been deprecated in YARN (Hadoop 2.x), so the new name should be used instead:
Sys.getenv("mapreduce_map_input_file")

Resources