Cloudera Workbench string enoding problem - r

I am pulling changes from a git repo where my coworker pushed R codes from his local windows
word <- gsub("=gesellschaftmitbeschränkterhaftung=","",fixed = T,x = word)
The code contains weird letters, such as "German Umlaute", e.g., "ä" in the example above. On windows, this works fine. But when I open the same code on the Cloudera Data Science Workbench, it messed up the special character:
word <- gsub("=gesellschaftmitbeschr�nkterhaftung=","",fixed = T,x = word)
I could manually replace it, but that is obviously a very painful solution and defeats the purpose of git. Is there any way to circumvent this issue? Find here the original R code, as pushed to Git from the windows machine containing all lines that cause issues.

Related

Global paths conversions from WIndows to MACOS (R language example)

I tried to find some already existing solutions - no success so far.
The problem
There are some projects, all run in R, by a group of people, where each team member uses Windows as the main operating system.
Nearly each script file uses the following command at the very beginning
setwd("Z://00-00-00/path/to/project")
What is used here is some common disc space under the path Z://00-00-00/. Since I work on MAC OS my paths are /common-drive/path/to/project the question is:
Is there a way to include a command/script in some sort of file like ~/.bashrc or maybe some R-related settings that will convert Windows-like absolute file paths to paths that are MAC OS-like when they detect it?
What I think should run is:
path.to.be.used <- "Z://00-00-00/path/to/project"
str_replace(path.to.be.used, "Z://00-00-00/", "/common-drive/")
however, all scripts have the path hard-coded directly in setwd, so I cannot change each file by hand. That is why I am trying to find out some workaround that will convert these paths in a "silent mode".
Does anyone have an idea how to do this? Any way to make a control on system or R-studio level if the path should be converted?
Thank you for you time and help!
As others said in the comments, you should convince your co-workers not to do that. However, that's often difficult, so here's a hack solution (mentioned by #MrFlick):
setwd <- function(dir) {
newdir <- sub("Z://00-00-00/", "/common-drive/", dir)
cat("Requested ", dir, ", using ", newdir, "\n")
base::setwd(newdir)
}

RSAP package to connect to SAP through R (windows)

I'd need to be able to grab data straight from into R without going through using its GUI. I've found that the RSAP package seems to be exactly what I'm looking for.
I followed the steps recommended by Piers and Alvaro Tejada Galindo (made it work on windows environment) and here is where I'm stuck:
managed to compile the RSAP package
managed to install it
everything is looking in good shape when I run library(RSAP)
whatever i try in the RSAPConnect command, my R session crashes without any log or tools to be able to debug.
Of course I've tried a few combinations of arguments in this command, but in every single case it still crashed without me knowing why. It does not matter whether i enter a valid ashost or just aaa for instance, still crashes...
Here is the code I was thinking would work (of course I added stars in there):
conn <- RSAPConnect(ashost = "*****.****.com", sysnr = "00", client = "410",
user = "*****", passwd = "*********", TRACE = "3")
Has anyone experienced something similar ? I don't even know in which direction to look to try and make this work. In fact I'd have expected some error message like "server could not be reach" for instance should the ashost not be right, but none of that happens.
I'd appreciate any assistance on this.
Thanks ahead for your support.
Kind regards
After some talking with Piers Harding, it appears that the segfault happens because of some code changes between previous version and version 3.x, which I use.
M. Alvaro Tejada Galindo also tried to use RSAP on a windows machine like me, but if you read his post, you'll see that he was using R 2.15.0 at the time.
Unfortunately I do not have the skills to locate these changes and make the required adjustments within the RSAP code.
Piers did confirm though that RSAP is still working great using R latest build for linux.
Lastly, for those like me who struggled to find the NW RFC library, you can find it on GitHub.
If this can help anyone...
Well I thought I'd add this as another answer.
It is possible to code some vba embedded in an excel file to go fetch stuff into SAP. The interesting part is that I just ran into some code to run a specific vba macro from a specific excel file, all from R :
# Open a specific workbook in Excel:
xlApp <- COMCreate("Excel.Application")
xlWbk <- xlApp$Workbooks()$Open("C:\\Excel_file.xlsm")
# Run the macro called "MyMacro"
vxlApp$Run("MyMacro")
# Close the workbook (and save it) and quit the app:
xlWbk$Close(TRUE)
vxlApp$Quit()
# Release resources:
rm(xlWbk, xlApp)
So in the end, if your macro is set up to grab and store the SAP data, all you have to do next is just read this file using XLConnect or any other package as you'd normally do, and you're all set !

Extracting git information in rstudio

I'm trying my hand at some reproducible research in RStudio and with Rmarkdown. Mostly because I'm too lazy to paste figures into powerpoint or word over and over. grin
One thing that I think is very important with reproducible research is recording exactly which version of the RMarkdown document produced the report. Often such documents go through many revisions, and in addition, they might pull in multiple other source files or data from the repository. So, insert the git commit SHA, and record if the repository is clean or dirty.
But despite RStudio knowing about git, it doesn't seem to make this information available through any API calls. Or am I missing something?
Other than shelling out to git by hand, what are my options?
I don't think RStudio provides this information either but you can retrieve it easily with a system call like this, for example :
docVersion <- system("git log -n 1 --pretty=oneline", intern = TRUE)
repoStatus <- system("git status -s", intern = TRUE)
You just have to specify the format you want in git log and maybe fiddle a bit with git statusto get the exact information you want.

Check if there is a newer version of my local file in Github, with R

In short: I need to get the date of last change in a file hosted on Github.
In long: given that in Github I have a file (an R workspace) that once in a while is updated, I would like to create a function in R that checks if my local file is older than the one in the repo (if you're curious, my motivation is exposed at the end of this post). This is the file I'm talking about.
In principle it should be somewhat easy, since every file has a history page associated with it, but my knowledge is far too poor to know what to do with this. Also, this Q seems to hint at some way of doing what I want using php, but that's terra incognita for me really, so I don't know if it could help in any way.
So, as I said in the short version of this post, I need to find a way to retrieve the date of the last commit for this file. I can find some way to compare it to the commit date of my local file afterwards.
Thanks in advance,
Juan
motivation: I'm working in a an online course in R basics which uses a system for self-checking if solutions of exercises are correct (i.e.: students can check their results instantly). This system uses a file with functions and data that is regularly updated because I often find bugs and new problems. So my goal is to have a function to tell the students if there is a newer file available. It would also be neat to find a way to download it and replace the older, but that is secondary now.
The problem is to keep the git-time of the download. The solution below sets the file time to the Git date after each download for the next check.
library(RCurl)
library(rjson)
destination = "datos" # assume current directory
repo = "https://api.github.com/repos/jumanbar/Curso-R/"
path = "ejercicios-de-programacion/rep-3/datos"
myopts = curlOptions(useragent="whatever",ssl.verifypeer=FALSE)
d = fromJSON(getURL(paste0(repo,"commits?path=",path),
useragent="whatever",ssl.verifypeer=FALSE))[[1]]
gitDate = as.POSIXct(d$commit$author$date)
MustDownload = !file.exists(destination) | file.info(destination)$mtime > gitDate
if (MustDownload){
url = d$url
commit = fromJSON(getURL(url, .opts=myopts))
files = unlist(lapply(commit$files,"[[","filename"))
rawfile = commit$files[[which(files==path)]]$raw_url
download.file(rawfile,destination,quiet=TRUE)
Sys.setFileTime(destination,gitDate)
print("File was downloaded")
}
It looks like from R the useragent and ssl.verifypeer is required; works without from the command line. If you are security-conscious, there is documentation on that subject floating around, but I took the easy path to commit.
It seems you need a local clone of the github repo. Forgetting language specifics of R for the moment (I don't know R), in git you can get the most recent date in a number of ways through git log. From the git log help file (git help log), under the Placeholders section:
%cd: committer date
%cD: committer date, RFC2822 style
%cr: committer date, relative
%ct: committer date, UNIX timestamp
%ci: committer date, ISO 8601 format
You can retrieve the UNIX timestamp (seconds since the start of January 1st, 1970 - very easily comparable) of the most recent commit for your file, starting from the project root, with the following git log command:
git log --format=%ct -1 -- ejercicios-de-programacion/rep-3/datos
That returns a number, e.g. 1368691710, but you can use the other formats listed as well.
Now you just need to find a way to make this system call from R, with your project root as the working directory. This SO post may help (but again, I don't R).
Perhaps you can make use of the "git status" command (which tells you if there are new commits) im combination with cronjobs. But you need a local clone for this. And I never tried to use the output of the command inside a cronjob.

Starting R and calling a script from a batch file

I have an R-based GUI that allows some non-technical users access to a stats model. As it stands, the users have to first load R and then type loadGui() at the command line.
While this isn't overly challenging, I don't like having to make non-technical people type anything at a command line. I had the idea of writing a .bat file (users are all running Windows, though multi-platform solutions also appreciated) that starts R GUI, then autoruns that command.
My first problem is opening RGui from the command line. While I can provide an explicit path, such as
"%ProgramW6432%\R\R-2.15.1\bin\i386\Rgui.exe"
it will need updating each time R is upgraded. It would be better to retrieve the location of RGui from the %path% environment variable, but I don't know an easy way to parse that.
The second, larger problem is how to call commands for R on startup from the command line. My first thought is that I could take a copy of ~/.Rprofile, append the extra command, and then replace the original copy of the file once R is loaded. This is awfully messy though, so I'd like an alternative.
Running R in batch mode isn't an option, firstly since I can't persuade GUIs to display themselves, and secondly because I would like the R console available, even if the users shouldn't need to use it.
If you want a toy GUI to test your ideas, try this:
loadGui <- function()
{
library(gWidgetstclck)
win <- gwindow("test")
rad <- gradio(letters[1:3], cont = win)
}
Problem 1: I simply do not ever install in the suggested default directory on Windows, but rather group R and a few related things in, say, c:/opt/ where I install R itself in, say,c:/opt/R-current so that the path c:/opt/R-current/bin will remain constant. On upgrade, I first renamed to R-previous and then install into a new R-current.
Problem 2: I think I solved that many moons ago with scripts. You can now use Rscript.exe to launch these, and there are tcltk examples for waiting for a prompt.
I have done similar a couple of times. In my cases the client was using windows so I just installed R on their computer and created a shortcut on their desktop to run R. Then I right click on the shortcut and choose properties to get the propertiest dialog. I then changed the "Start in" folder to the one where I wanted it to run from (which had the .Rdata file with the correct data and either a .First function in the .Rdata file or .Rprofile in the folder). There is also a "Run:" option that has a "Minimized" option to run the main R window minimized.
I had created the functions that I wanted to run (usually a specialized gui using tcltk) and any needed data and saved them in the .Rdata file and also either created .First or .Rprofile to run the comnand that showed the gui. The user double clicks on the icon on the desktop and up pops my GUI that they can work with while ignoring the other parts.
Take a look at the ProjectTemplate library. It does what you want to do. It loads used libraries from a batch file an run R files automatically after loading as well as a lot of other usefull stuff as well...
Using the answer from https://stackoverflow.com/a/27350487/41338 and a comment from Richie Cotton above I have arrived at the following solution to keeping a script alive until a window is closed by checking if the pointer to the window is valid.
For a RGtk2 window created and shown using:
library(RGtk2)
mainWindow <- gtkWindow("toplevel", show = TRUE)
Create a function which checks if the pointer to it exists:
isnull <- function(pointer){
a <- attributes(pointer)
attributes(pointer) <- NULL
out <- identical(pointer, new("externalptr"))
attributes(pointer) <- a
return(out)
}
and at the end of your script:
while(!isnull(mainWindow)) Sys.sleep(1)

Resources