I'm using Textmate as my code editor, and I would like to be able to run Julia from it. I have no problems saving the .jl file and sending it to the Terminal (via the Julia bundle in Textmate), but I was wondering if it is possible to make the session interactive, so, for example, the variables are stored while the session is running (so, for instance, I could send the code to Julia line by line, or have something like Rdaemon).
I use TextMate a lot with Julia. With Julia 1.0, everything got a lot more convenient. These are basically the steps you need to do:
Make sure you put your code in a package.
Start Julia in your terminal, Then `using YourPackage; using Revise
Revise.jl makes life a lot easier. You can work in TextMate and change the code of your functions and that will automatically get reflected in your REPL session. No need to reload. So you keep all your variables.
Occasionally you have to restart because you changed the visibility of a function or a type.
I have a more detailed explanation of my workflow in Julia 1.0 here.
Related
I have just started to learn to code on R, so I apologize for the very simple question. I understand it is best to type your code in as a Script so you can edit and save it. However, when I try to make an object in the script section, it does not work. If I make an object in the console, R saves the object and it appears in my environment. I am typing in a very simple code to try a quick exercise on rolling dice:
die <- 1:6
But it only works in the console and not when typed as a script. Any help/explanation appreciated!
Essentially, you interact with R environment differently when running an .R script via RScript.exe or via console with R.exe, Rterm, etc. and in GUI IDEs like RGui or RStudio. (This applies to any programming language with interactive compilers not just R).
The script does save thedie object in R environment but only during the run or lifetime of that script (i.e., from beginning to end of code lines). Your code line is simply an assignment of object. You do nothing with it. Apply some function, output results, and other actions in that script to see.
On the console, the R environment persists interactively until you quit it with q(). So assigned objects remains for lifetime of your console session. After assigning, you can afterwards apply function, output results, or other actions in line by line calls.
Ultimately, scripts gathers all line by line code in advance of run for automated execution without relying on user to supply lines. Imagine running 1,000 lines of code with nested if/then or for/while loops, apply functions on console! Therefore, have all your R coding needs summarily handled in scripts.
It is always better to have the script, as you say, you can save edit correct, without having to rewrite the code to change a variable or number.
I recommend using Rstudio, it is very practical and will help you to program more efficiently and allows you to see, among other things, the different objects that you have created.
I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.
Despite numerous searches, I can't seem to find a clear explanation as to what "Source on Save" means in RStudio.
I have tried ?source and the explanation there isn't clear, either.
As far as I can tell, it seems to run the script when I hit Save, but I don't understand the relevance/significance of it.
In simple terms, what exactly does Source on Save do and why would/should I use it?
This is kind of a shortcut to save and execute your code. You type something, save the script and it will be automatically sourced.
Very useful for short scripts but very annoying for time consuming longer scripts.
So sourcing is basically running each line of your file.
EDIT:
SO thinking of a scenario where this might be useful...
You developing a function which you will later put into a package... So you write this function already in an extra file but execute the function for testing in the command line...
Normally, you have to execute the whole function again, when you changed something. While using "Source on Save" the function will be executed and you can use Ctrl + 2 to jump into command line and test the function directly.
Since I am working with R, my datasets are much bigger. But I am remembering starting coding in python and vi, I updated my setting in a way to execute the code on save, since these little scripts where done in less then 10 seconds...
So maybe it is just not standard to work with small datasets... But I can still recommend it, for development, to use only 10% of a normal dataset. It will speed up the graphics creation and a lot of other things as well. Test it with the complete dataset every now and then.
I wrote an R function that updates the version number of a package in another question. I work a lot with GitHub and RStudio, and it would safe me quite some time (plus be much more precise) if this function was automatically run every time I opened a certain project (or better yet, make a git commit/push, but I assume that is harder to do). But I don't know how to do this or if this is even possible.
I could use .Rprofile to run R codes every time I start R, so I could just update versions whenever I start R (or build in that it only updates the version if the date is not today or something) but that seems overdoing it.
You can make a separate .Rprofile for each project. You have to put it in the main directory of the project (http://www.rstudio.com/ide/docs/using/projects).
Well I would use .Rprofile for that. There is something to be said for being independent of the tool chain around you: knitr works from RStudio as well as without it, dito for Rcpp/RInside etc pp.
You can hook into commit hooks for svn, both explicitly via hooks in the back end, or simply at your by end adding wrapper scripts. I presume you can do likewise with git but I simply know much less about it. So to abstract this away, I would write myself a 'commitThis' or 'pushThis' or ... function that does the number increment, test run, code push and what have you.
If your code needs RStudio to be already running (e.g. because it's relying on some rstudioapi:: function), putting it directly in .Rprofile won't work (.Rprofile is executed before RStudio is available).
Instead, you could set a hook for "rstudio.sessionInit":
setHook(
hookName = "rstudio.sessionInit",
action = function(newSession) {
if (newSession) {
# your code goes here
},
action = "append"
)
I wrote an R function that updates the version number of a package in another question. I work a lot with GitHub and RStudio, and it would safe me quite some time (plus be much more precise) if this function was automatically run every time I opened a certain project (or better yet, make a git commit/push, but I assume that is harder to do). But I don't know how to do this or if this is even possible.
I could use .Rprofile to run R codes every time I start R, so I could just update versions whenever I start R (or build in that it only updates the version if the date is not today or something) but that seems overdoing it.
You can make a separate .Rprofile for each project. You have to put it in the main directory of the project (http://www.rstudio.com/ide/docs/using/projects).
Well I would use .Rprofile for that. There is something to be said for being independent of the tool chain around you: knitr works from RStudio as well as without it, dito for Rcpp/RInside etc pp.
You can hook into commit hooks for svn, both explicitly via hooks in the back end, or simply at your by end adding wrapper scripts. I presume you can do likewise with git but I simply know much less about it. So to abstract this away, I would write myself a 'commitThis' or 'pushThis' or ... function that does the number increment, test run, code push and what have you.
If your code needs RStudio to be already running (e.g. because it's relying on some rstudioapi:: function), putting it directly in .Rprofile won't work (.Rprofile is executed before RStudio is available).
Instead, you could set a hook for "rstudio.sessionInit":
setHook(
hookName = "rstudio.sessionInit",
action = function(newSession) {
if (newSession) {
# your code goes here
},
action = "append"
)