Im very new to Julia and was trying to save my session (all the values, including functions for example) and didnt see any easy way. There seems to be a pretty complete low level write function for ints, floats, arrays, etc. But it doesnt, for example, write a DataFrames. Is there an easy way to do this or do I need to code all this from scratch? Im using V0.2.1.
Have you tried using the iJulia notebook? This might be useful for what you're describing. https://github.com/JuliaLang/IJulia.jl
You can do this with HDF5.jl. I don't know how well it works for functions, but it should work fine for data frames and any other native Julia type.
For functions you want to keep, I would probably just define them in a regular .jl file and include("def.jl") at the start of the session, for example.
Checkout out the Julia Data Format https://github.com/JuliaIO/JLD.jl
It can both save specific julia types as well as types you created yourself, and has macros to save your entire workspace at once.
I think it can be in Julia Data format (JLD).
https://github.com/JuliaIO/JLD.jl
If you have own data fromat e.g. type model
type Model
version::String
id::String
equations::Vector{Equation}
coefs::Vector{Matrix}
end
You can save it with command
using JLD
save("MODEL.jld", "modelS", model1)
and read as
pathReport = joinpath(homedir(),".julia/v0.5/foo/test")
m = JLD.load(joinpath(pathReport, "MODEL.jld"))
model2 = m["modelS"]
model2.equations[1].terms[2] == "EX_01"
Related
I am developing a package in R (3.4.3) that has some internal data and I would like to be able to let users refresh it without having to do too much work. But let me be more specific.
The package contains several functions that rely on multiple parameters but I am really only exporting wrapper functions that do all the work. Because there is a large number of parameters to pass to those wrapper functions, I am putting them in a .csv file and passing them as a list. The data is therefore a simple .csv file called param.csv with the names of the parameters in one column and their value in the next column
parameter_name,parameter_value
ticker, SPX Index
field, PX_LAST
rolling_window,252
upper_threshold,1
lower_threshold,-1
signal_positive,1
signal_neutral,0
signal_negative,-1
I process it running
param.df <- read.csv(file = "param.csv", header = TRUE, sep = ",")
param.list <- split(param.df[, 2], param.df[, 1])
and I put the list inside the package like this
usethis::use_data(param.list, overwrite = TRUE)
Then, when reset the R interactive window and execute
devtools::document()
devtools::load_all()
devtools::install()
require(pkg)
everything works out great, the data is available and can be passed to functions.
First problem: when I change param.csv, save it and repeat the above four lines of code, the internal param.list is not updated. Is there something I am missing here ? Or is it by nature that the developer of the package should run usethis::use_data(param.list, overwrite = TRUE) each time he changes the .csv file where the data comes from?
Because it is intended to be a sort of map, the users will want to tweak parameters for (manual) model calibration. To try and solve this issue I let users provide the functions with their own parameter list (as a .csv file like before). I have a function get_param_list("path/to/file.csv") that does exactly the same processing that is done above and returns the list. It allows the users to pass their own parameter list. De facto, the internal data param.list is considered a default setting of parameters.
Second Problem: I'd like to let users modify this default list of parameters param.list from outside the package in a safe way.
So since the internal object is a list, the users can simply modify the element of their choice in the list from outside the package. But that is pretty dangerous in my mind because users could
- forget parameters, so I have default values inside the functions for that case
- replace a parameter with another type which can provoke an error or worse, side effects.
Is there a way to let users modify the internal list of parameters without opening the package ? For example I had thought about a function that could replace the .csv file by a new one provided by the user but this brings me back to the First Problem, when I restart the R prompt and reinstall the package, nothing happens unless I usethis::use_data(param.list, overwrite = TRUE) from inside the package. Would another type of data be more useful (e.g. make the data internal) ?
This is my first question on this website but I realize it's quite long and might be taken as a coding style question. But if anyone can see an obvious mistake or misunderstanding of package development in R from my side that would really help me.
Cheers
I am in the painful process of transitioning from MATLAB to R, and still coming to terms with not having a neatly arranged MathWorks website to consult.
When writing MATLAB functions, they are stored in a local drive and can be accessed in my source code (as long as the function is in the active directory).
When writing a function in R, I need to "run" it, so it is stored in the global environment, then I can use it. Surely there is a 'nicer' way of doing this, as I will need to refer to many, many functions. Can I seemingly "hide" them so I don't have to see them, but always know they exist?
Thanks in advance
source('F:\\RWorkingDirectory\\my_functions.r') or you create your own R package which is very easy to do with Rstudio.
Thanks for the suggestions. I have decided to set up an environment instead.
E.g.,
Set up R script with my desired function(s) called MainFunctions.R
Add to .Rprofile:
e <- new.env()
source("MainFunctions.R",local=e)
attach(e)
Now I need to simply edit this file, and not worry about having to load them in, or create a package.
My aim is to better organize the work done by a R code.
In particular it could be useful to split the R code I have written in different R files, perhaps with each R file accomplishing to a different task. I have in mind what we can do in Matlab with different M files, where we can easily call functions written in different M files directly from the main code.
Is it useful to write this R files in the form of functions?
How can we call these R files /functions in the main code?
Thanks
You can use source("filename.R") to include the file in your main script.
I am not sure if there is a ready function to include an entire directory, but it is straightforward to write using list.files() and then call source dynamicly for each filename. You can also filter files to only list *.R for example.
Unless you intend to write an R package, you should rethink your organization. R is not Matlab, thank goodness! You can place as many functions as you wish into a single file, and make them all available in your environment with source foo.r . If you are writing a collection of generic functions and don't want to build a package, this really is the cleaner way to go.
As a side thought, consider making some of your tools more flexible by adding more input arguments. You may find that you don't really need so many separate functions/files. As a trivial example, if you have some function do_it_double , another do_it_integer , and yet another do_it_character , all of which do basically the same thing, just merge them into a single do_it_all(x,y,datatype='double') and override the default datatype as desired. (I know this can be done with internal input validation. I'm just giving an example)
Your approach might be working good. I would recommend to wrap the code in a function and use one R file for one R function.
It might be interesting to look at the packages devtools and ProjectTemplate which aim to help organizing R code.
A very simple question:
I am writing and running my R scripts using a text editor to make them reproducible, as has been suggested by several members of SO.
This approach is working very well for me, but I sometimes have to perform expensive operations (e.g. read.csv or reshape on 2M-row databases) that I'd better cache in the R environment rather than re-run every time I run the script (which is usually many times as I progress and test the new lines of code).
Is there a way to cache what a script does up to a certain point so every time I am only running the incremental lines of code (just as I would do by running R interactively)?
Thanks.
## load the file from disk only if it
## hasn't already been read into a variable
if(!(exists("mytable")){
mytable=read.csv(...)
}
Edit: fixed typo - thanks Dirk.
Some simple ways are doable with some combinations of
exists("foo") to test if a variable exists, else re-load or re-compute
file.info("foo.Rd")$ctime which you can compare to Sys.time() and see if it is newer than a given amount of time you can load, else recompute.
There are also caching packages on CRAN that may be useful.
After you do something you discover to be costly, save the results of that costly step in an R data file.
For example, if you loaded a csv into a data frame called myVeryLargeDataFrame and then created summary stats from that data frame into a df called VLDFSummary then you could do this:
save(c(myVeryLargeDataFrame, VLDFSummary),
file="~/myProject/cachedData/VLDF.RData",
compress="bzip2")
The compress option there is optional and to be used if you want to compress the file being written to disk. See ?save for more details.
After you save the RData file you can comment out the slow data loading and summary steps as well as the save step and simply load the data like this:
load("~/myProject/cachedData/VLDF.RData")
This answer is not editor dependent. It works the same for Emacs, TextMate, etc. You can save to any location on your computer. I recommend keeping the slow code in your R script file, however, so you can always know where your RData file came from and be able to recreate it from the source data if needed.
(Belated answer, but I began using SO a year after this question was posted.)
This is the basic idea behind memoization (or memoisation). I've got a long list of suggestions, especially the memoise and R.cache packages, in this query.
You could also take advantage of checkpointing, which is also addressed as part of that same list.
I think your use case mirrors my second: "memoization of monstrous calculations". :)
Another trick I use is to do a lot of memory mapped files, which I use a lot of, to store data. The nice thing about this is that multiple R instances can access shared data, so I can have a lot of instances cracking at the same problem.
I want to do this too when I'm using Sweave. I'd suggest putting all of your expensive functions (loading and reshaping data) at the beginning of your code. Run that code, then save the workspace. Then, comment out the expensive functions, and load the workspace file with load(). This is, of course, riskier if you make unwanted changes to the workspace file, but in that event, you still have the code in comments if you want to start over from scratch.
Without going into too much detail, I usually follow one of three approaches:
Use assign to assign a unique name for each important object throughout my execution. Then include an if(exists(...)) get(...) at the top of each function to get the value or else recompute it. (same as Dirk's suggestion)
Use cacheSweave with my Sweave documents. This does all the work for you of caching computations and retrieves them automatically. It's really trivial to use: just use the cacheSweave driver and add this flag to each block: <<..., cache=true>>=
Use save and load to save the environment at crucial moments, again making sure that all names are unique.
The 'mustashe' package is great for this kind of problem. In addition to caching the results, it also can include links to dependencies so that the code is re-run if the dependencies change.
Disclosure: I wrote this tool ('mustashe'), though I do not make any financial gains from others using it. I made it for this exact purpose for my own work and want to share it with others.
Below is a simple example. The foo variable is created and "stashed" for later. If the same code is re-run, the foo variable is loaded from disk and added to the global environment.
library(mustashe)
stash("foo", {
foo <- some_long_running_opperation(1e3)
}
#> Stashing object.
The documentation has additional examples of more complex use-cases and a detailed explanation of how it works under the hood.
Let's say I have an R source file comprised of some functions, doesn't matter what they are, e.g.,
fnx = function(x){(x - mean(x))/sd(x)}
I would like to be able to access them in my current R session (without typing them in obviously). It would be nice if library("/path/to/file/my_fn_lib1.r") worked, as "import" works in Python, but it doesn't. One obvious solution is to create an R Package, but i want to avoid that overhead just to import a few functions.
Use the source() command. In your case:
source("/path/to/file/my_fn_lib1.r")
Incidentally, creating a package is fairly easy with the package.skeleton() function (if you're planning to reuse this frequently).