I'm not sure if this is a CV question or SO, so I apologize if it falls within the CV domain.
Problem
I know it's possible to microbenchmark specific chunks of R code, but is there any benchmark-ing tool for an entire Jupyter Notebook? I could just run the entire notebook and time it manually, but I'd like more statistics and precision on the timing for which the microbenchmark package provides (I'm trying to make a case for automation of data analyses and visualizations).
The other dilemma (an overall issue with notebooks) is that my notebook is divided into many individual cells, so the option of benchmark-ing in the Jupyter environment might be inefficient (forcing me to export all code and then running a microbenchmark on it say in R Studio).
Desired Solution
An efficient way to benchmark an entire, multi-celled JupyterLab Notebook.
Related
What is the difference between Pluto.jl and Jupyter Notebooks?
How do I decide which I should prefer for teaching students?
Is there a performance difference?
I only found https://www.reddit.com/r/Julia/comments/kxdjzh/pluto_vs_jupyter_notebook/, which does not include many details.
From Pluto.jl github page:
A Pluto notebook is made up of small blocks of Julia code (cells) and together they form a reactive notebook. When you change a variable, Pluto automatically re-runs the cells that refer to it. ...
The main difference between using Pluto.jl and Jupyter Notebooks with Julia is that Pluto.jl is dynamic. It re-runs all the above cells according to the last cell. Whereas outputs of Jupyter Notebook only change when the code blocks that created them change
I find the largest pro of Pluto to be, that by design notebooks the same resuts. In Jupyter it is quite easy to create results that are dependent on the cell-execution order, which is hard to do in Pluto.
In jupyter you can chose to execute cells one-by-one, which can be beneficial, if you have large calculations going. This can not be done in Pluto, but one learns to take this into consideration, when writing notebooks.
Support for slides is very good for Jupyter. For Pluto only proof of concepts exist as far as I am aware.
Both are performant.
For teaching students I personally prefer Pluto. Some of the restrictions imposed make it much simpler to debug these notebooks (results do not depend on cell execution order). Also the notebook is basically a standard Julia source code file, which can be manipulated easily using any text editor. Pluto's Reactivity is also great in an educational setting, as it encourages students to play around with the notebooks. Lastly, there is the MIT course Introduction to Computational Thinking, which uses these notebooks for lectures and exercises and they are a great inspiration on how to use Pluto notebooks for teaching. I hope these insights are what you were looking for.
I have installed irkernel following standard instruction. When trying to work on a large dataset (9M rows * 70 cols), it takes forever to run an easy command such as print(db[1,1]) or head(db) in jupyter-notebook irkernel.
I could run these commands in seconds on rstudio.
Is there anything I could improve in order to work on this dataset efficiently through jupyter-notebook? What could be the potential problem?
Thanks a lot.
I'm using R (3.x) within Jupyter to perform some statistical analysis of a clinical study.
The flow of subjects is pretty complex and I would like to draw the Patient flow chart directly in my notebook. Like this one :
Why ?
To make sure the subjects count is consistent with the code, and not having to maintain sync between the R code and this figure.
I've tried DiagrammeR (won't install for some reason), RGrahpviz (too complex and doesn't meet the requirements).
A perfect solution might come from mermaid but I couldn't find a way to have it integrated with my Jupyter/R notebooks.
I'm not using a python Jupyter notebook but a R notebook.
You have mentioned that DiagrammeR won't install, but I would still provide one solution using DiagrammeR anyway--in case you have solved the installation issue. Suppose after you have made the graph using DimagrammeR and obtained a grViz object named myGraph. To display it under jupyter notebook, I use the following
require('DiagrammeR')
require('DiagrammeRsvg') #export grViz objects to svg
require('IRdisplay')
mySVG<-export_svg(myGraph)
display_svg(data=mySVG)
I am running a CART decision tree on a training set which I've tokenized using quanteda for a routine text analysis task. The resulting DFM from tokenizing was turned into a dataframe and appended with the class attribute I am predicting for.
Like many DFMs, the table is very wide (33k columns), but only contains about 5,500 rows of documents. Calling rpart on my training set returns a stack overflow error.
If it matters, to help increase the speed of calculations, I am using the doSNOW library so I can run the model on 3 out of 4 of my cores in parallel.
I've looked at this answer but can't figure out how to do the equivalent on my mac workstation to see if the same solution would work for me. There is a chance that even if I increase the ppsize of RStudio, I may still run into this error.
So my question is how do I increase the maxppsize of RStudio on a mac, or more generally, how can I fix this stack overflow so I can run my model?
Thanks!
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
I would like to run some Matlab scripts. Nevertheless we don't have the Matlab licence so it is necessary a conversion from Matlab to R language. Unfortunately I'm totally new in Matlab but not in R. Is it possible to read Matlab scripts using R or is there an easy way to translate Matlab scripts in R?
Rewriting from one language to another can be a painstaking process, especially because your have to take great care that the outcomes of both sets of codes are the same. I see roughly four approaches:
Digest the goal of the scripts, put aside the matlab code, and rewrite in R
Try and mimic the matlab code in R
Run the matlab code in octave, and interface with R
Run the code in Octave entirely
These are roughly in order of amount of work. If you just want to get the Matlab code working, definitely use Octave, which should run the code with minimal changes. If you want to convert the code to R, and continue developing in R, I would go for the first option. In that way you can leverage the real strenghts of R, as R is quite different (link with info, comparison R and matlab). But it does take the largest amount of time. Even if you reimplement in R, I would recommend getting the code running in Octave to be able to see if your results in R fit with the Matlab code.