I'm trying to plot graphs using plotly on EMR Jupyterhub Notebook however the graphs are not being rendered in Pyspark kernel. (Note: Python kernel renders the graph just fine)
Sample code I am trying:
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()
I am able to plot a graph with %%display sparkmagic however I am not able to figure out if we can get plotly working with %%display sparkmagic -
import random
data = [('Person:%s' % i, i, random.randint(1, 5)) for i in range(1, 50)]
columns = ['Name', 'Age', 'Random']
spark_df = spark.createDataFrame(data, columns)
%%display
spark_df
Has anyone tried this successfully? Please advise.
This is the limitation of sparkmagic. You would have to resort to %%local magic. From sparkmagic docs.
Since all code is run on a remote driver through Livy, all structured data must
be serialized to JSON and parsed by the Sparkmagic library so that it can be
manipulated and visualized on the client side. In practice this means that you
must use Python for client-side data manipulation in %%local mode.
I have a Juypter Notebook cell with some code (not mine) that generates a Bokeh plot. Last command in the cell is a Bokeh show() command. What I want is for the plot to appear in the output cell below, and then to be able to continue running subsequent cells in the notebook.
What happens instead is that when I run the cell the plot appears below, but the cell with the code continues to run (asterisk in the left-hand brackets) and never stops. So I can't continue to run any other cells unless I click the stop button (which leaves the plot intact in the output cell, but follows it with many lines of error trace-back messages).
My understanding is that executing the output_notebook() function should result in show() giving the behavior I want, as described here:
https://docs.bokeh.org/en/latest/docs/user_guide/notebook.html
In their screenshot, the cell with the show() command has clearly finished running (no asterisk).
I'm not sure if it makes any difference, but in my case I am running the notebook on a remote server and using port-forwarding to view it on my laptop computer browser.
Does anyone know how I can get the show() command to finish running?
For reference, this is the code:
# Use the PCA projection
dset = proj_pca
# Select the dimensions in the projection
dim1 = 0
dim2 = 1
p = figure(plot_width=945, plot_height=560)
p.title.text = 'PCA projection: PC' + str(dim1+1) + ' vs PC' + str(dim2+1)
for cont in continents:
for pop in pop_by_continent[cont]:
projections_within_population = dset[indices_of_population_members[pop]]
p.circle(projections_within_population[:,dim1], projections_within_population[:,dim2],
legend=name_by_code[pop], color = color_dict[pop])
#p.legend.visible = False
p.legend.location = "top_center"
p.legend.click_policy="hide"
output_file("interactive_pca.html", title="PCA projection")
output_notebook()
#save(p)
show(p)
I'm trying to execute the below code that display a plot:
using Plots, Measures
pyplot()
data = [rand(100), rand(100)];
histogram(data, layout = 2,
title = ["Dataset A" "Dataset B"], legend = false,
ylabel = "ylabel", margin = 5mm)
I do not get any output once I execute it from the command line!!
Knowing that I tried it in 3 different ways and it works (Jupyter, Juni, Julia session), though I'm confused about the way it works in the Juno, kinldy see my observations below:
With Jupyter it is functioning perfectly:
With Juno, I've to run the file TWICE!! the first time looks to be for compilation, second time for execution, maybe!!
And if I closed the plot, I've to close the Julia session, restart it, then re-execute the file TWICE!! and sometimes nothing is appearing!!
With Julia session, it take time for first time execution, then if I closed the plot and run it again, it is appearing smoothly.
The commands like histogram or plot typically don't display the plots for users, they only generate and return the plots.
What displays the plots is actually the display system in Julia.
When Julia is in interactive use, like in REPL, Jupyter and Juno, the display system will be invoked automatically with commands not ending with ";". That's why you see plots displayed in REPL, Jupyter and Juno.
But when executing the file from the command line, the display system is not automatically activated. So you first have to invoke display yourself like this:
using Plots, Measures
pyplot()
data = [rand(100), rand(100)];
h = histogram(data, layout = 2,
title = ["Dataset A" "Dataset B"], legend = false,
ylabel = "ylabel", margin = 5mm)
display(h)
But even this will not give you the picture, but only text representation of the plot. This is because in command line julia, only a very simple text display system is in place, and it doesn't have "full" support for plots from Plots. To display the plots, you have to write your own display mechanism and push it to Julia display system, which is not hard but a little tedious. I will give an example when I have more time.
BTW, if you just want plots generated from command line, another way is to save it to files, which is more direct than making a display mechanism yourself.
Update
Here is a simple display, which mimics the display system used in Julia REPL. The code here is for Julia 0.7/1.0.
const output = IOBuffer()
using REPL
const out_terminal = REPL.Terminals.TerminalBuffer(output)
const basic_repl = REPL.BasicREPL(out_terminal)
const basic_display = REPL.REPLDisplay(basic_repl)
Base.pushdisplay(basic_display)
Using it before the previous code will show the plot. Please note that you use pyplot() backend for Plots, whose default is to open a new window and display the plot in the window, but when the julia finish execution in the command line, it will close the plot window. To deal with this, we could either change ways for the default display, or use another backend to show the plot, for example, plotly() backend will display the plot through html. The complete code may look like following:
const output = IOBuffer()
using REPL
const out_terminal = REPL.Terminals.TerminalBuffer(output)
const basic_repl = REPL.BasicREPL(out_terminal)
const basic_display = REPL.REPLDisplay(basic_repl)
Base.pushdisplay(basic_display)
using Plots, Measures
plotly()
data = [rand(100), rand(100)];
h = histogram(data, layout = 2,
title = ["Dataset A" "Dataset B"], legend = false,
ylabel = "ylabel", margin = 5mm)
display(h)
Use readline() to get such destination that close the plot window by typing enter.
using GR
y = rand(20,1)
p = plot(y,linewidth=2,title="My Plot")
readline()
I am trying to create a plot and eventually save it as a file. But because I am making a lot of changes and want to test it out, I want to be able to view and save the plot at the same time. I have looked at this page to do what I want to do but in my system, it does not seem to be working as it is supposed to.
Here are my codes:
png('Save.png')
sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
plot(Y ~ X, data = sample.df)
dev.copy(png, 'Save.png')
dev.off()
There are several issues (I am new to R so I might be missing something entirely):
(1) When I use png(), I cannot view the plot in RStudio so I used dev.copy() but it does not allow me to view my plot in R studio
(2) Even after I use dev.off(), I cannot view the saved file until I close the RStudio (says "Windows Photo Viewer can't open this picture because the picture is being edited in another program"). I need to restart every time so it is very inconvenient.
What am I doing wrong and how could I view and view saved file without restarting RStudio every time? Thank you in advance!
Addition
Based on Love Tätting's comments, when I run dev.list(), this is what I get.
> png('Save.png')
>
> sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
+ X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
+ Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
>
> plot(Y ~ X, data = sample.df)
>
> dev.copy(png, 'Save.png')
png
3
> dev.off()
png
2
> dev.list()
png
2
> dev.off()
null device
1
> dev.list()
NULL
Why do I not get RStudioGD?
RStudio has its own device, "RStudioGD". You can see it with dev.list(), where it by default is the first and only one.
R's design for decoupling rendering and backend is by the abstraction of devices. Which ones you can use is platform and environment dependent. dev.list() shows the stack of current devices.
If I understand your problem correctly you want to display the graph first in RStudio, and then decide whether you want to save it or not. Depending on how often you save th image you could use the 'export' button in the plot pane in RStudio and save it manually.
Otherwise, your choice of trying to copy it would be the obvious one for me as well.
To my knowledge the device abstraction in R does not allow one to encapsulate the device as an object, so one for example could make it an argument to a function that does the actual plot. Since dev.set() takes an index as argument, passing the index as argument will be dependent on state of the stack of devices.
I have not come up with a clean solution to this myself and have sometimes retorted to bracketing the plot rendering code with a call to a certain device and saving it right after, and switching device depending on a global.
So, if you can, use RStudios export functionality, otherwise an abstraction would need to maintain the state of the global stack of devices and do extensive testing of its state as it is global and you cannot direct a plot call to a certain device, it simply plots to the current device (to my knowledge).
Edit after OP comment
It seems that it is somewhat different behaviour you are experiencing if you cannot watch the file after dev.off, but also need to quit RStudio. For some type of plot frameworks there is a need to call print on the graphical object to have it actually print to the file. Perhaps this is done by RStudio at shutdown as part of normal teardown procedures of open devices? In that ase the file should be empty if you forcibly look in its contents before quiting RStudio.
The other thing that sometimes work is to call dev.off twice. I don't know exactly why, but sometimes more devices get created than I have anticipated. After you have done dev.off, what does dev.list show?
Edit after OP's edit
I can see that you do, png(); dev.copy(); dev.off(). This will leave you with one more device opened than closed. You will still have the first graphics device that you started open as can be seen when you do the listing. You can simply remove dev.copy(). The image will be saved on dev.off() and should be able to open from the filesystem.
As to why you cannot see the RStudio graphics device, I am not entirely sure. It might be that other code is messing with your device stack. I would check in a clean session if it is there to make sure other code isn't tampering with the device stack. From RStudio forums and other SO questions there seem to have been plot pane related problems in RStudio that have resolved after updating RStudio to the latest. If that is a viable solution for you I would try that.
I've just added support for RStudio's RStudioGD device to the developer's version of R.devices package (I'm the author). This will allow you to do the following in RStudio:
library("R.devices")
sample.df <- data.frame(
group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10)
)
figs <- devEval(c("RStudioGD", "png"), name = "foo", {
plot(Y ~ X, data = sample.df)
})
You can specify any set of output target types, e.g. c("RStudioGD", "png", "pdf", "x11"). The devices that output to file will by default write the files in folder figures/ with filenames as <name>.<ext>, e.g. figures/foo.png in the above example.
The value of the call, figs, holds references to all figures produced, e.g. figs$png. You can open them directly from R using the operator !. For example:
> figs$png
[1] "figures/foo.png"
> !figs$png
[1] "figures/foo.png"
The latter call should show the PNG file using your system's PNG viewer.
Until I submit these updates to CRAN, you can install the developer's version (2.15.1.9000) as:
remotes::install_github("HenrikBengtsson/R.devices#develop")
I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.