I'm trying to plot graphs using plotly on EMR Jupyterhub Notebook however the graphs are not being rendered in Pyspark kernel. (Note: Python kernel renders the graph just fine)
Sample code I am trying:
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()
I am able to plot a graph with %%display sparkmagic however I am not able to figure out if we can get plotly working with %%display sparkmagic -
import random
data = [('Person:%s' % i, i, random.randint(1, 5)) for i in range(1, 50)]
columns = ['Name', 'Age', 'Random']
spark_df = spark.createDataFrame(data, columns)
%%display
spark_df
Has anyone tried this successfully? Please advise.
This is the limitation of sparkmagic. You would have to resort to %%local magic. From sparkmagic docs.
Since all code is run on a remote driver through Livy, all structured data must
be serialized to JSON and parsed by the Sparkmagic library so that it can be
manipulated and visualized on the client side. In practice this means that you
must use Python for client-side data manipulation in %%local mode.
Related
I'm using DeployR server 8.0.5 as R API. I have R script deployed on that server which is using library : ggplot2. From what i know ggplot will store the plot on server only if i call
save(p, file = "plot.rdata")
or
ggsave("plot.png", width = 5, height = 5)
Looks like that print() function stores the svg into R database,here is an example how i'm generating SVG :
File_Name <- "MyPlot.svg"
myfile <- paste(File_Name)
svg(myfile,width=5, height=5, pointsize = 12)
print(My_Plot) #contains ggplot() result
dev.off()
result$plot <- myfile
return(result)
The problem is that the DeployR database is going huge. Looks like every response from DeployR is stored in db as byte[] in table : file_content .. i have a lot of request and respectively my DeployR database is going huge.
One possible solution is to clear the db manually from time to time , but in general i want to change the behaviour . I don't understand why result is stored in database ? I just want to return the result , no need of storing the data.
So what i found is gridSVG library ,but i'm not sure how to use it in my case ,i'm not able to find proper example? Also find one more library : svglite , but again i'm not able to use it in my case.
I am trying to create some R visuals in Power BI using the googleVis and/or plotly libraries, but no matter what I do, I can’t get Power BI to display anything. It always just says “No image was created. The R code didn’t result in creation of any visuals. Make sure your R script results in a plot to the R default device.” The issue occurs with plotly and googleVis libraries, so I think it may have something to do with the fact that they’re both browser-based outputs. Per Microsoft, plotly is supported in Power BI. I was hoping someone could tell me why I can’t get any of these example scripts to work in Power BI.
Example code which works in R, but not pbi.
plotly
library(plotly)
p <- plot_ly(midwest, x = ~percollege, color = ~state, type = "box")
p
googleVis
df=data.frame(country=c("US", "GB", "BR"),
val1=c(10,13,14),
val2=c(23,12,32))
Line <- gvisLineChart(df)
plot(Line)
Now, we can create RHTML custom visual in power BI
1) we can use Ploty - to use it you have load you "midwest" data in power BI table
2) then drag it to R script so it will available to R Script .
3) then run R script , it will work
You can render charts created by plotly as PNG:
p <- plot_ly(x = dataset$period, y = dataset$mean, name = "spline", line = list(shape = "spline"))
plotly_IMAGE(p, format = "png", out_file = "out.png")
But the problem with this is that, though rendered by plotly, the visualizations will not be interactive since it's just a PNG image.
If you want to create interactive visualizations using plotly. The only way you can do so far is to create a custom Power BI visualization and import it to your report. See this post for a good introduction.
I'm trying to use Bokeh to have several live plots in a Jupyter notebook. I understand that I cannot use push_notebook() since it will only update the last figure. Is there another way to do it as of today?
push_notebook updates the last cell by defatult, if you don't pass it an argument. But it also accepts a "notebook handle" that show returns when it renders a cell.
# in one cell
p = figure(**opts)
r = p2.circle([1,2,3], [4,5,6])
h = show(p)
# in a different cell
r.glyph.fill_color = "white"
push_notebook(handle=h)
See the Basic Usage example notebook in the GitHub repo.
I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.
Is there a way of creating animated graphs. For example showing the same graph, with different parameters.
For example is SAGE notebook, one can write:
a = animate([circle((i,i), 1-1/(i+1), hue=i/10) for i in srange(0,2,0.2)],
xmin=0,ymin=0,xmax=2,ymax=2,figsize=[2,2])
a.show()
This has horrible flickering, but at least this creates a plot that animates for me. It is based on Aron's, but Aron's does not work as-is.
import time, sys
from IPython.core.display import clear_output
f, ax = plt.subplots()
n = 30
x = array([i/10.0 for i in range(n)])
y = array([sin(i) for i in x])
for i in range(5,n):
ax.plot(x[:i],y[:i])
time.sleep(0.1)
clear_output()
display(f)
ax.cla() # turn this off if you'd like to "build up" plots
plt.close()
Update: January 2014
Jake Vanderplas has created a Javascript-based package for matplotlib animations available here. Using it is as simple as:
# https://github.com/jakevdp/JSAnimation
from JSAnimation import examples
examples.basic_animation()
See his blog post for a more complete description and examples.
Historical answer (see goger for a correction)
Yes, the Javascript update does not correctly hold the image frame yet, so there is flicker, but you can do something quite simple using this technique:
import time, sys
from IPython.display import clear_output
f, ax = plt.subplots()
for i in range(10):
y = i/10*sin(x)
ax.plot(x,y)
time.sleep(0.5)
clear_output()
display(f)
ax.cla() # turn this off if you'd like to "build up" plots
plt.close()
IPython widgets let you manipulate Python objects in the kernel with GUI objects in the Notebook. You might also like Sage hosted IPython Notebooks. One problem you might have with sharing widgets or interactivity in Notebooks is that if someone else doesn't have IPython, they can't run your work. To solve that, you can use Domino to share Notebooks with widgets that others can run.
Below are three examples of widgets you can build in a Notebook using pandas to filter data, fractals, and a slider for a 3D plot. Learn more and see the code and Notebooks here.
If you want to live-stream data or set up a simulation to run as a loop, you can also stream data into plots in a Notebook. Disclaimer: I work for Plotly.
If you use IPython notebook, v2.0 and above support interactive widgets. You can find a good example notebook here (n.b. you need to download and run from your own machine to see the sliders).
It essentially boils down to importing interact, and then passing it a function, along with ranges for the paramters. e.g., from the second link:
In [8]:
def pltsin(f, a):
plot(x,a*sin(2*pi*x*f))
ylim(-10,10)
In [9]:
interact(pltsin, f=(1,10,0.1), a=(1,10,1));
This will produce a plot with two sliders, for f and a.
If you want 3D scatter plot animations, the Ipyvolume Jupyter widget is very impressive.
http://ipyvolume.readthedocs.io/en/latest/animation.html#
bqplot is a really good option to do this now. its built specifically for animation through python in the notebook
https://github.com/bloomberg/bqplot
On #goger's comment of 'horrible flickering', I found that calling clear_output(wait=True) solved my problem. The flag tells clear_output to wait to render till it has something new to render.
matplotlib has an animation module to do just that. However, examples provided on the site will not run as is in a notebook; you need to make a few tweaks to make it work.
Here is the example of the page below modified to work in a notebook (modifications in bold).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from matplotlib import rc
from IPython.display import HTML
fig, ax = plt.subplots()
xdata, ydata = [], []
ln, = plt.plot([], [], 'ro', animated=True)
def init():
ax.set_xlim(0, 2*np.pi)
ax.set_ylim(-1, 1)
return ln,
def update(frame):
xdata.append(frame)
ydata.append(np.sin(frame))
ln.set_data(xdata, ydata)
return ln,
ani = FuncAnimation(fig, update, frames=np.linspace(0, 2*np.pi, 128),
init_func=init, blit=True)
rc('animation', html='html5')
ani
# plt.show() # not needed anymore
Note that the animation in the notebook is made via a movie and that you need to have ffmpeg installed and matplotlib configured to use it.