Plotting - pandas - distribution in boxplots and norm distribution in histograms - plot

I'd like to add distribution to boxplot when using it with pandas dataframe like this:
In [52]: df = DataFrame(rand(10,5))
In [53]: plt.figure();
In [54]: bp = df.boxplot()
but this generates these:
and I would like something like this:
is it possible using pandas? Thanks
Same with histograms, for example:
plt.figure()
pd.tools.plotting.hist_frame(fr_q, color="k", alpha=0.5,bins=20, figsize=fgsize)
and now I would like to insert "kde". It's easy for single plot, for ex.:
plt.figure()
a.hist(normed=True)
a.plot(kind="kde")
but how to added to every subplot?
Thanks

http://nbviewer.ipython.org/urls/gist.github.com/fonnesbeck/5850463/raw/a29d9ffb863bfab09ff6c1fc853e1d5bf69fe3e4/3.+Plotting+and+Visualization.ipynb
here is a good resource for plotting

Related

Clustering with only two variables?

I want to cluster my two-dimensional dataset, but I couldn't figure it out. My dataset looks like below,
dt<-data.frame(x=c(rnorm(10, 2,1), rnorm(10, 6,1)), categorize=c(rep(1,10), rep(2,10)))
I just want to plot this dataset like the graph below, if I add the third value like c(1:nrow(dt)) does it work or what do you recommend me?

In R, how do i make one variable and column the X axis of a histogram

There are 7 different tree types and I'd like to find out which tree is most climbed. I would like to plot it in a histogram but I don't know how to make that variable into the x axis.
This is the data I'm working with
This is what I want the end result to look like
Using ggplot2 package, that's what you are searching for :
ggplot(df,aes(x=Tree,y=count)) + geom_bar(stat="identity")
With df being that dataframe of your screen.

Creating a Bland-Altman plot for data in two columns in data frame

I have a data frame data_2 and wish to create a Bland-Altman plot to compare the differences between the data in the columns alog1 vs. dig1.
Please help with the function for this and how to execute this. Would the function be barplot()?
Thanks for your time.
Another name for a Bland-Altman plot is a Tukey mean-difference plot. (I have nothing against Bland and Altman, but I think 'mean-difference' is more descriptive.) Note that this different from a boxplot (observe the pictures on the two Wikipedia pages). The mean-difference plot is simply a regular scatterplot, except that instead of plotting x versus y, you are plotting the difference x-y against the mean of x and y (or in your case, alog1 and dig1). Probably the easiest way to make this is to form these two new variables first, and then simply plot them as you would any other scatterplot. Here is some sample code:
mn <- (data_2$alog1 + data_2$dig1)/2
dif <- data_2$alog1 - data_2$dig1
plot(mn, dif)
If you wanted to add arguments to customize your plot, you could do that just as you normally would, for example:
plot(mn, dif, main="Bland-Altman plot", xlab="mean of alog1 & dig1",
ylab="difference between alog1 & dig1")

Is it possible to use more than two characters as points in a plot

I am trying to plot points in a plot where each dot is represented by a number. However, it seems that the points can only be one character long, as you can see in the plot produced by the code below:
set.seed(1); plot(rnorm(15), pch=paste(1:15))
I wonder if there is any workaround for this. Thanks.
set.seed(1); plot(rnorm(15), pch=paste(1:15),type='n')
text(x=1:15,y=rnorm(15),label=round(rnorm(15),2))
another grid option using lattice for example:
dat <- data.frame(x=1:15,y=rnorm(15))
xyplot(y~x,data=dat,
panel=function(x,y,...){
panel.xyplot(x,y,...)
panel.text(x,y,label=round(rnorm(15),2),adj=2,col='red')})

Producing statistics over levels

I've generated a set of levels from my dataset, and now I want to find a way to sum the rest of the data columns in order to plot it while plotting my first column. Something like:
levelSet <- cut(frame$x1, "cutting")
boxplot(frame$x1~levelSet)
for (l in levelSet)
{
x2Sum<-sum(frame$x2[levelSet==l])
}
or maybe the inside of the loop should look like:
lines(sum(frame$x2[levelSet==l]))
Any thoughts? I am new to R, but I can't seem to get a hang of the indexing and ~ notation thus far.
I know r doesn't work this way, but I'd like functionality that 'looks' like
hist(frame$x2~levelSet)
## Or
hist(frame$x2, breaks = levelSet)
To plot a histograph, boxplot, etc. over a level set:
Try the lattice package:
library(lattice)
histogram(~x2|equal.count(x1),data=frame)
Substitute shingle for equal.count to set your own break points.
ggplot2 would also work nicely for this.
To put a histogram over a boxplot:
par(mfrow=c(2,1))
hist(x2)
boxplot(x2)
You can also use the layout() command to fine-tune the arrangement.

Resources