Plotting Fisher's Iris Dataset with colorcode - julia

I want to plot something like Fisher's Iris Dataset With Sepal length (y) against Sepal width (x) and colorcoding type. There is a screenshot from Julialang's homepage with minimal sourcecode:
Which I suspect originates from Gadfly's demo, but since I'm new to the language I have no idea how this works. How can I reproduce the plot from scratch?

Not sure I understand the question, but to reproduce the plot you posted, you can do more or less exactly what is shown in the IJulia code cells (although the screen grab could be a little out of date, given that the example in the Gadfly docs is called slightly differently). In any case, you can just start a REPL and do:
using RDatasets, Gadfly
plot(dataset("datasets","iris"), x="SepalWidth", y="SepalLength", color="Species")
This should open a new tab in your browser, containing precisely the plot in the first output cell.
(NB: This assumes that you have RDatasets and Gadfly installed, otherwise you have to Pkg.add() them first obviously.)

Related

Issues with combining different (continuous and ordinal) plot types into one plot

I am preparing a figure for a paper presenting data for 2 different experiments in one plot. For that reason I don't need a legend for every plot, so I try to combine them with ggdraw from cowplot.
My code
should generate a reproducible example
and gives this output:
It seems like the two figures get the same slot (A) and the legend gets slot (B). Typically, I would probably use facet wrap to plot them together (which should also guarantee that the scaling/legend is consistent across the two plots.), but that will probably not work in this case, as I am trying to add an additional figure type to C and D.
The problem is that this figure type is ordinal so I have used a somewhat “hacky” approach to plot it, giving me this figure looking essentially as I want it to:
I so far have not been able to extract to another element that ggdraw can use.
Ideally the final plot should roughly look like this (of course with different labels):
How would you go about plotting these different types together?
Thank you for taking time to read my question and I hope that you can help me. I now it is quite a mouth full, but I was not sure how I meaningfully could reduce it to smaller chunks.

creating multiple file types while plotting

I would like to produce a series of plots in both high-resolution and low-resolution versions, or stated differently using two different file types (.png and .eps). I'd like to know the best/least repetetive way to do this. I am using the gplot function in sna, and the plot has a custom legend outside the plot area. I wrote a function something like this:
library(sna)
plotfun <- function(net){
png("test.png",width=800)
p <- gplot(net)
par(xpd=T)
legend(max(p[,1])+1,max(p[,2]),legend=letters[1:10],title="custom legend")
dev.off()
seteps()
postscript(test.eps)
#repeat all the plotting commands, which are much longer in real life
dev.off()
}
#try it with some random data
plotfun(rgraph(10))
This is perfectly functional but seems inefficient and clumsy. The more general version of this question is: if for any reason I want to create a plot (including extra layers like my custom legend), store it as an object, and then plot it later, is there a way to do this? Incidentally, this question didn't seem sna specific to me at first, but in trying to reproduce the problem using a similar function with plot, I couldn't get the legend to appear correctly, so this solution to the outside-the-plot-area legend doesn't seem general.
I would recommend generate graphs only in Postscript/PDF from R and then generate bitmaps (e.g. PNG) from the Postscript/PDF using e.g. ImageMagick with -density parameter (http://www.imagemagick.org/script/command-line-options.php#density) set appropriately to get desired resolution. For example
convert -density 100 -quality 100 picture.pdf picture.png
assuming picture.pdf is 7in-by-7in (R defaults) will give you a 700x700 png picture.
With this approach you will not have to worry that the picture comes out formatted differently depending which R device (pdf() vs png()) is used.

ggplot2 geom_violin with 0 variance

I started to really like violin plots, since they give me a much better feel that box plots when you have funny distributions. I like to automatize a lot of stuff, and thus ran into a problem:
When one variable has 0 variance, the boxplot just gives you a line at that point. Geom_violin however, terminates with an error. What behavior would I like? Well, either put in a line or nothing, but please give me the distributions for the other variables.
Ok, quick example:
dff=data.frame(x=factor(rep(1:2,each=100)),y=c(rnorm(100),rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin()
yields
Error in `$<-.data.frame`(`*tmp*`, "n", value = 100L) :
replacement has 1 row, data has 0
However, what works is:
ggplot(dff,aes(x=x,y=y)) + geom_boxplot()
Update:
The issue is resolved as of yesterday: https://github.com/hadley/ggplot2/issues/972
Update 2:
(from question author)
Wow, Hadley himself responded! geom_violin now behaves consistently with geom_density and base R density.
However, I don't think the behavior is optimal yet.
(1) The 'zero' problem
Just run it with my original example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rnorm(100), rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Yielding this:
Is the plot on the right an appropriate representation of 'all zeroes'? I don't think so. It is better to have trimming that produces a single line to show that there is no variation in the data.
Workaround solution: Add a + geom_boxplot()
(2) I may actually want TRIM=TRUE.
Example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rgamma(100,1,1), rep(0,100) ))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Now I have non-zero data, and standard kernel density estimates don't handle this correctly. With trim=T I can quickly see that the data is strictly positive.
I am not arguing that the current behavior is 'wrong', since it's in line with other functions. However, geom_violin may be used in different contexts, for exploring different data.frames with heterogeneous data types (positive+skewed or not, for instance).
Three options for dealing with this until the ggplot2 issue is resolved:
As a quick hack, you can set one of the y-values to 0.0001 (instead of zero) and geom_violin will work.
Check out the vioplot package if you're not set on using ggplot2. vioplot doesn't throw an error when you feed it a bunch of identical values.
The Hmisc package includes a panel.bpplot (box-percentile plot) function that can create violin plots with the bwplot function from the lattice package. See the Examples section of ?panel.bpplot. It produces a single line when you feed it a vector of identical values.

Change plot size of pairs plot in R

I have this pairs plot
I want to make this plot bigger, but I don't know how.
I've tried
window.options(width = 800, height = 800)
But nothing changes.
Why?
That thing's huge. I would send it to a pdf.
> pdf(file = "yourPlots.pdf")
> plot(...) # your plot
> dev.off() # important!
Also, there is an answer to the window sizing issue in this post.
If your goal is to explore the pairwise relationships between your variables, you could consider using the shiny interface from the pairsD3 R package, which provides a way to interact with (potentially large) scatter plot matrices by selecting a few variables at a time.
An example with the iris data set:
install.packages("pairsD3")
require("pairsD3")
shinypairs(iris)
More reference here
I had the same problem with the pairs() function. Unfortunately, I couldn't find a direct answer to your question.
However, something that could help you is to plot a selected number of variables only. For this, you can either subset the default plot. Refer to this answer I received on a different question.
Alternatively, you can use the pairs2 function which I came across through this post.
To make the plot bigger, write it to a file. I found that a PDF file works well for this. If you use "?pdf", you will see that it comes with height and width options. For something this big, I suggest 6000 (pixels) for both the height and width. For example:
pdf("pairs.pdf", height=6000, width=6000)
pairs(my_data, cex=0.05)
dev.off()
The "cex=0.05" is to handle a second issue here: The points in the array of scatter plots are way too big. This will make them small enough to show the arrangements in the embedded scatter plots.
The labels not fitting into the diagonal boxes is resolved by the increased plot size. It could also be handled by changing the font size.

accessing shape attribute for points when making NVD3 scatterChart with nplot/rCharts

How do you set the shape attribute for points when building a scatterChart with nplot from rCharts? Point size can be set by providing a column in the input dataframe named "size" but if there's a corresponding "shape" column consisting of strings such as "square" or "cross" the resulting graph still has the default circle points. New to R and NVD3 so I apologize for my lack of vocabulary.
It appears the newest version of nvd3 no longer works the same way as the old version. See for example. The screenshot shows shapes, and the data has shape:, but only circles are rendered in the actual chart. Also, the tests do not produce anything other than circles. I glanced at the source, and I could not find where or how to set shape. If you know how to do with nvd3, I could easily translate into a rCharts example.
I don't have a reputation of 50, but I'd like to comment.
Line 18 in this NVD3 example(Novus.github) shows how it's currently done. Likewise, all you need to do with the live code(nvd3.org) is uncomment the 'size' line in the data tab.
I attempted making a column in my df named 'shape', and using n1 <- nPlot(x~y, data=df, shape='shape', type='scatterChart'); n1$chart(onlyCircles=FALSE); and a number of other combinations. I've only spent the last two days working with rCharts but have made some exciting progress. I'm giving up on this but found it curious that these two examples weren't mentioned here, so I thought I'd mention them.
I know this question is a bit "ancient" but I faced the same problem and it took me a while to find out how to change the shapes.
I followed the approach in this example for changing the size:
nvd3 scatterPlot with rCharts in R: Vary size of points?
Here my solution:
library(rCharts)
df=data.frame(x=rep(0:2,3),y=c(rep(1,3),rep(2,3),rep(3,3)),
group=c(rep("a",3),rep("b",3),rep("c",3)),shape=rep("square",9))
p <- nPlot(y~x , group = 'group',data = df, type = 'scatterChart')
#In order to make it more pleasant to look at
p$chart(xDomain=c(-0.5,2.5))
p$chart(yDomain=c(0,4))
p$chart(sizeRange = c(200,200))
#here the magic
p$chart(scatter.onlyCircles = FALSE)
p$chart(shape = '#! function(d){return d.shape} !#')
p

Resources