Long story short, I am working on an assignment for a data visualization course and the assignment specifies that we have to use the lattice package and that we have to create a marginal histogram scatterplot. (I know that asking about homework questions is frowned upon, but I'm not asking you to write my assignment for me - only asking for guidance or at least a direction to start in).
Our lecture and book don't mention anything about marginal histogram scatterplots and while the lecture shows how to create them using the standard plot function in R as well as how to do it using ggplot2, we are told not to use either. I've never used lattice before, and when I ask for help, I get general responses that aren't helpful at all.
Note: I'm not posting the question or what type of data I have to use as I'm not looking for an answer to the homework here. Just some help on where to begin. You can literally use any data if you want to show an example.
This is definitely a tricky question in lattice as well. There are quite some compelling reasons why ggplot2 has become one of the more popular packages, while lattice is still extremely powerful. As this is part of a visualization course, I'd assume you are meant to come up with something similar to ggMarginal. For this you'll have to use some time adjusting margins on your lattice plot.
As a guideline for how I'd solve this question, I found an answer doing the following:
Search google for lattice marginal histogram, the second link is an answer to a mailing help list, which gives an example to a similar problem
Open R and following the link make a small example. Eg.
data(mtcars)
library(lattice)
scatter <- xyplot(hp ~ mpg, mtcars)
hist <- histogram(~ mpg, mtcars)
plot(scatter, more = TRUE, split = c(1, 2, 1, 2))
plot(hist, more = FALSE, split = c(1, 1, 1, 2))
after getting this far, it comes about figuring what is actually happening. The link above suggests looking at ?plot.trellis, and the importance here seems how can we move around our plots, which seems to be controlled by split. Looking at the documentation (?plot.trellis) we get some help for understanding how to use this argument
a vector of 4 integers, c(x, y, nx, ny), that says to position the current plot at the x, y position in a regular array of nx by ny plots. (Note: this has origin at top left)
From here we have everything we need to create the marginal plot, If we make this a 2x2 plot, we'd place one histogram at c(1, 1, 2, 2), a scatter plot at c(2, 1, 2, 2) and another histogram at c(2, 2, 2, 2). Of course this is not going to be the best looking marginal plot, for which you'd have to work with the margins or go under the hood and manually set up the plot using the grid package. I'd say that is definitely a bit on the "next level" side of thing.
Note:
In the above example I didn't cover how one can rotate one histogram, or how one can create a sideways histogram, if you are seeking to replicate ggMarginal more closely.
In addition as you said you had some problems finding information on this. Another option for finding an answer would've been reading the ?histogram documentation page. There are several examples within this page (and many others) which show how one can manipulate the position of lattice plots.
Related
plot.lm has a nice feature of displaying plots one after another, so when specifying
plot(lm(rnorm(100) ~ rnorm(100, 3, 1)))
displays first plot and asks user to
Hit Return to see next plot:
Now I want to generate 30 plots, so displaying them in a grid will make them hard to read, while specifying them one after another is quite cumbersome. I've been wondering if there is a function or a method to imitate plot.lm behaviour? I'm specifically interested in a function that is compatible with ggplot2.
Study stats:::plot.lm. It uses devAskNewPage.
Example:
devAskNewPage(TRUE)
for (i in 1:3) plot(i)
devAskNewPage(options("device.ask.default")[[1]])
I’m aware that how to plot multiple graphs in a same window has been solved already and it’s pretty straight forward. But I can’t find how to remove the space between the different graphs.
I use this script to arrange these three graphs in the same window:
mat<-(matrix(1:3,ncol=3))
layout(mat,widths = rep.int(1, ncol(mat)),heights = rep.int(1,nrow(mat)),respect =F)
layout.show(n = 3)
With this script I can generate this graph:
These three violin plots where obtained from a large dataset. I don’t publish the script to get them since I think is not relevant for the purpose of this question.
As you can see this figure can be improved. First the labels at the x-axis are cut from the figure, I don’t know why this is happening. Also I want to remove the space between the three plots. Thanks!
I have a data frame with several million points in it - each having two values.
When I plot this like this:
plot(myData)
All the points are plotted, but the plot is quite busy, so I thought I'd plot it as a line:
plot(myData, type="l")
But while the x axis doesn't change (i.e. goes from 0 to 7e+07), the actual plotting stops at about 3e+07 and I don't actually get a proper line plot either.
Is there a limitation on line plotting?
Update
If I use
plot(myData, type="h")
I get correct and useable output, but I still wonder why the type="l" option fails so badly.
Further update
I am plotting a time series - here is one output using type="h":
That's perfectly usable, but having a line would allow me to compare several outputs.
High dimensional data graphic representation is growing issue in data analysis. The problem, actually, is not create the graph. The problem is make the graph capable of communicate information that we could transform in useful knowledge. Allow me to present an example to produce this point, by considering a data with a million observations, that is, not that big.
x <- rnorm(10^6, 0, 1)
y <- rnorm(10^6, 0, 1)
Let's plot it. R can yes easily manage such a problem. But can we? Probably not.
Afterall, what kind of information can we deduce from an ink hard stain? Probably, no more than a tasseographyst trying to divinate the future in patterns of tea leaves, coffee grounds, or wine sediments.
plot(x, y)
A different approach is represented by the smoothScatter function. It creates a density plot of bivariate data. There, we create two examples.
First, with defaults.
smoothScatter(x, y)
Second, the bandwidth was specified to be a little larger than the default, and five points are specified to be shown using a different symbol pch = 3.
smoothScatter(x, y, bandwidth=c(5,1)/(1/3), nrpoints=5, pch=3)
As you can see, the problem is not solved. Nevertheless, we can have a better grasp on the distribution of our data. This kind of approach is still in development, and there are several matters that are discussed and evolved. If this approach represents a more suitable approach to represent your big dataset, I suggest you to visit this blog that discuss throughfully the issue.
For what it's worth, all the evidence I have is that is computer - even though it was a lump of big iron - ran out of memory.
I have a plotting problem with curves when using mixtools
Using the following R code
require(mixtools)
x <- c(rnorm(10000,8,2),rnorm(10000,18,5))
xMix <- normalmixEM(x, lambda=NULL, mu=NULL, sigma=NULL)
plot(xMix, which = 2, nclass=25)
I get a nice histogram, with the 2 normal curves estimated from the model superimposed.
The problem is with the default colours (i.e. red and green), which I need to change for a publication to be black and grey.
One way I thought to doing this was first to produce the histogram
hist(xMix$x, freq=FALSE, nclass=25)
and then add the lines using the "curve" function.
....... but I lost my way, and couldn't solve it
I would be grateful for any pointers or the actual solution
thanks
PS. Note that there is an alternative work-around to this problem using ggplot:
Any suggestions for how I can plot mixEM type data using ggplot2
but for various reasons I need to keep using the base graphics
You can also edit the colours directly using the col2 argument in the mixtools plotting function
For example
plot(xMix, which = 2, nclass=25, col2=c("dimgrey","black"))
giving the problem a bit more thought, I managed to rephrase the problem and ask the question in a much more direct way
Using user-defined functions within "curve" function in R graphics
this delivered two nice solutions of how to use the "curve" function to draw normal distributions produced by the mixture modelling.
the overall answer therefore is to use the "hist" function to draw a histogram of the raw data, then the "curve" function (incorporating the sdnorm function) to draw each normal distribution. This gives total control of the colours (and potentially any other graphic parameter).
And not to forget - this is where I got the code for the sdnorm function - and other useful insights
Any suggestions for how I can plot mixEM type data using ggplot2
Thanks as always to StackOverflow and the contributors who provide such helpful advice.
When doing matrix operations, I would like to be able to see what the results of my calculations are, at least to get a rough idea of the nature of the matrices going in and coming out of the operation.
How can I plot a matrix of real numbers, so that the x axis represents columns, the y represents rows, and the color or size of a point represents the cell value?
Ultimately, I would like to display multiple plots, e.g. the right and left hand sides of an equation.
Here is some example code:
a <- matrix(rnorm(100), ncol = 10)
b <- diag(1,10)
c <- a*b
par(mfrow = c(1,3))
plot.matrix.fn <- function(m) {
#enter answer to this question here
}
lapply(list(a,b,c), plot.matrix.fn)
update: since posting this question, I found that there are some great examples here: What techniques exists in R to visualize a "distance matrix"?
You could try something like (adjusting the parameters to your particular needs)
image(t(m[nrow(m):1,] ), axes=FALSE, zlim=c(-4,4), col=rainbow(21))
producing something like
See ?image for a single plot (note that row 1 will be at the bottom) and ?rasterImage for adding 1 or more representations to an existing plot. You may want to do some scaling or other transformation on the matrix first.
Not an answer but a longer comment.
I've been working on a package to plot matrices using grid.raster, but it's not quite ready for release yet. Your example would read,
library(gridplot)
row_layout(a, b, c)
I found that writing custom functions was probably easier than tweaking 10s of parameters in lattice or base graphics, and ggplot2 lacks some control over the axes.
However, writing graphics functions from scratch also means reinventing non-trivial things like layout and positioning; hopefully Hadley's scales and guides packages can make this easier. I'll add the functions to gridExtra when the overall design seems sound and more stable.