Population Pyramid in Plotrix freezes r [duplicate] - r

I am trying make a pyramid plot with R. The I found a example code in the internet that does what I want. The problem is that I am not working with small numbers as in the example. My plot has values of 3,000,000 to 12,000,000 but only 10 bars per side. Never the less it takes for ever create the plot with the larger numbers and output pdf file is about 800mb of size.
pyramid.plot(x,y,labels=groups,main="Performance",lxcol=mcol,rxcol=fcol,gap=0.5,show.values=TRUE)
Why is the performance so bad? Shouldn't get scaled automatically?
Update:
pdf(file='figure1.pdf')
library(plotrix)
x <-c(3105000,3400001,4168780,2842764,3543116,4224601,4222222,6432105,9222222,12345596)
y <-c(3105000,3400001,4168780,2842764,3543116,4224601,4222222,6432105,9222222,12345596)
groups <-c("g1","g2","g3","g4","g5","g6","g7","g8","g9","g11")
pyramid.plot(x,y,labels=groups,main="Performance",gap=0.5,show.values=TRUE)
dev.off()
Both the export to pdf as well as the plotting screen takes multiple minutes.

Internally, pyramid.plot is trying to do some stuff to finagle the axes accounting for the gap in the middle: if you do debug(pyramid.plot) and step through line-by-line you find where the problem is:
if (is.null(laxlab)) {
laxlab <- seq(xlim[1] - gap, 0, by = -1)
axis(1, at = -xlim[1]:-gap, labels = laxlab)
}
in other words, pyramid.plot is trying to make an axis with ticks every 1 (!) unit.
Something like this works OK:
pyramid.plot(x,y,labels=groups,
main="Performance",gap=5e5,show.values=TRUE,
laxlab=seq(0,1e7,by=1e6),raxlab=seq(0,1e7,by=1e6))
there are a few other vestiges of the fact that pyramid.plot was designed for demographic plots ... you might write to the package maintainer and ask him to think about generalizing the design of the axes a little bit ...

Related

Prevent ggplot from auto-adjusting facet sizes in R

I have this problem where R will auto-adjust the size of the facets in ggplot. In the 2 attached images, clearly, the one scaled from 0-100 on the y-axis is less stretched out compared to the one scaled at 6.6-7.2. These are plotted using the same ggplot commands from maaply, so I don't know where the difference would come from. Is there any way to prevent R from performing the auto-adjusting to keep the formatting of each ggplot the same? My OCD and I thank you.
It looks like I have made a copy and paste error where I used some the the wrong variable to set the base_height in save_plot within mapply, so the scaling factor was varying across iterations.

R plotting strangeness with large dataset

I have a data frame with several million points in it - each having two values.
When I plot this like this:
plot(myData)
All the points are plotted, but the plot is quite busy, so I thought I'd plot it as a line:
plot(myData, type="l")
But while the x axis doesn't change (i.e. goes from 0 to 7e+07), the actual plotting stops at about 3e+07 and I don't actually get a proper line plot either.
Is there a limitation on line plotting?
Update
If I use
plot(myData, type="h")
I get correct and useable output, but I still wonder why the type="l" option fails so badly.
Further update
I am plotting a time series - here is one output using type="h":
That's perfectly usable, but having a line would allow me to compare several outputs.
High dimensional data graphic representation is growing issue in data analysis. The problem, actually, is not create the graph. The problem is make the graph capable of communicate information that we could transform in useful knowledge. Allow me to present an example to produce this point, by considering a data with a million observations, that is, not that big.
x <- rnorm(10^6, 0, 1)
y <- rnorm(10^6, 0, 1)
Let's plot it. R can yes easily manage such a problem. But can we? Probably not.
Afterall, what kind of information can we deduce from an ink hard stain? Probably, no more than a tasseographyst trying to divinate the future in patterns of tea leaves, coffee grounds, or wine sediments.
plot(x, y)
A different approach is represented by the smoothScatter function. It creates a density plot of bivariate data. There, we create two examples.
First, with defaults.
smoothScatter(x, y)
Second, the bandwidth was specified to be a little larger than the default, and five points are specified to be shown using a different symbol pch = 3.
smoothScatter(x, y, bandwidth=c(5,1)/(1/3), nrpoints=5, pch=3)
As you can see, the problem is not solved. Nevertheless, we can have a better grasp on the distribution of our data. This kind of approach is still in development, and there are several matters that are discussed and evolved. If this approach represents a more suitable approach to represent your big dataset, I suggest you to visit this blog that discuss throughfully the issue.
For what it's worth, all the evidence I have is that is computer - even though it was a lump of big iron - ran out of memory.

How to do a ridiculously wide plot

I have a long time series of 10000 observations that I want to visualize. The problem is, if I just plot it normally the time-dimension will be squished and none of the fine detail of the time-series that I want to visualize will be apparent. For example:
plot((sin(1:10000/100)+rnorm(10000)/5),type='l')
What I would like is to somehow plot the following together side by side in one gigantically long plot without using par(mfrow=c(1,100)). I then want to export this very wide plot and simply scroll across to vizualise the whole series.
plot((sin(1:10000/100)+rnorm(10000)/5)[1:100],type='l')
plot((sin(1:10000/100)+rnorm(10000)/5)[101:200],type='l')
plot((sin(1:10000/100)+rnorm(10000)/5)[201:300],type='l')
.....
Eventually I would like to have 3 or 4 of these gigantically wide plots on top of each other with a par(mfrow=c(4,1)).
I know that the answer has something to do with the pin setting in par, but I keep getting Error in plot.new() : plot region too large. I'm guessing this has something to do with the interaction of pin with the other par parameters
Bonus points are awarded if we can get the pixel height and width exactly right. It is preferable that the plot doesn't skip random pixels due to the export sizing being imperfect.
Further bonus points if the image can be encoded in a .html. and viewed this way
An alternative that you might consider is svg, which will produce something of better quality than png/jpeg in any case.
Something like
svg(width = 2000, height = 7)
par(mfrow=c(4,1), mar = c(4, 4, 0, 2))
for (i in 1:4){
plot((sin(1:10000/100)+rnorm(10000)/5),type='l',
bty = "l", xaxs = "i")
}
dev.off()
will produce a very wide plot, just over 1MB in size, which renders quite nicely in Chrome.
Note the width and height are in inches here.
P.S. svg also offers the potential for interactive graphics. Just seen a nice example allowing the user to select a region of a long time series to zoom in on, see Figure 22 in Dynamic and Interactive R Graphics for the Web: The gridSVG Package, a draft paper by Paul Murrell and Simon Potter.
It could be a Cairo-specific problem, or it could be a lack of RAM on your machine. The following code works fine for me on a Windows 7 machine with 8GB RAM.
png("wide.png", width = 1e5, height = 500)
plot((sin(1:10000/100)+rnorm(10000)/5),type='l')
dev.off()
If I change the width to 1e6 pixels, then R successfully creates the file (it took about a minute), but no image viewing software that I have available can display an image that large.
I would go on some alternative route. First of all, what exactly is the point of viewing the entire plot at hi-res? If you're searching for some sort of anomalies or irregularities, well, that's what data processing is for :-) . Think about something like finding allx > 3sigma, or doing an FFT, etc.
Next, if you really want to examine the whole thing by eye, how about writing some R-TclTK code or using dynamicGraph or iplots or zoom to produce an interactive graph that you can scroll thru "live."
ETA: IIRC RStudio has tools for interactive graph scrolling and zoom as well.

Intelligent Y Axis Scaling BarPlot R

I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.

R How to make smoother looking plots of oscillations

When plotting oscillations in R, e.g., using the package desolve,
df1 <-function(t,y,mu)( list(c(y[2],mu*y[1]^3-y[1]+0.005*cos(t))))
library (deSolve)
yini<-c(y1=0,y2=0)
df2 <-ode(y=yini,func=df1, times=0:520,parms=0.1667)
plot(df2,type="l",which="y1",ylab="Displacement",xlab="Time", main="")
I get raggedy plots such as:
instead of a smooth plot (not done in R) such as:
Does anyone know of a way to obtain a smoother plot in R instead of a raggedy one when displaying oscillations? Note that it is not just a matter of the difference in scale and I am not looking for a smoothing filter.
Thanks,
I generated your plot in R and exported it as PDF. I zoomed in on it and it's quite lovely. I can't see the problem you're talking about there. Therefore, there are some scaling issues or something with a raster format that are causing the issue. Perhaps you're pasting into Word and that's giving you a raster image that's bad. The plot that R is making, at a logical level, is great in spite of the one you posted. It's even better than the comparison plot you put up.
It's possible that you're generating the plot in a raster format and not setting a high enough resolution and size. Try tiff('filname', 1200, 1200, 300) for a good raster image of it. I did notice that when exporting to raster formats it was easy to make your plot into a fine mess with default png or jpg settings that would just smear things.
Maybe you really wanted to sample in your function at a higher resolution, something not done in the comparison plot. If that's the case then it's relatively easy. Change 0:520 to seq(0, 520, 0.1). That's an even nicer plot, as shown below (much better than shown as PDF, EPS, or SVG).

Resources