plot multiple line segments on one graph using R - r

How can I duplicate this style of graph, with multiple plots on one graph, and, preferably, legends attached as below.
I have tried the concept of "facet" but ggplot2 and trellis:xyplot both think of facets as separate panels rather than overlaid plots.
I can do it using plain Jane plot() and line().. but was using ggplot2 and woudl like to get multiple lines on one plot in that package.
Here is some example data in long form (captured from the plot using a nifty app called "Graphclick")
comp <- read.table(pipe("pbpaste"), header=T, sep=',')
company, year, sales
Apple,1975.003,17298.457
Apple,1977.302,16784.502
Apple,1978.314,17298.457
Apple,1980.246,20730.098
Apple,1981.533,27608.426
Apple,1984.293,40862.852
Apple,1986.408,50468.617
Apple,1987.328,48236.188
Apple,1988.892,35676.547
Apple,1989.904,34616.582
Apple,1991.192,44732.742
Apple,1992.387,44732.742
Apple,1993.399,39055.324
Apple,1995.791,37894.922
Apple,1996.895,39648.746
Apple,1998.274,52804.367
Apple,1999.378,61399.512
Apple,2001.770,2.350e5
Apple,2005.265,7.735e5
Toshiba,1999.378,86856.6
Toshiba,2001.862,1.192e5
Toshiba,2004.069,1.495e5
Toshiba,2004.069,1.495e5
IBM,1975.003,22019.092
IBM,1975.830,27195.193
IBM,1976.934,30682.320
IBM,1978.130,31148.527
IBM,1980.430,35676.547
IBM,1981.625,35676.547
IBM,1983.005,39648.746
IBM,1985.305,40862.852
IBM,1986.408,46102.508
IBM,1987.512,64241.156
IBM,1989.996,75832.898
IBM,1991.100,84276.039
IBM,1992.295,85556.641
IBM,1993.307,79342.539
IBM,1994.779,79342.539
IBM,1995.791,84276.039
IBM,1996.895,95082.484
IBM,1996.895,95082.484
Commodore,1975.003,33588.051
Commodore,1975.830,34616.582
Commodore,1977.118,25219.982
Commodore,1978.130,23388.229
Commodore,1979.326,25992.234
Commodore,1980.521,21689.514
Commodore,1981.717,25219.982
Commodore,1984.201,6999.029
Commodore,1985.213,1670.460
Commodore,1986.408,1458.447
(source: asymco.com)

If you're looking for the most control, you could just use the low-level plot and lines commands. Use "plot" to generate the first graph (with title, xlimits, and ylimits), then use "lines" to add lines to that graph.
plot(0,type="n", xlim=c(0,10), ylim=c(0,10), xlab="X Label", ylab="Y Label", main="Title")
Then add lines using the lines command:
lines(1:10, 1:10, type="l", lty=2)
lines(2:4, 10:8, col=2, type="l")
lines(6:9, c(5,6,5,6), col=3, type="l")
You can fine-tune the look by using all of the parameters listed in the "par" help file ("?par")

so, in ggplot2, this code works
qplot(year, sales, data=comp, colour=as.factor(company), group= company, geom="path", log="y")
The only things left now is to format the value on the Y axis as numeric (not sci notation), and the labels are in an off-graph legend, rather than on the plots... Final suggestions welcomed.
This is a lot easier in the end than plot() + lines(), as that required support code to get the ranges, iterate over the group levels etc.

Related

How to create histogram plot in ggplot2 without data frame?

I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))

R plot and barplot how to fix ylim not alike?

I try to use base R to plot a time series as a bar plot and as ordinary line plot. I try to write a flexible function to draw such a plot and would like to draw the plots without axes and then add universal axis manually.
Now, I hampered by strange problem: same ylim values result into different axes. Consider the following example:
data(presidents)
# shorten this series a bit
pw <- window(presidents,start=c(1965))
barplot(t(pw),ylim = c(0,80))
par(new=T)
plot(pw,ylim = c(0,80),col="blue",lwd=3)
I intentionally plot y-axes coming from both plots here to show it's not the same. I know I can achieve the intended result by plotting a bar plot first and then add lines using x and y args of lines.
But the I am looking for flexible solution that let's you add lines to barplots like you add lines to points or other line plots. So is there a way to make sure y-axes are the same?
EDIT: also adding the usr parameter to par doesn't help me here.
par(new=T,usr = par("usr"))
Add yaxs="i" to your lineplot. Like this:
plot(pw,ylim = c(0,80),col="blue",lwd=3, yaxs="i")
R start barplots at y=0, while line plots won't. This is to make sure that you see a line if it happens that your data is y=0, otherwise it aligns with the x axis line.

R plot overlay barplot with plot type "p" (confused with factors)

I'm new to R. Previously, I've been able to overlay 2 separate plots that were of the same kind, p1 and p2, using plot (p1); plot (p2, add=T).
I'm struggling with the definition of factors when overlaying a barplot with a point plot showing all individual points.
I can individually plot the barplot as I want it. The point plot looks like I want it, but I realize I'm using an incorrect definition of phase as numerical to force R plot to display each value, rather than default to a boxplot (like when I use plot(my.df$cond, my.df$val).
Any tips on defining my variable types correctly or whether I'm using the correct barplot and plot functions, would be greatly appreciated. Thank you so much.
shpad <- c(1,2,5,6,1,2,5,6,1,2,5,6,1,2,5,6)
my.df <- data.frame(val=c(0.0738,0.0518,0.002,0.0397,0.1452,0.1152,0.1774,0.0658,0.0218,0.0497,-0.0296,0.0653,0.0848,0.1296,0.1416,0.0923,
phase=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
sub=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
cond=c("NsNm", "NsNm", "NsNm", "NsNm", "NsLm", "NsLm", "NsLm", "NsLm", "LsNm", "LsNm", "LsNm", "LsNm", "LsLm", "LsLm", "LsLm", "LsLm"))
avg <-tapply(my.df$val, my.df$phase, mean)
barplot(avg, border=NA, names.arg=c("NsNm", "NsLm", "LsNm", "LsLm"),col=c("blue","darkblue","red", "darkred"),ylab = "score",ylim=c(-0.03,0.25))
plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad)
tl;dr: problem is that if instead of the last line, I have plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad, add=T), the formats are incongruent.
Alright, so, I've tried for a bit to accomplish what you wanted, but the best I could do with the base plotting system is this:
Which is accomplished purely by your lines of code above except for the last line, which I replaced with
points(my.df$phase,my.df$val,type="p",pch=shpad)
However, I think you can do much better, if you want to keep the same kind of plot, using the ggplot2 library. Using this code:
library('ggplot2')
new.df <- data.frame(avg,phase=levels(factor(phase)))
ggplot(new.df) +
geom_bar(stat="identity",aes(x=levels(phase),y=avg, fill=c("NsNm","NsLm","LsNm","LsLm")))+
geom_point(aes(x=my.df$phase,y=my.df$val,shape=factor(shpad))) +
scale_x_discrete(name="Type",labels=c("NsNm","NsLm","LsNm","LsLm")) +
ylab("Score")
you can make this chart:
I didn't adjust the coloring and the point types and the legend titles (not sure how important they are, but those can be fiddled with). However, you can see this probably produces the result you were aiming for.

r program side-by-side boxplots

I have three different boxplots,
k1<-boxplot(decreased$Group.1)
k2<-boxplot(unchanged$Group.1)
k3<-boxplot(created$Group.1)
Is there any way I can make side-by-side boxplot with it or do I have to combine the columns for table together and use ~ to find out side by side?
It can happen but you will need to play with the xlim, ylim, at and add arguments.
See this example:
boxplot(1:10, xlim=c(1,6), ylim=c(0,20), at=1.5)
boxplot(2:10, add=TRUE, at=3.5)
boxplot(3:20, add=TRUE, at=5.5)
So, you need to add the x-limits and y-limits on the first plot along with the location of where to plot the first barplot (specified by at). Then consecutive barplots need the location (i.e. again at) and also the add=TRUE argument.

Resources