Reason for strange gap in the plot - r

I have this plot of Depths vs time:
This plot has a strange gap at the start of May.
I checked the data but there are no NAs or Nans or no missing data.
This is a time series of regular interval of 15 minutes
I cannot give the dataset here since it contains 10,000 rows.
Can somebody please give suggestions as what possibly it can be?
I am using the following plotting code:
library(zoo)
z=read.zoo("data.txt", header=TRUE)
temp=index(z[,1])
m=coredata(z[,1])
x=0.001
p=rep.int(x,length(temp))
png(filename=paste(Name[k],"_mean1.png", sep=''), width= 3500, height=1600, units="px")
par(mar=c(13,13,5,3),cex.axis= 2.5, cex.lab=3, cex.main=3.5, cex.sub=5)
plot(temp,m, xlab="Time", ylab="Depth",type='l', main=Name[k])
symbols(temp,m,add=TRUE,circles=p, inches=1/15, ann=F, bg="steelblue2", fg=NULL)
dev.off()

Okay, here's a guess from what you have posted.
I'm guessing there is no data for a period right at the start of May where the 'gap' in question pops up. There are no NAs because there just aren't any lines of data for this period at all. There is still a thin black line drawn to the plot by this line of code which links the 'gap' in data...
plot(temp,m, xlab="Time", ylab="Depth",type='l', main=Name[k])
...but there are no blue symbols (circles) plotted close together enough to make it look like a continuous blue line. The blue symbols being plotted with the below code, over the top of the existing plot:
symbols(temp,m,add=TRUE,circles=p, inches=1/15, ann=F, bg="steelblue2", fg=NULL)
I suggest instead of plotting a line and then plotting symbols over the top of it that you just plot a thick blue line to start with like:
plot(temp,m, xlab="Time", ylab="Depth",type='l', main=Name[k],lwd=5,col="steelblue2")

Related

Creating shades between a cluster of line plots in R

I am currently working with a line plot in R that would contain a large number of separate plots (which does make it difficult to read). What I would like to do instead is to somehow create a light-colored shade that captures the range of individual line plots. Would that be possible? This is what I have for my plot:
plot(get,Hope2, type="l",col="red", lwd="3", xlab="Cumulative CO2
emissions (TtC)", ylab="One-day maximum precipitation (mm/day)",
main="One-day maximum precipitation for South Sudan for CanESM2
scenarios")
lines(get2.teratons, Hope3, type="l", lwd="3", col="red")
lines(get3.teratons, Hope4, type="l", lwd="3", col="red")
lines(get4.teratons, Hope5, type="l", lwd="3", col="red")
So, this gives 4 separate red lines on the same plot (I am likely going to place up to 10 lines, so you can imagine how cluttered that would look without shading them in the background!). Now, let's say that I wanted to create a light red shade that fills from the upper to lower red curves. How should I go about doing that? The idea would be to capture all of the line plots by doing something like this:
http://jvoigts.scripts.mit.edu/blog/assets/plot_shaded_pretty.png
Thanks, and I would greatly appreciate any help!

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))

How to decrease padding between lines and points in R "both" type plots

I have tried to plot a series of points in R, and I use type="b" as a plot option. However, there is a lot of padding (white space) between the points and the lines between them, so much that the line disappears entirely between some points. Her is a picture of how it looks:
I have tried to make the points smaller with the cex plot option, but this does not help, as it only changes the size of the points and not where the lines between the points between these starts and ends. I do not know if this makes a difference, but the symbols I am using are pch=1.
I am interested in knowing if it is possible to reduce this padding, and how you do so. I am not interested in using type=o as a plot option instead.
Any particular reason why you don't want to use type="o"? It seems like the easiest way to get the effect you want:
# Fake data
set.seed(10)
dfs = data.frame(x=1:10, y=rnorm(10))
plot(y~x,data=dfs, type="o", pch=21, bg='white')
pch=21 is a circle marker like pch=1, but with both border and fill. We set the fill to white with bg="white" to "cover up" the lines that go through the point markers.
You can also use cex to change the marker size to avoid overlap and make the lines between nearby points visible:
set.seed(10)
dfs = data.frame(x=1:100, y=cumsum(rnorm(100)))
plot(y~x,data=dfs, type="o", pch=21, bg="white", cex=0.6)
Using a dataframe named dfs this seems to deliver a mechanism for adjusting the surrounding "white halo" to whatever size of point an halo you want by adjusting the 'cex' values of the white and black points :
plot(y~x,data=dfs, type="l")
with(dfs, points(x,y, pch=16,col="white",cex=1.4))
with(dfs, points(x,y,cex=1) )

Change tick parameters and point position of overlapping data in base R plot

I want to plot two data series with overlapping ranges onto a single plot in base r.
This is the graph I have.
I want to place the data points for each time point side-by-side, either so that data points from series 1 are slightly to the left of the ticks and series 2 slightly to the right, or so that they are in between the ticks. Is there a way to do this?
Here is my code (I have excluded that for the error bars)
plot(d$month, d$y, xaxt='n', #xaxt='n' suppresses the x-axis
pch=16, lty=1,lwd = 1.2, ylim=c(0,80), #lty = line type, pch = symbols, lwd = line width,
col='black',cex=1.2,cex.lab=1.0,cex.axis=1.0)
len = .07
axis(side = 1, at = d$month, labels=d$month)
lines(d$month, d$y, col='black') # adds connecting lines
lines(d$month, d$y2, col=200)
points(d$month, d$y2, col=200, pch=16)
You would need to subtract/add a reasonably short period from/to the data series. However, since the data points are already plotted very tight, this would give the erroneuos impression that the matching date points do in fact refer to different dates. Therefore this approach cannot be recommended
If you merely intend to avoid confusion by overplotting, you could a) connect all points (possibly filter out NA from each series to avoid the gaps), b) use bigger open (not filled) symbols for one and small closed symbols or crosses for the other series.
For the error bars and connecting lines: could use gray solid and black dashed lines (which then can be differenciated even when plotting the black on top of the gray ones). In case the error bars are symmetric, one might even decide to plot only one side for each series.

Y-axis values does not display correctly on my gap.boxplot in R

This is my script and the graph produced. I have made a gap between 7-29.8. But How can I display the y-axis values at 7 and 30? The axis only shows 1-6, instead of 0-7 , 30 as intended.
gap.boxplot(Km, gap=list(top=c(7,30), bottom=c(NA,NA), axis(side=2, at=c(0,29.8), labels= F)),
ylim=c(0,30), axis.labels=T, ylab="Km (mM)", plot=T, axe=T,
col=c("red","blue","black"))
abline(h=seq(6.99,7.157,.001), col="white")
axis.break(2, 7.1,style="slash")
You can call the axis function again to plot the marks at c(0:7,30)
axis(2,c(0:7,30)
But because you have a gap between 7 and 30, anything beyond 7 will have to be shifted in the plot. In general, a mark at position y will have to be moved downwards to y-gap.width, or y-(30-7) in your case.
So you can do this to plot your marks:
axis(2, labels=c(0:7,30), at=c(0:7,30-(30-7)))
It's hard to replicate the plot without example data. But I think this worth a try and should work.
axis(2,labels=c(0:7), at=c(0:7)) # build first gap marker '7'
then separately add the second gap marker
axis(2, labels=c(30), at=7*(1+0.01)) # the interval (0.01) could be different, test to find the best one to fit your plot

Resources