Force starting point of lines() - r

Perhaps because the question is so basic, the keywords that I can think up for this question all directs me to other things. I am trying to draw a graph with spiky curve lines that connect the medians. The real data is very big, but the starting values are duplicates of (0,0):
DATA<-data.frame(time<-c(sort(rep(c(0,2,4,8,12),4))),
conc<-c(rep(0,4),rnorm(n=4,mean=30),
rnorm(n=4,mean=10),
rnorm(n=4,mean=35),
rnorm(n=4,mean=15)))
# Create blank graph
plot(NULL,NULL,xlab="Time",ylab="Conc",
xlim=c(0,15),ylim=c(0,40),main="Example")
# Add line
require(quantreg)
require(plyr)
require(MatrixModels)
DATA<-plyr::arrange(DATA,time)
fit3<-rqss(DATA$conc~qss(DATA$time,constraint="N"),tau=0.5,data = DATA)
lines(unique(DATA$time)[-1],fit3$coef[1] + fit3$coef[-1],lwd=2)
As you can see, the line does not connect to the starting (0,0) values and instead start at the next lowest level.
I was tempted to cheat, but it does not connect to the lines and I would really prefer to work it out with the rest of the code instead of trying to pass off two lines as one:
# Cheating getaway but does not work well, segments are not connected
segments(x0=0,y0=0,x1=2,y1=30,lwd=2)
Some relevant answers that I found were not appropriate for my situation.
Line in R plot should start at a different timepoint for example suggest modifying the data, which would not help to extend my line and plus my actual data is too big that I would be wary to do this kind of manipulation. I would not want to use plot(x,y,type="l") even though it goes through the (0,0) point, because 1) it looks bad on the huge data, and 2) I would have to overlay another similar line using lines(). I wonder whether it has more to do with rqss and less with lines?
I apologize if this has already been asked before.

Related

geom_bspline across multiple plots combined into a single figure

I would like to create a ggplot2 layer that includes multiple geom_bspline(), or something similar, to point to regions on different plots after combining them into a single figure. A feature in the data seen in one plot appears in another plot after a transformation. However, it may not be clear to a non-expert they are due to the same phenomenon. The plots are to be combined into a single figure using ggarrange(), cowplot(), patchwork() or something similar.
I can get by using ggforce::geom_ellipse() on each plot but it's not as clean. Any suggestions?
Of course, after asking the question and staring at the figure in question, it came to me that I simply need to add a geom_bspline() to the combined figure. Tried that earlier but didn't give enough thought to the coordinates on the new layer. The coordinates of the spline are given in the range of 0 to 1 for both the x and y values on this new layer. Simple and obvious.

Issues with combining different (continuous and ordinal) plot types into one plot

I am preparing a figure for a paper presenting data for 2 different experiments in one plot. For that reason I don't need a legend for every plot, so I try to combine them with ggdraw from cowplot.
My code
should generate a reproducible example
and gives this output:
It seems like the two figures get the same slot (A) and the legend gets slot (B). Typically, I would probably use facet wrap to plot them together (which should also guarantee that the scaling/legend is consistent across the two plots.), but that will probably not work in this case, as I am trying to add an additional figure type to C and D.
The problem is that this figure type is ordinal so I have used a somewhat “hacky” approach to plot it, giving me this figure looking essentially as I want it to:
I so far have not been able to extract to another element that ggdraw can use.
Ideally the final plot should roughly look like this (of course with different labels):
How would you go about plotting these different types together?
Thank you for taking time to read my question and I hope that you can help me. I now it is quite a mouth full, but I was not sure how I meaningfully could reduce it to smaller chunks.

Creating a grid on a map in R using grid points

I'm clearly struggling with this problem for a day now and can't seem to find a nice solution to it. I would really appreciate some help and I'm really a novice in R (since last week).
Problem 1:
I have a set CSV representing grid points which I can parse into a data frame (pointname, latitude, longitude).
Eg:
name,latitude,longitude
x0y0,35.9767,-122.605
x1y0,35.9767,-122.594
x2y0,35.9767,-122.583
x0y1,35.9857,-122.605
x1y1,35.9857,-122.594
x2y1,35.9857,-122.583
x0y2,35.9947,-122.605
x1y2,35.9947,-122.594
x2y2,35.9947,-122.583
The points in this file represent the lower left corner and are arranged in row major format, meaning lowest horizontal grid points first. Each point is a certain great circle distance away from its neighbors (1km). I want to create a grid overlay on a map which I've plotted using ggmap.
What I've tried or considered:
map.grid() - this is really not useful to me as I'm not looking for any kind of projection.
geom_vline() and geom_hline(). These look good but I don't have constant x and y intercepts on a plane. Moreover, once I create a grid, I'd like to use the grid to color against a density.
geom_rect() and geom_tile(). These look really promising and may be what I want. But I'm not able to find a good way of working with these.
I'd like to fill these grid boxes later with another parameter. Any suggestions on how I can create such a grid? This may be a trivial question but I don't know a lot of R yet.
Problem 2:
How can I store or hold such a grid so that I given a point (lat,lon), I can quickly get to that grid. In fact my whole back end is in C++ and can directly output the grid name x<n>y<n> directly against a given search point. I somehow am finding it difficult to count such points against grid points so that I can fill grid with a representative color.
I'm not sure if everything of what I'm saying is clear. Please tell me if I've to clarify something.
Also note that I've Googled quite a lot and not found relevant answers although some looked close.
Eg: This, ThisToo
Thanks for the help!

R plotting strangeness with large dataset

I have a data frame with several million points in it - each having two values.
When I plot this like this:
plot(myData)
All the points are plotted, but the plot is quite busy, so I thought I'd plot it as a line:
plot(myData, type="l")
But while the x axis doesn't change (i.e. goes from 0 to 7e+07), the actual plotting stops at about 3e+07 and I don't actually get a proper line plot either.
Is there a limitation on line plotting?
Update
If I use
plot(myData, type="h")
I get correct and useable output, but I still wonder why the type="l" option fails so badly.
Further update
I am plotting a time series - here is one output using type="h":
That's perfectly usable, but having a line would allow me to compare several outputs.
High dimensional data graphic representation is growing issue in data analysis. The problem, actually, is not create the graph. The problem is make the graph capable of communicate information that we could transform in useful knowledge. Allow me to present an example to produce this point, by considering a data with a million observations, that is, not that big.
x <- rnorm(10^6, 0, 1)
y <- rnorm(10^6, 0, 1)
Let's plot it. R can yes easily manage such a problem. But can we? Probably not.
Afterall, what kind of information can we deduce from an ink hard stain? Probably, no more than a tasseographyst trying to divinate the future in patterns of tea leaves, coffee grounds, or wine sediments.
plot(x, y)
A different approach is represented by the smoothScatter function. It creates a density plot of bivariate data. There, we create two examples.
First, with defaults.
smoothScatter(x, y)
Second, the bandwidth was specified to be a little larger than the default, and five points are specified to be shown using a different symbol pch = 3.
smoothScatter(x, y, bandwidth=c(5,1)/(1/3), nrpoints=5, pch=3)
As you can see, the problem is not solved. Nevertheless, we can have a better grasp on the distribution of our data. This kind of approach is still in development, and there are several matters that are discussed and evolved. If this approach represents a more suitable approach to represent your big dataset, I suggest you to visit this blog that discuss throughfully the issue.
For what it's worth, all the evidence I have is that is computer - even though it was a lump of big iron - ran out of memory.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources