Create data with function inside shinyServer to feed ggplot - r

I have a Shiny app what calculates some power estimates for a type of genetic association study. The ui.R is pretty simple, and the server.R has a function that gives a data frame (I think I can't have this function as reactive because it has some parameters).
The link to the Gist is here. To run it:
library(shiny)
shiny:: runGist('5895082')
The app calculates correctly the estimates, but I have two questions regarding it:
Is it possible to have the output$powTable actually represent all the values contained within the range, in the first sliderInput(n.cases)?. It only seems to represent the two extreme values of the range... what I'm doing wrong?
There's an error when running the app:
Error: Reading objects from shinyoutput object not allowed.
How can I pass the data (reactivity?) from the function f() to feed the ggplot? After much trial and error, I am very lost. Where can be the error in my code? Many thaks in advance!
The original code of the function works well: (EDITED)
f <- function(ncases, p0, OR.cas.ctrl, Nh, sig.level) {
num.cases <- ncases
p0 <- p0
Nh <- Nh
OR.cas.ctrl <- OR.cas.ctrl
sig.level <- sig.level
# Parameters related to sig.level, from [Table 2] of Samuels et al.
# For 90% power and alpha = .05, Nscaled = 8.5
if (sig.level == 0.05){
A <- -28 # Parameter A for alpha=.05
x0 <- 2.6 # Parameter x0 for alpha=.05
d <- 2.4 # Parameter d for alpha=.05
}
if (sig.level == 0.01){
A <- -13 # Parameter A for alpha=.01
x0 <- 5 # Parameter x0 for alpha=.01
d <- 2.5 # Parameter d for alpha=.01
}
if (sig.level == 0.001){
A <- -7 # Parameter A for alpha=.001
x0 <- 7.4 # Parameter x0 for alpha=.001
d <- 2.8 # Parameter d for alpha=.001
}
out.pow <- NULL # initialize vector
for(ncases in ncases){
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR
OR <- OR.ctrl.cas
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3].
p1 <- bracket.pw / (1 + bracket.pw)
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED
num.n <- num.cases*((p1-p0)^2)
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037
Nscaled <- num.n/den.n
num.power <- A - 100 # 3. CALCULATE POWER
den.power <- 1 + exp((Nscaled - x0)/d)
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha
}
OR <- OR.cas.ctrl
out.pow <- data.frame(num.cases, Nh, Nscaled, p0, OR, sig.level, power)
out.pow
}
mydata <- f(ncases=seq(50,1000, by=50), 0.4, 2.25, 11, 0.05)
mydata
library(ggplot2)
print(ggplot(data = mydata, aes(num.cases, power)) +
theme_bw() +
theme(text=element_text(family="Helvetica", size=12)) +
labs(title = "Ad-hoc power for haplogroup") +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab("number of cases/controls") +
ylab("power") +
scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))

First of all, you have n.cases named inconsistently I think. It's n.cases sometimes, and ncases other times. Is that a mistake?
Anyway, output$mydata() is incorrect. It isn't an output. It should be just:
mydata <- reactive(f(input$n.cases,
input$p0,
input$OR.cas.ctrl,
input$Nh,
input$sig.level))
And then when executing it in output$powHap() it should be:
output$powHap <- renderPlot(
{
print(ggplot(data = mydata(), aes(ncases, power)) +
theme_bw() +
theme(text=element_text(family="Helvetica", size=12)) +
labs(title = "Ad-hoc power for haplogroup") +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab("number of cases/controls") +
ylab("power") +
scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))
})
The important part there is that you need to do:
data = mydata()
rather than
data = output$mydata
Because output$mydata is a (reactive) function.
I would suggest reading the documentation on how reactives work. The whole thing should make a lot more sense afterwards. +1 for a very reproducible example by the way. This is how all questions should be posted.

Related

R - create tomography plot with ggplot

I have the following graph:
set.seed(123456)
test1_1 <- round(rnorm(20,mean=40,sd=5),0)/100
test1_2 <- round(rnorm(20,mean=60,sd=5),0)/100
test.data <- as.data.frame(cbind(test1_1,test1_2))
test <- ggplot(test.data, aes(test1_1,test1_2))+
geom_point()+
scale_y_continuous(limits = c(0, 1)) +
scale_x_continuous(limits = c(0, 1)) + # OP missing `+`
abline(0.5,0.5)
test
Now I have point, which are created with the following formular:
line <- function(beta_2, test1_1,test1_2){
beta_1 = (test1_2/(1-test1_1))-(test1_1/(1-test1_1))*beta_2
return(beta_1)}
output1 <- as.data.frame(matrix(0,20,1))
beta_2 <- 1
for (i in 1:nrow(test.data)){
output1[i,] <- line(beta_2,test.data[i,1],test.data[i,2])
}
output2 <- as.data.frame(matrix(0,20,1))
beta_2 <- 0
for (i in 1:nrow(ei.data)){
output2[i,] <- line(beta_2,test.data[i,1],test.data[i,2])
}
output <- cbind(output1,output2)
I would like the add the data in the second data frame as lines in the plot created above (always connect the points per one row). However, using
abline(output[1,1],output[1,2])
does not work. How could I achieve this?
abline() is syntax for base R, for ggplot2 you need to use geom_abline(output[1,1],output[1,2])
ggplot(test.data, aes(test1_1,test1_2)) +
geom_point()+
scale_y_continuous(limits = c(0, 1))+
scale_x_continuous(limits = c(0, 1)) +
geom_abline(slope = output[1,1],intercept = output[1,2])
Note: Your reproducible had an error in which ei.data was used instead of test.data, and part of your ggplot call was missing a + sign, I only point this out so others (or you) do not get caught in a separate error without realizing it.

Changing Color in ggplot2 Scatterplots

I'm attempting to modify some existing code that was originally from the question found here (https://stats.stackexchange.com/questions/76999/simulating-longitudinal-lognormal-data-in-r), and used to demonstrate Scatterplots in R at the following website: https://hopstat.wordpress.com/2014/10/30/my-commonly-done-ggplot2-graphs/
It's a simple and stupid question, but I've been struggling with it all morning. The following code gives a nice black and white scatterplot. I want to modify the code to make the lines a very light gray.
library(MASS)
library(nlme)
library(plyr)
library(ggplot2)
### set number of individuals
n <- 200
### average intercept and slope
beta0 <- 1.0
beta1 <- 6.0
### true autocorrelation
ar.val <- .4
### true error SD, intercept SD, slope SD, and intercept-slope cor
sigma <- 1.5
tau0 <- 2.5
tau1 <- 2.0
tau01 <- 0.3
### maximum number of possible observations
m <- 10
### simulate number of observations for each individual
p <- round(runif(n,4,m))
### simulate observation moments (assume everybody has 1st obs)
obs <- unlist(sapply(p, function(x) c(1, sort(sample(2:m, x-1,
replace=FALSE)))))
### set up data frame
dat <- data.frame(id=rep(1:n, times=p), obs=obs)
### simulate (correlated) random effects for intercepts and slopes
mu <- c(0,0)
S <- matrix(c(1, tau01, tau01, 1), nrow=2)
tau <- c(tau0, tau1)
S <- diag(tau) %*% S %*% diag(tau)
U <- mvrnorm(n, mu=mu, Sigma=S)
### simulate AR(1) errors and then the actual outcomes
dat$eij <- unlist(sapply(p, function(x) arima.sim(model=list(ar=ar.val),
n=x) * sqrt(1-ar.val^2) * sigma))
dat$yij <- (beta0 + rep(U[,1], times=p)) + (beta1 + rep(U[,2], times=p)) *
log(dat$obs) + dat$eij
dat = ddply(dat, .(id), function(x){
x$alpha = ifelse(runif(n = 1) > 0.9, 1, 0.1)
x$grouper = factor(rbinom(n=1, size =3 ,prob=0.5), levels=0:3)
x
})
tspag = ggplot(dat, aes(x=obs, y=yij)) +
geom_line() + guides(colour=FALSE) + xlab("Observation Time Point") +
ylab("Y")
spag = tspag + aes(colour = factor(id))
spag
bwspag = tspag + aes(group=factor(id))
bwspag
I've tried scale_colour_manual, I've tried defining the color within the aes statement on the bwspag line...no luck. I'm relatively inexperienced with R. I appreciate any assistance!
Do you want to make the line in grayscale? If yes, then adding colour in geom_line() function should be enough. For example:
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_line(colour = "gray40")
You can choose other values with gray: from 0 to 100. More info here.

ggplot2: How to plot an orthogonal regression line?

I have tested a large sample of participants on two different tests of visual perception – now, I'd like to see to what extent performance on both tests correlates.
To visualise the correlation, I plot a scatterplot in R using ggplot() and I fit a regression line (using stat_smooth()). However, since both my x and y variable are performance measures, I need to take both of them into account when fitting my regression line – thus, I cannot use a simple linear regression (using stat_smooth(method="lm")), but rather need to fit an orthogonal regression (or Total least squares). How would I go about doing this?
I know I can specify formula in stat_smooth(), but I wouldn't know what formula to use. From what I understand, none of the preset methods (lm, glm, gam, loess, rlm) are applicable.
It turns out that you can extract the slope and intercept from principal components analysis on (x,y), as shown here. This is just a little simpler, runs in base R, and gives the identical result to using Deming(...) in MethComp.
# same `x and `y` as #user20650's answer
df <- data.frame(y, x)
pca <- prcomp(~x+y, df)
slp <- with(pca, rotation[2,1] / rotation[1,1])
int <- with(pca, center[2] - slp*center[1])
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp, intercept=int, color="blue")
Caveat: not familiar with this method
I think you should be able to just pass the slope and intercept to geom_abline to produce the fitted line. Alternatively, you could define your own method to pass to stat_smooth (as shown at the link smooth.Pspline wrapper for stat_smooth (in ggplot2)). I used the Deming function from the MethComp package as suggested at link How to calculate Total least squares in R? (Orthogonal regression).
library(MethComp)
library(ggplot2)
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Deming regression
mod <- Deming(x,y)
# Define functions to pass to stat_smooth - see mnel's answer at link for details
# Defined the Deming model output as class Deming to define the predict method
# I only used the intercept and slope for predictions - is this correct?
f <- function(formula,data,SDR=2,...){
M <- model.frame(formula, data)
d <- Deming(x =M[,2],y =M[,1], sdr=SDR)[1:2]
class(d) <- "Deming"
d
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.Deming <- function(model, xseq, se, level) {
pred <- model %*% t(cbind(1, xseq) )
data.frame(x = xseq, y = c(pred))
}
ggplot(data.frame(x,y), aes(x, y)) + geom_point() +
stat_smooth(method = f, se= FALSE, colour='red', formula=y~x, SDR=1) +
geom_abline(intercept=mod[1], slope=mod[2], colour='blue') +
stat_smooth(method = "lm", se= FALSE, colour='green', formula = y~x)
So passing the intercept and slope to geom_abline produces the same fitted line (as expected). So if this is the correct approach then imo its easier to go with this.
The MethComp package seems to be no longer maintained (was removed from CRAN).
Russel88/COEF allows to use stat_/geom_summary with method="tls" to add an orthogonal regression line.
Based on this and wikipedia:Deming_regression I created the following functions, which allow to use noise ratios other than 1:
deming.fit <- function(x, y, noise_ratio = sd(y)/sd(x)) {
if(missing(noise_ratio) || is.null(noise_ratio)) noise_ratio <- eval(formals(sys.function(0))$noise_ratio) # this is just a complicated way to write `sd(y)/sd(x)`
delta <- noise_ratio^2
x_name <- deparse(substitute(x))
s_yy <- var(y)
s_xx <- var(x)
s_xy <- cov(x, y)
beta1 <- (s_yy - delta*s_xx + sqrt((s_yy - delta*s_xx)^2 + 4*delta*s_xy^2)) / (2*s_xy)
beta0 <- mean(y) - beta1 * mean(x)
res <- c(beta0 = beta0, beta1 = beta1)
names(res) <- c("(Intercept)", x_name)
class(res) <- "Deming"
res
}
deming <- function(formula, data, R = 100, noise_ratio = NULL, ...){
ret <- boot::boot(
data = model.frame(formula, data),
statistic = function(data, ind) {
data <- data[ind, ]
args <- rlang::parse_exprs(colnames(data))
names(args) <- c("y", "x")
rlang::eval_tidy(rlang::expr(deming.fit(!!!args, noise_ratio = noise_ratio)), data, env = rlang::current_env())
},
R=R
)
class(ret) <- c("Deming", class(ret))
ret
}
predictdf.Deming <- function(model, xseq, se, level) {
pred <- as.vector(tcrossprod(model$t0, cbind(1, xseq)))
if(se) {
preds <- tcrossprod(model$t, cbind(1, xseq))
data.frame(
x = xseq,
y = pred,
ymin = apply(preds, 2, function(x) quantile(x, probs = (1-level)/2)),
ymax = apply(preds, 2, function(x) quantile(x, probs = 1-((1-level)/2)))
)
} else {
return(data.frame(x = xseq, y = pred))
}
}
# unrelated hlper function to create a nicer plot:
fix_plot_limits <- function(p) p + coord_cartesian(xlim=ggplot_build(p)$layout$panel_params[[1]]$x.range, ylim=ggplot_build(p)$layout$panel_params[[1]]$y.range)
Demonstration:
library(ggplot2)
#devtools::install_github("Russel88/COEF")
library(COEF)
fix_plot_limits(
ggplot(data.frame(x = (1:5) + rnorm(100), y = (1:5) + rnorm(100)*2), mapping = aes(x=x, y=y)) +
geom_point()
) +
geom_smooth(method=deming, aes(color="deming"), method.args = list(noise_ratio=2)) +
geom_smooth(method=lm, aes(color="lm")) +
geom_smooth(method = COEF::tls, aes(color="tls"))
Created on 2019-12-04 by the reprex package (v0.3.0)
For anyone who is interested, I validated jhoward's solution against the deming::deming() function, as I was not familiar with jhoward's method of extracting the slope and intercept using PCA. They indeed produce identical results. Reprex is:
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Make data.frame()
df <- data.frame(x,y)
# Get intercept and slope using deming::deming()
library(deming)
mod_Dem <- deming::deming(y~x,df)
slp_Dem <- mod_Dem$coefficients[2]
int_Dem <- mod_Dem$coefficients[1]
# Get intercept and slope using jhoward's method
pca <- prcomp(~x+y, df)
slp_jhoward <- with(pca, rotation[2,1] / rotation[1,1])
int_jhoward <- with(pca, center[2] - slp_jhoward*center[1])
# Plot both orthogonal regression lines and simple linear regression line
library(ggplot2)
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp_jhoward, intercept=int_jhoward, color="blue", lwd = 3) +
geom_abline(slope=slp_Dem, intercept=int_Dem, color = "white", lwd = 2, linetype = 3)
Interestingly, if you switch the order of x and y in the models (i.e., to mod_Dem <- deming::deming(x~y,df) and pca <- prcomp(~y+x, df)) , you get completely different slopes:
My (very superficial) understanding of orthogonal regression was that it does not treat either variable as independent or dependent, and thus that the regression line should be unaffected by how the model is specified, e.g., as y~x vs x~y. Clearly I was very much mistaken, and I would be interested to hear anyone's thoughts about exactly why I was so wrong.

How to plot a contour line showing where 95% of values fall within, in R and in ggplot2

Say we have:
x <- rnorm(1000)
y <- rnorm(1000)
How do I use ggplot2 to produce a plot containing the two following geoms:
The bivariate expectation of the two series of values
A contour line showing where 95% of the estimates fall within?
I know how to do the first part:
df <- data.frame(x=x, y=y)
p <- ggplot(df, aes(x=x, y=y))
p <- p + xlim(-10, 10) + ylim(-10, 10) # say
p <- p + geom_point(x=mean(x), y=mean(y))
And I also know about the stat_contour() and stat_density2d() functions within ggplot2.
And I also know that there are 'bins' options within stat_contour.
However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.
I have also seen a solution within the graphics package. However, I would like to do this within ggplot.
Help much appreciated,
Jon
Unfortunately, the accepted answer currently fails with Error: Unknown parameters: breaks on ggplot2 2.1.0. I cobbled together an alternative approach based on the code in this answer, which uses the ks package for computing the kernel density estimate:
library(ggplot2)
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
kd <- ks::kde(d, compute.cont=TRUE)
contour_95 <- with(kd, contourLines(x=eval.points[[1]], y=eval.points[[2]],
z=estimate, levels=cont["5%"])[[1]])
contour_95 <- data.frame(contour_95)
ggplot(data=d, aes(x, y)) +
geom_point() +
geom_path(aes(x, y), data=contour_95) +
theme_bw()
Here's the result:
TIP: The ks package depends on the rgl package, which can be a pain to compile manually. Even if you're on Linux, it's much easier to get a precompiled version, e.g. sudo apt install r-cran-rgl on Ubuntu if you have the appropriate CRAN repositories set up.
Riffing off of Ben Bolker's answer, a solution that can handle multiple levels and works with ggplot 2.2.1:
library(ggplot2)
library(MASS)
library(reshape2)
# create data:
set.seed(8675309)
Sigma <- matrix(c(0.1,0.3,0.3,4),2,2)
mv <- data.frame(mvrnorm(4000,c(1.5,16),Sigma))
# get the kde2d information:
mv.kde <- kde2d(mv[,1], mv[,2], n = 400)
dx <- diff(mv.kde$x[1:2]) # lifted from emdbook::HPDregionplot()
dy <- diff(mv.kde$y[1:2])
sz <- sort(mv.kde$z)
c1 <- cumsum(sz) * dx * dy
# specify desired contour levels:
prob <- c(0.95,0.90,0.5)
# plot:
dimnames(mv.kde$z) <- list(mv.kde$x,mv.kde$y)
dc <- melt(mv.kde$z)
dc$prob <- approx(sz,1-c1,dc$value)$y
p <- ggplot(dc,aes(x=Var1,y=Var2))+
geom_contour(aes(z=prob,color=..level..),breaks=prob)+
geom_point(aes(x=X1,y=X2),data=mv,alpha=0.1,size=1)
print(p)
The result:
This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
kk <- MASS::kde2d(x,y)
dx <- diff(kk$x[1:2])
dy <- diff(kk$y[1:2])
sz <- sort(kk$z)
c1 <- cumsum(sz) * dx * dy
approx(c1, sz, xout = 1 - prob)$y
}
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
stat_density2d(geom="tile", aes(fill = ..density..),
contour = FALSE)+
stat_density2d(colour="red",breaks=L95)
(with help from http://comments.gmane.org/gmane.comp.lang.r.ggplot2/303)
update: with a recent version of ggplot2 (2.1.0) it doesn't seem possible to pass breaks to stat_density2d (or at least I don't know how), but the method below with geom_contour still seems to work ...
You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:
kk <- with(dd,MASS::kde2d(x,y))
library(reshape2)
dimnames(kk$z) <- list(kk$x,kk$y)
dc <- melt(kk$z)
ggplot(dc,aes(x=Var1,y=Var2))+
geom_tile(aes(fill=value))+
geom_contour(aes(z=value),breaks=L95,colour="red")
doing the 95% level computation from the kk grid (to reduce the number of kernel computations to 1) is left as an exercise
I'm not sure why stat_density2d(geom="tile") and geom_tile give slightly different results (the former is smoothed)
I haven't added the bivariate mean, but something like annotate("point",x=mean(d$x),y=mean(d$y),colour="red") should work.
I had an example where the MASS::kde2d() bandwidth specifications were not flexible enough, so I ended up using the ks package and the ks::kde() function and, as an example, the ks::Hscv() function to estimate flexible bandwidths that captured the smoothness better. This computation can be a bit slow, but it has much better performance in some situations. Here is a version of the above code for that example:
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
kk <- MASS::kde2d(x,y)
dx <- diff(kk$x[1:2])
dy <- diff(kk$y[1:2])
sz <- sort(kk$z)
c1 <- cumsum(sz) * dx * dy
approx(c1, sz, xout = 1 - prob)$y
}
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
stat_density2d(geom="tile", aes(fill = ..density..),
contour = FALSE)+
stat_density2d(colour="red",breaks=L95)
## using ks::kde
hscv1 <- Hscv(d)
fhat <- ks::kde(d, H=hscv1, compute.cont=TRUE)
dimnames(fhat[['estimate']]) <- list(fhat[["eval.points"]][[1]],
fhat[["eval.points"]][[2]])
library(reshape2)
aa <- melt(fhat[['estimate']])
ggplot(aa, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_contour(aes(z=value), breaks=fhat[["cont"]]["50%"], color="red") +
geom_contour(aes(z=value), breaks=fhat[["cont"]]["5%"], color="purple")
For this particular example, the differences are minimal, but in an example where the bandwidth specification requires more flexibility, this modification may be important. Note that the 95% contour is specified using the breaks=fhat[["cont"]]["5%"], which I found a little bit counter-intuitive, because it is called here the "5% contour".
Just mixing answers from above, putting them in a more tidyverse friendly way, and allowing for multiple contour levels. I use here geom_path(group=probs), adding them manually geom_text. Another approach is to use geom_path(colour=probs) which will automatically label the contours as legend.
library(ks)
library(tidyverse)
set.seed(1001)
## data
d <- MASS::mvrnorm(1000, c(0, 0.2), matrix(c(1, 0.4, 1, 0.4), ncol=2)) %>%
magrittr::set_colnames(c("x", "y")) %>%
as_tibble()
## density function
kd <- ks::kde(d, compute.cont=TRUE, h=0.2)
## extract results
get_contour <- function(kd_out=kd, prob="5%") {
contour_95 <- with(kd_out, contourLines(x=eval.points[[1]], y=eval.points[[2]],
z=estimate, levels=cont[prob])[[1]])
as_tibble(contour_95) %>%
mutate(prob = prob)
}
dat_out <- map_dfr(c("10%", "20%","80%", "90%"), ~get_contour(kd, .)) %>%
group_by(prob) %>%
mutate(n_val = 1:n()) %>%
ungroup()
## clean kde output
kd_df <- expand_grid(x=kd$eval.points[[1]], y=kd$eval.points[[2]]) %>%
mutate(z = c(kd$estimate %>% t))
ggplot(data=kd_df, aes(x, y)) +
geom_tile(aes(fill=z)) +
geom_point(data = d, alpha = I(0.4), size = I(0.4), colour = I("yellow")) +
geom_path(aes(x, y, group = prob),
data=filter(dat_out, !n_val %in% 1:3), colour = I("white")) +
geom_text(aes(label = prob), data =
filter(dat_out, (prob%in% c("10%", "20%","80%") & n_val==1) | (prob%in% c("90%") & n_val==20)),
colour = I("black"), size =I(3))+
scale_fill_viridis_c()+
theme_bw() +
theme(legend.position = "none")
Created on 2019-06-25 by the reprex package (v0.3.0)

How do I plot lines and points with limited points?

I am trying to replot the following figure in a more legible way. Observe that I am trying to plot both lines and points. However, the number of points being printed is way too many and the line is getting covered up. Is there a way I can plot:
Different lines for different datasets
Different points shapes for different datasets but limit the number of points to say 30-50
Add the line and point information to the legend
My plotting code is here (It was too big for SO)
Do you need something like this?
transData$Type2 <- factor(transData$Type, labels = c("Some Info for P", "Some Info for Q"))
ggplot(transData, aes(x=Value, y=ecd)) +
geom_line(aes(group=Type2,colour=Type2, linetype=Type2), size=1.5) +
geom_point(aes(shape = Type2), data = transData[round(seq(1, nrow(transData), length = 30)), ], size = 5) +
opts(legend.position = "top", legend.key.width = unit(3, "line"))
You can plot large, partially transparent points: the denser areas will appear darker.
p <- ggplot(transData, aes(x=Value, y=ecd, group=Type))
p +
geom_point(size=20, colour=rgb(0,0,0,.02)) +
geom_line(aes(colour=Type), size=3)
The following code adds points more or less evenly spaced, though they're not necessarily actual data points (could be interpolated),
barbedize <- function(x, y, N=10, ...){
ind <- order(x)
x <- x[ind]
y <- y[ind]
lengths <- c(0, sqrt(diff(x)^2 + diff(y)^2))
l <- cumsum(lengths)
tl <- l[length(l)]
el <- seq(0, to=tl, length=N+1)[-1]
res <-
sapply(el[-length(el)], function(ii){
int <- findInterval(ii, l)
xx <- x[int:(int+1)]
yy <- y[int:(int+1)]
dx <- diff(xx)
dy <- diff(yy)
new.length <- ii - l[int]
segment.length <- lengths[int+1]
ratio <- new.length / segment.length
xend <- x[int] + ratio * dx
yend <- y[int] + ratio * dy
c(x=xend, y=yend)
})
as.data.frame(t(res))
}
library(plyr)
few_points <- ddply(transData, "Type", function(d, ...)
barbedize(d$Value, d$ecd, ...), N=10)
ggplot(transData, aes(x=Value, y=ecd)) +
geom_line(aes(group=Type,colour=Type, linetype=Type), size=1) +
geom_point(aes(x=x,y=y, colour=Type, shape=Type), data=few_points, size=3)
(this is a quick and dirty proof-of-principle, barbedize should be cleaned up and written more efficiently...)

Resources