Combining LOESS and Quantreg to caculate percentiles/quantiles for data - r

I am trying to calculate percentiles or quantils for data which considerably scatter.
Using the Loess function the mean is nicely presented, however, I cannot get percentile/quantils from this function.
I tried to combine quantreg with loess. This plot shows linear curves instead of loess smoothed curves.
I would like to get a result similar to this:
data(cars)
plot(cars)
lmodel <- loess(cars$dist~cars$speed,span = 0.3, degree = 1)
lpred<-predict(lmodel, newdata= 5:25,se=TRUE)
lines(5:25, lpred$fit,col='#000066',lwd=4)
lines(5:25, lpred$fit - qt(0.975, lpred$df)*lpred$se, lty=2)
lines(5:25, lpred$fit + qt(0.975, lpred$df)*lpred$se, lty=2)
#### combination of quantreg with loess
plot(cars$speed,cars$dist)
xx <- seq(min(cars$speed),max(cars$speed),1)
f <- coef(rq(loess(cars$dist~cars$speed,span = 0.3, degree = 1), tau=c(0.1,0.25,0.5,0.75,0.9)) )
yy <- cbind(1,xx)%*%f
for(i in 1:length(taus)){
lines(xx,yy[,i],col = "gray")
}
I also tried the suggested code, however, I could not change the settings of the smoothing. The lines showed wavy path.
library(quantreg)
data(cars)
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9, degree = 1)
rqmodel <- rq(lmodel, tau = taus, data = cars)
f <- coef(rqmodel)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
yy <- predict(rqmodel)
plot(cars)
matlines(xx, yy, col = "grey",lwd=3)
The Loess function does not provide data for quantiles as the rg would.
However, the Loess functions allows to get a curve without zigzag.
Please see the code snip. What would be the setting for tau=0.5 using the rg function to produce the same results compared with Loess function.
data(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9 )
plot(cars)
lines( x=4:25 , y=predict(lmodel, newdata= data.frame(speed=4:25)) ,col="Blue")

I believe the code in the question is mixing loess and quantile regressions when they are different methods and the latter does not need the former.
I will try to fit both and plot the respective results.
In the code below I will use matlines, not a for loop.
These code lines are common.
library(quantreg)
data(cars)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
First the loess model.
lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
ls_yy <- cbind(ls_yy$fit,
ls_yy$fit - 2*ls_yy$se.fit,
ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")
Now quantile regression.
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
rqmodel <- rq(dist ~ speed, tau = taus, data = cars)
rq_yy <- predict(rqmodel)
plot(cars)
matlines(xx, rq_yy, col = "darkgrey")

The code below (taken from an "answer") is not correct and should not be included in a correct solution. This would provide a 95% confidence interval on a fit, and the probability that interval lands on the true trend line. It does not correspond to a quantile computed from the data within the span of this moving average. A normal based approximation as recommended would require multiplying ls_yy$se.fit by sqrt(ni) where ni is the number of observations in the particular span. Unfortunately loess does not return ni, so this is not a tenable solution unless the span covers the entire dataset and ni can be set to n and there is no heteroskedasticity.
data(cars)
plot(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
#wrong - this does not denote quantiles for the input data:
ls_yy <- cbind(ls_yy$fit,
ls_yy$fit - 2*ls_yy$se.fit,
ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")
We can make this more obvious using a sample dataset with more observations. Samples 1 and 2 are identical, aside from their sample sizes (500 and 1500 observations), and therefore they should have very similar quantiles.
set.seed(1)
x1 = runif(500,0,10)
y1 = x1 + rnorm(length(x1))
x2 = runif(1500,0,10)
y2 = x1 + rnorm(length(x2))
dfpd = data.frame(x=1:9)
lmodel1 <- loess(y ~ x, data = data.frame(x=x1,y=y1), span = 0.5, degree = 1)
ls_yy1 <- predict(lmodel1, newdata=dfpd, se = TRUE)
lmodel2 <- loess(y ~ x, data = data.frame(x=x2,y=y2), span = 0.5, degree = 1)
ls_yy2 <- predict(lmodel2, newdata=dfpd, se = TRUE)
#the only difference between lmodel1 and lmodel2 is the number of observations
#the quantiles should be very similar, but their se values are a function of sample
#size and are thus quite different
ls_yy1$se
ls_yy2$se
ls_yy1$se / ls_yy2$se
We can see that the ratio of se values is around 60% which confirms that they cannot be used as-is for quantile calculations

Related

How do I get a smooth curve from a few data points, in R?

I am trying to plot the rate 1/t as it changes with mue. The code is given below and I have highlighted the relevant lines with input and output.
library("deSolve")
library("reshape")
library("tidyverse")
Fd <- data.frame()
MUES <- c(100, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 100010, 100020, 100050, 100060, 100080, 100090, 100100, 100500) # <------ THIS IS THE INPUT
for (i in 1:length(MUES)){
parameters <- c(tau = 0.005, tau_r = 0.0025, mui=0, Ve=0.06, Vi=-0.01, s=0.015, mue=MUES[i])
state <- c(X = 0.015, Y = 0)
Derivatives <-function(t, state, parameters) {
#cc <- signal(t)
with(as.list(c(state, parameters)),{
# rate of change
dX <- -(1/tau + mue - mui)*X + (Y-X)/tau_r + mue*Ve - mui*Vi
dY <- -Y/tau + (X-Y)/tau_r
# return the rate of change
list(c(dX, dY))
}) # end with(as.list ...
}
times <- seq(0, 0.1, by = 0.0001)
out <- ode(y = state, times = times, func = Derivatives, parms = parameters)
out.1 <- out %>%
as.data.frame() %>% summarise(d = min(times[Y >=0.015]))
Time <- out.1$d
localdf <- data.frame(t=Time, rate= 1/Time, input=MUES[i])
Fd <- rbind.data.frame(Fd, localdf)}. # <----- THIS IS THE DATAFRAME WITH OUTPUT AND INPUT
spline_int <- as.data.frame(spline(Fd$input, Fd$rate))
ggplot(Fd) +
geom_point(aes(x = input, y = rate), size = 3) +
geom_line(data = spline_int, aes(x = x, y = y))
The rate 1/t has a limiting value at 1276 and thats why I have taken quite a few values of mue in the end, to highlight this. I get a graph like this:
What I want is something like below, so I can highlight the fact that the rate 1/t doesn't grow to infinity and infact has a limiting value. The below figure is from the Python question.
How do I accomplish this in R? I have tried loess and splines and geom_smooth (but just with changing span), perhaps I am missing something obvious.
Splines are polynomials with multiple inflection points. It sounds like you instead want to fit a logarithmic curve:
# fit a logarithmic curve with your data
logEstimate <- lm(rate~log(input),data=Fd)
# create a series of x values for which to predict y
xvec <- seq(0,max(Fd$input),length=1000)
# predict y based on the log curve fitted to your data
logpred <- predict(logEstimate,newdata=data.frame(input=xvec))
# save the result in a data frame
# these values will be used to plot the log curve
pred <- data.frame(x = xvec, y = logpred)
ggplot() +
geom_point(data = Fd, size = 3, aes(x=input, y=rate)) +
geom_line(data = pred, aes(x=x, y=y))
Result:
I borrowed some of the code from this answer.

How to plot vector of bootstrapped slopes in ggplot2?

I've been using ggplot2 to plot the results of bootstrapping various statistical outputs such as correlation coefficients. Most recently, I bootstrapped the slope of a linear regression model. Here's how that looks using the plot() function from the graphics package:
plot(main="Relationship Between Eruption Length at Wait Time at \n
Old Faithful With Bootstrapped Regression Lines",
xlab = "Eruption Length (minutes)",
ylab = "Wait Time (minutes)",
waiting ~ eruptions,
data = faithful,
col = spot_color,
pch = 19)
index <- 1:nrow(faithful)
for (i in 1:10000) {
index_boot <- sample(index, replace = TRUE) #getting a boostrap sample (of indices)
faithful_boot <- faithful[index_boot, ]
# Fitting the linear model to the bootstrapped data:
fit.boot <- lm(waiting ~ eruptions, data = faithful_boot)
abline(fit.boot, lwd = 0.1, col = rgb(0, 0.1, 0.25, alpha = 0.05)) # Add line to plot
}
fit <- lm(waiting ~ eruptions, data=faithful)
abline(fit, lwd = 2.5, col = "blue")
That works, but it depends on a workflow where we first create a plot, then add the lines in a loop. I'd rather create a list of slopes with a function and then plot all of them in ggplot2.
For example, the function might look something like this:
set.seed(777) # included so the following output is reproducible
n_resample <- 10000 # set the number of times to resample the data
# First argument is the data; second is the number of resampled datasets
bootstrap <- function(df, n_resample) {
slope_resample <- matrix(NA, nrow = n_resample) # initialize vector
index <- 1:nrow(df) # create an index for supplied table
for (i in 1:n_resample) {
index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
df_boot <- df[index_boot, ] # create a bootstrap sample from original data
a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
slope_resample[i] <- slope <- a$coefficients[2] # take the slope
}
return(slope_resample) # Return a vector of differences of proportion
}
bootstrapped_slopes <- bootstrap(faithful, 10000)
But how to get geom_line() or geom_smooth() to take the data from bootstrapped_slopes? Any assistance is much appreciated.
EDIT: More direct adaptation from the OP
For plotting, I presume you want both the slopes and the intercepts, so here's a modified bootstrap function:
bootstrap <- function(df, n_resample) {
# Note 2 dimensions here, for slope and intercept
slope_resample <- matrix(NA, 2, nrow = n_resample) # initialize vector
index <- 1:nrow(df) # create an index for supplied table
for (i in 1:n_resample) {
index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
df_boot <- df[index_boot, ] # create a bootstrap sample from original data
a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
slope_resample[i, 1] <- slope <- a$coefficients[1] # take the slope
slope_resample[i, 2] <- intercept <- a$coefficients[2] # take the intercept
}
# Return a data frame with all the slopes and intercepts
return(as.data.frame(slope_resample))
}
Then run it and plot the lines from that data frame:
bootstrapped_slopes <- bootstrap(faithful, 10000)
library(dplyr); library(ggplot2)
ggplot(faithful, aes(eruptions, waiting)) +
geom_abline(data = bootstrapped_slopes %>%
sample_n(1000), # 10k lines look about the same as 1k, just darker and slower
aes(slope = V2, intercept = V1), #, group = id),
alpha = 0.01) +
geom_point(shape = 19, color = "red")
Alternative solution
This could also be done using modelr and broom to simplify some of the bootstrapping. Based on the main help example for modelr::bootstrap, we can do the following:
library(purrr); library(modelr); library(broom); library(dplyr)
set.seed(777)
# Creates bootstrap object with 10k extracts from faithful
boot <- modelr::bootstrap(faithful, 10000)
# Applies the linear regression to each
models <- map(boot$strap, ~ lm(waiting ~ eruptions, data = .))
# Extracts the model results into a tidy format
tidied <- map_df(models, broom::tidy, .id = "id")
# We just need the slope and intercept here
tidied_wide <- tidied %>% select(id, term, estimate) %>% spread(term, estimate)
ggplot(faithful, aes(eruptions, waiting)) +
geom_abline(data = tidied_wide %>%
sample_n(1000), # 10k lines look about the same as 1k, just darker and slower
aes(slope = eruptions, intercept = `(Intercept)`, group = id), alpha = 0.05) +
geom_point(shape = 19, color = "red") # spot_color wasn't provided in OP

Having several fits in one plot (in R)

I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")

Plot a smooth and extrapolated curve using an nls model with several fitted parameters

I feel that I am close to finding the answer for my problem, but somehow I just cannot manage to do it. I have used nls function to fit 3 parameters using a rather complicated function describing fertilization success of eggs (y-axis) in a range of sperm concentrations (x-axis) (Styan's model [1], [2]). Fitting the parameters works fine, but I cannot manage to plot a smoothed extrapolated curve using predict function (see at the end of this post). I guess it is because I have used a value that was not fitted on x-axis. My question is how to plot a smoothed and extrapolated curve based on a model fitted with nls function
using non-fitted parameter on x-axis?
Here is an example:
library(ggplot2)
data.nls <- structure(list(S0 = c(0.23298, 2.32984, 23.2984, 232.98399, 2329.83993,
23298.39926), fert = c(0.111111111111111, 0.386792452830189,
0.158415841584158, 0.898648648648649, 0.616, 0.186440677966102
), speed = c(0.035161615379406, 0.035161615379406, 0.035161615379406,
0.035161615379406, 0.035161615379406, 0.035161615379406), E0 = c(6.86219803476946,
6.86219803476946, 6.86219803476946, 6.86219803476946, 6.86219803476946,
7.05624476582978), tau = c(1800, 1800, 1800, 1800, 1800, 1800
), B0 = c(0.000102758645352932, 0.000102758645352932, 0.000102758645352932,
0.000102758645352932, 0.000102758645352932, 0.000102758645352932
)), .Names = c("S0", "fert", "speed", "E0", "tau", "B0"), row.names = c(NA,
6L), class = "data.frame")
## Model S
modelS <- function(Fe, tb, Be) with (data.nls,{
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)})
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- cbind(data.nls, data.frame(pred = predict(modelS.fitted)))
## Plot
ggplot(model.data) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
I have tried following joran's example here, but it has no effect, maybe because I did not fit S0:
r <- range(model.data$S0)
S0.ext <- seq(r[1],r[2],length.out = 200)
predict(modelS.fitted, newdata = list(S0 = S0.ext))
# [1] 0.002871585 0.028289057 0.244399948 0.806316161 0.705116868 0.147974213
You function should have the parameters (S0,E0,B0,tau,Fe,tb,Be). nls will look for the parameters in the data.frame passed to its data argument and only try to fit those it doesn't find there (provided that starting values are given). No need for this funny with business in your function. (with shouldn't be used inside functions anyway. It's meant for interactive usage.) In predict newdata must contain all variables, that is S0,E0,B0, and tau.
Try this:
modelS <- function(S0,E0,B0,tau,Fe, tb, Be) {
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)}
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(S0,E0,B0,tau,Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- data.frame(
S0=seq(min(data.nls$S0),max(data.nls$S0),length.out=1e5),
E0=seq(min(data.nls$E0),max(data.nls$E0),length.out=1e5),
B0=seq(min(data.nls$B0),max(data.nls$B0),length.out=1e5),
tau=seq(min(data.nls$tau),max(data.nls$tau),length.out=1e5))
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(data.nls) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(data=model.data,aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
Obviously, this might not be what you want, since the function has multiple variables and more than one vary in new.data. Normally one would only vary one and keep the others constant for such a plot.
So this might be more appropriate:
S0 <- seq(min(data.nls$S0),max(data.nls$S0),length.out=1e4)
E0 <- seq(1,20,length.out=20)
B0 <- unique(data.nls$B0)
tau <- unique(data.nls$tau)
model.data <- expand.grid(S0,E0,B0,tau)
names(model.data) <- c("S0","E0","B0","tau")
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(model.data) +
geom_line(data=,aes(x = S0, y = pred, color=interaction(E0,B0,tau)), lwd = 1.3) +
geom_point(data=data.nls,aes(x = S0, y = fert), size = 2) +
scale_x_log10()

R Language - Sorting data into ranges; averaging; ignore outliers

I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers.
The data consists of 2 columns WindSpeed and Power, so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other.
What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x and y and then find the average of power generated for each range and graph the curve formed.
From this average I want recalculate the average based on data which falls within one of two standard deviations of the average (basically ignoring outliers).
Any pointers are appreciated.
For those who are interested I am trying to create a graph similar to this. Its a pretty standard type of graph but like I said the shear quantity of data requires something heavier than excel.
Since you're no longer in Excel, why not use a modern statistical methodology that doesn't require crude binning of the data and ad hoc methods to remove outliers: locally smooth regression, as implemented by loess.
Using a slight modification of csgillespie's sample data:
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
plot(w_sp, power)
x_grid <- seq(0, 100, length = 100)
lines(x_grid, predict(loess(power ~ w_sp), x_grid), col = "red", lwd = 3)
Throw this version, similar in motivation as #hadley's, into the mix using an additive model with an adaptive smoother using package mgcv:
Dummy data first, as used by #hadley
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
df <- data.frame(power = power, w_sp = w_sp)
Fit the additive model using gam(), using an adaptive smoother and smoothness selection via REML
require(mgcv)
mod <- gam(power ~ s(w_sp, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
Predict from our model and get standard errors of fit, use latter to generate an approximate 95% confidence interval
x_grid <- with(df, data.frame(w_sp = seq(min(w_sp), max(w_sp), length = 100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
x_grid <- within(x_grid, upr <- fit + 2 * pred$se.fit)
x_grid <- within(x_grid, lwr <- fit - 2 * pred$se.fit)
Plot everything and the Loess fit for comparison
plot(power ~ w_sp, data = df, col = "grey")
lines(fit ~ w_sp, data = x_grid, col = "red", lwd = 3)
## upper and lower confidence intervals ~95%
lines(upr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
lines(lwr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
## add loess fit from #hadley's answer
lines(x_grid$w_sp, predict(loess(power ~ w_sp, data = df), x_grid), col = "blue",
lwd = 3)
First we will create some example data to make the problem concrete:
w_sp = sample(seq(0, 100, 0.01), 1000)
power = 1/(1+exp(-(rnorm(1000, mean=w_sp, sd=5) -40)/5))
Suppose we want to bin the power values between [0,5), [5,10), etc. Then
bin_incr = 5
bins = seq(0, 95, bin_incr)
y_mean = sapply(bins, function(x) mean(power[w_sp >= x & w_sp < (x+bin_incr)]))
We have now created the mean values between the ranges of interest. Note, if you wanted the median values, just change mean to median. All that's left to do, is to plot them:
plot(w_sp, power)
points(seq(2.5, 97.5, 5), y_mean, col=3, pch=16)
To get the average based on data that falls within two standard deviations of the average, we need to create a slightly more complicated function:
noOutliers = function(x, power, w_sp, bin_incr) {
d = power[w_sp >= x & w_sp < (x + bin_incr)]
m_d = mean(d)
d_trim = mean(d[d > (m_d - 2*sd(d)) & (d < m_d + 2*sd(d))])
return(mean(d_trim))
}
y_no_outliers = sapply(bins, noOutliers, power, w_sp, bin_incr)
Here are some examples of fitted curves (weibull analysis) for commercial turbines:
http://www.inl.gov/wind/software/
http://www.irec.cmerp.net/papers/WOE/Paper%20ID%20161.pdf
http://www.icaen.uiowa.edu/~ie_155/Lecture/Power_Curve.pdf
I'd recommend also playing around with Hadley's own ggplot2. His website is a great resource: http://had.co.nz/ggplot2/ .
# If you haven't already installed ggplot2:
install.pacakges("ggplot2", dependencies = T)
# Load the ggplot2 package
require(ggplot2)
# csgillespie's example data
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
# Bind the two variables into a data frame, which ggplot prefers
wind <- data.frame(w_sp = w_sp, power = power)
# Take a look at how the first few rows look, just for fun
head(wind)
# Create a simple plot
ggplot(data = wind, aes(x = w_sp, y = power)) + geom_point() + geom_smooth()
# Create a slightly more complicated plot as an example of how to fine tune
# plots in ggplot
p1 <- ggplot(data = wind, aes(x = w_sp, y = power))
p2 <- p1 + geom_point(colour = "darkblue", size = 1, shape = "dot")
p3 <- p2 + geom_smooth(method = "loess", se = TRUE, colour = "purple")
p3 + scale_x_continuous(name = "mph") +
scale_y_continuous(name = "power") +
opts(title = "Wind speed and power")

Resources