I have a simple dataset and I am trying to use the power trend to best fit the data. The sample data is very small and is as follows:
structure(list(Discharge = c(250, 300, 500, 700, 900), Downstream = c(0.3,
0.3, 0.3, 0.3, 0.3), Age = c(1.32026239202165, 1.08595138888889,
0.638899189814815, 0.455364583333333, 0.355935185185185)), .Names = c("Discharge",
"Downstream", "Age"), row.names = c(NA, 5L), class = "data.frame")
Data looks as follows:
> new
Discharge Downstream Age
1 250 0.3 1.3202624
2 300 0.3 1.0859514
3 500 0.3 0.6388992
4 700 0.3 0.4553646
5 900 0.3 0.3559352
I tried to plot the above data using ggplot2
ggplot(new)+geom_point(aes(x=Discharge,y=Age))
I could add the linear line using geom_smooth(method="lm") but I am not sure what code do I need to show the power line.
The output is as follows:
How Can I add a power linear regression line as done in excel ? The excel figure is shown below:
While mnel's answer is correct for a nonlinear least squares fit, note that Excel isn't actually doing anything nearly that sophisticated. It's really just log-transforming the response and predictor variables, and doing an ordinary (linear) least squares fit. To reproduce this in R, you would do:
lm(log(Age) ~ log(Discharge), data=df)
Call:
lm(formula = log(Age) ~ log(Discharge), data = df)
Coefficients:
(Intercept) log(Discharge)
5.927 -1.024
As a check, the coefficient for log(Discharge) is identical to that from Excel while exp(5.927) ~ 375.05.
While I'm not sure how to use this as a trendline in ggplot2, you can do it in base graphics thusly:
m <- lm(log(y) ~ log(x), data=df)
newdf <- data.frame(Discharge=seq(min(df$Discharge), max(df$Discharge), len=100))
plot(Age ~ Discharge, data=df)
lines(newdf$Discharge, exp(predict(m, newdf)))
text(600, .8, substitute(b0*x^b1, list(b0=exp(coef(m)[1]), b1=coef(m)[2])))
text(600, .75, substitute(plain("R-square: ") * r2, list(r2=summary(m)$r.squared)))
Use nls (nonlinear least squares) as your smoother
eg
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE)
Noting Doug Bates comments on R-squared values and non-linear models here, you could use the ideas in
Adding Regression Line Equation and R2 on graph
to append the regression line equation
# note that you have to give it sensible starting values
# and I haven't worked out why the values passed to geom_smooth work!
power_eqn = function(df, start = list(a =300,b=1)){
m = nls(Discharge ~ a*Age^b, start = start, data = df);
eq <- substitute(italic(y) == a ~italic(x)^b,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2)))
as.character(as.expression(eq));
}
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE) +
geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
2018 Update:
The call "start" now seems to be depreciated. It is not in the stat_smooth function information either.
If you want to choose starting values, you need to use "method.args" option now.
See changes below:
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a = 1,b=1)),se=FALSE) + geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
Related
I am running an analysis in R on the effect of canopy cover (OverheadCover) and the number of carcasses placed on the same location (CarcassNumber) on the proportion of carrion eaten by birds (ProportionBirdsScavenging). The interaction effect OverheadCover * CarcassNumber is significant and I would like visualise this using ggplot like explained here: https://sebastiansauer.github.io/vis_interaction_effects/. I won't be using method = "lm" like in the example, but method = glmmTMB::glmmTMB. I've added the extra arguments formula = and method.args = to make sure R computes the smooth correctly.
This is how it should look, but I prefer the graph to be made with ggplot because then all my graphs will be in the same style.
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
plot_model(glmm_interaction, type = "int", ci.lvl = 0.682) # conf. int. of 68.3% -> ± standard error
This is the code I'm trying to run, but I can't get it to work. It keeps giving me errors, like object 'pointWeight_scaled' not found. Anyone an idea what I'm doing wrong here?
qplot(x = OverheadCover, y = ProportionBirdsScavenging, color = CarcassNumber, data = data_both) +
geom_smooth(method = glmmTMB::glmmTMB,
formula = ProportionBirdsScavenging ~ OverheadCover * CarcassNumber,
method.args = list(data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled))
I know that it might be easier to just individually run the models and plot them on the same graph. I've done that, and it works. However, my calculated standard errors are larger than the ones in the plot_model(), so I wanted to see how these standard errors look if R does all the work, hence my intention to plot it this way.
This is how it should look, but I prefer the graph to be made with ggplot
The plot returned by plot_model() is a ggplot-object, which you can modify as you like. You could also use the ggeffects-package, which returns the underlying data that can be used to create the plot. There are many examples in the vignettes, both on how to create own plots or how to modify plots returned by plot(), e.g. here or here.
Here is a toy example:
library(ggplot2)
library(ggeffects)
library(lme4)
#> Loading required package: Matrix
set.seed(123)
dat <- data.frame(
outcome = rbinom(n = 500, size = 1, prob = 0.25),
var_binom = as.factor(rbinom(n = 500, size = 1, prob = 0.3)),
var_cont = rnorm(n = 500, mean = 10, sd = 3),
group = sample(letters[1:4], size =500, replace = TRUE)
)
model <- glmer(
outcome ~ var_binom * poly(var_cont, 2) + (1 | group),
data = dat,
family = binomial(link = "logit")
)
predictions <- ggpredict(model, c("var_cont [all]", "var_binom"))
# plot-function from ggeffects
plot(predictions)
# self made ggplot
ggplot(
predictions,
aes(x = x, y = predicted, ymin = conf.low, ymax = conf.high, colour = group, fill = group)
) +
geom_line() +
geom_ribbon(alpha = .1, colour = NA) +
theme_minimal()
Created on 2020-02-06 by the reprex package (v0.3.0)
I am making exponential regressions in r.
Actually I want to compare y = exp^(ax+b) with y = 5^(ax+b).
# data
set.seed(1)
y <- c(3.5, 2.9, 2.97,4.58,6.18,7.11,9.50,9.81,10.17,10.53,
12.33,14.14,18, 22, 25, 39, 40, 55, 69, 72) + rnorm(20, 10, 1)
x <- 1:length(y)
df = data.frame(x = x, y = y)
predata = data.frame(x = 1:20)
# plot
plot(df, ylim = c(0,100), xlim = c(0,40))
# simple linear regression
fit_sr = lm(y~x, data = df)
pre_sr = predict(fit_sr, newdata = predata,
interval ='confidence',
level = 0.90)
lines(pre_sr[,1], col = "red")
# exponential regression 1
fit_er1 = lm(log(y, base = exp(1))~x, data = df)
pre_er1 = predict(fit_er1, newdata = predata,
interval ='confidence',
level = 0.90)
pre_er1 = exp(1)^pre_er1 # correctness
lines(pre_er1[,1], col = "dark green")
# exponential regression 2
fit_er2 = lm(log(y, base = 5) ~ x, data = df)
pre_er2 = predict(fit_er2, newdata = predata,
interval ='confidence',
level = 0.90)
pre_er2 = 5^pre_er2 # correctness
lines(pre_er2[,1], col = "blue")
I expect something like this(plot1), but exponential regression 1 and 2 are totally the same(plot2).
plot1
plot2
The two regression should be different because of the Y value is different.
Also, I am looking for how to make y = exp(ax+b) + c fitting in R.
Your code is correct, your theory is where the problem is. The models should be the same.
Easiest way is to think on the log scale, as you've done in your code. Starting with y = exp(ax + b) we can get to log(y) = ax + b, so a linear model with log(y) as the response. With y = 5^(cx + d), we can get log(y) = (cx + d) * log(5) = (c*log(5)) * x + (d*log(5)), also a linear model with log(y) as the response. Yhe model fit/predictions will not be any different with a different base, you can transform the base e coefs to base 5 coefs by multiplying them by log(5). a = c*log(5) and b = d*log(5).
It's a bit like wanting to compare the linear models y = ax + b where x is measured in meters vs y = ax + b where x is measured in centimeters. The coefficients will change to accommodate the scale, but the fit isn't any different.
The first part is already answered by #gregor, the second part "...I am looking for how to make y = exp(ax+b) + c fitting in R" can be done with nls:
fit_er3 <- nls(y ~ exp(a*x+b) + c, data = df, start=list(a=1,b=0,c=0))
I feel that I am close to finding the answer for my problem, but somehow I just cannot manage to do it. I have used nls function to fit 3 parameters using a rather complicated function describing fertilization success of eggs (y-axis) in a range of sperm concentrations (x-axis) (Styan's model [1], [2]). Fitting the parameters works fine, but I cannot manage to plot a smoothed extrapolated curve using predict function (see at the end of this post). I guess it is because I have used a value that was not fitted on x-axis. My question is how to plot a smoothed and extrapolated curve based on a model fitted with nls function
using non-fitted parameter on x-axis?
Here is an example:
library(ggplot2)
data.nls <- structure(list(S0 = c(0.23298, 2.32984, 23.2984, 232.98399, 2329.83993,
23298.39926), fert = c(0.111111111111111, 0.386792452830189,
0.158415841584158, 0.898648648648649, 0.616, 0.186440677966102
), speed = c(0.035161615379406, 0.035161615379406, 0.035161615379406,
0.035161615379406, 0.035161615379406, 0.035161615379406), E0 = c(6.86219803476946,
6.86219803476946, 6.86219803476946, 6.86219803476946, 6.86219803476946,
7.05624476582978), tau = c(1800, 1800, 1800, 1800, 1800, 1800
), B0 = c(0.000102758645352932, 0.000102758645352932, 0.000102758645352932,
0.000102758645352932, 0.000102758645352932, 0.000102758645352932
)), .Names = c("S0", "fert", "speed", "E0", "tau", "B0"), row.names = c(NA,
6L), class = "data.frame")
## Model S
modelS <- function(Fe, tb, Be) with (data.nls,{
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)})
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- cbind(data.nls, data.frame(pred = predict(modelS.fitted)))
## Plot
ggplot(model.data) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
I have tried following joran's example here, but it has no effect, maybe because I did not fit S0:
r <- range(model.data$S0)
S0.ext <- seq(r[1],r[2],length.out = 200)
predict(modelS.fitted, newdata = list(S0 = S0.ext))
# [1] 0.002871585 0.028289057 0.244399948 0.806316161 0.705116868 0.147974213
You function should have the parameters (S0,E0,B0,tau,Fe,tb,Be). nls will look for the parameters in the data.frame passed to its data argument and only try to fit those it doesn't find there (provided that starting values are given). No need for this funny with business in your function. (with shouldn't be used inside functions anyway. It's meant for interactive usage.) In predict newdata must contain all variables, that is S0,E0,B0, and tau.
Try this:
modelS <- function(S0,E0,B0,tau,Fe, tb, Be) {
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)}
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(S0,E0,B0,tau,Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- data.frame(
S0=seq(min(data.nls$S0),max(data.nls$S0),length.out=1e5),
E0=seq(min(data.nls$E0),max(data.nls$E0),length.out=1e5),
B0=seq(min(data.nls$B0),max(data.nls$B0),length.out=1e5),
tau=seq(min(data.nls$tau),max(data.nls$tau),length.out=1e5))
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(data.nls) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(data=model.data,aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
Obviously, this might not be what you want, since the function has multiple variables and more than one vary in new.data. Normally one would only vary one and keep the others constant for such a plot.
So this might be more appropriate:
S0 <- seq(min(data.nls$S0),max(data.nls$S0),length.out=1e4)
E0 <- seq(1,20,length.out=20)
B0 <- unique(data.nls$B0)
tau <- unique(data.nls$tau)
model.data <- expand.grid(S0,E0,B0,tau)
names(model.data) <- c("S0","E0","B0","tau")
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(model.data) +
geom_line(data=,aes(x = S0, y = pred, color=interaction(E0,B0,tau)), lwd = 1.3) +
geom_point(data=data.nls,aes(x = S0, y = fert), size = 2) +
scale_x_log10()
I have the following data in my thesis:
28 45
91 14
102 11
393 5
4492 1.77
I need to fit a curve into this. If I plot it, then this is what I get.
I think some kind of exponential curve should fit this data. I am using GNUplot. Can someone tell me what kind of curve will fit this and what initial parameters I can use?
Just in case R is an option, here's a sketch of two methods you might use.
First method: evaluate the goodness of fit of a set of candidate models
This is probably the best way as it takes advantage of what you might already know or expect about the relationship between the variables.
# read in the data
dat <- read.table(text= "x y
28 45
91 14
102 11
393 5
4492 1.77", header = TRUE)
# quick visual inspection
plot(dat); lines(dat)
# a smattering of possible models... just made up on the spot
# with more effort some better candidates should be added
# a smattering of possible models...
models <- list(lm(y ~ x, data = dat),
lm(y ~ I(1 / x), data = dat),
lm(y ~ log(x), data = dat),
nls(y ~ I(1 / x * a) + b * x, data = dat, start = list(a = 1, b = 1)),
nls(y ~ (a + b * log(x)), data = dat, start = setNames(coef(lm(y ~ log(x), data = dat)), c("a", "b"))),
nls(y ~ I(exp(1) ^ (a + b * x)), data = dat, start = list(a = 0,b = 0)),
nls(y ~ I(1 / x * a) + b, data = dat, start = list(a = 1,b = 1))
)
# have a quick look at the visual fit of these models
library(ggplot2)
ggplot(dat, aes(x, y)) + geom_point(size = 5) +
stat_smooth(method = lm, formula = as.formula(models[[1]]), size = 1, se = FALSE, color = "black") +
stat_smooth(method = lm, formula = as.formula(models[[2]]), size = 1, se = FALSE, color = "blue") +
stat_smooth(method = lm, formula = as.formula(models[[3]]), size = 1, se = FALSE, color = "yellow") +
stat_smooth(method = nls, formula = as.formula(models[[4]]), data = dat, method.args = list(start = list(a = 0,b = 0)), size = 1, se = FALSE, color = "red", linetype = 2) +
stat_smooth(method = nls, formula = as.formula(models[[5]]), data = dat, method.args = list(start = setNames(coef(lm(y ~ log(x), data = dat)), c("a", "b"))), size = 1, se = FALSE, color = "green", linetype = 2) +
stat_smooth(method = nls, formula = as.formula(models[[6]]), data = dat, method.args = list(start = list(a = 0,b = 0)), size = 1, se = FALSE, color = "violet") +
stat_smooth(method = nls, formula = as.formula(models[[7]]), data = dat, method.args = list(start = list(a = 0,b = 0)), size = 1, se = FALSE, color = "orange", linetype = 2)
The orange curve looks pretty good. Let's see how it ranks when we measure the relative goodness of fit of these models are...
# calculate the AIC and AICc (for small samples) for each
# model to see which one is best, ie has the lowest AIC
library(AICcmodavg); library(plyr); library(stringr)
ldply(models, function(mod){ data.frame(AICc = AICc(mod), AIC = AIC(mod), model = deparse(formula(mod))) })
AICc AIC model
1 70.23024 46.23024 y ~ x
2 44.37075 20.37075 y ~ I(1/x)
3 67.00075 43.00075 y ~ log(x)
4 43.82083 19.82083 y ~ I(1/x * a) + b * x
5 67.00075 43.00075 y ~ (a + b * log(x))
6 52.75748 28.75748 y ~ I(exp(1)^(a + b * x))
7 44.37075 20.37075 y ~ I(1/x * a) + b
# y ~ I(1/x * a) + b * x is the best model of those tried here for this curve
# it fits nicely on the plot and has the best goodness of fit statistic
# no doubt with a better understanding of nls and the data a better fitting
# function could be found. Perhaps the optimisation method here might be
# useful also: http://stats.stackexchange.com/a/21098/7744
Second method: use genetic programming to search a vast amount of models
This seems to be a kind of wild shot in the dark approach to curve-fitting. You don't have to specify much at the start, though perhaps I'm doing it wrong...
# symbolic regression using Genetic Programming
# http://rsymbolic.org/projects/rgp/wiki/Symbolic_Regression
library(rgp)
# this will probably take some time and throw
# a lot of warnings...
result1 <- symbolicRegression(y ~ x,
data=dat, functionSet=mathFunctionSet,
stopCondition=makeStepsStopCondition(2000))
# inspect results, they'll be different every time...
(symbreg <- result1$population[[which.min(sapply(result1$population, result1$fitnessFunction))]])
function (x)
tan((x - x + tan(x)) * x)
# quite bizarre...
# inspect visual fit
ggplot() + geom_point(data=dat, aes(x,y), size = 3) +
geom_line(data=data.frame(symbx=dat$x, symby=sapply(dat$x, symbreg)), aes(symbx, symby), colour = "red")
Actually a very poor visual fit. Perhaps there's a bit more effort required to get quality results from genetic programming...
Credits: Curve fitting answer 1, curve fitting answer 2 by G. Grothendieck.
Do you know some analytical function that the data should adhere to? If so, it could help you choose the form of the function, to fit to the data.
Otherwise, since the data looks like exponential decay, try something like this in gnuplot, where a function with two free parameters is fitted to the data:
f(x) = exp(-x*c)*b
fit f(x) "data.dat" u 1:2 via b,c
plot "data.dat" w p, f(x)
Gnuplot will vary parameters named after the 'via' clause for the best fit. Statistics are printed to stdout, as well as a file called 'fit.log' in the current working directory.
The c variable will determine the curvature (decay), while the b variable will scale all values linearly to get the correct magnitude of the data.
For more info, see the Curve fit section in the Gnuplot documentation.
I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers.
The data consists of 2 columns WindSpeed and Power, so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other.
What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x and y and then find the average of power generated for each range and graph the curve formed.
From this average I want recalculate the average based on data which falls within one of two standard deviations of the average (basically ignoring outliers).
Any pointers are appreciated.
For those who are interested I am trying to create a graph similar to this. Its a pretty standard type of graph but like I said the shear quantity of data requires something heavier than excel.
Since you're no longer in Excel, why not use a modern statistical methodology that doesn't require crude binning of the data and ad hoc methods to remove outliers: locally smooth regression, as implemented by loess.
Using a slight modification of csgillespie's sample data:
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
plot(w_sp, power)
x_grid <- seq(0, 100, length = 100)
lines(x_grid, predict(loess(power ~ w_sp), x_grid), col = "red", lwd = 3)
Throw this version, similar in motivation as #hadley's, into the mix using an additive model with an adaptive smoother using package mgcv:
Dummy data first, as used by #hadley
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
df <- data.frame(power = power, w_sp = w_sp)
Fit the additive model using gam(), using an adaptive smoother and smoothness selection via REML
require(mgcv)
mod <- gam(power ~ s(w_sp, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
Predict from our model and get standard errors of fit, use latter to generate an approximate 95% confidence interval
x_grid <- with(df, data.frame(w_sp = seq(min(w_sp), max(w_sp), length = 100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
x_grid <- within(x_grid, upr <- fit + 2 * pred$se.fit)
x_grid <- within(x_grid, lwr <- fit - 2 * pred$se.fit)
Plot everything and the Loess fit for comparison
plot(power ~ w_sp, data = df, col = "grey")
lines(fit ~ w_sp, data = x_grid, col = "red", lwd = 3)
## upper and lower confidence intervals ~95%
lines(upr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
lines(lwr ~ w_sp, data = x_grid, col = "red", lwd = 2, lty = "dashed")
## add loess fit from #hadley's answer
lines(x_grid$w_sp, predict(loess(power ~ w_sp, data = df), x_grid), col = "blue",
lwd = 3)
First we will create some example data to make the problem concrete:
w_sp = sample(seq(0, 100, 0.01), 1000)
power = 1/(1+exp(-(rnorm(1000, mean=w_sp, sd=5) -40)/5))
Suppose we want to bin the power values between [0,5), [5,10), etc. Then
bin_incr = 5
bins = seq(0, 95, bin_incr)
y_mean = sapply(bins, function(x) mean(power[w_sp >= x & w_sp < (x+bin_incr)]))
We have now created the mean values between the ranges of interest. Note, if you wanted the median values, just change mean to median. All that's left to do, is to plot them:
plot(w_sp, power)
points(seq(2.5, 97.5, 5), y_mean, col=3, pch=16)
To get the average based on data that falls within two standard deviations of the average, we need to create a slightly more complicated function:
noOutliers = function(x, power, w_sp, bin_incr) {
d = power[w_sp >= x & w_sp < (x + bin_incr)]
m_d = mean(d)
d_trim = mean(d[d > (m_d - 2*sd(d)) & (d < m_d + 2*sd(d))])
return(mean(d_trim))
}
y_no_outliers = sapply(bins, noOutliers, power, w_sp, bin_incr)
Here are some examples of fitted curves (weibull analysis) for commercial turbines:
http://www.inl.gov/wind/software/
http://www.irec.cmerp.net/papers/WOE/Paper%20ID%20161.pdf
http://www.icaen.uiowa.edu/~ie_155/Lecture/Power_Curve.pdf
I'd recommend also playing around with Hadley's own ggplot2. His website is a great resource: http://had.co.nz/ggplot2/ .
# If you haven't already installed ggplot2:
install.pacakges("ggplot2", dependencies = T)
# Load the ggplot2 package
require(ggplot2)
# csgillespie's example data
w_sp <- sample(seq(0, 100, 0.01), 1000)
power <- 1/(1+exp(-(w_sp -40)/5)) + rnorm(1000, sd = 0.1)
# Bind the two variables into a data frame, which ggplot prefers
wind <- data.frame(w_sp = w_sp, power = power)
# Take a look at how the first few rows look, just for fun
head(wind)
# Create a simple plot
ggplot(data = wind, aes(x = w_sp, y = power)) + geom_point() + geom_smooth()
# Create a slightly more complicated plot as an example of how to fine tune
# plots in ggplot
p1 <- ggplot(data = wind, aes(x = w_sp, y = power))
p2 <- p1 + geom_point(colour = "darkblue", size = 1, shape = "dot")
p3 <- p2 + geom_smooth(method = "loess", se = TRUE, colour = "purple")
p3 + scale_x_continuous(name = "mph") +
scale_y_continuous(name = "power") +
opts(title = "Wind speed and power")