R linear regression [closed] - r

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have the formula y = x / (a+b*x) that I want to fit to the points (6,72) (211,183) (808,360) (200,440). I put them in R using
x <- c(6,211,808,200)
y <- c(72,183,360,440)
Now I want to the fit the function defined above to fit trough these points, and find a and b.
How do I get a and b (using R) ? and, how do i get the formula in R?

Construct data:
x <- c(6,211,808,200)
y <- c(72,183,360,440)
d <- data.frame(x,y)
Plot the data: although sparse, they're not insane (they do show some evidence of an increasing/saturating pattern)
plot(y~x,data=d)
Fit the model:
## y = x/(a+b*x)
## 1/y = a/x + b
m1 <- glm(y~I(1/x),family=gaussian(link="inverse"),data=d)
You can plot the results in ggplot
library("ggplot2")
qplot(x,y,data=d)+theme_bw()+
geom_smooth(method="glm",family=gaussian(link="inverse"),
formula=y~I(1/x),se=FALSE)
The confidence intervals for this model are somewhat crazy (because the confidence intervals for 1/y include zero, at which point the confidence intervals on y blow up), so be careful ...

Get the data and plot it:
x <- c(6,211,808,200)
y <- c(72,183,360,440)
plot(x,y,pch=19)
Define the function, get your coefficients
f <- function(x,a,b) {x/(a+b*x)}
fit <- nls(y ~ f(x,a,b), start=c(a=1,b=1))
co <- coef(fit)
# co will contain your coefficients for a and b
# a b
#0.070221853 0.002796513
And plot away:
curve(f(x, a=co["a"], b=co["b"]), add = TRUE, col="green", lwd=2)
Result:

Related

Fit different regression models to polynomial data? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am an absolute beginner with R so please bear with me.
I have some generated polynomial (squared) data
x.training <- seq(0, 5, by=0.01) # x data
error.training <- rnorm(n=length(x.training), mean=0, sd=1) # Error (0, 1)
y.training <- x.training^2 + error.training # y data
I want to apply 3 different regression models to this data to demonstrate which one has a better fit. My 3 models are linear, polynomial, and trigonometric (cos).
I have tried the following but the lines either don't show up or are just straight lines. How could I go about applying these models properly?
Full code:
x.training <- seq(0, 5, by=0.01) # x data
error.training <- rnorm(n=length(x.training), mean=0, sd=1) # Error (0, 1)
y.training <- x.training^2 + error.training # y data
linear.model <- lm(y.training~x.training)
poly.model <- lm(y.training~poly(x.training, 2))
trig.model <- lm(y.training~cos(x.training))
linear.predict <- predict(linear.model)
poly.predict <- predict(poly.model)
trig.predict <- predict(trig.model)
plot(x.training, y.training)
lines(linear.predict, col="red")
lines(poly.predict, col="blue")
lines(trig.predict, col="green")
Absolutely simple mistake on my part. I feel silly.
lines(x.training, linear.predict, col="red")
lines(x.training, poly.predict, col="blue")
lines(x.training, trig.predict, col="green")
I wasn't feeding in any X coordinates, and predict only returns Y-hat.
Much better!

'Non-conformable arguments' in R code [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
: ) I previously wrote an R function that will compute a least-squares polynomial of arbitrary order to fit whatever data I put into it. "LeastSquaresDegreeN.R" The code works because I can reproduce results I got previously. However, when I try to put new data into it I get a "Non-conformable arguments" error.
"Error in Conj(t(Q))%*%t(b) : non-conformable arguments"
An extremely simple example of data that should work:
t <- seq(1,100,1)
fifthDegree <- t^5
LeastSquaresDegreeN(t,fifthDegree,5)
This should output and plot a polynomial f(t) = t^5 (up to rounding errors).
However I get "Non-conformable arguments" error even if I explicitly make these vectors:
t <- as.vector(t)
fifthDegree <- as.vector(fifthDegree)
LeastSquaresDegreeN(t,fifthDegree,5)
I've tried putting in the transpose of these vectors too - but nothing works.
Surely the solution is really simple. Help!? Thank you!
Here's the function:
LeastSquaresDegreeN <- function(t, b, deg)
{
# Usage: t is independent variable vector, b is function data
# i.e., b = f(t)
# deg is desired polynomial order
# deg <- deg + 1 is a little adjustment to make the R loops index correctly.
deg <- deg + 1
t <- t(t)
dataSize <- length(b)
A <- mat.or.vec(dataSize, deg) # Built-in R function to create zero
# matrix or zero vector of arbitrary size
# Given basis phi(z) = 1 + z + z^2 + z^3 + ...
# Define matrix A
for (i in 0:deg-1) {
A[1:dataSize,i+1] = t^i
}
# Compute QR decomposition of A. Pull Q and R out of QRdecomp
QRdecomp <- qr(A)
Q <- qr.Q(QRdecomp, complete=TRUE)
R <- qr.R(QRdecomp, complete=TRUE)
# Perform Q^* b^T (Conjugate transpose of Q)
c <- Conj(t(Q))%*%t(b)
# Find x. R isn't square - so we have to use qr.solve
x <- qr.solve(R, c)
# Create xPlot (which is general enough to plot any degree
# polynomial output)
xPlot = x[1,1]
for (i in 1:deg-1){
xPlot = xPlot + x[i+1,1]*t^i
}
# Now plot it. Least squares "l" plot first, then the points in red.
plot(t, xPlot, type='l', xlab="independent variable t", ylab="function values f(t)", main="Data Plotted with Nth Degree Least Squares Polynomial", col="blue")
points(t, b, col="red")
} # End

R survey confidence interval plots [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Similarly to barplot and dotchart (from the survey package) barNest (plotrix package) was meant to produce plots for svyby objects on the fly,but also plotted confidence intervals. However barNest.svymean is no longer working on survey data. An alternative would be to plot confidence intervals on top of the survey plotting function dotchart
library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
#just one variable
a<-svyby(~api99, ~stype, dclus1, svymean)
#several variables
b<-svyby(~api99+api00, ~stype, dclus1, svymean)
dotchart(b)
although I'm not sure how you'd do that. If anyone works this out then it would be really good to automate it (by creating some code that applies to svyby objects of different sizes) and maybe even incoroporate it in dotchart.svystat {survey}. It would make graphic comparison among groups much easier! The standard errors can be extrated from b or using SE(b).
right so you're trying to use an object class (svyby) in a function (barNest) that doesn't know how to handle that class, because the survey package and the plotrix package don't play together too nicely. luckily the dotchart method for svyby objects isn't too much code, so you might just want to modify it..
# run your code above, then review the dotchart method for svyby objects:
getS3method( 'dotchart' , 'svyby' )
..and from that you can learn it's really not much beyond calling the original dotchart function (that is, not using the svyby object, just a regular collection of statistics), after converting the data contained in your b object to a matrix. now all you have left to do is add a confidence interval line.
the confidence interval widths are easily obtained (easier than using SE(b)) by running
confint( b )
can you extract those statistics to build your own barNest or plotCI call?
if it's important to put confidence intervals on a dotchart, the major hurdle is hitting the y coordinates correctly. dig around in the dotchart default method..
getS3method( 'dotchart' , 'default' )
..and you can see how the y coordinates are calculated. whittled down to just the essentials, i think you can use this:
# calculate the distinct groups within the `svyby` object
groups <- as.numeric( as.factor( attr( b , 'row.names' ) ) )
# calculate the distinct statistics within the `svyby` object
nstats <- attr( b , 'svyby' )$nstats
# calculate the total number of confidence intervals you need to add
n <- length( groups ) * nstats
# calculate the offset sizes
offset <- cumsum(c(0, diff(groups) != 0))
# find the exact y coordinates for each dot in the dotchart
# and leave two spaces between each group
y <- 1L:n + sort( rep( 2 * offset , nstats ) )
# find the confidence interval positions
ci.pos <-
rep( groups , each = nstats ) +
c( 0 , length( groups ) )
# extract the confidence intervals
x <- confint( b )[ ci.pos , ]
# add the y coordinates to a new line data object
ld <- data.frame( x )
# loop through each dot in the dotchart..
for ( i in seq_len( nrow( ld ) ) ){
# add the CI lines to the current plot
lines( ld[ i , 1:2 ] , rep( y[i] , 2 ) )
}
but that's obviously clunky since the confidence intervals are allowed to go way off the screen. ignoring the svyby class and even the whole survey package for a second, find us implementation of dotchart that formats confidence intervals nicely, and we may be able to help you more. i don't think the survey package is the root of your problem :)
Adding a new dotchart plot (with min and max) to Anthony's last bit (from ld<-data.frame(x)) solves the problem he outlined.
ld <- data.frame( x )
dotchart(b,xlim=c(min(ld),max(ld)))#<-added
for ( i in seq_len( nrow( ld ) ) ){
lines( ld[ i , 1:2 ] , rep( y[i] , 2 ) )
}
However I agree with Anthony: the plot doesn't look great. Many thanks to Anthony for sharing his knowledge and programming skills. The confidence intervals also look asymmetrical (which might be right), particularly for M api00. Has anyone compared this with other software? Should confit specify a df (degrees of freedom)?

Given a sample of random variables, and n, how do I find the ecdf of the sum of n Xs? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I can't fit X to a common distribution so currently I just have X ~ ecdf(sample_data).
How do I calculate the empirical distribution of sum(X1 + ... + Xn), given n? X1 to Xn are iid.
To estimate the distribution of that sum, you can repeatedly sample with replacement (and then take the sum of) n variates from sample_data. (sample() places equal probability mass on each element of sample_data, just as the ecdf does, so you don't need to calculate ecdf(sample_data) as an intermediate step.)
# Create some example data
sample_data <- runif(100)
n <- 10
X <- replicate(1000, sum(sample(sample_data, size=n, replace=TRUE)))
# Plot the estimated distribution of the sum of n variates.
hist(X, breaks=40, col="grey", main=expression(sum(x[i], i==1, n)))
box(bty="l")
# Plot the ecdf of the sum
plot(ecdf(X))
First, generalize and simplify: solve for step function CDFs X and Y, independent but not identically distributed. For every step jump xi and every step jump yi, there will be a corresponding step jump at xi+yi in the CDF of X + Y, So the CDF of X + Y will be characterized by the list:
sorted(x + y for x in X for y in Y)
That means if there are k points in X's CDF, there will be kn in (X1 + ... + Xn). We can cut that down to a manageable number at the end by throwing away all but k again, but clearly the intermediate calculations will be costly in time and space.
Also, note that even though the original CDF is an ECDF for X, the result will not be an ECDF for (X1 + ... + Xn), even if you keep all kn points.
In conclusion, use Josh's solution.

Calculate error, MSE and MAPE? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I created this program to estimate the Mean Squared Error (MSE), and Mean absolute percent error (MAPE):
Is everything all right with this?
pune is an .csv file with 22 data points.
pune <- read.csv("C:/Users/ervis/Desktop/Te dhenat e konsum energji/pune.csv", header=T,dec=",", sep=";")
pune <- data.matrix(pune,rownames.force=NA)
m1 <- seq(from = 14274.19, to = 14458.17, length.out = 10000)
MSE1 <- numeric(length = 10000)
for(i in seq_along(MSE1)) {
MSE1[i] <- 1 / length(pune) * sum((pune-m1[i]) ^ 2)
}
MAPE1 <- numeric(length = 10000)
for(i in seq_along(MAPE1)) {
MAPE1[i] <- 1 / length(pune) * sum(abs((pune-m1[i]) / pune))
}
Am I right?
Mean squared error seems to have different meanings in different contexts.
For a random sample taken from a population, the MSE of the sample mean is just the variance divided by the number of samples, i.e.,
mse <- function(sample_mean) var(sample_mean) / length(sample_mean)
mse(pune)
For regressions, MSE means the sum of squares of residuals divided by the degreees of freedom of those residuals.
mse.lm <- function(lm_model) sum(residuals(lm_model) ^ 2) / lm_model$df.residual
#or
mse.lm <- function(lm_model) summary(lm_model)$sigma ^ 2
Seems like a lot of code for a simple calculation. Here is how I would do it for a data vector a:
a = c(1:10)
mse_a = sum((a - mean(a)) ^ 2) / length(a)
From what I can see your formula for MSE is correct, but there should only be one value for the whole dataset, not multiple values.
If your data only contains 22 points, I can't see why you need to create a 10,000 item vector, regardless of whether you are using loops or not.

Resources