Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am an absolute beginner with R so please bear with me.
I have some generated polynomial (squared) data
x.training <- seq(0, 5, by=0.01) # x data
error.training <- rnorm(n=length(x.training), mean=0, sd=1) # Error (0, 1)
y.training <- x.training^2 + error.training # y data
I want to apply 3 different regression models to this data to demonstrate which one has a better fit. My 3 models are linear, polynomial, and trigonometric (cos).
I have tried the following but the lines either don't show up or are just straight lines. How could I go about applying these models properly?
Full code:
x.training <- seq(0, 5, by=0.01) # x data
error.training <- rnorm(n=length(x.training), mean=0, sd=1) # Error (0, 1)
y.training <- x.training^2 + error.training # y data
linear.model <- lm(y.training~x.training)
poly.model <- lm(y.training~poly(x.training, 2))
trig.model <- lm(y.training~cos(x.training))
linear.predict <- predict(linear.model)
poly.predict <- predict(poly.model)
trig.predict <- predict(trig.model)
plot(x.training, y.training)
lines(linear.predict, col="red")
lines(poly.predict, col="blue")
lines(trig.predict, col="green")
Absolutely simple mistake on my part. I feel silly.
lines(x.training, linear.predict, col="red")
lines(x.training, poly.predict, col="blue")
lines(x.training, trig.predict, col="green")
I wasn't feeding in any X coordinates, and predict only returns Y-hat.
Much better!
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
In R, I am generating uncorrelated values in time domain with rnorm(). Then I apply fft() to these values, however, I am only getting a value of 0.88 instead of 1. Is there anything I am not aware of?
Here is a MWE:
# dt <- 0.01 # time stesp
nSteps <- 100000 # Number of time steps
# df <- 1/(nSteps*dt) # frequency resolution
# t <- 0:(nSteps-1)*dt #
y <- rnorm(nSteps, mean=0, sd=1) # generate uncorrelated data. Should result in a white noise spectrum with sd=1
y_sq_sum <- sum(y^2)
# We ignore cutting to the Nyquist frequency.
# f <- 0:(nSteps-1)*df
fft_y <- abs(fft(y))/sqrt(length(y))
fft_y_sq_sum <- sum(fft_y^2)
print(paste("Check for Parseval's theorem: y_sq_sum = ", y_sq_sum, "; fft_y_sq_sum = ", fft_y_sq_sum, sep=""))
print(paste("Mean amplitude of my fft spectrum: ", mean(fft_y)))
print(paste("The above is typically around 0.88, why is it not 1?"))
This question doesn't belong on StackOverflow, it's more of a Cross-validated kind of thing. But here's an answer anyway:
Parseval's theorem says that the mean of fft_y^2 should be 1. The square root function is a concave function, so Jensen's inequality says the mean of sqrt(fft_y^2) will be less than 1. Since fft_y is positive in your definition, fft_y = sqrt(fft_y^2).
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have the following problem. I have a time series made by 2659 observations. I need to perform a statistical test over a sliding window of length 256 and each time I want to extract the p-values from these tests and gather them into a time series vector. To perform this test (runs test) I want as threshold a moving average that moves along with the data and the rolling window. Here is my attemp (in R)
x<- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
for(i in 1:2404){
runs <- runs.test(x[i:i+255], threshold = mean(x[i:i+255]))
ret[i] <- runs$p.value
}
The index starts from 1 but stops to 2404 because the time window must move of 256 each time, therefore the first window goes from 1 to 256, the second from 2 to 257... and finally stops to 255+2404 = 2659. I hope that I made clear my problem, I do not understand why it does not work. Of course I need to plot the result over time to have in a plot all the p-values over the time. I hope you can help me.
PS: Please, set a seed if you propose an example so that I can reproduce your results.
Use rollapplyr with the indicated function.
library(zoo)
pv <- function(xx) runs.test(xx, threshold = mean(xx))$p.value
out <- rollapplyr(x, 256, pv, fill = NA)
Note
library(randtests)
set.seed(123)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
Two changes to your existing code should make it work:
set.seed(0)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
ret <- rep(NA, length(x))
for(i in 1:2404){
runs <- runs.test(x[i:(i+255)], threshold = mean(x[i:(i+255)]))
ret[i] <- runs$p.value
}
First change is to initialize the ret variable before the loop. ret <- rep(NA, length(x))
The second change is to add the parenthesis, i.e. x[i:(i+255)]. If you do x[i:i+255], you will get a single return value, x[i].
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
: ) I previously wrote an R function that will compute a least-squares polynomial of arbitrary order to fit whatever data I put into it. "LeastSquaresDegreeN.R" The code works because I can reproduce results I got previously. However, when I try to put new data into it I get a "Non-conformable arguments" error.
"Error in Conj(t(Q))%*%t(b) : non-conformable arguments"
An extremely simple example of data that should work:
t <- seq(1,100,1)
fifthDegree <- t^5
LeastSquaresDegreeN(t,fifthDegree,5)
This should output and plot a polynomial f(t) = t^5 (up to rounding errors).
However I get "Non-conformable arguments" error even if I explicitly make these vectors:
t <- as.vector(t)
fifthDegree <- as.vector(fifthDegree)
LeastSquaresDegreeN(t,fifthDegree,5)
I've tried putting in the transpose of these vectors too - but nothing works.
Surely the solution is really simple. Help!? Thank you!
Here's the function:
LeastSquaresDegreeN <- function(t, b, deg)
{
# Usage: t is independent variable vector, b is function data
# i.e., b = f(t)
# deg is desired polynomial order
# deg <- deg + 1 is a little adjustment to make the R loops index correctly.
deg <- deg + 1
t <- t(t)
dataSize <- length(b)
A <- mat.or.vec(dataSize, deg) # Built-in R function to create zero
# matrix or zero vector of arbitrary size
# Given basis phi(z) = 1 + z + z^2 + z^3 + ...
# Define matrix A
for (i in 0:deg-1) {
A[1:dataSize,i+1] = t^i
}
# Compute QR decomposition of A. Pull Q and R out of QRdecomp
QRdecomp <- qr(A)
Q <- qr.Q(QRdecomp, complete=TRUE)
R <- qr.R(QRdecomp, complete=TRUE)
# Perform Q^* b^T (Conjugate transpose of Q)
c <- Conj(t(Q))%*%t(b)
# Find x. R isn't square - so we have to use qr.solve
x <- qr.solve(R, c)
# Create xPlot (which is general enough to plot any degree
# polynomial output)
xPlot = x[1,1]
for (i in 1:deg-1){
xPlot = xPlot + x[i+1,1]*t^i
}
# Now plot it. Least squares "l" plot first, then the points in red.
plot(t, xPlot, type='l', xlab="independent variable t", ylab="function values f(t)", main="Data Plotted with Nth Degree Least Squares Polynomial", col="blue")
points(t, b, col="red")
} # End
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have the formula y = x / (a+b*x) that I want to fit to the points (6,72) (211,183) (808,360) (200,440). I put them in R using
x <- c(6,211,808,200)
y <- c(72,183,360,440)
Now I want to the fit the function defined above to fit trough these points, and find a and b.
How do I get a and b (using R) ? and, how do i get the formula in R?
Construct data:
x <- c(6,211,808,200)
y <- c(72,183,360,440)
d <- data.frame(x,y)
Plot the data: although sparse, they're not insane (they do show some evidence of an increasing/saturating pattern)
plot(y~x,data=d)
Fit the model:
## y = x/(a+b*x)
## 1/y = a/x + b
m1 <- glm(y~I(1/x),family=gaussian(link="inverse"),data=d)
You can plot the results in ggplot
library("ggplot2")
qplot(x,y,data=d)+theme_bw()+
geom_smooth(method="glm",family=gaussian(link="inverse"),
formula=y~I(1/x),se=FALSE)
The confidence intervals for this model are somewhat crazy (because the confidence intervals for 1/y include zero, at which point the confidence intervals on y blow up), so be careful ...
Get the data and plot it:
x <- c(6,211,808,200)
y <- c(72,183,360,440)
plot(x,y,pch=19)
Define the function, get your coefficients
f <- function(x,a,b) {x/(a+b*x)}
fit <- nls(y ~ f(x,a,b), start=c(a=1,b=1))
co <- coef(fit)
# co will contain your coefficients for a and b
# a b
#0.070221853 0.002796513
And plot away:
curve(f(x, a=co["a"], b=co["b"]), add = TRUE, col="green", lwd=2)
Result:
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am searching for a function/package-name in R which allows one to separate two superimposed normal distributions. The distribution looks something like this:
x<-c(3.95, 3.99, 4.0, 4.04, 4.1, 10.9, 11.5, 11.9, 11.7, 12.3)
I had good results in the past using vector generalized linear models. The VGAM package is useful for that.
The mix2normal1 function allows to estimate the parameters of a mix of two univariate normal distributions.
Little example
require(VGAM)
set.seed(12345)
# Create a binormal distribution with means 10 and 20
data <- c(rnorm(100, 10, 1.5), rnorm(200, 20, 3))
# Initial parameters for minimization algorithm
# You may want to create some logic to estimate this a priori... not always easy but possible
# m, m2: Means - s, s2: SDs - w: relative weight of the first distribution (the second is 1-w)
init.params <- list(m=5, m2=8, s=1, s2=1, w=0.5)
fit <<- vglm(data ~ 1, mix2normal1(equalsd=FALSE),
iphi=init.params$w, imu=init.params$m, imu2=init.params$m2,
isd1=init.params$s, isd2=init.params$s2)
# Calculated parameters
pars = as.vector(coef(fit))
w = logit(pars[1], inverse=TRUE)
m1 = pars[2]
sd1 = exp(pars[3])
m2 = pars[4]
sd2 = exp(pars[5])
# Plot an histogram of the data
hist(data, 30, col="black", freq=F)
# Superimpose the fitted distribution
x <- seq(0, 30, 0.1)
points(x, w*dnorm(x, m1, sd1)+(1-w)*dnorm(x,m2,sd2), "l", col="red", lwd=2)
This correctly gives ("true" parameters - 10, 20, 1.5, 3)
> m1
[1] 10.49236
> m2
[1] 20.06296
> sd1
[1] 1.792519
> sd2
[1] 2.877999
You might want to use nls , the nonlinear regression tool (or other nonlin regressors). I'm guessing you have a vector of data representing the superimposed distributions. Then, roughly, nls(y~I(a*exp(-(x-meana)^2/siga) + b*exp(-(x-meanb)^2/sigb) ),{initial guess values required for all constants} ) , where y is your distribution and x is the domain .
I'm not thinking about this at all, so I'm not sure which convergence methods are less likely to fail.