I am trying to fit michaelis menten equation to a dataset to determine rate of disappearance as well as IC50 (Km) if data permits. I am getting good fit except the first point at concentration 0, however, I am getting negative value of Km, which is not correct. I am using the following code.
x <- c(0, 2.5, 5.0, 10.0, 25.0)
y <- c(4.91, 1.32, 1.18, 1.12, 1.09)
#########################Fit General Michaelis Menten Equation########################################################
model.mm <- nls(y ~ (Vmax*x/(Km+x)), data = data.frame(x,y),
start = list(Km=max(y)/2, Vmax = max(y)))
print(summary(model.mm))
#plot it
plot(y~x, type="p", lwd=2,
xlab="Lopinavir Concentrations (uM)", ylab="Efflux Ratio")
title("Lopinavir Transport in MDCK-MDR1 Cells")
lines(fitted(model.mm)~x, col="red")
Any suggestions for improving the fit and parameter estimates will be very appreciated.
thanks,
Krina
The Michealis Menten Hyperbola of xlab = Substrate Concentration and ylab = Velocity usually looks like a rising hyperbola to a maximum. The parameters which you are measuring look like an exponential decay as the substrate concentration increases. I'm not sure the Michealis Menten Equation works so well here. Also you shouldn't be using the lines function. It doesn't give you a curve. You should be using the curve function.
x <- c(0, 2.5, 5.0, 10.0, 25.0)
y <- c(4.91, 1.32, 1.18, 1.12, 1.09)
mm <- data.frame(x, y)
I think you should make the dataframe before you call it into a function.
model.mm <- nls(y ~ Vmax*x / (Km + x), data = mm, start = list(Km = max(mm$y)/2, Vmax = max(mm$y)))
plot(y ~ x, type = "p", lwd = 2, xlab = "Lopinavir Concentrations (uM)", ylab = "Efflux Ratio", pch = 16, main = "Lopinavir Transport in MDCK-MDR1 Cells")
summary(model.mm)
Formula: y ~ Vmax * x/(Km + x)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Km -0.4772 6.6246 -0.072 0.947
Vmax 1.0678 2.1382 0.499 0.652
Residual standard error: 2.835 on 3 degrees of freedom
Number of iterations to convergence: 4
Achieved convergence tolerance: 6.634e-06
Now I think the curve function is pretty self explanatory:
?curve
curve(x * 1.0678 / (x + -0.4772), col = "red", lwd = 2, add = TRUE)
fx <- function(x){x * 1.0678 / (x + -0.4772)}
range(x)
1 0 25
We can integrate this Michealis-Menten FUnction and calculate area under curve:
require(pracma)
integrate(fx, lower = 0, upper = 25)
Error in integrate(fx, lower = 0, upper = 25) :
the integral is probably divergent
This divergence is because you plot looks a lot like y = (1 / x) which is divergent.
If you push x a little bit away from zero, where y tends to infinity we can get a finite answer.
integrate(fx, lower = 0.5, upper = 25)
29.71809 with absolute error < 0.00069
but this integral is questionable for divergence reasons.
You can estimate the area under your scatterplot points using the trapezoidal approximation for integration:
trapz(mm$x, mm$y)
1 33.2375
Note: I tried fitting exponential functions to your plot but that doesn't work. The curve drops too fast missing most of the points.
I think I figured out what's wrong with your function.
Type: y = (x * v) / (x + K)
into https://www.desmos.com/calculator
and see what happens when you make K negative and K - positive, and when you make both K and v negative, etc.
Thank you for your help, really appreciated.
I was able to resolve this by using hill equation.
fo <- y ~ (Vmax*x^hill/((VC50^hill) + (x^hill)))
st <- c(Vmax=0.5, hill=1, VC50=0.3)
model.hill <- nls(fo, data = data.frame(x,y), start = st)
print(summary(model.hill))
co <- coef(model.hill)
plot(y~x, type="p", lwd=2,
xlab="Lopinavir Concentrations (uM)", ylab="Efflux Ratio")
title("Lopinavir Transport in MDCK-MDR1 Cells")
lines(fitted(model.hill)~x, col="red")
Related
I have data that follows a sigmoid curve and I would like fit a logistic function to extract the three (or two) parameters for each participant. I have found some methods online, but I'm not sure which is the correct option.
This tutorial explains that you should use the nls() function like this:
fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))
## get the coefficients using the coef function
params=coef(fitmodel)
... where you clearly need the starting values to find the best-fitting values (?).
And then this post explains that to get the starting values, you can use a "selfstarting model can estimate good starting values for you, so you don't have to specify them":
fit <- nls(y ~ SSlogis(x, Asym, xmid, scal), data = data.frame(x, y))
However somewhere else I also read that you should use the SSlogis function for fitting a logistic function. Please could someone confirm whether these two steps are the best way to go about it? Or should I use values that I have extracted from previous similar data for the starting values?
Additionally, what should I do if I don't want the logistic function to be defined by the asymptote at all?
Thank you!
There isn't a best way but SSlogis does eliminate having to set starting values whereas if you specify the formula you have more control over the parameterization.
If the question is really how to fix a at a predetermined level, here the value 1, without rewriting the formula then set a before running nls and omit it from the starting values.
a <- 1
fo <- y ~ a / (1 + exp(-b * (x-c)))
nls(fo, start = list(b = 0.5, c = 25))
Alternately this substitutes a=1 into formula fo giving fo2 without having to rewrite the formula yourself.
fo2 <- do.call("substitute", list(fo, list(a = 1)))
nls(fo2, start = list(b = 0.5, c = 25))
As #G. Grothendieck writes, there is no general "best way", it always depends on you particular aims. Use of SSLogis is a good idea, as you don't need to specify start values, but a definition of an own function is more flexible. See the following example, where we use heuristics to derive start values ourselves instead of specifying them manually. Then we fit a logistic model and as a small bonus, the Baranyi growth model with an explicit lag phase.
# time (t)
x <- c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
# Algae cell counts (Mio cells per ml)
y <- c(0.88, 1.02, 1.43, 2.79, 4.61, 7.12,
6.47, 8.16, 7.28, 5.67, 6.91)
## we now plot the data linearly and logarithmically
## the layout function is another way to subdivide the plotting area
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE), respect = TRUE)
layout.show(nf) # this shows how the plotting area is subdivided
plot(x, y)
plot(x, log(y))
## we see that the first points show the steepest increase,
## so we can estimate a start value of the growth rate
r <- (log(y[5]) - log(y[1])) / (x[5] - x[1])
abline(a=log(y[1]), b=r)
## this way, we have a heuristics for all start parameters:
## r: steepest increase of y in log scale
## K: maximum value
## N0: first value
## we can check this by plotting the function with the start values
f <- function(x, r, K, N0) {K /(1 + (K/N0 - 1) * exp(-r *x))}
plot(x, y, pch=16, xlab="time (days)", ylab="algae (Mio cells)")
lines(x, f(x, r=r, K=max(y), N0=y[1]), col="blue")
pstart <- c(r=r, K=max(y), N0=y[1])
aFit <- nls(y ~ f(x, r, K,N0), start = pstart, trace=TRUE)
x1 <- seq(0, 25, length = 100)
lines(x1, predict(aFit, data.frame(x = x1)), col = "red")
legend("topleft",
legend = c("data", "start parameters", "fitted parameters"),
col = c("black", "blue", "red"),
lty = c(0, 1, 1),
pch = c(16, NA, NA))
summary(aFit)
(Rsquared <- 1 - var(residuals(aFit))/var(y))
## =============================================================================
## Approach with Baranyi-Roberts model
## =============================================================================
## sometimes, a logistic is not good enough. In this case, use another growth
## model
baranyi <- function(x, r, K, N0, h0) {
A <- x + 1/r * log(exp(-r * x) + exp(-h0) - exp(-r * x - h0))
y <- exp(log(N0) + r * A - log(1 + (exp(r * A) - 1)/exp(log(K) - log(N0))))
y
}
pstart <- c(r=0.5, K=7, N0=1, h0=2)
fit2 <- nls(y ~ baranyi(x, r, K, N0, h0), start = pstart, trace=TRUE)
lines(x1, predict(fit2, data.frame(x = x1)), col = "forestgreen", lwd=2)
legend("topleft",
legend = c("data", "logistic model", "Baranyi-Roberts model"),
col = c("black", "red", "forestgreen"),
lty = c(0, 1, 1),
pch = c(16, NA, NA))
If a model is fitted using mgcv and then the smooth terms are plotted,
m <- gam(y ~ s(x))
plot(m, shade = TRUE)
then you get a plot of the curve with a confidence interval. These are, I presume, pointwise-confidence intervals. How are they computed?
I tried to write
object <- plot(m, shade = true)
object[[1]]$fit +- 2*object[[1]]$se
in order to extract the lower and upper bounds using the standard errors and a multiplier of 2, but when I plot it, it looks a bit different than the confidence intervals plotted by plot.gam?
So, how are those calculated?
I do not use seWithMean = true or anything like that.
It is 1 standard deviation.
oo <- plot.gam(m)
oo <- oo[[1]]
points(oo$x, oo$fit, pch = 20)
points(oo$x, oo$fit - oo$se, pch = 20)
Reproducible example:
x <- seq(0, 2 * pi, length = 100)
y <- x * sin(x) + rnorm(100, 0, 0.5)
m <- gam(y ~ s(x))
I have set of Temperature and Discomfort index value for each temperature data. When I plot a graph between temperature(x axis) and Calculated Discomfort index value( y axis) I get a reversed U-shape curve. I want to do non linear regression out of it and convert it into PMML model. My aim is to get the predicted discomfort value if I give certain temperature.
Please find the below dataset :
Temp <- c(0,5,10,6 ,9,13,15,16,20,21,24,26,29,30,32,34,36,38,40,43,44,45, 50,60)
Disc<-c(0.00,0.10,0.25,0.15,0.24,0.26,0.30,0.31,0.40,0.41,0.49,0.50,0.56, 0.80,0.90,1.00,1.00,1.00,0.80,0.50,0.40,0.20,0.15,0.00)
How to do non linear regression (possibly with nls??) for this dataset?
I did take a look at this, then I think it is not as simple as using nls as most of us first thought.
nls fits a parametric model, but from your data (the scatter plot), it is hard to propose a reasonable model assumption. I would suggest using non-parametric smoothing for this.
There are many scatter plot smoothing methods, like kernel smoothing ksmooth, smoothing spline smooth.spline and LOESS loess. I prefer to using smooth.spline, and here is what we can do with it:
fit <- smooth.spline(Temp, Disc)
Please read ?smooth.spline for what it takes and what it returns. We can check the fitted spline curve by
plot(Temp, Disc)
lines(fit, col = 2)
Should you want to make prediction elsewhere, use predict function (predict.smooth.spline). For example, if we want to predict Temp = 20 and Temp = 44, we can use
predict(fit, c(20,44))$y
# [1] 0.3940963 0.3752191
Prediction outside range(Temp) is not recommended, as it suffers from potential bad extrapolation effect.
Before I resort to non-parametric method, I also tried non-linear regression with regression splines and orthogonal polynomial basis, but they don't provide satisfying result. The major reason is that there is no penalty on the smoothness. As an example, I show some try with poly:
try1 <- lm(Disc ~ poly(Temp, degree = 3))
try2 <- lm(Disc ~ poly(Temp, degree = 4))
try3 <- lm(Disc ~ poly(Temp, degree = 5))
plot(Temp, Disc, ylim = c(-0.3,1.0))
x<- seq(min(Temp), max(Temp), length = 50)
newdat <- list(Temp = x)
lines(x, predict(try1, newdat), col = 2)
lines(x, predict(try2, newdat), col = 3)
lines(x, predict(try3, newdat), col = 4)
We can see that the fitted curve is artificial.
We can fit polynomials as follows, but it's going to overfit the data as we have higher degree:
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4, start=list(a=0, b=1, c=1, d=1, e=1))
plot(Temp,Disc,pch=19)
lines(Temp,predict(m),lty=2,col="red",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5, start=list(a=0, b=1, c=1, d=1, e=1, f=1))
lines(Temp,predict(m),lty=2,col="blue",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5 + g*Temp^6, start=list(a=0, b=1, c=1, d=1, e=1, f=1, g=1))
lines(Temp,predict(m),lty=2,col="green",lwd=3)
m.poly <- lm(Disc ~ poly(Temp, degree = 15))
lines(Temp,predict(m),lty=2,col="yellow",lwd=3)
legend(x = "topleft", legend = c("Deg 4", "Deg 5", "Deg 6", "Deg 20"),
col = c("red", "green", "blue", "yellow"),
lty = 2)
I try to compare the power functions of the Chi-square-Test and the t-Test for one particular value and my overall goal was to show that the t-Test is more powerful (because it has an assumption about the distribution). I used the pwr package for R for calculating the power of each function and then wrote two functions and plotted the results.
However, I do not find that the t-test is better than the Chi-square-test, and I am puzzled by the result. I spend hours on it so every help is so much appreciated.
Is the code wrong, do I have a wrong understanding of the power functions, or is there something wrong in the package?
library(pwr)
#mu is the value for which the power is calculated
#no is the number of observations
#function of the power of the t-test with a h0 of .2
g <- function(mu, alpha, no) { #calculate the power of a particular value for the t-test with h0=.2
p <- mu-.20
sigma <- sqrt(.5*(1-.5))
pwr.t.test(n = no, d = p/sigma, sig.level = alpha, type = "one.sample", alternative="greater")$power # d is the effect size p/sigma
}
#chi squared test
h <- function(mu, alpha, no, degree) {#calculate the power of a particular value for the chi squared test
p01 <- .2 # these constructs the effect size (which is a bit different for the chi squared)
p02 <- .8
p11 <-mu
p12 <- 1-p11
effect.size <- sqrt(((p01-p11)^2/p01)+((p02-p12)^2/p02)) # effect size
pwr.chisq.test(N=no, df=degree, sig.level = alpha, w=effect.size)$power
}
#create a diagram
plot(1, 1, type = "n",
xlab = expression(mu),
xlim = c(.00, .75),
ylim = c(0, 1.1),
ylab = expression(1-beta),
axes=T, main="Power function t-Test and Chi-squared-Test")
axis(side = 2, at = c(0.05), labels = c(expression(alpha)), las = 3)
axis(side = 1, at = 3, labels = expression(mu[0]))
abline(h = c(0.05, 1), lty = 2)
legend(.5,.5, # places a legend at the appropriate place
c("t-Test","Chi-square-Test"), # puts text in the legend
lwd=c(2.5,2.5),col=c("black","red"))
curve(h(x, alpha = 0.05, no = 100, degree=1), from = .00, to = .75, add = TRUE, col="red",lwd=c(2.5,2.5) )
curve(g(x, alpha = 0.05, no = 100), from = .00, to = .75, add = TRUE, lwd=c(2.5,2.5))
Thanks a lot in advance!
If I understand the problem right, you are testing for a Binomial distribution with the mean under the null equal to 0.2 and the alternative being null greater than 0.2? If so, then on line 2 of you function g, shouldn't it be sigma <- sqrt(.2*(1-.2)) instead of sigma <- sqrt(.5*(1-.5))? That way, your standard deviation will be smaller, resulting in a larger test statistic and hence smaller p-value leading to higher power.
I am a beginner in R and I have tried to find information about the following without finding anything.
The green graph in the picture is composed by the red and yellow graphs. But let's say that I only have the data points of something like the green graph. How do I extract the low/high frequencies (i.e. approximately the red/yellow graphs) using a low pass/high pass filter?
Update: The graph was generated with
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
Update 2: Using the Butterworth filter in the signal package suggested I get the following:
library(signal)
bf <- butter(2, 1/50, type="low")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)
bf <- butter(2, 1/25, type="high")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)
The calculations was a bit work, signal.pdf gave next to no hints about what values W should have, but the original octave documentation at least mentioned radians which got me going. The values in my original graph was not chosen with any specific frequency in mind, so I ended up with the following not so simple frequencies: f_low = 1/500 * 2 = 1/250, f_high = 1/500 * 2*10 = 1/25 and the sampling frequency f_s = 500/500 = 1. Then I chose a f_c somewhere inbetween the low and high frequencies for the low/high pass filters (1/100 and 1/50 respectively).
I bumped into similar problem recently and did not find the answers here particularly helpful. Here is an alternative approach.
Let´s start by defining the example data from the question:
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
y <- y + noise1
plot(x, y, type="l", ylim=range(-1.5*max_y,1.5*max_y,5), lwd = 5, col = "green")
So the green line is the dataset we want to low-pass and high-pass filter.
Side note: The line in this case could be expressed as a function by using cubic spline (spline(x,y, n = length(x))), but with real world data this would rarely be the case, so let's assume that it is not possible to express the dataset as a function.
The easiest way to smooth such data I have came across is to use loess or smooth.spline with appropriate span/spar. According to statisticians loess/smooth.spline is probably not the right approach here, as it does not really present a defined model of the data in that sense. An alternative is to use Generalized Additive Models (gam() function from package mgcv). My argument for using loess or smoothed spline here is that it is easier and does not make a difference as we are interested in the visible resulting pattern. Real world datasets are more complicated than in this example and finding a defined function for filtering several similar datasets might be difficult. If the visible fit is good, why to make it more complicated with R2 and p values? To me the application is visual for which loess/smoothed splines are appropriate methods. Both of the methods assume polynomial relationships with the difference that loess is more flexible also using higher degree polynomials, while cubic spline is always cubic (x^2). Which one to use depends on trends in a dataset. That said, the next step is to apply a low-pass filter on the dataset by using loess() or smooth.spline():
lowpass.spline <- smooth.spline(x,y, spar = 0.6) ## Control spar for amount of smoothing
lowpass.loess <- loess(y ~ x, data = data.frame(x = x, y = y), span = 0.3) ## control span to define the amount of smoothing
lines(predict(lowpass.spline, x), col = "red", lwd = 2)
lines(predict(lowpass.loess, x), col = "blue", lwd = 2)
Red line is the smoothed spline filter and blue the loess filter. As you see results differ slightly. I guess one argument of using GAM would be to find the best fit, if the trends really were this clear and consistent among datasets, but for this application both of these fits are good enough for me.
After finding a fitting low-pass filter, the high-pass filtering is as simple as subtracting the low-pass filtered values from y:
highpass <- y - predict(lowpass.loess, x)
lines(x, highpass, lwd = 2)
This answer comes late, but I hope it helps someone else struggling with similar problem.
Use filtfilt function instead of filter (package signal) to get rid of signal shift.
library(signal)
bf <- butter(2, 1/50, type="low")
b1 <- filtfilt(bf, y+noise1)
points(x, b1, col="red", pch=20)
One method is using the fast fourier transform implemented in R as fft. Here is an example of a high pass filter. From the plots above, the idea implemented in this example is to get the serie in yellow starting from the serie in green (your real data).
# I've changed the data a bit so it's easier to see in the plots
par(mfrow = c(1, 1))
number_of_cycles = 2
max_y = 40
N <- 256
x = 0:(N-1)
a = number_of_cycles * 2 * pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
### Apply the fft to the noisy data
y_noise = y + noise1
fft.y_noise = fft(y_noise)
# Plot the series and spectrum
par(mfrow = c(1, 2))
plot(x, y_noise, type='l', main='original serie', col='green4')
plot(Mod(fft.y_noise), type='l', main='Raw serie - fft spectrum')
### The following code removes the first spike in the spectrum
### This would be the high pass filter
inx_filter = 15
FDfilter = rep(1, N)
FDfilter[1:inx_filter] = 0
FDfilter[(N-inx_filter):N] = 0
fft.y_noise_filtered = FDfilter * fft.y_noise
par(mfrow = c(2, 1))
plot(x, noise1, type='l', main='original noise')
plot(x, y=Re( fft( fft.y_noise_filtered, inverse=TRUE) / N ) , type='l',
main = 'filtered noise')
Per request of OP:
The signal package contains all kinds of filters for signal processing. Most of it is comparable to / compatible with the signal processing functions in Matlab/Octave.
Check out this link where there's R code for filtering (medical signals). It's by Matt Shotwell and the site is full of interesting R/stats info with a medical bent:
biostattmat.com
The package fftfilt contains lots of filtering algorithms that should help too.
I also struggled to figure out how the W parameter in the butter function maps on to the filter cut-off, in part because the documentation for filter and filtfilt is incorrect as of posting (it suggests that W = .1 would result in a 10 Hz lp filter when combined with filtfilt when signal sampling rate Fs = 100, but actually, it's only a 5 Hz lp filter -- the half-amplitude cut-off is 5 Hz when use filtfilt, but the half-power cut-off is 5 Hz when you only apply the filter once, using the filter function). I'm posting some demo code I wrote below that helped me confirm how this is all working, and that you could use to check a filter is doing what you want.
#Example usage of butter, filter, and filtfilt functions
#adapted from https://rdrr.io/cran/signal/man/filtfilt.html
library(signal)
Fs <- 100; #sampling rate
bf <- butter(3, 0.1);
#when apply twice with filtfilt,
#results in a 0 phase shift
#5 Hz half-amplitude cut-off LP filter
#
#W * (Fs/2) == half-amplitude cut-off when combined with filtfilt
#
#when apply only one time, using the filter function (non-zero phase shift),
#W * (Fs/2) == half-power cut-off
t <- seq(0, .99, len = 100) # 1 second sample
#generate a 5 Hz sine wave
x <- sin(2*pi*t*5)
#filter it with filtfilt
y <- filtfilt(bf, x)
#filter it with filter
z <- filter(bf, x)
#plot original and filtered signals
plot(t, x, type='l')
lines(t, y, col="red")
lines(t,z,col="blue")
#estimate signal attenuation (proportional reduction in signal amplitude)
1 - mean(abs(range(y[t > .2 & t < .8]))) #~50% attenuation at 5 Hz using filtfilt
1 - mean(abs(range(z[t > .2 & t < .8]))) #~30% attenuation at 5 Hz using filter
#demonstration that half-amplitude cut-off is 6 Hz when apply filter only once
x6hz <- sin(2*pi*t*6)
z6hz <- filter(bf, x6hz)
1 - mean(abs(range(z6hz[t > .2 & t < .8]))) #~50% attenuation at 6 Hz using filter
#plot the filter attenuation profile (for when apply one time, as with "filter" function):
hf <- freqz(bf, Fs = Fs);
plot(c(0, 20, 20, 0, 0), c(0, 0, 1, 1, 0), type = "l",
xlab = "Frequency (Hz)", ylab = "Attenuation (abs)")
lines(hf$f[hf$f<=20], abs(hf$h)[hf$f<=20])
plot(c(0, 20, 20, 0, 0), c(0, 0, -50, -50, 0),
type = "l", xlab = "Frequency (Hz)", ylab = "Attenuation (dB)")
lines(hf$f[hf$f<=20], 20*log10(abs(hf$h))[hf$f<=20])
hf$f[which(abs(hf$h) - .5 < .001)[1]] #half-amplitude cutoff, around 6 Hz
hf$f[which(20*log10(abs(hf$h))+6 < .2)[1]] #half-amplitude cutoff, around 6 Hz
hf$f[which(20*log10(abs(hf$h))+3 < .2)[1]] #half-power cutoff, around 5 Hz
there is a package on CRAN named FastICA, this computes the approximation of the independent source signals, however in order to compute both signals you need a matrix of at least 2xn mixed observations (for this example), this algorithm can't determine the two indpendent signals with just 1xn vector. See the example below. hope this can help you.
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
######################################################
library(fastICA)
S <- cbind(y,noise1)#Assuming that "y" source1 and "noise1" is source2
A <- matrix(c(0.291, 0.6557, -0.5439, 0.5572), 2, 2) #This is a mixing matrix
X <- S %*% A
a <- fastICA(X, 2, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 200,
tol = 0.0001, verbose = TRUE)
par(mfcol = c(2, 3))
plot(S[,1 ], type = "l", main = "Original Signals",
xlab = "", ylab = "")
plot(S[,2 ], type = "l", xlab = "", ylab = "")
plot(X[,1 ], type = "l", main = "Mixed Signals",
xlab = "", ylab = "")
plot(X[,2 ], type = "l", xlab = "", ylab = "")
plot(a$S[,1 ], type = "l", main = "ICA source estimates",
xlab = "", ylab = "")
plot(a$S[, 2], type = "l", xlab = "", ylab = "")
I am not sure if any filter is the best way for You. More useful instrument for that aim is the fast Fourier transformation.