How to make unequal sine wave? - r

I have „newbie” question related to basic math. I am trying to make sine wave with “unequal radians” (at least I believe this is what I am trying to do). In other words: I need function that for first couple periods (x) is “faster” and gradually slows down (“cycles” are wider/longer) as x approaches infinity. Here is code and sketch of what I am trying to do:
x <- seq(1, 30, by=0.1) # my x
z <- ifelse(x <= 10, 3, ifelse(x <= 20, 2, 1)) # discrete value to modify x
y <- sin(z*x) # my y(x)
plot(y, type="l") # plot y(x)
and sketch (result of plot):
Ignore the “double peak” and other distortions, they are result of fact that z is discrete variable. I would like to make z continuous and make each cycle to widen smoothly. What mathematical function should I use here? I tried damped sine wave but this is not quite what I am going for.

Direct transcription from the Wikipedia page linked by #keziah: here's a function
chirp <- function(t,phi0=0,f0=1,k=1) sin(phi0 + 2*pi*(f0*t+k*t^2/2))
phi0 is the initial phase
k is the rate of frequency change or chirpyness.
f0 is the initial frequency
par(las=1,bty="l",mfrow=c(1,2))
curve(chirp(x),from=0,to=5,n=501)
curve(chirp(x,k=-1,f0=5),from=0,to=5,n=501)

This isn't really a full answer, as I can't give you code off the top of my head, but what you're looking for here is a chirp. There are a few different types of chirp, depending on the rate of change of phase that you want, but I'm guessing you probably want a linear chirp Wikipedia
R may well have a function/module that can provide this already.

Related

predict x value with given y using loess

I have a dataset from a biological experiment:
x = c(0.488, 0.977, 1.953, 3.906, 7.812, 15.625, 31.250, 62.500, 125.000, 250.000, 500.000, 1000.000)
y = c(0.933, 1.036, 1.112, 1.627, 2.646, 5.366, 11.115, 2.355, 1.266, 0, 0, 0)
plot(log(x),y)
x represents a concentration and y represents the response in our assay.
The plot can be found here: 1
How can I predict the x-value (concentration) of a pre-defined y-value (in my case 1.5)?
After a loess smoothing I can predict the y-value at a defined x-value. See the example:
smooth_data <- loess(y~log(x))
predict(smooth_data, 1.07) # which gives 1.5
Using the predict function, both x = 1.07 and x = 5.185 result in y = 1.5
Is there a convenient way to get the estimates from the loess regression at y = 1.5 without manually typing some x values into the predict function?
Any suggestions?
I gues your x and y's are pairs? so for f(0.488) = 0.933 and so on?
More of a mathproblem in my opinion :).
If you could define a function that describes your graph it would be pretty easy.
You could also draw a straight line between all points and for every line that intersects with your y value you could get corrosponding x values. But straight lines wouldn't be really precies.
If you have enough pairs you could also train a neureal network. That might get you the best results but takes some time and alot of pairs to train well.
Could you clarify your question a bit and tell us what you are looking for? A way to do it or a code example?
I hope this is helping you atleast a little bit :)
Since your function is not monotonic, there is no true inverse, but if you split it into two functions - one for x < maximum and one for x > maximum - you can just create two inverse functions and solve for whatever values of y you want.
smooth_data <- loess(y~log(x))
X = seq(0,6.9,0.1)
P = predict(smooth_data, X)
M = which.max(P)
Inverse1 = approxfun(X[1:M] ~ P[1:M])
Inverse2 = approxfun(X[M:length(X)] ~ P[M:length(X)])
Inverse1(1.5)
[1] 1.068267
predict(smooth_data, 1.068267)
[1] 1.498854
Inverse2(1.5)
[1] 5.185876
predict(smooth_data, 5.185876)
[1] 1.499585

Plot decision boundary from weight vector

How do I plot decision boundary from weight vector?
My original data is 2-dimensional but non-linearly separable so I used a polynomial transformation of order 2 and therefore I ended up with a 6-dimensional weight vector.
Here's the code I used to generate my data:
polar2cart <- function(theta,R,x,y){
x = x+cos(theta) * R
y = y+sin(theta) * R
c=matrix(x,ncol=1000)
c=rbind(c,y)
}
cart2polar <- function(x, y)
{
r <- sqrt(x^2 + y^2)
t <- atan(y/x)
c(r,t)
}
R=5
eps=5
sep=-5
c1<-polar2cart(pi*runif(1000,0,1),runif(1000,0,eps)+R,0,0)
c2<-polar2cart(-pi*runif(1000,0,1),runif(1000,0,eps)+R,R+eps/2,-sep)
data <- data.frame("x" = append(c1[1,], c2[1,]), "y" = append(c1[2,], c2[2,]))
labels <- append(rep(1,1000), rep(-1, 1000))
and here's how it is displayed (using ggplot2):
Thank you in advance.
EDIT: I'm sorry if I didn't provide enough information about the weight vector. The algorithm I'm using is pocket which is a variation of perceptron, which means that the output weight vector is the perpendicular vector that determines the hyper-plane in the feature space plus the bias . Therefore, the hyper-plane equation is , where are the variables. Now, since I used a polynomial transformation of order 2 to go from a 2-dimensional space to a 5-dimensional space, my variables are : and thus the equation for my decision boundary is:
So basically, my question is how do I go about drawing my decision boundary given
PS: I've found a solution while waiting, it might not be the best approach but, it gives the expected results. I'll share it as soon as I finish my project if anyone is interested. Meanwhile, I'd love to hear a better alternative.

Local linear regression in R -- locfit() vs locpoly()

I am trying to understand the different behaviors of these two smoothing functions when given apparently equivalent inputs. My understanding was that locpoly just takes a fixed bandwidth argument, while locfit can also include a varying part in its smoothing parameter (a nearest-neighbors fraction, "nn"). I thought setting this varying part to zero in locfit should make the "h" component act like the fixed bandwidth used in locpoly, but this is evidently not the case.
A working example:
library(KernSmooth)
library(locfit)
set.seed(314)
n <- 100
x <- runif(n, 0, 1)
eps <- rnorm(n, 0, 1)
y <- sin(2 * pi * x) + eps
plot(x, y)
lines(locpoly(x, y, bandwidth=0.05, degree=1), col=3)
lines(locfit(y ~ lp(x, nn=0, h=0.05, deg=1)), col=4)
Produces this plot:
locpoly gives the smooth green line, and locfit gives the wiggly blue line. Clearly, locfit has a smaller "effective" bandwidth here, even though the supposed bandwidth parameter has the same value for each.
What are these functions doing differently?
The two parameters both represent smoothing, but they do so in two different ways.
locpoly's bandwidth parameter is relative to the scale of the x-axis here. For example, if you changed the line x <- runif(n, 0, 1) to x <- runif(n, 0, 10), you will see that the green locpoly line becomes much more squiggly despite the fact that you still have the same number of points (100).
locfit's smoothing parameter, h, is independent of the scale, and instead is based on a proportion of the data. The value 0.05 means 5% of the data that is closest to that position is used to fit the curve. So changing the scale would not alter the line.
This also explains the observation made in the comment that changing the value of h to 0.1 makes the two look nearly identical. This makes sense, because we can expect that a bandwidth of 0.05 will contain about 10% of the data if we have 100 points distributed uniformly from 0 to 1.
My sources include the documentation for the locfit package and the documentation for the locpoly function.
I changed your code a bit so we can see more clearly what the actual window widths are:
library(KernSmooth)
library(locfit)
x <- seq(.1, .9, length.out = 80)
y <- rep(0:1, each = 40)
plot(x, y)
lines(locpoly(x, y, bandwidth=0.1, degree=1), col=3)
lines(locfit(y ~ lp(x, nn=0, h=0.1, deg=1)), col=4)
The argument h from locfit appears to be a half-window width. locpoly's bandwidth is clearly doing something else.
KernSmooth's documentation is very ambiguous, but judging from the source code (here and here), it looks like the bandwidth is the standard deviation of a normal density function. Hopefully this is explained in the Kernel Smoothing book they cite.

Sample regression, x = months, huge bandwidths

I have two vectors, x and y.
x is a vector where each entry represents a month for a period of several years, so I have (let's say) 10 years of data, then length(x) = 120 and so on.
(I have used the "posix.ct" command so they really are "months" in that sense, but couldn't I just have x as a numerical vector like c(1:n) or something, since I already know which month and which year a certain element of c(1:n) corresponds to? i.e if x = c(1:n), I know that x[13] is february of the second year and so on..)
y is a vector where each elements is an observation of a particular variable at a certain month.
So the observed data is grouped like this (january,0.123), (february,2.125) and so on.
I have two vectors for the months;
x1 = seq(as.POSIXct("YYYY-MM-DD", tz="GMT"),
as.POSIXct("YYYY-MM-DD", tz="GMT"),
by="month")
x2 = c(1:length(x1))
What I want to do is to run ksmooth:
plot(x1,y)
smooth = ksmooth(x2,y,"normal")
lines(smooth)
The reason that I use x1 in the plot() command is that I don't know how to otherwise get the x-axis in time.
R should automatically find a decent smoothing parameter when I haven't specified anything. The result is that ksmooth$y is equal to the input vector y! Also, a vertical bar is produced in the plot. If I replace x2 by x1 in the code above, ksmooth$y is NA for all values except for the first and last, which equal those of the input y.
So i try some bandwidths:
h = 0.1: now smooth$y = y, as before. A vertical bar is produced (it is the same color as I specified in the lines() command, so it must have to do with the ksmooth command.)
h = 10: get some non-strange results for smooth$y, however, a vertical bar is produced as before.
Then, I tried the crazy idea of very large bandwidths;
h = 1e+06: This produced nothing when I used x1 and x2 as in the code above. When I changed x2 to x1 however, I get some good results. For h = 1e+09 (that's huge!!) I get a very nice result. (I get a curve that fits the data and looks nice)
But h = 1e+09, is that reasonable? in all the examples I have looked h is something betweeen 0.1 and 10, give or take. heard something about a rule of thumb: h should equal n^(-1/5) where n is the number of data points.
I think the one thing that you are missing is that R doesn't find a decent smoothing parameter when you haven't specified anything, it just uses a bandwidth of 0.5, which is totally useless in your case.
The other thing you might be missing is that in ksmooth the bandwidth parameter is in terms of x. When ksmooth takes an x value of Date, it converts it to a numeric, which is the number of seconds. Therefore, your bandwidth will be measured in seconds, an undesirable result. When ksmooth takes an x value of months, it will default to a bandwidth of 0.5 months, also undesirable.
What you want to do is specify a reasonable bandwidth for the x that you are using. Here is an example:
x1 = seq(as.POSIXct("2000-01-01", tz="GMT"),
as.POSIXct("2010-12-31", tz="GMT"),
by="month")
x2 = c(1:length(x1))
set.seed(1)
y = runif(length(x1))
plot(x1,y,type='l')
smooth = ksmooth(x2,y,"normal")
lines(x1,smooth$y,col='blue',lwd=2)
lines(x1,ksmooth(x2,y,'normal',bandwidth=2)$y,col='red',lwd=2)
lines(x1,ksmooth(x2,y,'normal',bandwidth=10)$y,col='green',lwd=2)
lines(x1,ksmooth(x2,y,'normal',bandwidth=20)$y,col='orange',lwd=2)

Detecting dips in a 2D plot

I need to automatically detect dips in a 2D plot, like the regions marked with red circles in the figure below. I'm only interested in the "main" dips, meaning the dips have to span a minimum length in the x axis. The number of dips is unknown, i.e., different plots will contain different numbers of dips. Any ideas?
Update:
As requested, here's the sample data, together with an attempt to smooth it using median filtering, as suggested by vines.
Looks like I need now a robust way to approximate the derivative at each point that would ignore the little blips that remain in the data. Is there any standard approach?
y <- c(0.9943,0.9917,0.9879,0.9831,0.9553,0.9316,0.9208,0.9119,0.8857,0.7951,0.7605,0.8074,0.7342,0.6374,0.6035,0.5331,0.4781,0.4825,0.4825,0.4879,0.5374,0.4600,0.3668,0.3456,0.4282,0.3578,0.3630,0.3399,0.3578,0.4116,0.3762,0.3668,0.4420,0.4749,0.4556,0.4458,0.5084,0.5043,0.5043,0.5331,0.4781,0.5623,0.6604,0.5900,0.5084,0.5802,0.5802,0.6174,0.6124,0.6374,0.6827,0.6906,0.7034,0.7418,0.7817,0.8311,0.8001,0.7912,0.7912,0.7540,0.7951,0.7817,0.7644,0.7912,0.8311,0.8311,0.7912,0.7688,0.7418,0.7232,0.7147,0.6906,0.6715,0.6681,0.6374,0.6516,0.6650,0.6604,0.6124,0.6334,0.6374,0.5514,0.5514,0.5412,0.5514,0.5374,0.5473,0.4825,0.5084,0.5126,0.5229,0.5126,0.5043,0.4379,0.4781,0.4600,0.4781,0.3806,0.4078,0.3096,0.3263,0.3399,0.3184,0.2820,0.2167,0.2122,0.2080,0.2558,0.2255,0.1921,0.1766,0.1732,0.1205,0.1732,0.0723,0.0701,0.0405,0.0643,0.0771,0.1018,0.0587,0.0884,0.0884,0.1240,0.1088,0.0554,0.0607,0.0441,0.0387,0.0490,0.0478,0.0231,0.0414,0.0297,0.0701,0.0502,0.0567,0.0405,0.0363,0.0464,0.0701,0.0832,0.0991,0.1322,0.1998,0.3146,0.3146,0.3184,0.3578,0.3311,0.3184,0.4203,0.3578,0.3578,0.3578,0.4282,0.5084,0.5802,0.5667,0.5473,0.5514,0.5331,0.4749,0.4037,0.4116,0.4203,0.3184,0.4037,0.4037,0.4282,0.4513,0.4749,0.4116,0.4825,0.4918,0.4879,0.4918,0.4825,0.4245,0.4333,0.4651,0.4879,0.5412,0.5802,0.5126,0.4458,0.5374,0.4600,0.4600,0.4600,0.4600,0.3992,0.4879,0.4282,0.4333,0.3668,0.3005,0.3096,0.3847,0.3939,0.3630,0.3359,0.2292,0.2292,0.2748,0.3399,0.2963,0.2963,0.2385,0.2531,0.1805,0.2531,0.2786,0.3456,0.3399,0.3491,0.4037,0.3885,0.3806,0.2748,0.2700,0.2657,0.2963,0.2865,0.2167,0.2080,0.1844,0.2041,0.1602,0.1416,0.2041,0.1958,0.1018,0.0744,0.0677,0.0909,0.0789,0.0723,0.0660,0.1322,0.1532,0.1060,0.1018,0.1060,0.1150,0.0789,0.1266,0.0965,0.1732,0.1766,0.1766,0.1805,0.2820,0.3096,0.2602,0.2080,0.2333,0.2385,0.2385,0.2432,0.1602,0.2122,0.2385,0.2333,0.2558,0.2432,0.2292,0.2209,0.2483,0.2531,0.2432,0.2432,0.2432,0.2432,0.3053,0.3630,0.3578,0.3630,0.3668,0.3263,0.3992,0.4037,0.4556,0.4703,0.5173,0.6219,0.6412,0.7275,0.6984,0.6756,0.7079,0.7192,0.7342,0.7458,0.7501,0.7540,0.7605,0.7605,0.7342,0.7912,0.7951,0.8036,0.8074,0.8074,0.8118,0.7951,0.8118,0.8242,0.8488,0.8650,0.8488,0.8311,0.8424,0.7912,0.7951,0.8001,0.8001,0.7458,0.7192,0.6984,0.6412,0.6516,0.5900,0.5802,0.5802,0.5762,0.5623,0.5374,0.4556,0.4556,0.4333,0.3762,0.3456,0.4037,0.3311,0.3263,0.3311,0.3717,0.3762,0.3717,0.3668,0.3491,0.4203,0.4037,0.4149,0.4037,0.3992,0.4078,0.4651,0.4967,0.5229,0.5802,0.5802,0.5846,0.6293,0.6412,0.6374,0.6604,0.7317,0.7034,0.7573,0.7573,0.7573,0.7772,0.7605,0.8036,0.7951,0.7817,0.7869,0.7724,0.7869,0.7869,0.7951,0.7644,0.7912,0.7275,0.7342,0.7275,0.6984,0.7342,0.7605,0.7418,0.7418,0.7275,0.7573,0.7724,0.8118,0.8521,0.8823,0.8984,0.9119,0.9316,0.9512)
yy <- runmed(y, 41)
plot(y, type="l", ylim=c(0,1), ylab="", xlab="", lwd=0.5)
points(yy, col="blue", type="l", lwd=2)
EDITED : function strips the regions to contain nothing but the lowest part, if wanted.
Actually, Using the mean is easier than using the median. This allows you to find regions where the real values are continuously below the mean. The median is not smooth enough for an easy application.
One example function to do this would be :
FindLowRegion <- function(x,n=length(x)/4,tol=length(x)/20,p=0.5){
nx <- length(x)
n <- 2*(n %/% 2) + 1
# smooth out based on means
sx <- rowMeans(embed(c(rep(NA,n/2),x,rep(NA,n/2)),n),na.rm=T)
# find which series are far from the mean
rlesx <- rle((sx-x)>0)
# construct start and end of regions
int <- embed(cumsum(c(1,rlesx$lengths)),2)
# which regions fulfill requirements
id <- rlesx$value & rlesx$length > tol
# Cut regions to be in general smaller than median
regions <-
apply(int[id,],1,function(i){
i <- min(i):max(i)
tmp <- x[i]
id <- which(tmp < quantile(tmp,p))
id <- min(id):max(id)
i[id]
})
# return
unlist(regions)
}
where
n determines how much values are used to calculate the running mean,
tol determines how many consecutive values should be lower than the running mean to talk about a low region, and
p determines the cutoff used (as a quantile) for stripping the regions to their lowest part. When p=1, the complete lower region is shown.
Function is tweaked to work on data as you presented, but the numbers might need to be adjusted a bit to work with other data.
This function returns a set of indices, which allows you to find the low regions. Illustrated with your y vector :
Lows <- FindLowRegion(y)
newx <- seq_along(y)
newy <- ifelse(newx %in% Lows,y,NA)
plot(y, col="blue", type="l", lwd=2)
lines(newx,newy,col="red",lwd="3")
Gives :
You have to smooth the graph in some way. Median filtration is quite useful for that purpose (see http://en.wikipedia.org/wiki/Median_filter). After smoothing, you will simply have to search for the minima, just as usual (i.e. search for the points where the 1st derivative switches from negative to positive).
A simpler answer (which also does not require smoothing) could be provided by adapting the maxdrawdown() function from the tseries. A drawdown is commonly defined as the retreat from the most-recent maximum; here we want the opposite. Such a function could then be used in a sliding window over the data, or over segmented data.
maxdrawdown <- function(x) {
if(NCOL(x) > 1)
stop("x is not a vector or univariate time series")
if(any(is.na(x)))
stop("NAs in x")
cmaxx <- cummax(x)-x
mdd <- max(cmaxx)
to <- which(mdd == cmaxx)
from <- double(NROW(to))
for (i in 1:NROW(to))
from[i] <- max(which(cmaxx[1:to[i]] == 0))
return(list(maxdrawdown = mdd, from = from, to = to))
}
So instead of using cummax(), one would have to switch to cummin() etc.
My first thought was something much cruder than filtering. Why not look for the big drops followed by long enough stable periods?
span.b <- 20
threshold.b <- 0.2
dy.b <- c(rep(NA, span.b), diff(y, lag = span.b))
span.f <- 10
threshold.f <- 0.05
dy.f <- c(diff(y, lag = span.f), rep(NA, span.f))
down <- which(dy.b < -1 * threshold.b & abs(dy.f) < threshold.f)
abline(v = down)
The plot shows that it's not perfect, but it doesn't discard the outliers (I guess it depends on your take on the data).

Resources