Local linear regression in R -- locfit() vs locpoly() - r

I am trying to understand the different behaviors of these two smoothing functions when given apparently equivalent inputs. My understanding was that locpoly just takes a fixed bandwidth argument, while locfit can also include a varying part in its smoothing parameter (a nearest-neighbors fraction, "nn"). I thought setting this varying part to zero in locfit should make the "h" component act like the fixed bandwidth used in locpoly, but this is evidently not the case.
A working example:
library(KernSmooth)
library(locfit)
set.seed(314)
n <- 100
x <- runif(n, 0, 1)
eps <- rnorm(n, 0, 1)
y <- sin(2 * pi * x) + eps
plot(x, y)
lines(locpoly(x, y, bandwidth=0.05, degree=1), col=3)
lines(locfit(y ~ lp(x, nn=0, h=0.05, deg=1)), col=4)
Produces this plot:
locpoly gives the smooth green line, and locfit gives the wiggly blue line. Clearly, locfit has a smaller "effective" bandwidth here, even though the supposed bandwidth parameter has the same value for each.
What are these functions doing differently?

The two parameters both represent smoothing, but they do so in two different ways.
locpoly's bandwidth parameter is relative to the scale of the x-axis here. For example, if you changed the line x <- runif(n, 0, 1) to x <- runif(n, 0, 10), you will see that the green locpoly line becomes much more squiggly despite the fact that you still have the same number of points (100).
locfit's smoothing parameter, h, is independent of the scale, and instead is based on a proportion of the data. The value 0.05 means 5% of the data that is closest to that position is used to fit the curve. So changing the scale would not alter the line.
This also explains the observation made in the comment that changing the value of h to 0.1 makes the two look nearly identical. This makes sense, because we can expect that a bandwidth of 0.05 will contain about 10% of the data if we have 100 points distributed uniformly from 0 to 1.
My sources include the documentation for the locfit package and the documentation for the locpoly function.

I changed your code a bit so we can see more clearly what the actual window widths are:
library(KernSmooth)
library(locfit)
x <- seq(.1, .9, length.out = 80)
y <- rep(0:1, each = 40)
plot(x, y)
lines(locpoly(x, y, bandwidth=0.1, degree=1), col=3)
lines(locfit(y ~ lp(x, nn=0, h=0.1, deg=1)), col=4)
The argument h from locfit appears to be a half-window width. locpoly's bandwidth is clearly doing something else.
KernSmooth's documentation is very ambiguous, but judging from the source code (here and here), it looks like the bandwidth is the standard deviation of a normal density function. Hopefully this is explained in the Kernel Smoothing book they cite.

Related

How to make unequal sine wave?

I have „newbie” question related to basic math. I am trying to make sine wave with “unequal radians” (at least I believe this is what I am trying to do). In other words: I need function that for first couple periods (x) is “faster” and gradually slows down (“cycles” are wider/longer) as x approaches infinity. Here is code and sketch of what I am trying to do:
x <- seq(1, 30, by=0.1) # my x
z <- ifelse(x <= 10, 3, ifelse(x <= 20, 2, 1)) # discrete value to modify x
y <- sin(z*x) # my y(x)
plot(y, type="l") # plot y(x)
and sketch (result of plot):
Ignore the “double peak” and other distortions, they are result of fact that z is discrete variable. I would like to make z continuous and make each cycle to widen smoothly. What mathematical function should I use here? I tried damped sine wave but this is not quite what I am going for.
Direct transcription from the Wikipedia page linked by #keziah: here's a function
chirp <- function(t,phi0=0,f0=1,k=1) sin(phi0 + 2*pi*(f0*t+k*t^2/2))
phi0 is the initial phase
k is the rate of frequency change or chirpyness.
f0 is the initial frequency
par(las=1,bty="l",mfrow=c(1,2))
curve(chirp(x),from=0,to=5,n=501)
curve(chirp(x,k=-1,f0=5),from=0,to=5,n=501)
This isn't really a full answer, as I can't give you code off the top of my head, but what you're looking for here is a chirp. There are a few different types of chirp, depending on the rate of change of phase that you want, but I'm guessing you probably want a linear chirp Wikipedia
R may well have a function/module that can provide this already.

How to fix a straight line plot of a logistic map in R?

The logistic map (a map is a function that takes its value at any time step to its value at the next time step) is a model that has its roots in the prediction of animal population sizes. It has become famous, in part, due to special cases of its parameterization that exhibit surprising chaotic behavior. The logistic map equation is
xi+1 = rxi(1 - xi)
where xi ∈ [0,1] is the value ratio of current population size to maximum possible size at time i, xi+1 is the ratio at the next generation and r is the driving rate, representing animal reproduction and death. For r < 3.5 the population eventually reaches a stable size or will oscillate between a set of fixed values. However, if r > 3.5 then the system destabilizes and exhibits chaotic behavior!
That is background or context for the following problem statement:
Generate a set of points S = {r, x} where, for each r ∈ [1.0, 4.1] by increments of 0.001025 there will be a sequence of xi values for i = 0,...,16. So, for each r value there will be 17 xi values. Use x0 = 0.01. Depending on your implementation, you may find the rbind function useful. It may take a few seconds for the code to run since it will generate a lot of points in S. No more than 10 lines of R code.
Admittedly, this is a lab assignment; however, I am not a student in the class. I am learning R, and I am trying to work through the online assignments and come up with a solution myself. I have tried to create the set of points to plot, and based on manual verification of a few points, the set looks accurate.
for(j in c(0:3024)) {
rm(x)
x <- 1:17
x[1] <- 0.01
r <- 1 + (j * 0.001025)
for(i in c(1:(17-1))) {
x[i+1] <- r *x[i] * (1 - x[i])
}
if (j==0) {
binded <- cbind(r,x)
} else {
binded <- rbind(binded, cbind(r,x))
}
}
When I invoke plot(binded, pch='.') RStudio displays the result as a straight line. So I am unsure if I am using plot correctly, or even if I am generating all the points correctly. If I decrease the maximum value of j to something less than 2000, you will see a plot; it is just when the j value iterates up to 3024 that you only plot a straight line.
I believe your code is correct, what happens is when time exceeds 4, the of iterations are widely unstable and are going to -infinity. This large variation in the y value is compressing the scale and making the plot look like a flat line.
Cutting off the tail end of the matrix makes a very interesting plot:
plot(binded[-which(binded[,2]<0),], pch=".")
If you do want to plot the entire matrix, consider manually setting your y-axis limits to [0,1]. This way, the plot won't be stretched down to -1e24.
As an added bonus, here's a version in a different plotting library that has points colored by i.

Find the ranges of minimal and maximal gradients on sigmoidal curve in R

First, this is my first Stack Overflow question so I apologize for violating and decorum. Second, I realize this will be very trivial but I'm stumped. I'm trying to figure out how to find the minimum and maximum gradients on a sigmoidal curve.
I have a function that generates a vector of y values that form a sigmoidal curve:
#function to generate Sigmoid curves - works better with enough Xs to be smooth
genSigmoid = function(a, b, c, theta){
y = c + ((1-c) / (1 + exp(-a*(theta-b))))
return(y)
}
x<-c(1:100)
y<-genSigmoid(.25, .50, 0, x)
plot(x, y, type="n")
lines(x, y)
What I would like to do is find the points along this curve where the gradient is the smallest or zero and the points where the gradient is largest. My ultimate goal is to plot the different sections of this curve with different lines styles according the strength of the gradient along the curve. I can generate these different styles by 'eye-balling' it but it would be nice to have something that can do this more precisely.
You could do this using the grad(...) function in package numDeriv.
genSigmoid = function(theta,pars){
y <- with(pars,c + ((1-c) / (1 + exp(-a*(theta-b)))))
return(y)
}
x<-c(1:100)
pars <- list(a=0.25, b=0.50, c=0.0)
y<-genSigmoid(x, pars)
plot(x, y, type="l", ylim=c(0,1), col="blue")
library(numDeriv)
z<-grad(genSigmoid,x,pars=pars)
lines(x,z,col="red")
Here z is a vector of the derivative of genSigmoid(...) with respect to theta.
I redefined your function a bit to make the calling sequence simpler (combined the parameters into a named list, and reversed the order of the arguments).
Plotting segments of the curve with different line styles is a bit trickier:
lt <- as.integer(3*(z-min(z))/diff(range(z))+1)
df <- data.frame(x,y,z,lt)
plot(x,y,type="n")
lapply(split(df,df$lt),function(df)with(df,lines(x,y,lty=lt)))
So this creates a vector of line types (1,2,3, or 4) based on the value of the derivative, then splits the data based on line type, and plots the segments.

kde2d density comparison

I have a question about the kde2d (Kernel density estimator). I am computing two different kde2d for two different sets of data in the same space of variables. When I compare both with a filled.contour2 or contours I see that the set with lower density of points in a scatter plot(Also has less points in the total with a factor of 10) has an higher density for the contours values. I was expecting that the set with higher point density will have higher density contours values, but like I said above is not the case. It has to be with the choice of bandwidth (h)? I am using equals h, and i tried to change but the result did not changed a lot. What is my error?
An example
a <- runif(1000, 5.0, 7.5)
b <- runif(1000, 2.0, 3.0)
c <- runif(100000,5.0, 7.5)
d <- runif(100000, 2.0, 3.0)
library(MASS)
abdens <- kde2d(a,b,n=100,h=0.5)
cddens <- kde2d(c,d,n=100,h=0.5)
mylevels <- seq(-2.4,30,0.9)
filled.contour2(abdens,xlab="a",ylab="b",xlim=c(5,7.5),ylim=c(2,3),
col=hsv(seq(0,1,length=length(mylevels))))
plot(a,b)
contour(abdens,nlevels=5,add=T,col="blue")
plot(c,d)
contour(cddens,nlevels=5,add=T,col="orange")
I'm not sure I agree that the densities should be different in the uniform case. I would have expected a set with a higher number of randomly drawn points from a Normal distribution to have more support for extreme regions and therefore to have lower (estimated) density in the center. That effect might be also be occasionally apparent with bibariate Uniform draws with 1,000 points versus 100,000. I hope my modifications to your code are acceptable. It's easier to see the contours if done after the plots.
(The theoretic density would be the same in both these cases since the density distribution is normalized to an integral of 1.0. We are only looking at estimates with some expected artifacts from "edge" effects. In the case of univariate densities adding the information about bounds can be done with the desity functions in package::logspline.)

Looks like a simple graphing problem

At present I have a control to which I need to add the facility to apply various acuteness (or sensitivity). The problem is best illustrated as an image:
Graph http://img87.imageshack.us/img87/7886/control.png
As you can see, I have X and Y axess that both have arbitrary limits of 100 - that should suffice for this explanation. At present, my control is the red line (linear behaviour), but I would like to add the ability for the other 3 curves (or more) i.e. if a control is more sensitive then a setting will ignore the linear setting and go for one of the three lines. The starting point will always be 0, and the end point will always be 100.
I know that an exponential is too steep, but can't seem to figure a way forward. Any suggestions please?
The curves you have illustrated look a lot like gamma correction curves. The idea there is that the minimum and maximum of the range stays the same as the input, but the middle is bent like you have in your graphs (which I might note is not the circular arc which you would get from the cosine implementation).
Graphically, it looks like this:
(source: wikimedia.org)
So, with that as the inspiration, here's the math...
If your x values ranged from 0 to 1, the function is rather simple:
y = f(x, gamma) = x ^ gamma
Add an xmax value for scaling (i.e. x = 0 to 100), and the function becomes:
y = f(x, gamma) = ((x / xmax) ^ gamma) * xmax
or alternatively:
y = f(x, gamma) = (x ^ gamma) / (xmax ^ (gamma - 1))
You can take this a step further if you want to add a non-zero xmin.
When gamma is 1, the line is always perfectly linear (y = x). If x is less than 1, your curve bends upward. If x is greater than 1, your curve bends downward. The reciprocal value of gamma will convert the value back to the original (x = f(y, 1/g) = f(f(x, g), 1/g).
Just adjust the value of gamma according to your own taste and application needs. Since you're wanting to give the user multiple options for "sensitivity enhancement", you may want to give your users choices on a linear scale, say ranging from -4 (least sensitive) to 0 (no change) to 4 (most sensitive), and scale your internal gamma values with a power function. In other words, give the user choices of (-4, -3, -2, -1, 0, 1, 2, 3, 4), but translate that to gamma values of (5.06, 3.38, 2.25, 1.50, 1.00, 0.67, 0.44, 0.30, 0.20).
Coding that in C# might look something like this:
public class SensitivityAdjuster {
public SensitivityAdjuster() { }
public SensitivityAdjuster(int level) {
SetSensitivityLevel(level);
}
private double _Gamma = 1.0;
public void SetSensitivityLevel(int level) {
_Gamma = Math.Pow(1.5, level);
}
public double Adjust(double x) {
return (Math.Pow((x / 100), _Gamma) * 100);
}
}
To use it, create a new SensitivityAdjuster, set the sensitivity level according to user preferences (either using the constructor or the method, and -4 to 4 would probably be reasonable level values) and call Adjust(x) to get the adjusted output value. If you wanted a wider or narrower range of reasonable levels, you would reduce or increase that 1.5 value in the SetSensitivityLevels method. And of course the 100 represents your maximum x value.
I propose a simple formula, that (I believe) captures your requirement. In order to have a full "quarter circle", which is your extreme case, you would use (1-cos((x*pi)/(2*100)))*100.
What I suggest is that you take a weighted average between y=x and y=(1-cos((x*pi)/(2*100)))*100. For example, to have very close to linear (99% linear), take:
y = 0.99*x + 0.01*[(1-cos((x*pi)/(2*100)))*100]
Or more generally, say the level of linearity is L, and it's in the interval [0, 1], your formula will be:
y = L*x + (1-L)*[(1-cos((x*pi)/(2*100)))*100]
EDIT: I changed cos(x/100) to cos((x*pi)/(2*100)), because for the cos result to be in the range [1,0] X should be in the range of [0,pi/2] and not [0,1], sorry for the initial mistake.
You're probably looking for something like polynomial interpolation. A quadratic/cubic/quartic interpolation ought to give you the sorts of curves you show in the question. The differences between the three curves you show could probably be achieved just by adjusting the coefficients (which indirectly determine steepness).
The graph of y = x^p for x from 0 to 1 will do what you want as you vary p from 1 (which will give the red line) upwards. As p increases the curve will be 'pushed in' more and more. p doesn't have to be an integer.
(You'll have to scale to get 0 to 100 but I'm sure you can work that out)
I vote for Rax Olgud's general idea, with one modification:
y = alpha * x + (1-alpha)*(f(x/100)*100)
alt text http://www4c.wolframalpha.com/Calculate/MSP/MSP4501967d41e1aga1b3i00004bdeci2b6be2a59b?MSPStoreType=image/gif&s=6
where f(0) = 0, f(1) = 1, f(x) is superlinear, but I don't know where this "quarter circle" idea came from or why 1-cos(x) would be a good choice.
I'd suggest f(x) = xk where k = 2, 3, 4, 5, whatever gives you the desired degre of steepness for &alpha = 0. Pick a value for k as a fixed number, then vary α to choose your particular curve.
For problems like this, I will often get a few points from a curve and throw it through a curve fitting program. There are a bunch of them out there. Here's one with a 7-day free trial.
I've learned a lot by trying different models. Often you can get a pretty simple expression to come close to your curve.

Resources