Apply a vector to an optimize function in R - r

Hoew can I apply vector of observations to find the local maximum between each observation in R. I do the following code, but according to plot the result should be just two local maximum.
How can I do this in R?
x = c(0.0000005, 0.1578947, 0.3157895, 0.4736842, 0.6315789, 0.7894737,
0.9473684, 1.1052632,1.2631579, 1.4210526, 1.5789474, 1.7368421,
1.8947368, 2.0526316, 2.2105263, 2.3684211, 2.5263158 ,
2.6842105, 2.8421053, 3.000000)
f = function(x) (x+1)*(x-2)*(x-1)*(x)*(x+1)*(x-2)*(x-3)
plot(x, f(x), type="l")
maximums = sapply(x, function(x)optimize(f, c(0, x), maximum = TRUE)$maximum)

I'm not sure how to apply optimize to that sequence for that purpose but it surely wouldn't be applied pointwise.. You could conceivably make a polynomial spline and then differentiate it. The numerical analog of differentiation is diff and the conditions for a local maximum are that the first derivative be small and the second derivative be negative. Here is a plot of conditions that satisfy those (shifting the coloring by one to account for the shortening of the vector when you diff it:
plot(x,f(x),
col=c("red","blue")[1+seq_along(x) %in% # adding one to the logical values 0,1
c(0, which( diff(diff(f(x)))<0 & diff(f(x)) < 0.1))])

Related

How do I find level sets for a function on R^d, in R?

I am looking for an efficient way to find level sets of an arbitrary function from [0,1]^d to R.
To be clear: with a level set I mean the set of points in [0,1]^d that are mapped to the same value.
In all of my applications, the level sets are connected. They are lines, planes, or some higher dimensional hyperplane, but apart from the connectedness, they do not satisfy some general criterium.
I am looking for a subset of the level set that has a high density everywhere.
When I limit my functions to 2d, I can use the function contourLines from the package grDevices, which does exactly what I am looking for:
test <- function(x,y){
y-(x^2-6*x+9)
}
Mat = matrix(0,100,100)
x <- seq(-10,10,length.out = 100)
y <- seq(-10,10,length.out = 100)
for(i in (1:100)){
for(j in (1:100)){
Mat[i,j] = test(x[i],y[j])
}
}
cont <- contourLines(x, y, Mat, levels = 0)
Unfortunately I have not been able to find a function that does the same trick in higher dimensions.
To give a bit more context to the problem:
I have a 'wild' function, of which I hardly know anything, but I can easily evaluate it at any point in R^d. This function divides the R^d (or [0,1]^d, to make it a bit simpler), into a positive part (level sets larger than 0), and a negative part (level sets smaller than 0). I am looking for the boundary separating the two, which is the level set for 0.

Graphing a pair of data vectors with a For Loop in R

I'm given a problem set with data for all the x coordinates in one vector and all the y coordinates in a second vector and I'm able to plot them fine with a single call function. But now I'm asked to run the same code, but now, plot it using a For Loop. I've never done for loops over 2 vectors before, so I'm a bit lost here.
Here's the exact breakdown of the question:
Write a for loop that iterates over the two vectors, taking corresponding values of the problem.8.x.data vector and the problem.8.y.data and plotting a single point at that location:
* Then graph the sequence of points by iterating over the two vectors. (Hint: you can do this by iterating with an index, and then using positive integer indexing to select the elements from the two vectors.)- In the first iteration of the for loop, plot a point at the location where the $x$-coordinate is the first value of the problem.8.x.data vector and the $y$-coordinate is the first value of the problem.8.y.data` vector. - In the second iteration of the `for` loop, plot a point at the location where the $x$-coordinate is the second value of the `problem.8.x.data` vector and the $y$-coordinate is the second value of theproblem.8.y.datavector. - In the third iteration of theforloop, plot a point at the location where the $x$-coordinate is the third value of theproblem.8.x.datavector and the $y$-coordinate is the third value of the ``problem.8.y.data vector. Your for loop should iterate over all the points, so make sure you get the upper limit right. (Hint: the $x$ and $y$ data vectors must have the same number of values, so you can just calculate the length of one of them to find the number of points.)
The length of each vector is 36 values so they match up. Originally I thought I had to be using two separate loops but it looks like they are keen on it being iterated over one loop to plot the points. My current code runs fine and gets the correct answer, but I don't think it's utilizing the for loop in any regard:
plot(
x = NULL,
xlim = c(-3, 3),
ylim = c(0, 4),
main = "Smile.8",
xlab = "",
ylab = "",
las = 1
)
for (index.8b in 1:length(problem.8.x.data)) {
points( problem.8.x.data, problem.8.y.data,
pch = 19,
cex = 2,
col = "royalblue3"
)
}
Any advice on how to go about solving this problem?

Assigning variables to a range in R

I am trying to run the following algorithm in R:
generate a uniform random variable, u
find F(x(i-1)) < u <= F(x(i))
return x=x(i)
In my case, I segmented my F(x) so that it is given by:
cdf:
[1] 0.0000000000 0.0001524158 0.0025910684 0.0196616369 0.0879439110
0.2586495961 0.5317786923 0.8049077884 0.9609815577 1.0000000000
So for example F(x(1)) = cdf[2]
Then i generate a vector of random uniforms:
u<-c(runif(10000,0,1))
But I am having trouble assigning each element in that vector to a specific range in the 'cdf'. I've tried a for loop with many if statements, but this is tedious and error prone.
I've also tried the following using a while statement:
x<-u
for(i in (1:length(u))){
for(j in (1:length(cdf)))
while(x[i]<cdf[j]){x[i]==which(cdf[j]>=x[i])}
}
Any suggestions?
I think you want to use cut(), as in:
cutPoints <- c(0.0000000000,# could set to -1. See comment below.
0.0001524158,
0.0025910684,
0.0196616369,
0.0879439110,
0.2586495961,
0.5317786923,
0.8049077884,
0.9609815577,
1.0000000000)
u <- runif(1000)
cut(u,
cutPoints,
labels = seq.int(length(cutPoints)-1))
Notice that the length of the (optional) argument labels is one less than the cut points b/c the labels label the space between the cut points. See ?cut for details.

Detecting dips in a 2D plot

I need to automatically detect dips in a 2D plot, like the regions marked with red circles in the figure below. I'm only interested in the "main" dips, meaning the dips have to span a minimum length in the x axis. The number of dips is unknown, i.e., different plots will contain different numbers of dips. Any ideas?
Update:
As requested, here's the sample data, together with an attempt to smooth it using median filtering, as suggested by vines.
Looks like I need now a robust way to approximate the derivative at each point that would ignore the little blips that remain in the data. Is there any standard approach?
y <- c(0.9943,0.9917,0.9879,0.9831,0.9553,0.9316,0.9208,0.9119,0.8857,0.7951,0.7605,0.8074,0.7342,0.6374,0.6035,0.5331,0.4781,0.4825,0.4825,0.4879,0.5374,0.4600,0.3668,0.3456,0.4282,0.3578,0.3630,0.3399,0.3578,0.4116,0.3762,0.3668,0.4420,0.4749,0.4556,0.4458,0.5084,0.5043,0.5043,0.5331,0.4781,0.5623,0.6604,0.5900,0.5084,0.5802,0.5802,0.6174,0.6124,0.6374,0.6827,0.6906,0.7034,0.7418,0.7817,0.8311,0.8001,0.7912,0.7912,0.7540,0.7951,0.7817,0.7644,0.7912,0.8311,0.8311,0.7912,0.7688,0.7418,0.7232,0.7147,0.6906,0.6715,0.6681,0.6374,0.6516,0.6650,0.6604,0.6124,0.6334,0.6374,0.5514,0.5514,0.5412,0.5514,0.5374,0.5473,0.4825,0.5084,0.5126,0.5229,0.5126,0.5043,0.4379,0.4781,0.4600,0.4781,0.3806,0.4078,0.3096,0.3263,0.3399,0.3184,0.2820,0.2167,0.2122,0.2080,0.2558,0.2255,0.1921,0.1766,0.1732,0.1205,0.1732,0.0723,0.0701,0.0405,0.0643,0.0771,0.1018,0.0587,0.0884,0.0884,0.1240,0.1088,0.0554,0.0607,0.0441,0.0387,0.0490,0.0478,0.0231,0.0414,0.0297,0.0701,0.0502,0.0567,0.0405,0.0363,0.0464,0.0701,0.0832,0.0991,0.1322,0.1998,0.3146,0.3146,0.3184,0.3578,0.3311,0.3184,0.4203,0.3578,0.3578,0.3578,0.4282,0.5084,0.5802,0.5667,0.5473,0.5514,0.5331,0.4749,0.4037,0.4116,0.4203,0.3184,0.4037,0.4037,0.4282,0.4513,0.4749,0.4116,0.4825,0.4918,0.4879,0.4918,0.4825,0.4245,0.4333,0.4651,0.4879,0.5412,0.5802,0.5126,0.4458,0.5374,0.4600,0.4600,0.4600,0.4600,0.3992,0.4879,0.4282,0.4333,0.3668,0.3005,0.3096,0.3847,0.3939,0.3630,0.3359,0.2292,0.2292,0.2748,0.3399,0.2963,0.2963,0.2385,0.2531,0.1805,0.2531,0.2786,0.3456,0.3399,0.3491,0.4037,0.3885,0.3806,0.2748,0.2700,0.2657,0.2963,0.2865,0.2167,0.2080,0.1844,0.2041,0.1602,0.1416,0.2041,0.1958,0.1018,0.0744,0.0677,0.0909,0.0789,0.0723,0.0660,0.1322,0.1532,0.1060,0.1018,0.1060,0.1150,0.0789,0.1266,0.0965,0.1732,0.1766,0.1766,0.1805,0.2820,0.3096,0.2602,0.2080,0.2333,0.2385,0.2385,0.2432,0.1602,0.2122,0.2385,0.2333,0.2558,0.2432,0.2292,0.2209,0.2483,0.2531,0.2432,0.2432,0.2432,0.2432,0.3053,0.3630,0.3578,0.3630,0.3668,0.3263,0.3992,0.4037,0.4556,0.4703,0.5173,0.6219,0.6412,0.7275,0.6984,0.6756,0.7079,0.7192,0.7342,0.7458,0.7501,0.7540,0.7605,0.7605,0.7342,0.7912,0.7951,0.8036,0.8074,0.8074,0.8118,0.7951,0.8118,0.8242,0.8488,0.8650,0.8488,0.8311,0.8424,0.7912,0.7951,0.8001,0.8001,0.7458,0.7192,0.6984,0.6412,0.6516,0.5900,0.5802,0.5802,0.5762,0.5623,0.5374,0.4556,0.4556,0.4333,0.3762,0.3456,0.4037,0.3311,0.3263,0.3311,0.3717,0.3762,0.3717,0.3668,0.3491,0.4203,0.4037,0.4149,0.4037,0.3992,0.4078,0.4651,0.4967,0.5229,0.5802,0.5802,0.5846,0.6293,0.6412,0.6374,0.6604,0.7317,0.7034,0.7573,0.7573,0.7573,0.7772,0.7605,0.8036,0.7951,0.7817,0.7869,0.7724,0.7869,0.7869,0.7951,0.7644,0.7912,0.7275,0.7342,0.7275,0.6984,0.7342,0.7605,0.7418,0.7418,0.7275,0.7573,0.7724,0.8118,0.8521,0.8823,0.8984,0.9119,0.9316,0.9512)
yy <- runmed(y, 41)
plot(y, type="l", ylim=c(0,1), ylab="", xlab="", lwd=0.5)
points(yy, col="blue", type="l", lwd=2)
EDITED : function strips the regions to contain nothing but the lowest part, if wanted.
Actually, Using the mean is easier than using the median. This allows you to find regions where the real values are continuously below the mean. The median is not smooth enough for an easy application.
One example function to do this would be :
FindLowRegion <- function(x,n=length(x)/4,tol=length(x)/20,p=0.5){
nx <- length(x)
n <- 2*(n %/% 2) + 1
# smooth out based on means
sx <- rowMeans(embed(c(rep(NA,n/2),x,rep(NA,n/2)),n),na.rm=T)
# find which series are far from the mean
rlesx <- rle((sx-x)>0)
# construct start and end of regions
int <- embed(cumsum(c(1,rlesx$lengths)),2)
# which regions fulfill requirements
id <- rlesx$value & rlesx$length > tol
# Cut regions to be in general smaller than median
regions <-
apply(int[id,],1,function(i){
i <- min(i):max(i)
tmp <- x[i]
id <- which(tmp < quantile(tmp,p))
id <- min(id):max(id)
i[id]
})
# return
unlist(regions)
}
where
n determines how much values are used to calculate the running mean,
tol determines how many consecutive values should be lower than the running mean to talk about a low region, and
p determines the cutoff used (as a quantile) for stripping the regions to their lowest part. When p=1, the complete lower region is shown.
Function is tweaked to work on data as you presented, but the numbers might need to be adjusted a bit to work with other data.
This function returns a set of indices, which allows you to find the low regions. Illustrated with your y vector :
Lows <- FindLowRegion(y)
newx <- seq_along(y)
newy <- ifelse(newx %in% Lows,y,NA)
plot(y, col="blue", type="l", lwd=2)
lines(newx,newy,col="red",lwd="3")
Gives :
You have to smooth the graph in some way. Median filtration is quite useful for that purpose (see http://en.wikipedia.org/wiki/Median_filter). After smoothing, you will simply have to search for the minima, just as usual (i.e. search for the points where the 1st derivative switches from negative to positive).
A simpler answer (which also does not require smoothing) could be provided by adapting the maxdrawdown() function from the tseries. A drawdown is commonly defined as the retreat from the most-recent maximum; here we want the opposite. Such a function could then be used in a sliding window over the data, or over segmented data.
maxdrawdown <- function(x) {
if(NCOL(x) > 1)
stop("x is not a vector or univariate time series")
if(any(is.na(x)))
stop("NAs in x")
cmaxx <- cummax(x)-x
mdd <- max(cmaxx)
to <- which(mdd == cmaxx)
from <- double(NROW(to))
for (i in 1:NROW(to))
from[i] <- max(which(cmaxx[1:to[i]] == 0))
return(list(maxdrawdown = mdd, from = from, to = to))
}
So instead of using cummax(), one would have to switch to cummin() etc.
My first thought was something much cruder than filtering. Why not look for the big drops followed by long enough stable periods?
span.b <- 20
threshold.b <- 0.2
dy.b <- c(rep(NA, span.b), diff(y, lag = span.b))
span.f <- 10
threshold.f <- 0.05
dy.f <- c(diff(y, lag = span.f), rep(NA, span.f))
down <- which(dy.b < -1 * threshold.b & abs(dy.f) < threshold.f)
abline(v = down)
The plot shows that it's not perfect, but it doesn't discard the outliers (I guess it depends on your take on the data).

R question about plotting probability/density histogram the right way

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.
So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.
Thanks in advance!
the end of the whole data looks like this: just for a little orientation
[490,] 2.338260830 0.04858685
[491,] 2.347839477 0.04797310
[492,] 2.357418125 0.04736149
[493,] 2.366996772 0.04675206
[494,] 2.376575419 0.04614482
[495,] 2.386154067 0.04553980
[496,] 2.395732714 0.04493702
[497,] 2.405311361 0.04433653
[498,] 2.414890008 0.04373835
[499,] 2.424468656 0.04314252
[500,] 2.434047303 0.04254907
#everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.
The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)
Any suggestions?
Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.
[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817
Best regards,
appreciate the fast responses!(bow)
What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?
Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:
plot(your.data, type = 'l')
EDIT:
Ok, I'm not sure this is better than the density plot, but it can be done:
x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)
In your case, replace x with the probabilities from the second column of your matrix.
EDIT2:
On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.
Assuming M is the matrix. wouldn't this just be :
plot(x=M[ , 1], y = M[ , 2] )
You have already done the density estimation since this is not the original data.

Resources