Finding local maxima in R - r

I am trying to identify local maxima in a data set and am using the following code that I found from LyzandeR:
library(data.table)
maximums <- function(x) which(x - shift(x, 1) > 0 & x - shift(x, 1, type='lead') > 0)
maximums(data$Y)
Doing so gives me the correct maxima values for non-noisy data, but for noisy data it incorrectly (for my purposes, it's working as written) identifies small fluctuations as independent peaks. I've been unable to determine how to extend the range for the maximum, i.e. in order for a point to be considered a maximum, it needs to be greater than, say, 5 adjacent values. Is that possible given this code?
Thank you!

Related

Is it possible to create a countif like function in R using ranges?

I've already read this question with an approach to counting entries in R:
how to realize countifs function (excel) in R
I'm looking for a similar approach, except that I want to count data that is within a given range.
For example, let's say I have this dataset:
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
Following the approach on the linked question, we would develop something like this:
count <- data$values == 1.5
sum(count)
Problem is, I want to be able to include in the count anything that varies 0.2 from 1.5 - that is, all possible number from 1.3 to 1.7.
Is there a way to do so?
sum(data$values>=1.3 & data$values<=1.7)
As the explanation in the question you linked to points out, when you just write out a boolean condition, it generates a vector of TRUEs and FALSEs the same length as your original dataframe. TRUE equals 1 and FALSE equals 0, so summing across it gives you a count. So it simply becomes a matter of putting your condition as a boolean phrase. In the case of more than one condition, you connect them with & or | (or) -- much the same way that you could do in excel (only in excel you have to do AND() or OR()).
(For a more general solution, you can use dplyr::between - it's also supposed to be faster since it's implemented in C++. In this case, it would be sum(between(data$values,1.3,1.7).)
Like #doviod writes, you can use a compound logical condition.
My approach is different, I wrote a function that takes the vector and as range the center point value and the distance delta.
After a suggestion by #doviod, I have set a default value delta = 0, so that if only value is passed, the function returns
a count of cases where the values equal the value the user provides.
(doviod, in the comment)
countif <- function(x, value, delta = 0)
sum(value - delta <= x & x <= value + delta)
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
countif(data$values, 1.5, 0.2)
#[1] 3
which identifies the location of all values in your vector that satisfy your criterion, and length subsequently counts the 'hits'.
length( which(data$values>=1.3 & data$values<=1.7) )
[1] 3

Count duration of value in vector in R

I am trying to count the length of occurrances of a value in a vector such as
q <- c(1,1,1,1,1,1,4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,6,6,6,6,1,1,4,4,4)
Actual vectors are longer than this, and are time based. What I would like would be an output for 4 that tells me it occurred for 12 time steps (before the vector changes to 6) and then 3 time steps. (Not that it occurred 15 times total).
Currently my ideas to do this are pretty inefficient (a loop that looks element by element that I can have stop when it doesn't equal the value I specified). Can anyone recommend a more efficient method?
x <- with(rle(q), data.frame(values, lengths)) will pull the information that you want (courtesy of d.b. in the comments).
From the R Documentation: rle is used to "Compute the lengths and values of runs of equal values in a vector – or the reverse operation."
y <- x[x$values == 4, ] will subset the data frame to include only the value of interest (4). You can then see clearly that 4 ran for 12 times and then later for 3.
Modifying the code will let you check whatever value you want.

Counting consecutive repeats, and returning the maximum value in each in each string of repeats if over a threshold

I am working with long strings of repeating 1's and 0's representing the presence of a phenomenon as a function of depth. If this phenomenon is flagged for over 1m, it is deemed significant enough to use for further analyses, if not it could be due to experimental error.
I ultimately need to get a total thickness displaying this phenomenon at each location (if over 1m).
In a dummy data set the input and expected output would look like this:
#Depth from 0m to 10m with 0.5m readings
depth <- seq(0, 10, 0.5)
#Phenomenon found = 1, not = 0
phenomflag <- c(1,0,1,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0)
What I would like as an output is a vector with: 4, 5 (which gets converted back to 2m and 2.5m)
I have attempted to solve this problem using
y <- rle(phenomflag)
z <- y$length[y$values ==1]
but once I have my count, I have no idea how to:
a) Isolate 1 maximum number from each group of consecutive repeats.
b) Restrict to consecutive strings longer than (x) - this might be easier after a.
Thanks in advance.
count posted a good solution in the comments section.
y <- y <- rle(repeating series of 1's and 0's)
x <- cbind(y$lengths,y$values) ; x[which(x[,1]>=3 & x[,2]==1)]
This results in just the values that repeat more than a threshold of 2, and just the maximum.

Expected value of the difference between a sum of variables and a threshold

I had a custom deck consisting of eight cards of the sequence 2^n, n=0,..,6. I draw cards (without replacement) until the sum is equal or greater than the threshold. How can I implement in R a function that calculates the mean of the difference between the sum and the threshold??
I tried to do it using this How to store values in a vector with nested functions
but it takes ages... I think there is a way to do it with probabilities/simulations but I can figure out.
The threshold could be greater than the value of one single card, ie, threshold=500 or less than the value of a single card, ie, threshold=50
What I have done so far is to find all the subsets that meet the condition of the sum greater or equal to the threshold. Then I will only substract the threshold and calculate the mean.
I am using the following code in R. For a small set I get the answer quite fast. However, I have been running the function for several ours with the set containing the 56 numbers and is still working.
set<-c(rep(1,8),rep(2,8), rep(4,8),rep(8,8),rep(16,8),rep(32,8),rep(64,8))
recursive.subset <-function(x, index, current, threshold, result){
for (i in index:length(x)){
if (current + x[i] >= threshold){
store <<- append(store, sum(c(result,x[i])))
} else {
recursive.subset(x, i + 1, current+x[i], threshold, c(result,x[i]))
}
}
}
store <- vector()
inivector <- vector(mode="numeric", length=0) #initializing empty vector
recursive.subset (set, 1, 0, threshold, inivector)
I don't know if it is possible to get an exact solution, simply because there are so many possible combinations. It is probably better to do simulations, i.e. write a script for 1 full draw and then rerun that script many times. Since the solutions are very similar, the simulation should give a pretty good approximation.
Ok, here goes:
set <- rep(2^(0:6), each = 8)
thr <- 500
fun <- function(set,thr){
x <- cumsum(sample(set))
value <- x[min(which(x >= thr))]
value
}
system.time(a <- replicate(100000, fun(set,thr)))
# user system elapsed
# 1.10 0.00 1.09
mean(a - thr)
# [1] 21.22992
Explanation: Rather than drawing a card one at a time, I draw all cards simultaneously (sample) and then calculate the cumulative sum (cumsum). I then find the point where the cards at up to the threshold or larger, and find the sum of those cards back in x. We run this function many times with replicate, to obtain a vector of the values. We use mean(a-thr) to calculate the mean difference.
Edit: Made a really stupid typo in the code, fixed it now.
Edit2: Shortened the function a little.

Trouble with a loop statement in R

I am having trouble writing the proper R code to perform a looped, if else, conditional test. I am trying to solve for x (must be a whole number), such that F_c = 5 (see below). Both z and w are a series of known values, with z representing abundance values and w representing area sampled. Right now I am essentially entering random values for x to see how close I can get to F_c = 5. I would like to write a loop for this, and also have the loop stop when an iteration of x results in F_c = 5. Any help would be very appreciated, I have spent a lot of time on this and haven't found a similar question posted yet (but if there is one please direct me to the solution). Thanks,
cond = ifelse(z<=x, 1, 0)
F_c = 100*(sum(w*z*cond)/sum(w*z))
Not much clear, but I'd assume you want to know at which point the cumulative sum of w*z reaches the five per cent of sum(w*z), while following the order of z. If that's correct, you can try this:
#for every z get the order indices
indices<-order(z)
#sort both z and w by z
z<-z[indices]
w<-w[indices]
#now cumsum will give you the cumulative sum of a vector
#and you compare it to sum(z*w).
#findInterval will give you the index of when you reach .05
res<-findInterval(.05,cumsum(w*z)/sum(w*z))
#the value you are looking for:
z[res]

Resources