I recently tried to adapt some pseudocode for an in-place quicksort, quoted below:
function quicksort(array)
if length(array) > 1
pivot := select any element of array
left := first index of array
right := last index of array
while left ≤ right
while array[left] < pivot
left := left + 1
while array[right] > pivot
right := right - 1
if left ≤ right
swap array[left] with array[right]
left := left + 1
right := right - 1
quicksort(array from first index to right)
quicksort(array from left to last index)
Following this, I wrote this code:
quicksort<-function(array)
{
len<-length(array)
if(len>1)
{
left<-1
right<-len
pivot<-array[(left+right)%/%2]
while(left<=right)
{
while(array[left]<pivot){left<-left+1}
while(array[right]<pivot){right<-right-1}
if(left<=right)
{
array[c(left,right)]<-array[c(right,left)]
left<-left+1
right<-right-1
}
}
array<-quicksort(array[1:right])#Bug here
array<-quicksort(array[left:len])
}
array
}
If you run this code with more than one integer as an input, you'll find that it eventually tries to sort lists of NAs. I suspect that the problem is that the pseduocode wants array from first index to right - my 1:right - to be read as an empty sequence when right is zero. As is well known, R would actually read as 1:0 as the sequence 0 1.
Is there any function to get the behavior that I intend? I could do this with an if statement, but R is usually good enough with sequences that I can't help but think that there will be a better way. I tried to use seq, but it will throw an error if you try to use anything like seq(from=1,to=0,by=1).
There are two issues with the code:
One of the inequality signs is flipped.
The array assignment needs to be modified to only change a subset of elements rather than replace (and shrink) the whole array.
Here is the corrected code
quicksort <- function(array) {
len <- length(array)
if(len > 1) {
left <- 1
right <- len
pivot <- array[(left+right)%/%2]
while(left <= right)
{
while(array[left] < pivot){
left <- left + 1
}
while(array[right] > pivot) { # Changed "<" to ">"
right <- right - 1
}
if(left <= right) {
array[c(left, right)] <- array[c(right, left)]
left <- left + 1
right <- right - 1
}
}
# Modified the following two lines to only set a subset of array
array[1:right] <- quicksort(array[1:right])
array[left:len] <- quicksort(array[left:len])
}
array
}
quicksort(c(2, 6, 3, 1, 4, 5))
#> [1] 1 2 3 4 5 6
I have a timerow and would like to find minima which fulfill specific criteria. Those are that the number of points within the valleys (below the red line) between 2 peaks should exceed a certain value and also the number of points above the red line should exceed a certain value for each peak neighboring the valley. Also the lower of the two peaks should be resolved at a value lower than 50% of its height (meaning that the max(intensity) of the lower of the two peaks should be at least 2 fold the intensity of the lowest intensity within the valley between the two peaks - as calculated below in the code sample). I drew the red line here at a specific height but in reality those unresolved peaks can have any height and can also be seperated by any distance. So what I am doing at the moment is to "scan" with the red line over each point of the time series which is of course very slow and inefficient.
So here is the for-loop I am using at the moment:
detect_double_peaks <- function(pot.doubleP.v, Min.PpP = 10) {
peak.dt <-
data.table(
breakP = NA,
breakH = NA,
resolved = NA
)
for (point in pot.doubleP.v[pot.doubleP.v > 0 &
pot.doubleP.v < 0.8 * max(pot.doubleP.v)]) {
doublePeak.rle <- S4Vectors::Rle(pot.doubleP.v > point)
doublePeak.rle.dt <-
data.table(
idx = as.numeric(seq.int(length(
doublePeak.rle#values
))),
values = doublePeak.rle#values,
lengths = doublePeak.rle#lengths,
start = start(doublePeak.rle),
end = end(doublePeak.rle)
)
doublePeak.rle.dt_p <-
doublePeak.rle.dt[values == TRUE & lengths > Min.PpP]
if (nrow(doublePeak.rle.dt_p) > 1) {
for(peak in 1:nrow(doublePeak.rle.dt_p)-1){
doublePeak.rle.dt_v <- doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx & idx < doublePeak.rle.dt_p[peak + 1]$idx]
if(sum(doublePeak.rle.dt_v[values == FALSE]$lengths) >= max(max(doublePeak.rle.dt_p[peak]$lengths, doublePeak.rle.dt_p[peak+1]$lengths) * 0.5, Min.PpP)){
dp.p_height_h <-
max(max(pot.doubleP.v[(doublePeak.rle.dt_p[peak]$start):(doublePeak.rle.dt_p[peak]$end)]),
max(pot.doubleP.v[(doublePeak.rle.dt_p[peak + 1]$start):(doublePeak.rle.dt_p[peak + 1]$end)]))# - baseL
dp.p_height_l <-
min(max(pot.doubleP.v[(doublePeak.rle.dt_p[peak]$start):(doublePeak.rle.dt_p[peak]$end)]),
max(pot.doubleP.v[(doublePeak.rle.dt_p[peak + 1]$start):(doublePeak.rle.dt_p[peak + 1]$end)]))# - baseL
breakH <-
min(pot.doubleP.v[min(doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx]$start):max(doublePeak.rle.dt[idx < doublePeak.rle.dt_p[peak+1]$idx]$end)])# - baseL
resolved <-
breakH / dp.p_height_l * 100
breakP <-
which.min(pot.doubleP.v[min(doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx]$start):max(doublePeak.rle.dt[idx < doublePeak.rle.dt_p[peak+1]$idx]$end)]) +
doublePeak.rle.dt_p[peak]$end
peak.dt <- rbind(peak.dt,
data.table(breakP = breakP,
breakH = breakH,
resolved = resolved))
}
}
}
}
if(nrow(peak.dt) == 1) {return(NULL)} else{
return(na.omit(unique(peak.dt, by="breakP")))
}
}
Here are some example data:
testvector <- c(13126.177734375, 12040.060546875, 10810.6171875, 10325.94140625,
13492.8359375, 33648.5703125, 14402.603515625, 29920.12890625,
24316.224609375, 36019.26171875, 34492.4609375, 53799.82421875,
45988.72265625, 47930.453125, 67438.9140625, 61231.83984375,
56710.9140625, 62301.6796875, 54844.7578125, 70913.578125, 81028.1640625,
75234.203125, 59611.05078125, 79240.4375, 52313.3828125, 78758.2734375,
87918.5859375, 80764.7421875, 108035.5390625, 76263.875, 72401.6796875,
83167.640625, 76173.96875, 66241.4296875, 68687.4375, 52107.83984375,
45672.5390625, 51907.33203125, 39967.453125, 58856.90625, 52402.53125,
36980.3125, 43365.76171875, 40480.75, 39057.96484375, 31622.58984375,
23830.455078125, 27393.30078125, 30675.208984375, 27327.48046875,
25150.08984375, 23746.212890625, 9637.625, 19065.58984375, 21367.40625,
6789.0625, 9892.7490234375, 26820.685546875, 19965.353515625,
28281.462890625, 25495.0703125, 28808.416015625, 40244.03125,
35159.421875, 35257.984375, 39971.8046875, 34710.4453125, 60987.73828125,
50620.06640625, 58757.69140625, 52998.97265625, 55601.96484375,
69057.9453125, 58486.52734375, 66115.4765625, 80801.7578125,
77444.6015625, 43545.48828125, 79545.0703125, 50352.484375, 77401.8671875,
85118.421875, 80521.9296875, 68945.8125, 93098.0234375, 83065.8046875,
95970.8203125, 74141.8828125, 90298.75, 81251.0234375, 99658.3359375,
88812.2578125, 81939.4921875, 82632.1015625, 100125.0078125,
71627.84375, 70560.1484375, 77831.765625, 68122.328125, 79049.140625,
88000.890625, 64897.4453125, 57333.3046875, 68185.3046875, 67742.3515625,
58941.85546875, 63184.8671875, 36998.67578125, 45416.58984375,
31547.3359375, 32141.58203125, 35292.9765625, 30511.861328125,
25419.716796875, 23901.431640625, 15616.8759765625, 14469.16015625,
15026.0009765625, 18321.42578125, 15820.861328125, 19532.056640625,
13230.6240234375, 14586.76953125, 14912.642578125, 8541.5224609375,
21740.98046875, 19588.986328125, 18603.662109375, 19656.5625,
10812.94921875, 18379.3359375, 31242.716796875, 25626.0390625,
42446.71875, 27782.294921875, 38450.703125, 39070.97265625, 52914.375,
56484.47265625, 47741.88671875, 52397.18359375, 79378.2109375,
77866.078125, 55902.09765625, 66988.2265625, 63571.01171875,
66192.53125, 79989.8046875, 57204.59765625, 51172.9921875, 49612.16015625,
60508.0390625, 69518.09375, 48079.5625, 48691.0390625, 33679.12890625,
30697.470703125, 31209.359375, 49656.16796875, 32041.912109375,
13851.48828125, 29316.44921875, 31586.216796875, 45422.19921875,
24208.515625, 31496.083984375, 26250.646484375, 14318.302734375
)
For this vector the minima at 56 and 125 should be returned.
Those should be returned because when scanning with the red line through the points of the vector there is at least one iteration at which there are more than Min.PpP = 10 consecutive points above the red line on each side of the valley, and with the same red line there are also more than Min.PpP = 10 points in the valley. The reason why point 4 should not be returned as minima is that no matter where we put the red line the valley will never exceed 3 points (which is lower than Min.PpP = 10) and the peak on the left side of that minima would only be 1 point (which is also lower than Min.PpP = 10).
I am aware of functions like pastecs::turnpoints. However, they do not seem to offer the implementation of criteria as I want to use them.
So it there any other more efficent way to achive that?
Ps.:
I have also put another criteria in the example code which says that there should be at least halve as many points in the vally as there are for the peak with the smaller number of points even when Min.PpP is exceeded:
if(sum(doublePeak.rle.dt_v[values == FALSE]$lengths) >= max(max(doublePeak.rle.dt_p[peak]$lengths, doublePeak.rle.dt_p[peak+1]$lengths) * 0.5, Min.PpP))
However I guess thats not really important for this problem.
I am pretty new to R, so there is definitely some improvement to my code needed. What I want is to do quicksort on an array of n elements, count the number of comparisons made and output the sorted array after k comparisons.
So far, I have reused the code for a quicksort algorithm found here:
quickSort <- function(arr) {
# Pick a number at random.
mid <- sample(arr, 1)
print(arr)
print(mid)
# Place-holders for left and right values.
left <- c()
right <- c()
# Move all the smaller values to the left, bigger values to the right.
lapply(arr[arr != mid], function(d) {
count <<- count + 1
stopifnot(count <= k)
if (d < mid) {
left <<- c(left, d)
}
else {
right <<- c(right, d)
}
})
if (length(left) > 1) {
left <- quickSort(left)
}
if (length(right) > 1) {
right <- quickSort(right)
}
# Finally, return the sorted values.
c(left, mid, right)
}
I am currently struggling with several things:
How can I get not only the partial vector that is currently being sorted but also the full vector?
Did I put the right stopping condition in the right place?
An example of what I want:
given an array (2,4,1,3,5) and the first pivot element 3, after four comparisons I would want the output to be (2,1,3,4,5).
Any help would be greatly appreciated!
I would like to find the value of "p" below (which is between 0 and 1), knowing the following equations:
RI_26 = min(IR,na.rm=FALSE)
RI_min = 100-(sse*SUM/((1+p)*Dotation2017*100))^(1/p)
where RI_26 is the minimum of resources index of my 26 area. It is a constant in my case. In RI_min, sse and Dotations2017 are 2 constants and p is a unknown. I know that RI_26 should be equal to RI_min.
It would be easy to solve it, but SUM (which is present in RI_min) is as well unknown as it is a function of p as following:
`sum.function = function(p){
SUM <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
for(i in 1:length(Canton))
if(IR[i] < 100) {
SUM[i] <- (100-IR[i])^(1+p)*Pop[i]
SUM[27] <- SUM[27]+SUM[i]
}
SUM <- round(SUM,0)
return(SUM[27])
}
SUM = sum.function(p)
SUM returns a number (or vector 1X1). To deal with it, I would like to find the value of p that satisfied:
RI_26/RI_min = 1
To do so, I would like to do a loop, beginning with p = 0 and then increasing the value of p by 0.01 until it reaches 1. The loop should return the value of p_star when the constraint is True (RI_26/RI_min = 1.00).
I don't have any idea how to do this but it could look like the following code:
p.function = function(){
for(...)
if(RI_26/RI_min = 1.000000) {
p_star <- p
}
return(p_star)
}
So the function will return the value of p_star when RI_26/RI_min = 1.000000. What am I suppose to write in my function: p.function to increment "p" and have the result that I want? Any idea?
for (i in seq(0, 1, by = 0.1)) {
"Your code here"
}