R - Finding change points in a dataset [duplicate] - r

This question already has an answer here:
How to find changing points in a dataset
(1 answer)
Closed 5 years ago.
enter image description here
enter image description here
Kindly request to refer to the images to get a complete understanding
I have a huge dataset with numeric values. I would need to find the points at which an increasing or decreasing trend starts and ends.
E.g:
[100312
100317
100380
100432
100438
100441
100509
100641
100779
100919
100983
100980
100978
100983
100986
100885
100767
100758
100755
100755]
I have shown 5000 of the 1 million rows I have in my data.
Output > 100317(starting point of increase),100432 (end point of increase), 100441 (starting point of increase) 100919(end point of increase).
A change of ~10 is considered as noise.

you can try this code; starting point of increase and ending point of increase
df <- c(100312,100317,100380,100432,100438,100441,100509,100641,100779,100919,100983,100980,100978,100989,100999,100885,100767,100758,100755,100755)
indexs <- which(diff(df) >= 10)
flag <- which(diff(indexs) > 1)
end <- c(flag, length(indexs))
start <- c(1, end[-length(end)] + 1)
mapply(function(x, y) c(df[indexs[x]], df[indexs[y]]), start, end)

Related

Count duration of value in vector in R

I am trying to count the length of occurrances of a value in a vector such as
q <- c(1,1,1,1,1,1,4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,6,6,6,6,1,1,4,4,4)
Actual vectors are longer than this, and are time based. What I would like would be an output for 4 that tells me it occurred for 12 time steps (before the vector changes to 6) and then 3 time steps. (Not that it occurred 15 times total).
Currently my ideas to do this are pretty inefficient (a loop that looks element by element that I can have stop when it doesn't equal the value I specified). Can anyone recommend a more efficient method?
x <- with(rle(q), data.frame(values, lengths)) will pull the information that you want (courtesy of d.b. in the comments).
From the R Documentation: rle is used to "Compute the lengths and values of runs of equal values in a vector – or the reverse operation."
y <- x[x$values == 4, ] will subset the data frame to include only the value of interest (4). You can then see clearly that 4 ran for 12 times and then later for 3.
Modifying the code will let you check whatever value you want.

Bourdet Derivative in R with Smoothing Window

I am calculating pressure derivatives using algorithms from this PDF:
Derivative Algorithms
I have been able to implement the "two-points" and "three-consecutive-points" methods relatively easily using dplyr's lag/lead functions to offset the original columns forward and back one row.
The issue with those two methods is that there can be a ton of noise in the high resolution data we use. This is why there is the third method, "three-smoothed-points" which is significantly more difficult to implement. There is a user-defined "window width",W, that is typically between 0 and 0.5. The algorithm chooses point_L and point_R as being the first ones such that ln(deltaP/deltaP_L) > W and ln(deltaP/deltaP_R) > W. Here is what I have so far:
#If necessary install DPLYR
#install.packages("dplyr")
library(dplyr)
#Create initial Data Frame
elapsedTime <- c(0.09583, 0.10833, 0.12083, 0.13333, 0.14583, 0.1680,
0.18383, 0.25583)
deltaP <- c(71.95, 80.68, 88.39, 97.12, 104.24, 108.34, 110.67, 122.29)
df <- data.frame(elapsedTime,deltaP)
#Shift the elapsedTime and deltaP columns forward and back one row
df$lagTime <- lag(df$elapsedTime,1)
df$leadTime <- lead(df$elapsedTime,1)
df$lagP <- lag(df$deltaP,1)
df$leadP <- lead(df$deltaP,1)
#Calculate the 2 and 3 point derivatives using nearest neighbors
df$TwoPtDer <- (df$leadP - df$lagP) / log(df$leadTime/df$lagTime)
df$ThreeConsDer <- ((df$deltaP-df$lagP)/(log(df$elapsedTime/df$lagTime)))*
((log(df$leadTime/df$elapsedTime))/(log(df$leadTime/df$lagTime))) +
((df$leadP-df$deltaP)/(log(df$leadTime/df$elapsedTime)))*
((log(df$elapsedTime/df$lagTime))/(log(df$leadTime/df$lagTime)))
#Calculate the window value for the current 1 row shift
df$lnDeltaT_left <- abs(log(df$elapsedTime/df$lagTime))
df$lnDeltaT_right <- abs(log(df$elapsedTime/df$leadTime))
Resulting Data Table
If you look at the picture linked above, you will see that based on a W of 0.1, only row 2 matches this criteria for both the left and right point. Just FYI, this data set is an extension of the data used in example 2.5 in the referenced PDF.
So, my ultimate question is this:
How can I choose the correct point_L and point_R such that they meet the above criteria? My initial thoughts are some kind of while loop, but being an inexperienced programmer, I am having trouble writing a loop that gets anywhere close to what I am shooting for.
Thank you for any suggestions you may have!

Counting consecutive repeats, and returning the maximum value in each in each string of repeats if over a threshold

I am working with long strings of repeating 1's and 0's representing the presence of a phenomenon as a function of depth. If this phenomenon is flagged for over 1m, it is deemed significant enough to use for further analyses, if not it could be due to experimental error.
I ultimately need to get a total thickness displaying this phenomenon at each location (if over 1m).
In a dummy data set the input and expected output would look like this:
#Depth from 0m to 10m with 0.5m readings
depth <- seq(0, 10, 0.5)
#Phenomenon found = 1, not = 0
phenomflag <- c(1,0,1,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0)
What I would like as an output is a vector with: 4, 5 (which gets converted back to 2m and 2.5m)
I have attempted to solve this problem using
y <- rle(phenomflag)
z <- y$length[y$values ==1]
but once I have my count, I have no idea how to:
a) Isolate 1 maximum number from each group of consecutive repeats.
b) Restrict to consecutive strings longer than (x) - this might be easier after a.
Thanks in advance.
count posted a good solution in the comments section.
y <- y <- rle(repeating series of 1's and 0's)
x <- cbind(y$lengths,y$values) ; x[which(x[,1]>=3 & x[,2]==1)]
This results in just the values that repeat more than a threshold of 2, and just the maximum.

R : Display plot while calculating [duplicate]

This question already has answers here:
Plotting during a loop in RStudio
(7 answers)
Closed 8 years ago.
I wrote an R script which does a lot of heavy calculation. The script enters a while loop and proceeds it around 16 times (which lasts between 3 and 5 minutes a loop in average).
I would like to be able to "track" the calculation progresses: be able to see how much has been done and how much remains to be done. So I just printed the number of the iteration and the calculation time for a finished loop.
What I would like to add is a kind of a "progression bar", which I represent by a ggplot2 histogram. Every time a calculation is finished, I update the plot (the "finished" area grows bigger and the "remaining" area subsequently diminishes), and I print it. But, no plot shows in the "plot" area of R studio while the function is still executing. All plots appears once the overall calculation (of the 16 loops) is finished.
So my question is: Is it possible to display a plot at the end of a while loop, even if the "overall" calculation is not over?
If I am not being clear, please tell me.
Thank you in advance for your answers.
EDIT: Sys.sleep() works very well for what I intended.
If anyone is interested in the code, here follows:
# Big calculation
iter <- iter + 1
d <- data.frame(x1 = c(1,1), x2 = c(iter/maxIt, 1 - iter/maxIt),
x3 = c("finished", "remaining"))
print(qplot(d$x1, d$x2, fill = factor(d$x3), geom = "histogram", stat = "identity"))
Sys.sleep(1)
# Preparing the next loop
timearray <- NULL # array of time used by each iteration
for (wait in c(3,2,5,4,6)) {
s <- Sys.time() # iteration time start
Sys.sleep(3) # do your calculation -- I just wait here
s <- Sys.time() - s # now s is last iteration time
timearray <- c(timearray, s) # add this new value to array
plot(timearray, type='h', main='Iteration time') # plot the array
}
Could it be something like that?

Find the daily stock return percentage in R [duplicate]

This question already has answers here:
How to calculate returns from a vector of prices?
(7 answers)
Closed 8 years ago.
I am taking a column of historical stock prices and trying to find the percent return on stock. This would be accomplished through calculations such as todays stock price minus yesterdays stock price divided by yesterdays stock price. You could also divide the most current day and divide by the last and subtract by one.
I can find the difference between each day, but that is not my problem. I believe my teacher told me it is
x <- diff(log(theReturns))
Can you guys find the percent change in daily stock in R?
Let's say your vector is v <- c(10, 20, 23, 15, 22, 30) (this would be what you call theReturns, but I am using v for short here).
The difference between each day, which you already know how to get as you say, is
v[2:6] - v[1:5]
# 10 3 -8 7 8
In R there is another way to write this, using the function diff (see ?diff for more details):
diff(v) == v[2:6] - v[1:5]
# TRUE TRUE TRUE TRUE TRUE
Since you want to calculate the difference as a percentage of the previous day (i.e. the relative change), you simply need to divide this by v[1:5]
diff(v) / v[1:5]
# 1.0000000 0.1500000 -0.3478261 0.4666667 0.3636364
My guess is that you know how to do all that, but your confusion comes from your teacher introducing the log function in there. I don't think you necessarily have to use log, but it may simplify things because of one of its properties, which is that log(x/y) = log(x) - log(y), for positive x, y. Using this (after a little bit of algebra), you can see that another way to calculate the relative change is
exp(diff(log(v))) - 1
since that evaluates to exp(log(v[2:6]) - log(v[1:5])) - 1 which equals (v[2:6] / v[1:5]) - 1 which in turn equals (v[2:6] - v[1:5]) / v[1:5].

Resources