R peak and valley filltering with threshold - r

I am using below code to list out Peaks and Valleys.
x_last <- as.numeric(series[1])
x <- as.numeric(series[2])
d_last <- (x-x_last)
series[1:2] <- NULL
output <- list()
for (x_next in series){
if (x_next == x){
next}
d_next <- (x_next - x)
if (d_last * d_next < 0){
output <- append(output, x)}
x_last <- x
x <- x_next
d_last <- d_next
}
Here Output(list) contains "continuous Peaks and Valleys".
Output <- c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894)
so on...
the graph plotted using Output(list). My question is how to add threshold in this code? or how can i remove small Peaks and Valleys(values less than 1). I Need continuous Peaks and valleys.
Looking for answers. thank you in advance.

If you just want to plot your data:
You could plot this with ggplot2 and add a geom_smooth() layer. It defaults to method "loess" which is kind of a "do-the-right-thing" smoother for small datasets.
dat <- data.frame(y=c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894))
dat$x <- 1:length(dat$y)
library(ggplot2)
ggplot(dat, aes(x, y)) +
geom_line() +
geom_smooth(method="loess", se=FALSE)
?
Or do you rather want to smoothen the data yourself? (Your data series is quite short for that.) Do you need an equation for the fit? It's easy to spend quite some time on that.
I don't fully understand this "peak/valley" stuff. In any case, take a look at the diff() function. Maybe this helps:
dat <- data.frame(y=c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894))
dat[which(diff(dat$y) < 0.01)+1,"y"] <- NA
dat$y
[1] 41.50 NA 39.86 NA 39.96 NA NA NA 40.43 NA 40.82 NA
[13] 58.83 NA
Here I've used a threshold of 0.01.
I'm not sure if it's the right thing. But you can adapt this code for your needs.

At last I created a function to remove small cycles also to maintain Peak and valley. For me it is working perfectly.
hysteresis <- function(series, min_range){
#hysteresis function will remove cycles within the magnitude of min_range
#Series: list of values with continuous Peak & valley.
series <- unlist(series)
f <- series[1]
org <- f
series <- series[2:length(series)]
for(i in series){
val <- abs(i-f)
if(val > min_range){
org <- c(org,i)
f <- i
}
#else statement is used to maintain peak and valley
else{
org <- org[1:(length(org)-1)]
f <- org[length(org)]
}
}
return(org)
}

Related

Question on loop when integrating vectors into dataframe

Happy newyear! Simulating a model pertaining to circulation, I encounter some questions here! I generate Y correctly as it makes no sense and adds NAs on dataframe into which I am planning to simulate Y 5000 times to integrate. I don't have a idea on that,and really appreciate for any help or suggestions, Guys! Thanks!
[#Investment simulation
library(tidyverse)
library(ggplot2)
library(statmod)
library(invGauss)
library(dplyr)
library(estimatr)
Res<-as.data.frame(matrix(nrow = 27))
k=1
while(k <=5000){#this circulation probably collapesed
Y=c(1610,1550)
DeviaY=c(0)
Pos_value=c(0)
Years<-c(1986,1987)
DiffY=c(0)
prob_poi=(6/13)*(exp(1)^(-(6/13)))
expec<-c()
q <- rinvgauss(27, mean=0.5, disp=0.5) # generate vector of 10 random numbers
p <- pinvgauss(q, mean=0.5, disp=0.5) # p should be uniformly distributed
for(i in 3:27){
regree<-lm_robust(Y~Years)
expec<- predict(regree,newdata=data.frame(Years=1985+i))
DiffY\[i-1\]=0.7*(Y\[i-1\]-tail(expec,1))
DeviaY\[i-1\]=p\[i\]*sd(Y)
Pos_value\[i-1\]=sample(x=c(-1,1,0),replace = T,size = 1,prob = c(prob_poi/2,prob_poi/2,1-prob_poi))
Y\[i\]<- Y\[i-1\]-DiffY\[i-1\]+DeviaY\[i-1\]+Pos_value\[i-1\]*rinvgauss(1,mean = 60,dispersion = 25)
Years\[i\]<-1985+i
if(i == 27) return(Y)
}
Res\[,k\] <- Y#and corresponding to the first'while' on line 10
k<- k+1 #and corresponding to the first'while' on line 10
} #and corresponding to the first'while' on line 10
#My purpose is to repeat the correct calculating processes on line11 to line30 for framing dataframe,but it returns NAs and it make no sense.][1]

How to display counts using the periodic table with ggplot?

I have a list of elemental compositions and I'd like to display a count for the number of times an element is included in a composition mapped onto the periodic table (e.g. CH4 would increase the count on H and C by one).
How can I do this with ggplot? Is there a map I can use?
With a bit of searching I found information about the periodic table in this example code project. They had an Access Database with element information. I've exported it to this gist. You can import the data using the httr library with
library(httr)
dd <- read.table(text=content(GET("https://gist.githubusercontent.com/MrFlick/c1183c911bc5398105d4/raw/715868fba2d0d17a61a8081de17c468bbc525ab1/elements.txt")), sep=",", header=TRUE)
(You should probably create your own local version for easier loading in the future.)
Then your other challenge is decomposing something like "CH4" into the raw element counts. I've created this helper function which I think does what you need.
decompose <- function(x) {
m <- gregexpr("([A-Z][a-z]?)(\\d*)", x, perl=T)
dx <- Map(function(x, y) {
ElementSymbol <- gsub("\\d","", x)
cnt <- as.numeric(gsub("\\D","", x))
cnt[is.na(cnt)]<-1
cbind(Sym=y, as.data.frame(xtabs(cnt~ElementSymbol)))
}, regmatches(x,m), x)
do.call(rbind, dx)
}
Here I test the function
test_input <- c("H2O","CH4")
decompose(test_input)
# Sym ElementSymbol Freq
# 1 H2O H 2
# 2 H2O O 1
# 3 CH4 C 1
# 4 CH4 H 4
Now we can combine the data and the reference information to make a plot
library(ggplot2)
ggplot(merge(decompose("CH4"), dd), aes(Column, -Row)) +
geom_tile(data=dd, aes(fill=GroupName), color="black") +
geom_text(aes(label=Freq))
Clearly there are opportunities for improvement but this should give you a good start.
You might look for a more robust decomposition function. Looks like the CHNOSZ package has one
library(CHNOSZ)
data(thermo)
decompose <- function(x) {
do.call(`rbind`, lapply(x, function (x) {
z <- makeup(x)
cbind(data.frame(ElementSymbol = names(z),Freq=z), Sym=x)
}))
}
ggplot(merge(decompose("CaAl2Si2O7(OH)2*H2O"), dd), aes(Column, -Row)) +
geom_tile(data=dd, aes(fill=GroupName), color="black") +
geom_text(aes(label=Freq))

R subscript out of bounds with for loops

I am trying to count entries that fall within a 1000 window, the problem is that I'm using for loops which makes the number of operations that need to be performed quite large (I'm fairly new to R) and I get an out of bounds error. I know there must be a better way to do this.
File (warning the file is a little over 100mb): bamDF.txt
Use:
dget(file="bamDF.txt")
Script:
attach(bamDF)
out <- matrix(0,1,ceiling((max(pos, na.rm=TRUE)-min(pos, na.rm=TRUE))/interval))
interval <- 1000
for(q in 1:nrow(bamDF)){
for(z in 1:ceiling((max(pos, na.rm=TRUE)-min(pos, na.rm=TRUE))/interval)){
if(min(pos, na.rm=TRUE)+interval*(z-1)<pos[q]&&pos[q]<(min(pos, na.rm=TRUE)+interval*(z))){
out[z,] <- out[z,]+1;
}
}
}
detach(bamDF)
You can use the cut function
# set the seed to get a reproducible example
set.seed(12345)
min.val <- 0
max.val <- 5000
num.val <- 10000
# Generate some random values
values <- sample(min.val:max.val, num.val, replace=T)
interval <- 1000
num.split <- ceiling((max.val - min.val)/interval)+1
# Use cut to split the data.
# You can set labels=FALSE if you want the group number
# rather than the interval
groups <- cut(values, seq(min.val, max.val, length.out=num.split))
# Count the elements in each group
res <- table(groups)
res will contain:
groups
(0,1e+03] (1e+03,2e+03] (2e+03,3e+03] (3e+03,4e+03] (4e+03,5e+03]
1987 1974 2054 2000 1984
Similarly, you can just use the hist function:
h <- hist(values, 10) # 10 bins
or
h <- hist(values, seq(min.val, max.val, length.out=num.split))
h$counts contains the counts. Use plot=NULL if you don't want to plot the results.
grps <- seq(min(pos), max(pos), by= 1000)
counts <- table( findInterval( pos, c(grps, Inf) ) )
names(counts) <- grps

Eloquently change many raster cell values in R

I have a Landfire Existing Vegetation dataset (http://www.landfire.gov/), which I have projected and cropped to my study site. The raster has ~12,000,000 cells. The cell values represent a particular vegetation type, and the values range from 16:2200. All of those values are not represented in my study area (i.e. values jump from 20 to 1087).
As many of the pixels' values can be lumped together into one classification for my purposes (e.g. different shrub communities into one class), I wanted to reset the values of the raster to simpler values (1:11). This will facilitate easy extraction of data from other rasters by vegetation type and ease of plotting the classification map. I have a working code, but it requires a ton of typing to change all 61 of the values I need to change. Here's what I did:
#===============================
############Example#############
#===============================
library(raster)
r <- raster(nrows=30, ncols=10, xmn=0, xmx=10)
r[] <- rep(10:19, 30)
r.omance <- function(x){
x[x==10] <- 1; x[x==11] <- 1; x[x==12] <- 1
x[x==13] <- 1; x[x==14] <- 1; x[x==15] <- 1
x[x==16] <- 2; x[x==17] <- 2; x[x==18] <- 2
x[x==19] <- 2
return(x)}
reclass <- calc(r, fun = r.omance)
Does anyone know of an easier way to go about this? You can imagine the typing to change 61 values, especially since x[x==16:20] <- 1 was producing an error, so every value had to be typed out separately. As I said, my code works. But I just want to become a better R coder.
Thanks.
You could use %in%:
x %in% c(1,4,3:10)
This:
x[x==10] <- 1; x[x==11] <- 1; x[x==12] <- 1
x[x==13] <- 1; x[x==14] <- 1; x[x==15] <- 1
would reduce to:
x[x %in% 10:15]
I would use the reclassify function
library(raster)
r <- raster(nrows=30, ncols=10, xmn=0, xmx=10)
r[] <- rep(10:19, 30)
rc <- matrix(c(10,15,1,16,19,2), ncol=3, byrow=TRUE)
x <- reclassify(r, rc, right=NA)
You will save yourself a bit of typing using the & logical operator, e.g.
x[ x >= 10 & x <= 15 ] <- 1
x[ x >= 16 & x <= 19 ] <- 2

How to create adjacency matrix from grid coordinates in R?

I'm new to this site. I was wondering if anyone had experience with turning a list of grid coordinates (shown in example code below as df). I've written a function that can handle the job for very small data sets but the run time increases exponentially as the size of the data set increases (I think 800 pixels would take about 25 hours). It's because of the nested for loops but I don't know how to get around it.
## Dummy Data
x <- c(1,1,2,2,2,3,3)
y <- c(3,4,2,3,4,1,2)
df <- as.data.frame(cbind(x,y))
df
## Here's what it looks like as an image
a <- c(NA,NA,1,1)
b <- c(NA,1,1,1)
c <- c(1,1,NA,NA)
image <- cbind(a,b,c)
f <- function(m) t(m)[,nrow(m):1]
image(f(image))
## Here's my adjacency matrix function that's slowwwwww
adjacency.coordinates <- function(x,y) {
df <- as.data.frame(cbind(x,y))
colnames(df) = c("V1","V2")
df <- df[with(df,order(V1,V2)),]
adj.mat <- diag(1,dim(df)[1])
for (i in 1:dim(df)[1]) {
for (j in 1:dim(df)[1]) {
if((df[i,1]-df[j,1]==0)&(abs(df[i,2]-df[j,2])==1) | (df[i,2]-df[j,2]==0)&(abs(df[i,1]-df[j,1])==1)) {
adj.mat[i,j] = 1
}
}
}
return(adj.mat)
}
## Here's the adjacency matrix
adjacency.coordinates(x,y)
Does anyone know of a way to do this that will work well on a set of coordinates a couple thousand pixels long? I've tried conversion to SpatialGridDataFrame and went from there but it won't get the adjacency matrix correct. Thank you so much for your time.
While I thought igraph might be the way to go here, I think you can do it more simply like:
result <- apply(df, 1, function(pt)
(pt["x"] == df$x & abs(pt["y"] - df$y) == 1) |
(abs(pt["x"] - df$x) == 1 & pt["y"] == df$y)
)
diag(result) <- 1
And avoid the loopiness and get the same result:
> identical(adjacency.coordinates(x,y),result)
[1] TRUE

Resources