How to find changing points in a dataset

How to find changing points in a dataset - r

I need to find the points at which an increasing or decreasing trend starts and ends. In this data, a difference of ~10 between consecutive values is considered noise (i.e. not an increase or decrease). From the sample data given below, the first increasing trend would start at 317 and end at 432, and another would start at 441 and end at 983. Each of these points are to be recorded in a separate vector.
sample<- c(312,317,380,432,438,441,509,641,779,919,
983,980,978,983,986,885,767,758,755)
Below is an image of the main change points. Can anyone suggest an R method for this?

Here's how to make the change point vector:
vec <- c(100312,100317,100380,100432,100438,100441,100509,100641,100779,100919,
100983,100980,100978,100983,100986,100885,100767,100758,100755,100755)
#this finds your trend start/stops
idx <- c(cumsum(rle(abs(diff(vec))>10)$lengths)+1)
#create new vector of change points:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767 100755
#(opt.) to ignore the first and last observation as a change point:
idx <- idx[which(idx!=1 & idx!=length(vec))]
#update new vector if you want the "opt." restrictions applied:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767
#you can split newVec by start/stop change points like this:
start_changepoints <- newVec[c(TRUE,FALSE)]
print(start_changepoints)
[1] 100317 100441 100986
end_changepoints <- newVec[c(FALSE,TRUE)]
print(end_changepoints)
[1] 100432 100983 100767
#to count the number of events, just measure the length of start_changepoints:
length(start_changepoints)
[1] 3
If you then want to plot that, you can use this:
require(ggplot2)
#preps data for plot
df <- data.frame(vec,trends=NA,cols=NA)
df$trends[idx] <- idx
df$cols[idx] <- c("green","red")
#plot
ggplot(df, aes(x=1:NROW(df),y=vec)) +
geom_line() +
geom_point() +
geom_vline(aes(xintercept=trends, col=cols),
lty=2, lwd=1) +
scale_color_manual(values=na.omit(df$cols),
breaks=na.omit(unique(df$cols)),
labels=c("Start","End")) +
xlab("Index") +
ylab("Value") +
guides(col=guide_legend("Trend State"))
Output:

Related

Get X value graph for a certain Y value

I am analyzing qPCR data and I have a Y-value threshold for which I want to get the corresponding X value.
This is my code for the plot:
library(ggplot2)
ggplot() +
geom_line(data=qPCR_amplification_plot_data_IFI6, aes(x=`Cycle`, y=`dRn...3`))+
geom_line(data=qPCR_amplification_plot_data_IFI6, aes(x=`Cycle`, y=`dRn...5`))+
geom_point()+
labs(y = "ΔRn", title = "Amplification plot", x= "Cycle")+
scale_y_continuous(trans='log10', limits = c(0.001,10))+
geom_hline(yintercept = 0.04)
I know how to get Y value for a certain X value:
Intersect <- approxfun(qPCR_amplification_plot_data_IFI6$Cycle, qPCR_amplification_plot_data_IFI6$dRn...3)
Intersect(X)
But I would like to get the X value of the Y threshold (for example 0.4). How can I do that?
qPCR amplification plot

You can probably do this using a combination of which() and near() (from dplyr). which() can evaluate "which part of this is TRUE", effectively:
x <- 1:100
> which(x==4)
[1] 4
near() can be used as a logical expression of "kind of close to" - so good for approximations:
4 == 4.01
[1] FALSE
near(4, 4.01, tol=0.01)
[1] TRUE
near(4, 4.01, tol=0.001)
[1] FALSE
So you can use those two together to basically answer the question: "Which X value in my vector Intersect is close to Y?"
which(near(Intersect, Y, tol = your.tolerance))
By the way, I am assuming that Intersect(X) in your code is a typo and you have a vector Intersect[X]

Mapping slope of an area and returning percent above and below a threshold in R

I am trying to figure our the proportion of an area that has a slope of 0, +/- 5 degrees. Another way of saying it is anything above 5 degrees and below 5 degrees are bad. I am trying to find the actual number, and a graphic.
To achieve this I turned to R and using the Raster package.
Let's use a generic country, in this case, the Philippines
{list.of.packages <- c("sp","raster","rasterVis","maptools","rgeos")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)}
library(sp) # classes for spatial data
library(raster) # grids, rasters
library(rasterVis) # raster visualisation
library(maptools)
library(rgeos)
Now let's get the altitude information and plot the slopes.
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
Not very helpful due to the scale, so let's simply look at the Island of Palawan
e <- drawExtent(show=TRUE) #to crop out Palawan (it's the long skinny island that is roughly midway on the left and is oriented between 2 and 8 O'clock)
gewataSub <- crop(x,e)
plot(gewataSub, 1)## Now visualize the new cropped object
A little bit better to visualize. I get a sense of the magnitude of the slopes and that with a 5 degree restriction, I am mostly confined to the coast. But I need a little bit more for analysis.
I would like Results to be something to be in two parts:
1. " 35 % (made up) of the selected area has a slope exceeding +/- 5 degrees" or " 65 % of the selected area is within +/- 5 degrees". (with the code to get it)
2. A picture where everything within +/- 5 degrees is one color, call it good or green, and everything else is in another color, call it bad or red.
Thanks

There are no negative slopes, so I assume you want those that are less than 5 degrees
library(raster)
elevation <- getData('alt', country='CHE')
x <- terrain(elevation, opt='slope', unit='degrees')
z <- x <= 5
Now you can count cells with freq
f <- freq(z)
If you have a planar coordinate reference system (that is, with units in meters or similar) you can do
f <- cbind(f, area=f[,2] * prod(res(z)))
to get areas. But for lon/lat data, you would need to correct for different sized cells and do
a <- area(z)
zonal(a, z, fun=sum)
And there are different ways to plot, but the most basic one
plot(z)

You can use reclassify from the raster package to achieve that. The function assigns each cell value that lies within a defined interval a certain value. For example, you can assign cell values within interval (0,5] to value 0 and cell values within the interval (5, maxSlope] to value 1.
library(raster)
library(rasterVis)
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
e <- drawExtent(show = TRUE)
gewataSub <- crop(x, e)
plot(gewataSub$slope, 1)
m <- c(0, 5, 0, 5, maxValue(gewataSub$slope), 1)
rclmat <- matrix(m, ncol = 3, byrow = TRUE)
rc <- reclassify(gewataSub$slope, rclmat)
levelplot(
rc,
margin = F,
col.regions = c("wheat", "gray"),
colorkey = list(at = c(0, 1, 2), labels = list(at = c(0.5, 1.5), labels = c("<= 5", "> 5")))
)
After the reclassification you can calculate the percentages:
length(rc[rc == 0]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # <= 5 degrees
[1] 0.6628788
length(rc[rc == 1]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # > 5 degrees
[1] 0.3371212

How to obtain the density estimation of a specific value in stats::density?

Suppose I have data like the following:
val <- .65
set.seed(1)
distr <- replicate(1000, jitter(.5, amount = .2))
d <- density(distr)
Since stats::density uses a specific bw, it does not include all possible values in the interval (becuase they're infinite):
d$x[ d$x > .64 & d$x < .66 ]
[1] 0.6400439 0.6411318 0.6422197 0.6433076 0.6443955 0.6454834 0.6465713 0.6476592 0.6487471
[10] 0.6498350 0.6509229 0.6520108 0.6530987 0.6541866 0.6552745 0.6563624 0.6574503 0.6585382
[19] 0.6596261
I would like to find a way to provide val to the density function, so that it will return its d$y estimate (I will then use it to color areas of the density plot).
I can't guess how silly this question is, but I can't find a fast solution.
I thought of obtaining it by a linear interpolation of the d$y corresponding to the two values of d$x that are closer to val. Is there a faster way?

This illustrates the use of approxfun:
> Af <- approxfun(d$x, d$y)
> Af(val)
[1] 2.348879
> plot(d(
+
> plot(d)
> points(val,Af(val) )
> png();plot(d); points(val,Af(val) ); dev.off()

How to dodge points in ggplot2 in R

df = data.frame(subj=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10), block=factor(rep(c(1,2),10)), acc=c(0.75,0.83,0.58,0.75,0.58,0.83,0.92,0.83,0.83,0.67,0.75,0.5,0.67,0.83,0.92,0.58,0.75,0.5,0.67,0.67))
ggplot(df,aes(block,acc,group=subj)) + geom_point(position=position_dodge(width=0.3)) + ylim(0,1) + labs(x='Block',y='Accuracy')
How do I get points to dodge each other uniformly in the horizontal direction? (I grouped by subj in order to get it to dodge at all, which might not be the correct thing to do...)

I think this might be what you were looking for, although no doubt you have solved it by now.
Hopefully it will help someone else with the same issue.
A simple way is to use geom_dotplot like this:
ggplot(df,aes(x=block,y=acc)) +
geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 0.03) + ylim(0,1) + labs(x='Block',y='Accuracy')
This looks like this:
Note that x (block in this case) has to be a factor for this to work.

If they don't have to be perfectly aligned horizontally, here's one quick way of doing it, using geom_jitter. You don't need to group by subj.
Method 1 [Simpler]: Using geom_jitter()
ggplot(df,aes(x=block,y=acc)) + geom_jitter(position=position_jitter(0.05)) + ylim(0,1) + labs(x='Block',y='Accuracy')
Play with the jitter width for greater degree of jittering.
which produces:
Method 2: Deterministically calculating the jitter value for each row
We first use aggregate to count the number of duplicated entries. Then in a new data frame, for each duplicated value, move it horizontally to the left by an epsilon distance.
df$subj <- NULL #drop this so that aggregate works.
#a new data frame that shows duplicated values
agg.df <- aggregate(list(numdup=seq_len(nrow(df))), df, length)
agg.df$block <- as.numeric(agg.df$block) #block is not a factor
# block acc numdup
#1 2 0.50 2
#2 1 0.58 2
#3 2 0.58 1
#4 1 0.67 2
#...
epsilon <- 0.02 #jitter distance
new.df <- NULL #create an expanded dataframe, with block value jittered deterministically
r <- 0
for (i in 1:nrow(agg.df)) {
for (j in 1:agg.df$numdup[i]) {
r <- r+1 #row counter in the expanded df
new.df$block[r] <- agg.df$block[i]
new.df$acc[r] <- agg.df$acc[i]
new.df$jit.value[r] <- agg.df$block[i] - (j-1)*epsilon
}
}
new.df <- as.data.frame(new.df)
ggplot(new.df,aes(x=jit.value,y=acc)) + geom_point(size=2) + ylim(0,1) + labs(x='Block',y='Accuracy') + xlim(0,3)
which produces:

about plot contour figure by using r code

I am a green-hand on R code. Now I meet some trouble in plotting contour figure by using R code.
I have checked help(filled.contour) which tells that if you want to plot the contour, x,y should be both in ascending order. Actually, I receive the data randomly, like:
latitude, longitude, value
37.651098 140.725082 9519
37.650765 140.725248 9519
37.692738 140.749118 23600
37.692737 140.749118 9911
37.692695 140.749107 16591
37.692462 140.74902 6350
37.692442 140.749052 5507
37.692413 140.749148 5476
37.692383 140.74929 7069
37.692357 140.749398 6152
37.692377 140.749445 6170
37.692355 140.749587 7163
37.692298 140.749672 6831
37.692292 140.749787 6194
37.692283 140.749903 6696
37.692342 140.750007 8204
37.692585 140.750037 2872
37.692648 140.749948 3907
37.692655 140.749827 4891
37.692667 140.749687 4899
How can I plot the contour figure!?
Here is my code:
args <- commandArgs(trailingOnly = TRUE)
data1 <- args[1]
outputDir <- args[2]
outputFig = paste(outputDir, "Cs13x.jpeg",sep="");
jpeg(file = outputFig, width = 800,height=600, pointsize=20)
pinkcol <- rgb(1,0.7,0.7)
gpsdata <- read.table(file=data1,sep=" ");
lat <- as.vector(gpsdata[,1]);
lon <- as.vector(gpsdata[,2]);
datas <- as.vector(gpsdata[,3]);
datas <- abs(datas)
#---Convert gpsdata into x,y coordinate---#
# Convert degree into value
lat_pi <- lat*pi/180;
lon_pi <- lon*pi/180;
# calculate the value into corresponding x,y coordinate
x = cos(lat_pi) * cos(lon_pi);
y = cos(lat_pi) * sin(lon_pi);
#----------#
dataMatrix = matrix(datas, nrow = length(datas), ncol=length(datas));
plot.new()
filled.contour(sort(x),sort(y, decreasing = TRUE),dataMatrix, col = rainbow(100), main="Contour Figure of Cs13x"); (**WRONG HERE!!!**)
dev.off()
<-------------- FINISH LINE ----------->

The 'akima' package will do it. It is designed to handle irregularly spaced z values. The first two points were widely separated from the rest and that made the results from the whole dataset look rather sketchy, so I omitted them.
require(akima)
gps.interp <- with( gpsdata[-(1:2), ], interp(x=latitude, y=longitude, z=value))
contour(gps.interp)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to find changing points in a dataset - r

Related

Get X value graph for a certain Y value

Mapping slope of an area and returning percent above and below a threshold in R

How to obtain the density estimation of a specific value in stats::density?

How to dodge points in ggplot2 in R

about plot contour figure by using r code

Categories

Resources