How do I select data inside a density curve in R?

How do I select data inside a density curve in R? - r

I have a 2 varaible data set that I have to plot (na and ob). I applied the kde2d kernel and plotted 1 to 4 sigma density curves (confidencebound). I need to select those points that are inside 2 sigma curves (letting out all those between 1 and 2 sigmas), but not just in the plot, I neet select them from the data set, put them in a new list. Could you please help me with this?
kde_BPT <- kde2d(na,ob, n=1000, lims=c(-2,2,-1.5,1.5))
confidencebound <- quantile(kde_BPT$z, probs=c(0.685,0.955,0.9975,0.99995), na.rm = TRUE)
The data are to large to paste here. I put here the plot if that helps, I need to know which data points (any colour) are in the area between the contour curves of 1 and 2 (sigmas).
The plot
Thanks for your help.

Related

How to simply extract specific values of regression curve?

I would like to extract several predicted y-values for the x-values given from this graph :
I know that it is possible to get the x and y coordinates of the curve by using the following function :
coordinate <- ggplot_build(curve)$data[[2]][,c("x","y")]
head(coordinate,n = 6L)
# x y
1 0.1810660 32845.225
2 0.4810660 27635.136
3 0.7553301 23904.792
4 1.3295942 18316.923
5 1.8288582 15092.595
6 5.0312446 8018.707
Is there a function that allows you to directly obtain the predicted value of y for a given x value that does not appear in coordinate such as for example 3.5?

As Gregor mentions, you should fit a model aside of the plot.
Best you can do to "simply" obtain a value otherwise is an interpolating spline
sfun = splinefun(coordinate)
sfun(3.5)

Creating Heat Map using Krigging

I'm trying to create a good heat map using Krigging for missing values.
I have the following data, that contains all the values that have been measured for RLevel.
I followed the following link that tells how to use krigging. https://rpubs.com/nabilabd/118172
This is the following code I wrote. Before these steps, I had removed all the values from my DieData that
needed values to be tested. The values that need to be tested are refered as die.data.NAValues in my code.
#**************************************************CODE*****************
#Step3: Convert to SpatialPointsDataFrame Object
coordinates(die.data) = ~X+Y
#Step 4: Get the prediction Grid
coordinates(die.data.NAValues)=~X+Y
#Using autokride method
kr = autoKrige(RLevel, die.data, die.data.NAValues,nmax=20)
predicted_die_values <- kr$krige_output
predicted_die_model <- kr$var_model
#Get Predictions. Plot the predicted on heat map.
g <- gstat(NULL,"RLevel",RLevel~1,die.data, model=predicted_die_model,nmax=1)
predictedSet <- predict(g,newdata=die.data,BLUE=TRUE)
#Plot the krigging graph
predicted_die_values %>% as.data.frame %>% ggplot(aes(x=X,y=Y)) + geom_tile(aes(fill=v1.pred))+coord_equal() +scale_fill_gradient(low="yellow",high="red")+scale_x_continuous()+scale_y_continuous()+theme_bw()
When I plot the graph, I get the following image from the values that have been tested by the KRIGING METHOD.
My question is how can I show a good heat map with predicted points from KRIG and from the points already have. I want my graph to show something like this from the link above I had posted.
Description about my dataset: My original dataset including NA values that have not been tested contains around 55057 points. When I take out NA values and use that are my prediction grid, I get 390 points. Majority of the values for RLevel are within 30's range except around 100-200 points are above 100.
Can anyone help me out or give me guidance of how to produce a good heatmap?

Can I convert kernel density plots to raster and then overlay them using map algebra?

I am currently exploring three shapefiles, each with point data, and all confined to the same window. I also have them in ppp format which I've used to create kernel density maps.
plot(density.ppp(smktppp, 0.5, edge=T), main="Supermarket Density")
plot(density.ppp(tptppp, 0.5, edge=T), main="Transport Density")
plot(density.ppp(farmppp, 0.5, edge=T), main="Urban Farm Density")
I would like to overlay these plots, using map algebra, or fuzzy logic, etc, to create one output map showing the density of the three combined. How would I go about doing this in R?

If you simply want to estimate the overall density (usually called intensity since it doesn't integrate to one) of points disregarding whether it is "Supermarket", "Transport" or "Urban Farm" you just combine all the points and do as before:
library(spatstat)
combined <- superimpose(smktppp, tptppp, farmppp)
plot(density(combined), main="Density of all points.")
Of course you can choose the smoothing bandwidth to be 0.5 as before or any other value you like.
You can also do normal algebra with the raster images produced by density.ppp (object of class im). If you saved these as smktim, tptim and farmim you can do something like rslt <- smktim + tptim + farmim to get the sum of the three estimates.

Alternatively, if you combine the three point patterns into a single object
library(spatstat)
X <- superimpose(Supermarket=smktppp, Transport=tptppp, Farm=farmppp)
then you can display both the original data and the intensity estimates, with or without identifying the kind of point:
# original data:
plot(X) # single plot with 3 different plot characters for 3 types
plot(split(X)) # three plot panels, one for each type of point
plot(unmark(X)) # single plot without distinguishing types of points
# intensity images:
plot(density(X, 0.5)) # single plot: intensity regardless of type
plot(density(split(X), 0.5)) # three panels: intensity for each type
plot(relrisk(X), 0.5) # three panels: relative probabilities of each type
See the spatstat book for details.

Showing the intersection of two expressions

I try to plot 3D graph on Maple18. I plot 2 graphs on the same plane and I want it to show all intercepts. I actually want only integral intercepts if it's possible but I don't know the command.
Here is the graph I want it to show the intercepts
plot3d([x^2, 3^6*z-432], x = -50 .. 50, z = 0 .. 20)

If you are interested in just showing the intersection of the two expressions, you might try the plots:-intersectplot command. The following will show the intersection for the two surfaces:
p1 := plots:-intersectplot(y=x^2, y=3^6*z-432, x=-50..50, z=0..20, y=0..14000);
If you want to then superimpose this on your original plot:
p2 := plot3d([x^2, 3^6*z-432], x=-50..50, z=0..20):
plots:-display([p1,p2]);

Identifying data points amongst background noise for binned data R

Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")

Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I select data inside a density curve in R? - r

Related

How to simply extract specific values of regression curve?

Creating Heat Map using Krigging

Can I convert kernel density plots to raster and then overlay them using map algebra?

Showing the intersection of two expressions

Identifying data points amongst background noise for binned data R

Categories

Resources