Natural Neighbor Interpolation in R - r

I need to conduct Natural Neighbor Interpolation (NNI) via R in order to smooth my numeric data. For example, say I have very spurious data, my goal is to use NNI to model the data neatly.
I have several hundred rows of data (one observation for each postcode), alongside latitudes and longitudes. I've made up some data below:
Postcode lat lon Value
200 -35.277272 149.117136 7
221 -35.201372 149.095065 38
800 -12.801028 130.955789 27
801 -12.801028 130.955789 3
804 -12.432181 130.84331 29
810 -12.378451 130.877014 20
811 -12.376597 130.850489 3
812 -12.400091 130.913672 42
814 -12.382572 130.853877 32
820 -12.410444 130.856124 39
821 -12.426641 130.882367 39
822 -12.799278 131.131697 49
828 -12.474896 130.907378 38
829 -14.460879 132.280002 34
830 -12.487233 130.972637 8
831 -12.480066 130.984006 49
832 -12.492269 130.990891 29
835 -12.48138 131.029173 33
836 -12.525546 131.103025 40
837 -12.460094 130.842663 39
838 -12.709507 130.995407 28
840 -12.717562 130.351316 22
841 -12.801028 130.955789 8
845 -13.038663 131.072091 19
846 -13.226806 131.098416 50
847 -13.824123 131.835799 11
850 -14.464497 132.262021 2
851 -14.464497 132.262021 23
852 -14.92267 133.064654 36
854 -16.81839 137.14707 17
860 -19.648306 134.186642 3
861 -18.94406 134.318373 8
862 -20.231104 137.762232 28
870 -12.436101 130.84059 24
871 -12.436101 130.84059 16
Is there any kind of package that will do this? I should mention, that the only predictors I am using in this model are latitude and longitude. If there isn't a package than can do this, how can I implement it manually. I've searched extensively and I can't figure out how to implement this in R. I have seen one or two other SO posts, but they haven't assisted me in figuring this out.
Please let me know if there's anything I must add to the question. Thanks.

I suggest the following:
Reproject the data to the corresponding UTM Zone.
Use R WhiteboxTools package to process the data using natural neighbour interpolation.

Related

Indexing through 'names' in a list and performing actions on contained values in R

I have a data set of counts from standard solutions passed through an instrument that analyses chemical concentrations (an ICPMS for those familiar). The data is over a range of different standards and for each standard I have four repeat measurements that I want to calculate the mean and variance of.
I'm importing the data from an excel spreadsheet and then, following some housekeeping such as getting dates and times in the right format, I split the the dataset up into a list identified by the name of the standard solution using Count11.sp<-split(Count11.raw, Count11.raw$Type). Count11.raw$Type then becomes the list element name and I have the four count results for each chemical element in that list element.
So far so good.
I find I can yield an average (mean, median etc) easily enough by identifying the list element specifically i.e. mean(Count11.sp$'Ca40') , or sapply(Count11$'Ca40', median), but what I'm not able to do is automate that in a loop so that I can calculate the means for each standard and drop that into a numerical matrix for further manipulation. I can extract the list element names with names() and I can even use a loop to make a vector of all the names and reference the specific list element using these in a for loop.
For instance Count11.sp[names(Count11.sp[i])]will extract the full list element no problem:
$`Post Ca45t`
Type Run Date 7Li 9Be 24Mg 43Ca 52Cr 55Mn 59Co 60Ni
77 Post Ca45t 1 2011-02-08 00:13:08 114 26101 4191 453525 2632 520 714 2270
78 Post Ca45t 2 2011-02-08 00:13:24 114 26045 4179 454299 2822 524 704 2444
79 Post Ca45t 3 2011-02-08 00:13:41 96 26372 3961 456293 2898 520 762 2244
80 Post Ca45t 4 2011-02-08 00:13:58 112 26244 3799 454702 2630 510 792 2356
65Cu 66Zn 85Rb 86Sr 111Cd 115In 118Sn 137Ba 140Ce 141Pr 157Gd 185Re 208Pb
77 244 1036 56 3081 44 520625 78 166 724 10 0 388998 613
78 250 982 70 3103 46 526154 76 174 744 16 4 396496 644
79 246 1014 36 3183 56 524195 60 198 744 2 0 396024 612
80 270 932 60 3137 44 523366 70 180 824 2 4 390436 632
238U
77 24
78 20
79 14
80 6
but sapply(Count11.sp[names(count11.sp[i])produces an error message: Error in median.default(X[[i]], ...) : need numeric data
while sapply(Input$Post Ca45t, median) <'Post Ca45t' being name Count11.sp[i] i=4> does exactly what I want and produces the median value (I can clean that vector up later for medians that don't make sense) e.g.
Type Run Date 7Li 9Be 24Mg
NA 2.5 1297109612.5 113.0 26172.5 4070.0
43Ca 52Cr 55Mn 59Co 60Ni 65Cu
454500.5 2727.0 520.0 738.0 2313.0 248.0
66Zn 85Rb 86Sr 111Cd 115In 118Sn
998.0 58.0 3120.0 45.0 523780.5 73.0
137Ba 140Ce 141Pr 157Gd 185Re 208Pb
177.0 744.0 6.0 2.0 393230.0 622.5
238U
17.0
Can anyone give me any insight into how I can automate (i.e. loop through) these names to produce one median vector per list element? I'm sure there's just some simple disconnect in my logic here that may be easily solved.
Update: I've solved the problem. The way to do so is to use tapply on the original dataset with out the need to split it. tapply allows functions to be applied to data based on a user defined grouping criteria. In my case I could group according to the Count11.raw$Type and then take the mean of the data subset. tapply(Count11.raw$Type, Count11.raw[,3:ncol(Count11.raw)], mean), job done.

issue with normalizing variable

I am trying to normalize a variable using Box-Cox. However, I am receiving an error message:
boxcox_obj <- boxcox(alive_data_4$mosslpadeq)
Error in estimate_boxcox_lambda(x, ...) : x must be positive
I read online that you can get this message when the variable has negative values. However, that is not the case with this variable (see frequency below).
table(alive_data_4$mosslpadeq)
0 10 20 30 40 50 60 70 80 90 100
766 635 2141 1756 3355 1913 2095 1400 4498 1361 2228
Can someone advise?

R is not taking the parameter hgap in layout_with_sugiyama

I'm working on R on a graph and I'd like to have a hierarchical plot, based on the values in the vector S (a value for each node).
lay2 <- layout_with_sugiyama(grafo, attributes="all", layers = S, hgap=10, vgap=10)
plot(lay2$extd_graph, vertex.label.cex=0.5)
However, the paramaters hgap e vgap are not taken and the graph is really confused (even because I've got 162 nodes).
I'm doing something wrong or there is another way in which I can do a hierarchical graph?
I believe that layout_with_sugiyama is working just fine,
but you may be misinterpreting the output. Since you do
not provide any data, I will illustrate with some randomly
generated data.
library(igraph)
set.seed(1234)
grafo = erdos.renyi.game(162, 0.03)
lay2 <- layout_with_sugiyama(grafo, attributes="all",
hgap=10, vgap=10)
plot(lay2$extd_graph, vertex.label.cex=0.5, vertex.size=9)
I think the source of your question is the fact that the nodes
are a bit crowded together in the horizontal direction. But
that should be expected. Let's analyze the layout, starting
with the easy part, the vertical direction.
table(lay2$layout[,2])
1 11 21 31 41
24 82 42 13 1
You can see that vgap worked. The spacing is 10 units apart.
The second line up (y=11) has 82 nodes. Unless the nodes are
tiny, 82 nodes on a single, horizontal line will overlap.
But aren't they supposed to have spacing of at least 10?
They do! Let's look at that second line.
sort(lay2$layout[lay2$layout[,2]==11,1])
[1] -25 -15 -5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 230
[19] 240 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420
[37] 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600
[55] 610 620 630 640 655 665 675 685 695 720 730 740 750 760 770 780 790 800
[73] 810 820 830 840 850 860 870 880 890 910
Looking at the whole graph, there is a slightly broader range.
range(lay2$layout[,1])
[1] -65 910
None of the numbers are less that 10 apart - as requested. hgap worked too!
However, what happens when you try to plot that? If you read the part of the
?igraph.plotting help page that refers to the parameter rescale,
you will see:
rescale:
Logical constant, whether to rescale the coordinates to the [-1,1]x-1,1 interval. Defaults to TRUE, the layout will be rescaled.
So the layout will be rescaled to a range of -1,1 and then plotted.
Scaled or not, you need to fit 82 nodes in a single, horizontal row,
so it is very difficult to avoid overlapping nodes.

How ask R not to combine the X axis values for a bar chart?

I am a beginner with R . My data looks like this:
id count date
1 210 2009.01
2 400 2009.02
3 463 2009.03
4 465 2009.04
5 509 2009.05
6 861 2009.06
7 872 2009.07
8 886 2009.08
9 725 2009.09
10 687 2009.10
11 762 2009.11
12 748 2009.12
13 678 2010.01
14 699 2010.02
15 860 2010.03
16 708 2010.04
17 709 2010.05
18 770 2010.06
19 784 2010.07
20 694 2010.08
21 669 2010.09
22 689 2010.10
23 568 2010.11
24 584 2010.12
25 592 2011.01
26 548 2011.02
27 683 2011.03
28 675 2011.04
29 824 2011.05
30 637 2011.06
31 700 2011.07
32 724 2011.08
33 629 2011.09
34 446 2011.10
35 458 2011.11
36 421 2011.12
37 459 2012.01
38 256 2012.02
39 341 2012.03
40 284 2012.04
41 321 2012.05
42 404 2012.06
43 418 2012.07
44 520 2012.08
45 546 2012.09
46 548 2012.10
47 781 2012.11
48 704 2012.12
49 765 2013.01
50 571 2013.02
51 371 2013.03
I would like to make a bar graph like graph that shows how much what is the count for each date (dates in format of Month-Y, Jan-2009 for instance). I have two issues:
1- I cannot find a good format for a bar-char like graph like that
2- I want all of my data-points to be present in X axis(date), while R aggregates it to each year only (so I inly have four data-points there). Below is the current command that I am using:
plot(df$date,df$domain_count,col="red",type="h")
and my current plot is like this:
Ok, I see some issues in your original data. May I suggest the following:
Add the days in your date column
df$date=paste(df$date,'.01',sep='')
Convert the date column to be of date type:
df$date=as.Date(df$date,format='%Y.%m.%d')
Plot the data again:
plot(df$date,df$domain_count,col="red",type="h")
Also, may I add one more suggestion, have you used ggplot for ploting chart? I think you will find it much easier and resulting in better looking charts. Your example could be visualized like this:
library(ggplot2) #if you don't have the package, run install.packages('ggplot2')
ggplot(df,aes(date, count))+geom_bar(stat='identity')+labs(x="Date", y="Count")
First, you should transform your date column in a real date:
library(plyr) # for mutate
d <- mutate(d, month = as.numeric(gsub("[0-9]*\\.([0-9]*)", "\\1", as.character(date))),
year = as.numeric(gsub("([0-9]*)\\.[0-9]*", "\\1", as.character(date))),
Date = ISOdate(year, month, 1))
Then, you could use ggplot to create a decent barchart:
library(ggplot2)
ggplot(d, aes(x = Date, y = count)) + geom_bar(fill = "red", stat = "identity")
You can also use basic R to create a barchart, which is however less nice:
dd <- setNames(d$count, format(d$Date, "%m-%Y"))
barplot(dd)
The former plot shows you the "holes" in your data, i.e. month where there is no count, while for the latter it is even wuite difficult to see which bar corresponds to which month (this could however be tweaked I assume).
Hope that helps.

Binning a dataframe with equal frequency of samples

I have binned my data using the cut function
breaks<-seq(0, 250, by=5)
data<-split(df2, cut(df2$val, breaks))
My split dataframe looks like
... ...
$`(15,20]`
val ks_Result c
15 60 237
18 70 247
... ...
$`(20,25]`
val ks_Result c
21 20 317
24 10 140
... ...
My bins looks like
> table(data)
data
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35]
0 0 0 7 128 2748 2307
(35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
1404 11472 1064 536 7389 1008 1714
(70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105]
2047 700 329 1107 399 376 323
(105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140]
314 79 1008 77 474 158 381
(140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175]
89 660 15 1090 109 824 247
(175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210]
1226 139 531 174 1041 107 257
(210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245]
72 671 98 212 70 95 25
(245,250]
494
When I mean the bins, I get on an average of ~900 samples
> mean(table(data))
[1] 915.9
I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe.
I also tried Hmisc package, unfortunately the bins don't contain equal frequency!!
library(Hmisc)
data<-split(df2, cut2(df2$val, g=30, oneval=TRUE))
data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like:
df <- data.frame(var=runif(500, 0, 100)) # make data
cut.vec <- cut(
df$var,
breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles
include.lowest=T
)
df.split <- split(df, cut.vec)
Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
data<-split(df2, EqualFreq2(df2$val, 25))

Resources