Why does my detrending not result in a different semivariogram in R? - r

I have a series of samples, which I wish to construct a variogram model of, and eventual kriging model. First, I neeed to detrend the data, as shown below:
samples
x y z
1 180 1180 2.763441
2 180 240 -2.000000
3 380 1840 1.720087
4 720 80 4.056754
5 860 800 4.361503
6 620 1360 4.737717
7 980 1920 4.352956
8 1680 260 4.568255
9 1520 800 5.025272
10 1100 1220 4.693432
11 800 1460 2.470927
12 360 1900 1.455169
13 700 1760 2.894159
14 720 1540 2.115742
15 660 1480 1.749017
16 540 1680 3.291592
17 260 1280 2.962401
18 440 1640 2.422442
19 280 1260 2.966076
20 580 1580 3.178913
21 600 1220 3.752786
22 240 1700 1.748011
23 480 1440 3.106302
24 740 1880 4.827699
25 760 1320 3.603621
26 1560 1640 5.410076
27 1960 1980 6.145778
28 1520 1620 5.499064
29 1900 1820 5.316121
30 1780 1580 5.318344
31 100 740 2.019103
32 180 760 2.353693
33 140 200 1.714856
34 380 720 3.526107
35 240 580 3.075283
36 260 600 3.329397
37 340 360 3.188613
38 280 680 2.626241
39 420 700 3.211163
40 500 240 2.960805
41 460 280 3.171664
42 480 300 2.828883
43 400 640 3.227938
44 440 480 2.420358
45 300 560 4.021187
46 1380 220 5.364264
47 1500 740 5.344526
48 1240 380 4.632060
49 1420 360 4.012537
50 1280 800 4.122139
51 1400 600 5.033020
52 1300 640 4.215308
53 1460 200 5.116025
54 1220 440 4.550290
55 1200 520 3.788613
56 1540 340 5.772432
57 1520 660 5.656598
58 1480 260 5.423685
59 1360 780 4.728220
60 1260 240 3.683696
print(mean(samples$z))
h <- gstat(formula=z~1, locations=~x+y, data=samples)
samples.vgm <- variogram(h)
plot(samples.vgm,main='Variogram of Samples NOT detrended')
z = samples$z
x = samples$x
y = samples$y
trend <- lm(z~x+y)
c = trend$coefficients[[1]]
a = trend$coefficients[[2]]
b = trend$coefficients[[3]]
Xs <- c()
Ys <- c()
Zs <- c()
print('started the loop')
for (i in 1:nrow(samples)){
i = samples[i,]
x=i$x
y=i$y
z=i$z
z_prime = z - (a*x+b*y+c)
Xs <- c(Xs,x)
Ys <- c(Ys,y)
Zs <- c(Zs,z_prime)
}
sampled <- data.frame(Xs,Ys,Zs)
print(sampled)
print('the length of sampled is')
print(length(sampled[[1]]))
print(levelplot(Zs~Xs+Ys,sampled))
x <- seq(0,2000,by=20)
y <- seq(0,2000,by=20)
pred.grid <- data.frame(x=rep(x,times=length(y)),y=rep(y,each=length(x)))
g <- gstat(formula=Zs~1, locations=~Xs+Ys, data=sampled)
sampled.vgm <- variogram(g)
plot(sampled.vgm,main='Variogram of Samples hopefully detrended')
The problem is that the plot of detrended variogram (i.e. the variogram g above) looks exactly the same as the variogram h also above, which is NOT detrended. any reason why this happens??
The data is clearly different, The mean of the values in the detrended version is 0, as expected, but the non-detrended version the mean is around 3.556, also as expected.
Is there something I'm not catching here?

Not sure this question belongs here, since I think the issue is conceptual, not related to your code. I'm new, though, so I'll just go ahead and give you some quick feedback.
The variogram plots the variance (or semi-variance technically) of your data within a given spatial lag. When you apply a linear transformation to your data, I don't believe you'll alter the variance, and so you shouldn't see different parameters come out of the variogram model. Instead, your kriged surface just takes on a different mean value.
p.s. It would be helpful to make the code something that anyone can copy and paste -- e.g., include coded test data.

Related

Gompertz-Makeham parameter estimation

I would like estimate the parameters of the Gompert-Makeham distribution, but I haven't got a result.
I would like a method in R, like this Weibull parameter estimation code:
weibull_loglik <- function(parm){
gamma <- parm[1]
lambda <- parm[2]
loglik <- sum(dweibull(vec, shape=gamma, scale=lambda, log=TRUE))
return(-loglik)
}
weibull <- nlm(weibull_loglik,parm<-c(1,1), hessian = TRUE, iterlim=100)
weibull$estimate
c=weibull$estimate[1];b=weibull$estimate[2]
My data:
[1] 872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
[16] 20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
[31] 50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
[46] 374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
[61] 894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
[76] 2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
[91] 2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
[106] 12 6 3 2 1 1
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 32.0 314.0 900.9 1355.0 4054.0
It would be nice to have a reproducible example, but something like:
library(bbmle)
library(eha)
set.seed(101)
vec <- rmakeham(1000, shape = c(2,3), scale = 2)
dmwrap <- function(x, shape1, shape2, scale, log) {
res <- try(dmakeham(x, c(shape1, shape2), scale, log = log), silent = TRUE)
if (inherits(res, "try-error")) return(NA)
res
}
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=1),
data = data.frame(y = vec),
method = "Nelder-Mead"
)
Define a wrapper that (1) takes shape parameters as separate values; (2) returns NA rather than throwing an error when e.g. parameters are negative
Use Nelder-Mead rather than default BFGS for robustness
the fitdistrplus package might help too
if you're going to do a lot of this it may help to fit parameters on the log scale (i.e. use parameters logshape1, etc., and use exp(logshape1) etc. in the fitting formula)
I had to work a little harder to fit your data; I scaled the variable by 1000 (and found that I could only compute the log-likelihood; the likelihood gave an error that I didn't bother trying to track down). Unfortunately, it doesn't look like a great fit (too many small values).
x <- scan(text = "872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
12 6 3 2 1 1")
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=10000),
data = data.frame(y = x/1000),
method = "Nelder-Mead"
)
cc <- as.list(coef(m1))
png("gm.png")
hist(x,breaks = 25, freq=FALSE)
with(cc,
curve(exp(dmwrap(x/1000, shape1, shape2, scale, log = TRUE))/1000, add = TRUE)
)
dev.off()

How to calculate Williams %R in RStudio?

I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060

How do I get the same print format to output within julia code that I get in the julia console?

When using the julia console and you type something like this:
[10,20]*[1:100,1:100]'
You will get output like this:
2x200 Array{Int64,2}:
10 20 30 40 50 60 70 80 90 100 … 930 940 950 960 970 980 990 1000
20 40 60 80 100 120 140 160 180 200 1860 1880 1900 1920 1940 1960 1980 2000
How can I get this output format when executing code like this julia my_code.jl ?
Right now I am using println() and #show, but they output the full array and no information about the dimensions or type, which would be great to see. I also usually don't need to see a full 3x60,000 element matrix printed, but it would often be good to see the first and last few elements. Is there any easy way to do this (get the same output the julia console formats so nicely)?
You could use display:
(3.5.1) dsm#notebook:~/coding$ less d.jl
x = [10;20]*[1:100;1:100]';
display(x)
println()
(3.5.1) dsm#notebook:~/coding$ julia d.jl
2x200 Array{Int32,2}:
10 20 30 40 50 60 70 80 90 100 110 120 … 930 940 950 960 970 980 990 1000
20 40 60 80 100 120 140 160 180 200 220 240 1860 1880 1900 1920 1940 1960 1980 2000
If I'm reading the source right, this ultimately delegates to writemime via TextDisplay (as discussed here).

Reading html tables in R using readHTMLTable where most work but some do not

I am trying to read a few hundred html tables using readHTMLTable in R. This works mostly fine, expect for a couple of tables. The tables look fine in firefox.
Specifically, tables are by year and state. The following code reads the first table for Maryland in 2005 and works fine:
readHTMLTable("http://www.ssa.gov/policy/docs/statcomps/oasdi_sc/2005/md.html", header=FALSE)[[1]]
However, when trying to do this for Maryland and 2006, the table consists only of the first row of numbers.
readHTMLTable("http://www.ssa.gov/policy/docs/statcomps/oasdi_sc/2006/md.html", header=FALSE)[[1]]
I'm not sure where the problem is and appreciate if anyone could point me toward that.
Stephan
The problem I see is in the second URL "http://www.ssa.gov/policy/docs/statcomps/oasdi_sc/2006/md.html" if you inspect the source code you will see that in the "table 4" there is 2 "tbody". Then I think that readHTMLTable read the first tbody it founds in the page. Thats why you only get the "first" row (which is the first tbody tag)
You need to precise the tbody you want, in your case it's the 2nd tbody of the table in the div with the id "table4", you can identify this node by "//div[#id='table4']/table/tbody[2]"
doc <- "http://www.ssa.gov/policy/docs/statcomps/oasdi_sc/2006/md.html"
body <- getNodeSet(htmlParse(doc), "//div[#id='table4']/table/tbody[2]")[[1]]
> readHTMLTable(body)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 Allegany 16,515 10,060 1,060 120 1,955 560 2,225 75 460 4,835 7,050
2 Anne Arundel 69,550 47,150 3,245 475 6,380 2,750 7,760 100 1,690 21,900 29,260
3 Baltimore 136,035 91,755 6,040 1,200 13,470 5,545 14,695 175 3,155 41,480 61,215
4 Calvert 10,655 7,035 450 95 1,035 520 1,195 20 305 3,225 4,330
5 Caroline 5,955 3,835 180 70 575 245 835 10 205 1,760 2,355
6 Carroll 24,835 17,205 1,030 160 2,270 825 2,675 30 640 7,635 10,880
7 Cecil 15,030 8,870 630 140 1,415 725 2,435 55 760 4,210 5,330
8 Charles 14,925 9,305 625 135 1,320 950 2,040 20 530 4,275 5,625
9 Dorchester 6,980 4,860 165 65 550 265 895 10 170 2,105 2,875
10 Frederick 28,270 18,950 1,300 225 2,585 1,025 3,205 40 940 8,550 12,080
11 Garrett 6,300 3,760 435 45 780 225 855 40 160 1,910 2,485
12 Harford 34,695 23,020 1,540 235 3,330 1,365 4,140 60 1,005 10,540 14,330
13 Howard 26,855 18,825 1,150 260 2,085 1,275 2,555 25 680 8,595 11,330
14 Kent 5,385 3,865 280 40 485 125 500 5 85 1,815 2,385
15 Montgomery 105,195 76,640 6,085 1,105 8,810 3,105 7,615 80 1,755 35,725 50,710
16 Prince George's 84,190 53,900 2,770 1,025 6,370 5,815 11,420 75 2,815 24,310 32,780
17 Queen Anne's 7,050 5,030 310 50 545 225 695 15 180 2,395 2,825
18 St. Mary's 11,220 7,195 570 95 1,135 520 1,380 10 315 3,540 4,380
19 Somerset 4,625 3,055 155 55 385 180 665 10 120 1,365 1,830
20 Talbot 9,260 6,910 485 70 780 170 695 5 145 3,255 4,105
21 Washington 25,385 16,440 1,245 225 2,500 900 3,290 65 720 7,585 10,595
22 Wicomico 16,040 10,700 490 140 1,300 690 2,205 35 480 4,680 6,480
23 Worcester 13,365 10,235 440 70 965 275 1,130 20 230 4,605 5,765

Sample Function R does not produce uniformly distributed sample

I am creating a survey. There are 31 possible questions, I would like each respondent to answer a subset of 3. I would like them to be administered in a random order. Participants should not answer the same questions twice
I have created a table matrix with a participant index, and a column for the question indices for the 1st, 2nd and 3rd questions.
Using the code below, index 31 is under-represented in my sample.
I think I am using the sample function incorrectly. I was hoping someone could please help me?
SgPassCode <- data.frame(PassCode=rep(0,10000), QIndex1=rep(0,10000),
QIndex2=rep(0,10000), QIndex3=rep(0,10000))
set.seed(123)
for (n in 1:10000){
temp <- sample(31,3,FALSE)
SgPassCode[n,1] <- n
SgPassCode[n,-1] <- temp
}
d <- c(SgPassCode[,2],SgPassCode[,3],SgPassCode[,4])
hist(d)
The issue is with hist and the way it picks its bins, not sample. Proof is the output of table:
table(d)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# 1003 967 938 958 989 969 988 956 983 990 921 1001 982 1016 1013 959
# 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
# 907 918 918 991 931 945 998 1017 1029 980 959 886 947 987 954
If you want hist to "work", hist(d, breaks = 0:31) (and certainly a lot of other things) will work.

Resources