How to cancel a bias and analyse the data? - r

I have a data table like this one, I would like to know which type of substrate (called "Litières" / "Branchages" / "Racines") contributes the most to each score.
in r :
Substrate<-c('Litières','Litières','Racines','Branchages','Branchages','Litières','Branchages','Litières','Litières' )
One<-c(0,22,216,36,288,351,28,12,0)
Two<-c(574,755,1248,504,882,810,431,537,56)
Three<-c(1352,1248,706,1476,846,855,1334,1152,1628)
Four<-c(261,162,17,171,171,171,394,486,503)
x<-data.frame(Substrate,One,Two,Three,Four)
or in a table :
Substrate
One
Two
Three
Four
Litières
0
574
1352
261
Litières
22
755
1248
162
Racines
216
1248
706
17
Branchages
36
504
1476
171
Branchages
288
882
846
171
Litières
351
810
855
171
Branchages
28
431
1334
394
Litières
12
537
1152
486
Litières
0
56
1628
503
However you will notice that the number of substrate is not the same between each type of substrate. How to cancel this bias?
Thank !

Related

ADF test in R and Gretl - Why are the results different?

I am working on a time series-based study on the Czech Republic. I have macroeconomic data from 1993 to 2021. I tested my time series for stationarity using both R (function adfTest from package fUnitRoots) and Gretl. The results are significantly different to the point that for example the differences of GDP are strongly stationary according to Gretl, but nonstationary according to R. Both the test statistics and p-values are different. Do you have any idea why is that and which result is correct?
The test statistic for differences (I used the "constant" version and 3 lags as recommended by R)
According to R: -1.8587
According to Gretl: -4.27469
The p-values:
According to R: 0.3727
According to Gretl: 0.0004865
I am also enclosing the data
Year;GDP_(CZKm)
1993;1 205 330
1994;1 375 851
1995;1 596 306
1996;1 829 255
1997;1 971 024
1998;2 156 624
1999;2 252 983
2000;2 386 289
2001;2 579 126
2002;2 690 982
2003;2 823 452
2004;3 079 207
2005;3 285 601
2006;3 530 881
2007;3 859 533
2008;4 042 860
2009;3 954 320
2010;3 992 870
2011;4 062 323
2012;4 088 912
2013;4 142 811
2014;4 345 766
2015;4 625 378
2016;4 796 873
2017;5 110 743
2018;5 410 761
2019;5 791 498
2020;5 709 131
2021;6 108 717

How to calculate Williams %R in RStudio?

I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060

Divide paired matching columns

I have a data.frame df with matching columns that are also paired. The matching columns are defined in the factor patient. I would like to devide the matching columns by each other. Any suggestions how to do this?
I tried this, but this does not take the pairing from patient into account.
m1 <- m1[sort(colnames(df)]
m1_g <- m1[,grep("^n",colnames(df))]
m1_r <- m1[,grep("^t",colnames(df))]
m1_new <- m1_g/m1_r
m1_new
head(df)
na-008 ta-008 nc012 tb012 na020 na-018 ta-018 na020 tc020 tc093 nc093
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGT 56 311 137 242 23 96 113 106 41 114
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGG 208 656 350 713 49 476 183 246 157 306
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGT 631 1978 1531 2470 216 1906 732 850 665 909
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGTT 2760 8159 6067 9367 622 4228 2931 3031 2895 2974
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGTTT 1698 4105 3737 3729 219 1510 1697 1643 1527 1536
> head(patient)
$`008`
[1] "na-008" "ta-008"
$`012`
[1] "nc012" "tb012"
$`018`
[1] "na-018" "ta-018"
$`020`
[1] "na020" "tc020"
$`045`
[1] "nb045" "tc045"
$`080`
[1] "nb-080" "ta-080"

interpreting dates from Auto arima model

The following is my code,
auto<-auto.arima(x)
auto_for<-forecast(auto,h=30)
> auto_for$x
Time Series:
Start = 1
End = 74
Frequency = 1
[1] 151 151 151 151 151 219 465 465 465 465 465 743 743 743 743 743 743 743 743 743 743 743
[23] 743 743 743 743 743 743 743 829 829 829 829 829 829 1004 1004 1004 1424 1424 1424 1822 1941 1941
[45] 1941 1941 1941 1941 1941 2076 2076 2252 2252 2252 2252 2252 2252 2252 2252 2252 2252 2252 2252 2940 2940 2940
[67] 2940 2940 3134 3134 3134 3207 3207 3465
> auto_for
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
75 3510.397 3359.577 3661.217 3279.738 3741.056
76 3555.795 3342.503 3769.086 3229.594 3881.996
77 3601.192 3339.964 3862.419 3201.679 4000.705
78 3646.589 3344.949 3948.229 3185.271 4107.907
79 3691.986 3354.743 4029.230 3176.217 4207.755
80 3737.384 3367.952 4106.815 3172.387 4302.380
81 3782.781 3383.749 4181.812 3172.515 4393.047
82 3828.178 3401.595 4254.761 3175.776 4480.580
83 3873.575 3421.116 4326.035 3181.599 4565.552
84 3918.973 3442.039 4395.907 3189.565 4648.380
85 3964.370 3464.157 4464.582 3199.361 4729.379
86 4009.767 3487.312 4532.222 3210.741 4808.793
87 4055.164 3511.376 4598.953 3223.512 4886.817
88 4100.562 3536.246 4664.878 3237.515 4963.608
89 4145.959 3561.836 4730.081 3252.621 5039.297
90 4191.356 3588.077 4794.635 3268.720 5113.992
91 4236.753 3614.908 4858.599 3285.722 5187.785
I have the forecasted value, but I am not able to get the dates from the model. The dates are not present in the graph either and it has changed from 0 to 91, instead of my actual dates. I have used xts variable at the starting.
Update:
> a<-ts(ana)
> a
Time Series:
Start = 1
End = 68
Frequency = 1
final.day final.cumsum135
1 16535 318
2 16536 318
3 16537 318
4 16538 318
5 16539 318
6 16540 318
7 16541 318
8 16542 318
9 16543 318
10 16544 318
11 16545 318
12 16546 318
13 16547 318
14 16548 318
15 16549 318
16 16550 318
17 16551 318
18 16552 318
19 16553 318
20 16554 318
21 16555 318
22 16556 318
23 16557 318
24 16558 318
25 16559 318
26 16560 369
27 16561 369
28 16562 369
29 16563 369
30 16564 369
31 16565 369
32 16566 369
33 16567 369
34 16568 369
35 16569 369
> auto<-arima(a)
Error in arima(a) : only implemented for univariate time series
Is there any way I can get back the dates here?
Whit daily series, sometimes fitted and forecast "lost" dates. You could get dates by hand, using index:
y=x # x is your xts series
n=length(y)
model_a1 <- auto.arima(y)
# the plot
plot(x=1:n,y,xaxt="n",xlab="")
axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)],
las=2,cex.axis=.5)
lines(fitted(model_a1), col = 2)
#the forecast
auto_for<-forecast(model_a1,h=30)
fcs=xts(auto_for$mean,seq.Date(as.Date(index(y)[n]),by=1,length.out=30))
fcs

How to call a variable in loops of R? (create arrays as dictionary)

I'd like to define a series of variables in a for loop. (create a array as dictionary. Convert tops to d1 as shown below)
Firstly, I assign values to them (d1~d11);
then I try to define the names of these variables.
How should I call specific variables in the names() function to make it work like "names(d1)<-..."
for (i = 1:11)
{
assign(paste("d",i,sep=""),tops[,2*i])
names(eval(parse(text=paste("d",i,sep=""))))<-tops[,2*i-1]
}
> tops[,c(1,2)]
V1 V2 V3 V4 V5 V6
1 shift 2136 shift 2211 shift 2324
2 bed 1463 k 1551 plant 1664
3 run 1338 bed 1527 run 1466
4 plant 1309 run 1504 k 1456
5 k 1294 hr 1484 bed 1390
6 hr 1285 clean 1464 hr 1366
7 check 1255 plant 1386 clean 1359
8 clean 1203 check 1261 s 1254
9 s 1052 s 1205 check 1048
10 unload 1024 start 1115 end 1028
11 chang 1023 fine 1113 fine 1020
12 fine 960 chang 1104 start 1006
13 end 924 end 1050 chang 977
14 start 905 stop 974 stop 950
15 pellet 878 pellet 915 pellet 897
16 work 866 work 907 remov 874
17 due 856 screen 900 sinter 862
18 stop 853 bwr 888 side 841
19 complet 772 side 888 due 809
20 remov 750 due 861 conveyor 792
21 requir 726 complet 841 work 777
22 sinter 711 sinter 834 north 771
23 south 710 conveyor 775 south 760
24 side 688 north 768 west 738
25 issu 682 remov 764 belt 737
26 t 675 ok 759 carri 735
27 belt 672 t 753 screen 727
28 carri 668 requir 750 stock 725
29 strand 649 unload 749 unload 719
30 conveyor 646 chute 747 chute 688
> d1
shift bed run plant k hr check clean s
2136 1463 1338 1309 1294 1285 1255 1203 1052
unload chang fine end start pellet work due stop
1024 1023 960 924 905 878 866 856 853
complet remov requir sinter south side issu t belt
772 750 726 711 710 688 682 675 672
carri strand conveyor
668 649 646
> length(d1)
[1] 30
I hope I make it clear. if not, please free to ask me
As David mentioned, don't assign 11 different variables; create a list with 11 elements. This will simplify your code considerably.
d <- lapply(1:11, function(i) tops[, 2 * i = 1])

Resources