Iterating over function and get each result value - r

I created the following function to determine the lag of two variables.
However, this function takes only two parameters, and I would like to run it over my whole dataset:
datSel <- structure(list(stat.resProp.Dwell.4 = c(0.000887705, 0.007954085,
-0.025859667, 0.024097552, 0.114052787, 0.023329207, 0.042143181,
-0.092587287, -0.004050228, -0.001624696, 0.020121403, -0.100502922,
0.057354185, 0.025463388, 0.037409854, 0.001561281, -0.028482938,
-0.004827041, 0.014411779, -0.029034298, 0.021053409, -0.067963182,
0.032070259, -0.038091783, 0.039751534, 0.027802281, -0.027802281,
-0.013355791, 0.009201236, -0.073403679, 0.021277398, -0.033901552,
0.012624153, -0.065733979, 0.032017801, -0.072042665, 0.041936911,
0.002861232, 0.017933468, -0.01698154, 0.006638242, -0.08375153,
-0.007220248, 0.0255507, 0.019980685, 0.013752673, 0.026000502,
-0.021134312, -0.019608471, 0.0166916, -0.021654389, 0.066402455,
0.024828862, -0.083302632, 0.042518482, -0.052439198, 0.037186281,
-0.056311172, -0.012270093), stat.lohn = c(0, -0.007558004, -0.015289567,
0, 0, -0.009609384, -0.019500305, 0, 0, -0.012458015, -0.025391532,
-0.000983501, 0, -0.00165265, -0.003313516, 0.000204576, 0, -0.004898564,
-0.009869709, 0, 0, -0.010574012, -0.021489482, 0, 0, -0.011534651,
-0.023476287, 0, 0, -0.00814845, -0.016498838, 0, 0, -0.0099856,
-0.020275409, -0.002818337, 0, -0.007212389, -0.014582736, 0,
0, -0.004121565, -0.008294445, 0, 0, -0.010766386, -0.021886884,
0, 0, -0.010179741, -0.02067574, 0, 0, -0.011797067, -0.024020039,
-0.002017983, -0.007343864, -0.007398196, -0.014962644), stat.resProp.Dwell.1 = c(0.012777325,
-0.002991775, -0.057819571, -0.00796817, -0.019386714, 0, 0.009740337,
0.005638356, -0.035148694, 0, 0.027084134, -0.160377856, 0.101169235,
-0.043007944, 0.043007944, -0.002580647, -0.015625318, 0.023347364,
0.007662873, -0.09607383, -0.024575906, 0.056733018, -0.000904568,
-0.058703392, 0.011450507, 0.007561473, 0.037879817, -0.032246,
0.042169401, -0.001796946, -0.024580209, -0.148788737, 0.082097362,
-0.000985707, -0.00098668, 0.003940892, -0.049380309, 0.005151995,
0.027371197, -0.025317808, 0.019299736, -0.047382704, -0.010604553,
0.082827084, -0.04516573, 0.003075348, 0.007139245, 0.022111454,
-0.004982571, -0.038701368, 0.018519048, -0.049096021, 0.061254226,
-0.020346582, 0.023363175, -0.00402415, -0.014213437, 0.023245109,
0.027587957), stat.carReg = c(0.022775414, 0.008073857, 0.002624717,
0.169431097, -0.144595366, 0.066716837, -0.086971929, 0.037928208,
0.071752161, -0.046824102, 0.106085873, 0.049965928, -0.057984255,
-0.091650262, 0.090732857, -0.082282389, 0.053376121, -0.044203971,
-0.022855425, 0.025856271, 0.000136493, 0.05579193, -0.293966656,
0.013645739, 0.059732986, 0.187020956, -0.145234848, 0.11041385,
-0.126539687, -0.000949877, 0.031473389, 0.020267816, -0.02180532,
-0.07175183, 0.147500145, -0.040559138, 0.008394819, 0.049045337,
-0.043050615, 0.094358754, -0.058408438, -0.005018402, -0.061717889,
0.100150837, -0.071100417, -0.084393865, 0.002854733, 0.002141389,
-0.026538398, 0.013480513, -0.046002189, -0.030495611, 0.052899746,
0.012842017, 0.064086498, 0.020757573, -0.043441298, -0.009563043,
0.048033848)), .Names = c("stat.resProp.Dwell.4", "stat.lohn",
"stat.resProp.Dwell.1", "stat.carReg"), row.names = c(NA, -59L
), class = "data.frame")
The function and my function call is:
select.lags<-function(x,y,max.lag=8) {
y<-as.numeric(y)
y.lag<-embed(y,max.lag+1)[,-1,drop=FALSE]
x.lag<-embed(x,max.lag+1)[,-1,drop=FALSE]
t<-tail(seq_along(y),nrow(y.lag))
ms=lapply(1:max.lag,function(i) lm(y[t]~y.lag[,1:i]+x.lag[,1:i]))
pvals<-mapply(function(i) anova(ms[[i]],ms[[i-1]])[2,"Pr(>F)"],max.lag:2)
ind<-which(pvals<0.05)[1]
ftest<-ifelse(is.na(ind),1,max.lag-ind+1)
aic<-as.numeric(lapply(ms,AIC))
bic<-as.numeric(lapply(ms,BIC))
structure(list(ic=cbind(aic=aic,bic=bic),pvals=pvals,
selection=list(aic=which.min(aic),bic=which.min(bic),ftest=ftest)))
}
for (i in length(datSel) ) {
for (y in length(datSel) ) {
d1<-ts(datSel[i])
d2<-ts(datSel[y])
lag <- select.lags(d1,d2,5)
}
}
As output of lag I get:
> lag
$ic
aic bic
[1,] -115.3623 -109.56679
[2,] -114.3370 -106.60972
[3,] -116.2026 -106.54350
[4,] -114.7030 -103.11210
[5,] -112.7153 -99.19253
[6,] -110.8018 -95.34721
[7,] -110.0812 -92.69477
[8,] -110.1427 -90.82446
$pvals
[1] 0.1952302 0.3017934 0.7858944 0.9176337 0.5040079 0.0604511 0.3406657
$selection
$selection$aic
[1] 3
$selection$bic
[1] 1
$selection$ftest
[1] 1
As you can see I get only 8 results back, however, my data.frame has 20 variables.
Any recommendation what I am doing wrong?
I appreciate your replies!

If you want to e.g. store the result of the AIC criterion:
lag.aic.store = matrix(NA, 4, 4)
for (i in 1:length(datSel) ) {
for (y in 1:length(datSel) ) {
d1<-ts(datSel[,i])
d2<-ts(datSel[,y])
lag <- select.lags(d1,d2,5)
lag.store.aic[i,y] = lag$selection$aic
}
}
You get 8 values in $ic because max.lag is 8, it has nothing to do with your number of variables.
Please also note that i added commas when indexing by variable for clarity and that you have to loop through 1:length(datSel) as otherwise you will only catch the last variable.

Related

Missing value error in the if() condition for R

I want to perform log-normalisation on myy data, and since some enteries are 0.0000 in my dataframe I want to put some very small value of the order 1e-7, so that after performing log normalisation, I don't get -Inf as the value stored.
I'm writing the following code in my console:
for(i in 1:nrow(genes_rpkm_rep_colN))
{
for(j in 1:ncol(genes_rpkm_rep_colN))
{
if(genes_rpkm_rep_colN[i,j] == 0.0000000){
genes_rpkm_rep_colN[i,j] <- 1e-7
}
}
}
I'm encountering the following error while running this piece of code:
Error in if (genes_rpkm_rep_colN[i, j] == 0) { :
missing value where TRUE/FALSE needed
I've put a true/false boolean condition in the if() statement, yet the error.
I'm share a small piece of my data below so that you can have a look and check that my data isn't the one causing the error.
> dput(genes_rpkm_rep_colN[1:10,1:30])
structure(list(X42MGBA_CENTRAL_NERVOUS_SYSTEM = c(0.0093774,
3.99494, 0.0208305, 0.0065619, 0.0084466, 0.0085095, 0.0174268,
0.0233318, 0.0530461, 0.0699613), X8MGBA_CENTRAL_NERVOUS_SYSTEM = c(0,
4.6815, 0.0188461, 0.0118735, 0.0152838, 0.0230965, 0.0157667,
0.0070364, 0.0319951, 0.101274), A1207_CENTRAL_NERVOUS_SYSTEM = c(0.0432576,
2.96619, 0.0137272, 0.0259454, 0, 0.0336463, 0.0114842, 0, 0.0553488,
7.44429), A172_CENTRAL_NERVOUS_SYSTEM = c(0.0194699, 2.92748,
0.0216248, 0.0272483, 0, 0.0176679, 0.0180913, 0.0080738, 0.0665414,
0.0387354), AM38_CENTRAL_NERVOUS_SYSTEM = c(0.0115334, 2.69758,
0.0085399, 0.0322822, 0.0069257, 0, 0.0357226, 0.0063769, 0.0471195,
0.271525), CAS1_CENTRAL_NERVOUS_SYSTEM = c(0.10065, 4.8228, 0.0958194,
0.0469533, 0.0518052, 0.069588, 0.0979765, 0.0556501, 0.117486,
0.147798), CCFSTTG1_CENTRAL_NERVOUS_SYSTEM = c(0.0440228, 6.04641,
0.019558, 0.0246441, 0.0158612, 0.0079897, 0.0163623, 0.0073022,
0.0601819, 0.118238), CH157MN_CENTRAL_NERVOUS_SYSTEM = c(0.0120244,
3.41429, 0.0053421, 0.0235595, 0.0173293, 0.0043646, 0.0044692,
0.0139616, 0.0408118, 0.181811), D283MED_CENTRAL_NERVOUS_SYSTEM = c(0.0638066,
5.12254, 0.0250124, 0.057781, 0.0135231, 0.0272476, 0.0279006,
0.0124515, 0.0583877, 0.343494), D341MED_CENTRAL_NERVOUS_SYSTEM = c(0.0418829,
4.97037, 0.0348888, 0.0219808, 0.0377255, 0.0380065, 0.058376,
0.0217101, 0.0937822, 1.3228), DAOY_CENTRAL_NERVOUS_SYSTEM = c(0.0277923,
4.16543, 0.051447, 0.0194477, 0.016689, 0.0336267, 0.0602569,
0.0460997, 0.0633229, 0.317934), DBTRG05MG_CENTRAL_NERVOUS_SYSTEM = c(0.062215,
4.22423, 0.0307115, 0.0580469, 0.0622661, 0.012546, 0.0128466,
0.0171996, 0.72017, 0.192542), DKMG_CENTRAL_NERVOUS_SYSTEM = c(0.0061458,
2.58862, 0.0546082, 0.0086011, 0.0332147, 0.0446161, 0.0571067,
0.0866511, 0.0985031, 0.128385), GAMG_CENTRAL_NERVOUS_SYSTEM = c(0.0638691,
4.18606, 0.023646, 0.0595902, 0.0095882, 0.0676175, 0.0296734,
0.0264853, 0.0953419, 1.13302), GB1_CENTRAL_NERVOUS_SYSTEM = c(0.0332071,
4.09682, 0.0122941, 0.0232368, 0.0199406, 0.0100446, 0.0205706,
0.036721, 0.15393, 8.77573), GI1_CENTRAL_NERVOUS_SYSTEM = c(0.0236971,
2.99664, 0.0315838, 0.0132657, 0.008538, 0.0344062, 0.0528461,
0.0196535, 0.0826642, 0.132007), GMS10_CENTRAL_NERVOUS_SYSTEM = c(0.112392,
3.29799, 0, 0.0058257, 0.007499, 0.0151096, 0.0232076, 0.0069047,
0.0392457, 0.0786757), GOS3_CENTRAL_NERVOUS_SYSTEM = c(0.0785394,
3.06583, 0.0793018, 0.0349735, 0.0128625, 0.0194374, 0.0464408,
0.0207256, 0.149777, 0.205972), H4_CENTRAL_NERVOUS_SYSTEM = c(0.0412065,
5.11983, 0.0416065, 0.0209705, 0.0337421, 0.0543895, 0.0417697,
0.018641, 0.0953581, 0.432261), HS683_CENTRAL_NERVOUS_SYSTEM = c(0.0395662,
4.82034, 0.0087891, 0.016612, 0.0285111, 0, 0.0294118, 0.0164074,
0.0708759, 0.240087), IOMMLEE_CENTRAL_NERVOUS_SYSTEM = c(0.0089568,
3.07764, 0, 0.0188027, 0.0080677, 0.0406391, 0.0083226, 0.0037142,
0.0295557, 0.178196), KALS1_CENTRAL_NERVOUS_SYSTEM = c(0.0212606,
3.22541, 0.0094454, 0.0059509, 0.0076601, 0.0154343, 0.0790207,
0.0105796, 0.0440979, 0.135353), KG1C_CENTRAL_NERVOUS_SYSTEM = c(0.0306739,
3.25635, 0.0292018, 0.0674589, 0.007894, 0.0397642, 0.0814343,
0.0036343, 0.107415, 0.248463), KNS42_CENTRAL_NERVOUS_SYSTEM = c(0.0377038,
2.77745, 0.0598239, 0.0075381, 0.0097032, 0, 0, 0.0044672, 0.0660162,
0.128592), KNS60_CENTRAL_NERVOUS_SYSTEM = c(0.0308664, 2.75686,
0.0571377, 0.0359982, 0, 0.0186731, 0.0095603, 0, 0.0606269,
0.214931), KNS81_CENTRAL_NERVOUS_SYSTEM = c(0.0376095, 4.39526,
0.041772, 0.0328967, 0.0169382, 0.0341286, 0.0349465, 0.003899,
0.0864295, 0.0841772), KS1_CENTRAL_NERVOUS_SYSTEM = c(0.0113846,
1.91478, 0.0252892, 0.0318656, 0.0102545, 0.0413236, 0.0317354,
0.004721, 0.0295168, 0.18686), LN18_CENTRAL_NERVOUS_SYSTEM = c(0.0159147,
4.40237, 0, 0.0371213, 0.0191134, 0.0192557, 0.0197172, 0.0219985,
0.0600177, 0.358841), LN215_CENTRAL_NERVOUS_SYSTEM = c(0.0188976,
6.19285, 0.0209891, 0, 0, 0.0257228, 0.0175595, 0.0274276, 0.05345,
0.422964), LN229_CENTRAL_NERVOUS_SYSTEM = c(0.0042589, 4.66724,
0.0189209, 0.0059603, 0.0153445, 0, 0.0316585, 0.0070643, 0.0602291,
0.169461)), row.names = c("DDX11L1", "WASH7P", "MIR1302-11",
"FAM138A", "OR4G4P", "OR4G11P", "OR4F5", "RP11-34P13.7", "CICP27",
"AL627309.1"), class = "data.frame")
Maybe try this without a loop:
library(dplyr)
df |>
mutate(across(everything(), ~ifelse(.x == 0.0000000, 1e-7, .x)),
across(everything(), ~log(.x), .names = "log_{col}"))

How to conditionally loop on a list in r

I am not professional with r and need your help with this question - I have a list called result with different items inside as below:
result:
$r
0610007P14Rik 0610009B22Rik 0610009O20Rik 0610010F05Rik 0610010K14Rik 0610011F06Rik 0610012G03Rik 0610030E20Rik 0610037L13Rik
0610007P14Rik 0.00000000 -0.66234644 0.047111033 0.09782589 0.145761085 0.084414075 0.05975822 0.10952475 -0.020151257
0610009B22Rik -0.16234644 0.00000000 -0.227292854 -0.05088201 -0.100237074 0.078595470 -0.12782382 -0.05553298 0.012588413
$p
0610007P14Rik 0610009B22Rik 0610009O20Rik 0610010F05Rik 0610010K14Rik 0610011F06Rik 0610012G03Rik 0610030E20Rik 0610037L13Rik
0610007P14Rik 1.0000000 0.04047111 0.6405067 0.3310033 0.1459042 0.4019556 5.534329e-01 0.2760502 8.417704e-01
0610009B22Rik 0.1047111 1.0000000 0.44868459 0.6139574 0.3191458 0.4353240 2.029875e-01 0.5818857 9.007631e-01
What I want to do is to loop on this list and print pairs if result$r is > then 0.5 or < than -0.5 - and result$p < than 0.05.
So far I have been able to loop through the list by a simple loop where I can print any value at a specific position in the list but have not been able to expand it to perform what I am aiming to:
for (i in 1:length(result)){
print(result[[i]][2])
}
So based on the example above - the output should look like this because this is the only pair that has theresult$r value less than -0.5 and result$p < than 0.05:
0610007P14Rik,0610009B22Rik
Any help is appreciated - thanks.
dput(result)
list(r = structure(c(0, -0.662346440956915, 0.0471110327396697,
0.0978258929200013, 0.14576108466075, 0.0844140746798007, 0.0597582241031368,
0.10952475161027, -0.0201512568819922, -0.162346440956915, 0,
-0.0272928544838261, -0.0508820105675817, -0.100237073610376,
0.0785954698120888, -0.127823820999628, -0.0555329766448806,
0.0125884127823821, 0.0471110327396697, -0.0272928544838261,
0, -0.0565079080178134, 0.13575892944611, 0.0754843375985575,
0.086120417719783, 0.119119974969681, 0.00175356100237076, 0.0978258929200013,
-0.0508820105675817, -0.0565079080178134, 0, 0.131775445763479,
0.053017452395846, 0.0712198787836846, -0.0643888838089917, 0.112498034323393,
0.14576108466075, -0.100237073610376, 0.13575892944611, 0.131775445763479,
0, 0.00774829446850978, -0.0987186269323458, 0.0147657064131866,
0.0300260585042811, 0.0844140746798007, 0.0785954698120888, 0.0754843375985575,
0.053017452395846, 0.00774829446850978, 0, 0.10393741385522,
0.0236032173803311, -0.0182926697871553, 0.0597582241031368,
-0.127823820999628, 0.086120417719783, 0.0712198787836846, -0.0987186269323458,
0.10393741385522, 0, -0.0287458971316651, -0.378837751523345,
0.10952475161027, -0.0555329766448806, 0.119119974969681, -0.0643888838089917,
0.0147657064131866, 0.0236032173803311, -0.0287458971316651,
0, 0.137186835947971, -0.0201512568819922, 0.0125884127823821,
0.00175356100237076, 0.112498034323393, 0.0300260585042811, -0.0182926697871553,
-0.378837751523345, 0.137186835947971, 0), .Dim = c(9L, 9L), .Dimnames = list(
c("0610007P14Rik", "0610009B22Rik", "0610009O20Rik", "0610010F05Rik",
"0610010K14Rik", "0610011F06Rik", "0610012G03Rik", "0610030E20Rik",
"0610037L13Rik"), c("0610007P14Rik", "0610009B22Rik", "0610009O20Rik",
"0610010F05Rik", "0610010K14Rik", "0610011F06Rik", "0610012G03Rik",
"0610030E20Rik", "0610037L13Rik"))), p = structure(c(1, 0.0404711061993476,
0.640506733726655, 0.331003274745838, 0.145904233294019, 0.401955560275036,
0.553432861540715, 0.276050189693768, 0.841770371470927, 0.104711061993476,
1, 0.786845892450507, 0.613957357859433, 0.319145768537945, 0.4353239557619,
0.202987516780359, 0.581885722125544, 0.900763057605725, 0.640506733726655,
0.786845892450507, 1, 0.575261418533015, 0.17603176034094, 0.453788945686311,
0.392462976283337, 0.23580507604055, 0.986141912945937, 0.331003274745838,
0.613957357859433, 0.575261418533015, 1, 0.189217146197476, 0.599137018214154,
0.479787539192647, 0.523038223686805, 0.263117246150375, 0.145904233294019,
0.319145768537945, 0.17603176034094, 0.189217146197476, 1, 0.93882240484224,
0.326580916483781, 0.88370977100432, 0.766081848734211, 0.401955560275036,
0.4353239557619, 0.453788945686311, 0.599137018214154, 0.93882240484224,
1, 0.3014852587724, 0.815110528089105, 0.85620041392968, 0.553432861540715,
0.202987516780359, 0.392462976283337, 0.479787539192647, 0.326580916483781,
0.3014852587724, 1, 0.775787682483708, 7.84039808365529e-05,
0.276050189693768, 0.581885722125544, 0.23580507604055, 0.523038223686805,
0.88370977100432, 0.815110528089105, 0.775787682483708, 1, 0.17147275199326,
0.841770371470927, 0.900763057605725, 0.986141912945937, 0.263117246150375,
0.766081848734211, 0.85620041392968, 7.84039808365529e-05, 0.17147275199326,
1), .Dim = c(9L, 9L), .Dimnames = list(c("0610007P14Rik", "0610009B22Rik",
"0610009O20Rik", "0610010F05Rik", "0610010K14Rik", "0610011F06Rik",
"0610012G03Rik", "0610030E20Rik", "0610037L13Rik"), c("0610007P14Rik",
"0610009B22Rik", "0610009O20Rik", "0610010F05Rik", "0610010K14Rik",
"0610011F06Rik", "0610012G03Rik", "0610030E20Rik", "0610037L13Rik"
))))
You should not loop through the list, as the list is a fixed format and you know that there are the elements $rand $p. Instead, you should loop over the rows and columns you have:
for(row in rownames(result$p)){
for(col in colnames(result$p)){
if(abs(result$r[row,col])>.5 & result$p[row,col]<.05){
print(paste(row,col,sep=", "))
}
}
}
The output is:
"0610009B22Rik, 0610007P14Rik"
This is the only correlation that meets your criterion (r=-.66, p=.040)
You can do the comparison directly if the r and p matrix are of the same dimension.
inds <- which(result$r > 0.5 | result$r < -0.5 & result$p < 0.05, arr.ind = TRUE)
cbind(row = rownames(result$p)[inds[, 1]],
col = colnames(result$p)[inds[, 2]])
# row col
#[1,] "0610009B22Rik" "0610007P14Rik"

Is there a way to normalize data with high kurtosis?

I have a vector that has a kurtosis of 2.95 (which is pretty high, Leptokurtic). Following is a sample of that data:
x = c(6.819, 8.948, 0, 67.556, -40.785, -18.951, -29.151, 1.008,
0, 18.034, -6.631, 6.294, 0.643, -28.921, 0, -2.133, -44.348,
-87.488, 7.063, 0, -74.428, -16.361, 50.963, -32.431, -82.233,
-26.953, -48.475, 64.043, 0, 1.576, -2.728, -5.9, -63.059, -1.061,
-15.018, -58.119, -32.092, 5.329, -19.968, 38.822, 66.897, 0,
-2.579, 82.696, 42.745, 79.677, 2.522, -11.475, 1.019, 2.719,
-3.634, -7.975, 0, 1.873, 21.732, -10.217, -24.002, -76.049,
35.045, 27.22, -71.366, 16.293, -48.762, 65.481, 66.615, -19.616,
6.016, 59.722, 88.235, 10.1, 0, -4.598, 5.446, 56.909, 0, -24.827,
0, 6.487, 0, 63.315, 28.397, 9.433, 19.085, 0, 6.591, -22.643,
32.235, -12.535, -1.787, 56.157, 68.819, 0, -21.936, 38.695,
-79.006, 24.888, -5.187, 10.368, -68.191, 0, -22.171, -78.783,
-14.119, 54.084, -13.597, 26.669, 0, -18.402, 80.309, -12.652,
1.801, -69.946, -87.67, -19.586, 38.085, -21.031, -36.957, 1.357,
0.17, 47.407, -59.598, 66.125, 10.97, 6.33, -38.837, 1.868, 38.169,
-46.662, -32.255, 25.816, 14.432, -18.57, -0.456, -0.638, 31.07,
72.794, 52.957, 13.858, -18.885, 0, -13.488, 11.689, 1.618, 19.373,
-57.526, 0, -0.655, 36.308, 50.231, 0.048, -80.157, 0, -64.805,
-70.864, 0.813, 52.143, -4.989, 42.166, 7.397, 87.437, -17.897,
-0.877, 68.363, 47.315, -2.181, 2.699, 36.278, 0, -2.924, 71.56,
74.406, -46.071, 56.158, 1.44, 0, 0, 0, -3.233, 37.084, -85.189,
0, -16.137, -84.499, -12.67, -14.117, 0, 23.757, -58.299, -34.956,
0.402, 0, -67.585, -14.314, -73.426, 23.158, 1.782, 0, 4.399,
18.871, -6.929)
Is there a way to normalize this data?
Since this data range between -90 to 90, the normalized data should be in a similar range and should not change vastly, i.e. the range should not be changed to -1 to 1 or -20 to 20 etc...
I have tried using atan(X), 1/x, log(x), and many other transformational techniques but they all tend to increase the skewness. Is there a way to normalize this data without skewing it?
I am sure there must be an easy solution to this.
It may not be what you want but you can almost always perfectly normalize a distribution (if there are no ties) using a normal scores transformation:
xq <- qnorm(rank(x)/(length(x)+1), mean=mean(x), sd=sd(x))
plot(sort(x),sort(xq))
hist(xq)
qqnorm(xq)
The new range is (-99.2, 99.6) (the old range was +/- 88).
If you need to change the range you could do it as follows:
newmin + (newmax-newmin)*scale(xq, center=min(qx), scale=diff(range(xq)))
but as suggested in the comments this may not actually be the right approach to solve your broader problem.

How to skip an iteration in a loop?

I have a for-loop it looks like that:
for (ID in rownames(countDF)) {
avector <- as.vector(as.numeric(countDF2[rownames(countDF2)==ID,]))
nbfit <- fitdistr(avector,'negative binomial')
}
So I want to calculate the fitdistr function for each of IDs. But the problem is that for some of the IDs the function doesn't work and throws an error. Here it is:
Error in stats::optim(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, :
non-finite finite-difference value [2]
I want to skip these IDs somehow and continue with the others.
I've found a function try, but I don't understand how is it working.
I've tried it like this:
nbfir <- try(fitdistr(avector,'negative binomial'))
But the loop still breaks down with the error.
What should I do to fix it ?
You could use tryCatch and do nothing on catching an error.
for (ID in rownames(countDF)) {
avector <- as.vector(as.numeric(countDF2[rownames(countDF2)==ID,]))
tryCatch(
nbfit <- fitdistr(avector,'negative binomial'),
error = function(e) {})
}

Get same output as R console in Java using JRI

When I enter the following commands directly into the R console
library("xts")
mySeries <- xts(c(1.0, 2.0, 3.0, 5.0, 6.0), order.by=c(ISOdatetime(2001, 1, 1, 0, 0, 0), ISOdatetime(2001, 1, 2, 0, 0, 0), ISOdatetime(2001, 1, 3, 0, 0, 0), ISOdatetime(2001, 1, 4, 0, 0, 0), ISOdatetime(2001, 1, 5, 0, 0, 0)))
resultingSeries <- to.monthly(mySeries)
resultingSeries
I will get an output like this
mySeries.Open mySeries.High mySeries.Low mySeries.Close
Jan 2001 1 6 1 6
When I look into the attributes, I see the following output
attributes(resultingSeries)
$dim
[1] 1 4
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "mySeries.Open" "mySeries.High" "mySeries.Low" "mySeries.Close"
$index
[1] 978307200
attr(,"tclass")
[1] "yearmon"
$tclass
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
$class
[1] "xts" "zoo"
$.indexCLASS
[1] "yearmon"
This is the same I get in Java. I'm wondering where the magic happens so that I see the nice output I get in R. I have no access to the event loop, since I'm using JRI like this (since, it's the recommended way and simplifies error handling):
REngine engine = REngine.engineForClass("org.rosuda.REngine.JRI.JRIEngine");
REXP result = engine.parseAndEval(...)
/edit
In Java I execute each command from above as follows:
REXP result = engine.parseAndEval("resultingSeries") // or any other command
What I get is
org.rosuda.REngine.REXPDouble#4ac66122+[12]
The payload being doubles: 1, 6, 1, 6
The attributes are the same as specified above.
Now R does some magic to display the output above. Is there a way I can get the same output without having to create it manually by myself? Where's the implementation stored, that R gets the above mentioned output?
Here is a piece of code that will work, here i extracted the first element of the field mySeries.Open from the object resultingSeries (which i converted to a data frame) which is equal to 1, notice that you can't pass all of the resultingSeries object strait into Java, you will need to break it down.
package stackoverflow;
import org.rosuda.JRI.REXP;
import org.rosuda.JRI.Rengine;
/**
*
* #author yschellekens
*/
public class StackOverflow {
public static void main(String[] args) throws Exception {
String[] Rargs = {"--vanilla"};
Rengine rengine = new Rengine( Rargs, false, null);
rengine.eval("library('xts')");
rengine.eval("mySeries <- xts(c(1.0, 2.0, 3.0, 5.0, 6.0), order.by=c(ISOdatetime(2001, 1, 1, 0, 0, 0), ISOdatetime(2001, 1, 2, 0, 0, 0), ISOdatetime(2001, 1, 3, 0, 0, 0), ISOdatetime(2001, 1, 4, 0, 0, 0), ISOdatetime(2001, 1, 5, 0, 0, 0)))");
rengine.eval("resultingSeries <- to.monthly(mySeries)");
rengine.eval("resultingSeries<-as.data.frame(resultingSeries)");
REXP result= rengine.eval("resultingSeries$mySeries.Open");
System.out.println("Greeting from R: "+result.asDouble());
}
}
And the Java output:
run:
Greeting from R: 1.0
I figured out the following workaround. The solution is far from perfect.
R offers a command to save its console output as characters vector.
capture.output( {command} )
We can access the output using
REXPString s = rengine.parseAndEval("capture.output( to.monthly(mySeries))")
String[] output = result.asStrings()
The variable output will contain all output lines
[0] mySeries.Open mySeries.High mySeries.Low mySeries.Close
[1]Jan 2001 1 6 1 6
Alternatively you coud use JRIEngine and attack yourself to the event loop, which it did not want in my case (due to the more complicated error handling).

Resources