R seq function too many arguments? - r

I am getting an error that I don't really understand at all. I was just messing around with generating some sequences, and I came across this problem:
This should create a sequence of 50 numbers.
seq.int(from=1,to=1000,by=5,length.out=50)
But if I enter this in the console I get the error message:
Error in seq.int(from = 1, to = 1000, by = 5, length.out = 50) :
too many arguments
If I look at the help (?seq), in the Usage section there is this line in there which makes it seem as though I called the function correctly, and it allows this many number of arguments:
seq.int(from, to, by, length.out, along.with, ...)
So what the heck is going on? I am I missing something fundamental, or are the docs out of date?
NOTE
The arguments I am providing to the function in the code sample are just for sake of example. I'm not trying to solve a particular problem, just curious as to why I get the error.

It's not clear what you expect as output from this line of code, and you're getting an error because R doesn't want to resolve the contradictions for you.
Here is some valid output, and the line of code you'd use to achieve each. This is a case where you need to decide for yourself which approach to use given the task you have in mind:
Override length.out
[1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
...
[199] 991 996
#via:
seq.int(from=1,to=1000,by=5)
Override by
[1] 1.00000 21.38776 41.77551 62.16327 82.55102 102.93878 123.32653
[8] 143.71429 164.10204 184.48980 204.87755 225.26531 245.65306 266.04082
[15] 286.42857 306.81633 327.20408 347.59184 367.97959 388.36735 408.75510
[22] 429.14286 449.53061 469.91837 490.30612 510.69388 531.08163 551.46939
[29] 571.85714 592.24490 612.63265 633.02041 653.40816 673.79592 694.18367
[36] 714.57143 734.95918 755.34694 775.73469 796.12245 816.51020 836.89796
[43] 857.28571 877.67347 898.06122 918.44898 938.83673 959.22449 979.61224
[50] 1000.00000
#via:
seq.int(from=1,to=1000,length.out=50)
Override to
[1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101
[22] 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206
[43] 211 216 221 226 231 236 241 246
#via:
seq.int(from=1,by=5,length.out=50)
Override from
[1] 755 760 765 770 775 780 785 790 795 800 805 810 815 820 825 830 835 840
[19] 845 850 855 860 865 870 875 880 885 890 895 900 905 910 915 920 925 930
[37] 935 940 945 950 955 960 965 970 975 980 985 990 995 1000
#via:
seq.int(to=1000,by=5,length.out=50)
A priori, R has no way of telling which of the above you'd like, nor should it. You as programmer need to decide which inputs take precedence.
And you're right that this should be documented; for now, take a look at the source of .Primitive("seq.int"), as linked originally by #nongkrong.

No, there is nothing fundamental to the R language that I was missing that was the source of the problem. The problem is that the documents, at least at time of writing, are misleading and/or incorrect.

Related

How to cancel a bias and analyse the data?

I have a data table like this one, I would like to know which type of substrate (called "Litières" / "Branchages" / "Racines") contributes the most to each score.
in r :
Substrate<-c('Litières','Litières','Racines','Branchages','Branchages','Litières','Branchages','Litières','Litières' )
One<-c(0,22,216,36,288,351,28,12,0)
Two<-c(574,755,1248,504,882,810,431,537,56)
Three<-c(1352,1248,706,1476,846,855,1334,1152,1628)
Four<-c(261,162,17,171,171,171,394,486,503)
x<-data.frame(Substrate,One,Two,Three,Four)
or in a table :
Substrate
One
Two
Three
Four
Litières
0
574
1352
261
Litières
22
755
1248
162
Racines
216
1248
706
17
Branchages
36
504
1476
171
Branchages
288
882
846
171
Litières
351
810
855
171
Branchages
28
431
1334
394
Litières
12
537
1152
486
Litières
0
56
1628
503
However you will notice that the number of substrate is not the same between each type of substrate. How to cancel this bias?
Thank !

How to calculate Williams %R in RStudio?

I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060

Divide paired matching columns

I have a data.frame df with matching columns that are also paired. The matching columns are defined in the factor patient. I would like to devide the matching columns by each other. Any suggestions how to do this?
I tried this, but this does not take the pairing from patient into account.
m1 <- m1[sort(colnames(df)]
m1_g <- m1[,grep("^n",colnames(df))]
m1_r <- m1[,grep("^t",colnames(df))]
m1_new <- m1_g/m1_r
m1_new
head(df)
na-008 ta-008 nc012 tb012 na020 na-018 ta-018 na020 tc020 tc093 nc093
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGT 56 311 137 242 23 96 113 106 41 114
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGG 208 656 350 713 49 476 183 246 157 306
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGT 631 1978 1531 2470 216 1906 732 850 665 909
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGTT 2760 8159 6067 9367 622 4228 2931 3031 2895 2974
hsa-let-7b-5p_TGAGGTAGTAGGTTGTGTGGTTT 1698 4105 3737 3729 219 1510 1697 1643 1527 1536
> head(patient)
$`008`
[1] "na-008" "ta-008"
$`012`
[1] "nc012" "tb012"
$`018`
[1] "na-018" "ta-018"
$`020`
[1] "na020" "tc020"
$`045`
[1] "nb045" "tc045"
$`080`
[1] "nb-080" "ta-080"

How to define range of values of a time series?

First of all, sorry for any mistakes regarding my post, I'm new to this site.
I´m getting started with R now and I´m trying to do some analysis with time series data.
So, I got a times series at hand and already loaded it into R.
I can also plot this times series and add labels to the axes and so on. So far so good.
My problem: When I plot the time series, R would set the range of values on the y-axis to the interval of [0:170] approximately.
This is somehow strange, since the times series contains the daily EUR/USD exchange rates for this year. That means the values are in a range of about 1.05 to 1.2.
The relative values are correct.
If the plot shows a maximum around day 40, the corresponding value in the data set appears to be a maximum.
But it is around 1.4 and not 170.
I hope one can understand my problem.
I would like to have the y-axis on a scale from 1 to 1.2 for example.
The ylim=c(1, 1.2) command will scale the axis to that range but not the values.
It just ignores them.
Does anyone know how to adjust that?
I´d really appreciate it.
Thank you very much in advance.
Thanks a lot for the input so far.
The "critical code" is the following:
> FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> str(FRB)
'data.frame': 212 obs. of 2 variables:
$ Date: Factor w/ 212 levels "2015-01-01","2015-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Rate: Factor w/ 180 levels "1.0524","1.0575",..: 180 179 177 178 174 173 175 176 171 172 ...
> plot.ts(Rate)
The result of this last plot is the one shown above.
Changing the variable to numeric yields this:
> as.numeric(Rate)
[1] 180 179 177 178 174 173 175 176 171 172 170 166 180 167 169 160 123 128 150 140 132 128 138 165
[25] 161 163 136 134 134 129 159 158 180 156 140 155 151 142 131 148 104 100 96 104 65 53 27 24
[49] 13 3 8 1 2 7 10 9 21 42 36 50 39 33 23 15 19 29 51 54 26 23 11 6
[73] 4 12 5 16 20 18 17 14 22 30 34 49 92 89 98 83 92 141 125 110 81 109 151 149
[97] 162 143 85 69 77 61 180 30 32 38 52 37 78 127 120 73 105 126 131 106 122 119 107 112
[121] 157 137 152 96 93 99 87 94 86 70 71 180 67 43 66 58 84 57 55 47 35 25 26 41
[145] 31 48 48 75 63 59 38 60 46 44 28 40 45 52 62 101 82 74 68 60 64 102 144 168
[169] 159 154 108 91 98 118 111 72 76 180 95 90 117 139 131 116 130 133 145 103 79 88 115 97
[193] 106 113 89 102 121 102 119 114 124 148 180 153 164 161 147 135 146 141 80 56
So, it remains unchanged. This is very strange. The data excerpt shows that "Rate" takes on values between 1.1 and 1.5 approximately, so really not the values that are shown above. :/
The data set can be found under this link:
https://www.dropbox.com/s/ndxstdl1aae5glt/FRB_H10.csv?dl=0
It should be alright. I got it from the data base from the Federal Reserve System, so quite a decent source.
(Had to remove the link to the data excerpt because my reputation only allows for 2 links to be posted at a time. But the entire data set should be even better, I guess.
#BlankUsername
Thanks very much for the link. I got it working now using this code:
FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> as.numeric(paste(Rate))
[1] NA 1.2015 1.1918 1.1936 1.1820 1.1811 1.1830 1.1832 1.1779 1.1806 1.1598 1.1517 NA
[14] 1.1559 1.1584 1.1414 1.1279 1.1290 1.1370 1.1342 1.1308 1.1290 1.1337 1.1462 1.1418 1.1432
[27] 1.1330 1.1316 1.1316 1.1300 1.1410 1.1408 NA 1.1395 1.1342 1.1392 1.1372 1.1346 1.1307
[40] 1.1363 1.1212 1.1197 1.1190 1.1212 1.1070 1.1006 1.0855 1.0846 1.0707 1.0576 1.0615 1.0524
[53] 1.0575 1.0605 1.0643 1.0621 1.0792 1.0928 1.0908 1.0986 1.0919 1.0891 1.0818 1.0741 1.0768
[66] 1.0874 1.0990 1.1008 1.0850 1.0818 1.0671 1.0598 1.0582 1.0672 1.0596 1.0742 1.0780 1.0763
[79] 1.0758 1.0729 1.0803 1.0876 1.0892 1.0979 1.1174 1.1162 1.1194 1.1145 1.1174 1.1345 1.1283
[92] 1.1241 1.1142 1.1240 1.1372 1.1368 1.1428 1.1354 1.1151 1.1079 1.1126 1.1033 NA 1.0876
[105] 1.0888 1.0914 1.0994 1.0913 1.1130 1.1285 1.1271 1.1108 1.1232 1.1284 1.1307 1.1236 1.1278
[118] 1.1266 1.1238 1.1244 1.1404 1.1335 1.1378 1.1190 1.1178 1.1196 1.1156 1.1180 1.1154 1.1084
[131] 1.1090 NA 1.1076 1.0952 1.1072 1.1025 1.1150 1.1020 1.1015 1.0965 1.0898 1.0848 1.0850
[144] 1.0927 1.0884 1.0976 1.0976 1.1112 1.1055 1.1026 1.0914 1.1028 1.0962 1.0953 1.0868 1.0922
[157] 1.0958 1.0994 1.1042 1.1198 1.1144 1.1110 1.1078 1.1028 1.1061 1.1200 1.1356 1.1580 1.1410
[170] 1.1390 1.1239 1.1172 1.1194 1.1263 1.1242 1.1104 1.1117 NA 1.1182 1.1165 1.1262 1.1338
[183] 1.1307 1.1260 1.1304 1.1312 1.1358 1.1204 1.1133 1.1160 1.1252 1.1192 1.1236 1.1246 1.1162
[196] 1.1200 1.1276 1.1200 1.1266 1.1249 1.1282 1.1363 NA 1.1382 1.1437 1.1418 1.1360 1.1320
[209] 1.1359 1.1345 1.1140 1.1016
Warning message:
NAs introduced by coercion
> Rate <- cbind(paste(Rate))
> plot(Rate)
Warning message:
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
> plot.ts(Rate, ylab="EUR/USD")
Despite the warning message, I get the following output (shown below). Like I intended to plot it.
Nevertheless, I do not really understand why it works the way it did. Why I have to use the paste() command and what it does exactly. I get the basic idea of what the classes do, but am very new to this whole world of R.
One thing I came to realize already is that R is such a powerful program. And yet confusing if you are a beginner. :D

R implementation cluster analysis

I am in the process of implementing few algorithms for cluster analysis especially cluster validation. There are few ways such as cross validation, external index, internal index, relative index. I am trying to implement an algorithm that is under internal index.
Internal index - Based on the intrinsic content of the data. It is used to measure the goodness of a clustering structure without respect to external information.
My interest is Silhouette Coefficient
s(i) = b(i) - a(i) / max{a(i), b(i)}
To make it more clear lets assume I have following multi-model distribution:
library(mixtools)
wait = faithful$waiting
mixmdl = normalmixEM(wait)
plot(mixmdl,which=2)
lines(density(wait), lty=2, lwd=2)
We see that there are two clusters and cut off mark is around 68. There are no label data here so no ground truth to do cross-validation (Un-Supervised). So we need a mechanism to evaluate the clusters. In this case we know there are two cluster from visualization but how do we clear show that two distributions are actually belong to cluster. Base on what I red on wikipedia Silhouette gives us that validation.
I want to implement a method (which implements Silhouette) such that it takes a r list of values in my example its wait, number of clusters in this case 2, and the model which is the model and return average s(i).
I have started but can't really figure out how to go forward
Silhouette = function(rList, num_clusters, model) {
}
summary of my list looks like this:
Length Class Mode
clust_A 416014 -none- numeric
clust_B 72737 -none- numeric
clust_C 6078 -none- numeric
myList$clust_A will return points that are belong to that cluster
[1] 13 880 497 1864 392 55 1130 248 437 37 62 153 60 117
[15] 22 106 71 1026 446 1558 23 56 287 402 46 1506 115 2700
[29] 67 134 48 536 41 506 1098 33 30 280 225 16 25 17
[43] 63 1762 477 174 98 76 157 698 47 312 40 3 198 621
[57] 15 34 226 657 48 110 23 250 14 32 137 272 26 257
[71] 270 133 1734 78 134 8 5 225 187 166 35 15 94 2825
[85] 2 8 94 89 54 91 77 17 106 1397 16 25 16 103
problem is that I don't think the existing library accept this type of data structure.
Silhouette assumes that all clusters have the same variance.
IMHO, it does not make sense to use this measure with EM clustering.

Resources