R: Quadratic curve is printing as dozens of lines - r

Instead of printing a nice curve, R is printing a dense mess of line segments. The edge of that dense mess looks exactly like the curve I'm trying to print, though.
I have datasets X2 and Y2 plotted and am trying to print a quadratic curve on the existing plot. Here's my code:
X22 <- X2^2
model2s <- lm(Y2 ~ X2 + X22)
plot(X2,Y2)
lines(X2,fitted(model2s))
X2:
[1] -2.69725933 -1.54282303 -1.91720835 -0.08528522 -2.57551112 -2.65955930 1.66190727 0.01135419
[9] -1.67597429 -0.46931267 1.31551076 1.78942814 -0.54821881 -2.93750249 -0.63519111 -2.17234702
[17] 2.26156660 -2.13808807 -0.74155513 2.65037057 2.44828088 -2.52896408 -2.02068505 -1.36222982
[25] 1.97171562 -0.27897421 -2.12562749 -0.85870780 0.71198294 1.24482512 0.20295272 -1.58949497
[33] -0.59396637 0.45486252 2.51659763 2.62181364 2.20407646 1.06466931 -1.43400604 0.01579675
[41] -0.33513385 -0.05453015 1.96167436 1.28448541 -2.69429783 -1.08642394 -0.09400599 2.98775967
[49] 2.05795131 1.58896527 0.67934349 -0.13352141 -0.52543898 -2.40677026 -0.13610972 -1.31887725
[57] -1.56066767 -1.35457660 1.16511448 -2.55372404 -2.28185200 -0.19699659 1.84159785 1.24092476
[65] -2.90374380 2.29220701 1.22968228 2.60137009 0.87307737 2.71556663 0.94467230 0.96922155
[73] 1.89863312 1.64500729 1.37186380 -1.87455109 1.15276643 0.26130981 -1.84580809 -1.32085543
[81] -2.41207641 0.19248616 -1.65741770 2.13950098 -1.69597327 -0.06976200 1.14711285 2.97132615
[89] 0.71798324 -1.02838913 0.44070700 2.07600642 -0.21917452 -0.36556134 2.60091749 -1.41738042
[97] 1.04864677 0.83080236 2.56432957 -0.72499588 -0.81415858 -2.49700816 -2.72860601 0.49777866
Y2:
[1] -9.00479135 -1.56827264 -3.85069478 3.80620694 -7.78195591 -6.21173824 4.24967581 3.39550072
[9] -2.51108153 1.71820705 5.44931613 4.97755290 2.66081793 -11.34941655 0.27113981 -5.27374362
[17] 3.55191243 -3.64065638 1.27630806 4.20004221 5.53455823 -6.48854059 -4.17995733 -2.00651295
[25] 3.72495467 0.68337096 -4.28579895 -1.37001146 4.87616860 6.06427661 1.70089898 -3.07543568
[33] 2.90859968 4.12792739 2.76034855 3.87910950 4.14718875 4.73100437 -2.38820139 3.32093131
[41] 1.86320165 3.27669364 2.46242358 4.92157619 -7.90548937 -0.75929903 2.94267998 1.74858185
[49] 3.45587195 3.74016585 4.00274064 2.93845395 1.85504582 -4.30620277 3.40285048 -1.11881798
[57] -0.50718093 -0.43403754 2.54878083 -6.90253145 -5.37796863 3.25636120 3.41966211 3.40255742
[65] -8.59066220 1.82125444 3.20829746 2.46454987 4.09421369 1.79725157 5.61761174 4.55423983
[73] 3.12240983 2.86139737 4.00807877 -4.19551852 4.63684416 4.82350596 -2.73656766 -1.69755051
[81] -4.16628941 2.60384722 -2.77361082 3.98215540 -2.73349536 1.61857480 4.05148933 3.57791895
[89] 3.35775758 1.13832332 4.17317062 1.62551176 1.15076311 2.24591763 1.99284489 -2.35373088
[97] 3.86807106 5.50186659 2.51879877 0.82435797 1.56822937 -9.69863069 -7.75684415 3.61224550

lines will draw lines between the points in the order they are given. Something like
plot(X2,Y2)
ox2 <- order(X2)
lines(X2[ox2],fitted(model2s)[ox2])
should give you what you're looking for.

Related

How to remove the prefix of each sample

I was stuck in removing the prefix of each sample. I have tried to remove all the number within the sample, but this could not be a good way for grouping. I would like to only keep the sample name as the last two suffix. ( For example: AAP-L ) The details are list as below. Thank you in advance!
geo$pd$title
[1] "AAB-HT002-AAP-L" "AAB-HT003-AAP-L" "AAB-HT006-AAP-L" "AAB-HT002-AAP-NL"
[5] "AAB-HT003-AAP-NL" "AAB-HT006-AAP-NL" "AAB-C007-AU-L" "AAB-HT001-AT-L"
[9] "AAB-N-C021-Normal-NC" "AAB-N-C022-Normal-NC" "AAB-C024-Normal-NC" "AAB-N-C025-Normal-NC"
[13] "AAB-HT010-AAP.T-L" "AAB-HT011-AAP-L" "AAB-HT012-AAP-L" "AAB-HT010-AAP.T-NL"
[17] "AAB-HT011-AAP-NL" "AAB-HT012-AAP-NL" "AAB-C013-AU-L" "AAB-C033-AU-L"
[21] "AAB-C037-AT-L" "AAB-C043-AU-L" "AAB-HT041-AU-L" "AAB-N-C026-Normal-NC"
[25] "AAB-N-C027-Normal-NC" "AAB-N-C028-Normal-NC" "AAB-N-C029-Normal-NC" "AAB-C014-AAP-L"
[29] "AAB-HT017-AAP.T-L" "AAB-HT018-AAP-L" "AAB-C014-AAP-NL" "AAB-HT017-AAP.T-NL"
[33] "AAB-HT018-AAP-NL" "AAB-C047-AT-L" "AAB-M044-AU-L" "AAB-N-C030-Normal-NC"
[37] "AAB-N-C032-Normal-NC" "AAB-N-C034-Normal-NC" "AAB-N-C035-Normal-NC" "AAB-C020-AAP.T-L"
[41] "AAB-C038-AAP-L" "AABM046-AAP-L" "AAB-C020-AAP.T-NL" "AABM046-AAP-NL"
[45] "AAB-C048-AT-L" "AAB-HT050-AT-L" "AAB-M-060-AU-L" "AAB-M-061-AU-L"
[49] "AAB-N-C036-Normal-NC" "AAB-N-C039-Normal-NC" "AAB-N-C042-Normal-NC" "AAB-N-C045-Normal-NC"
[53] "AAB-C052-AAP-L" "AAB-C076-AAP-L" "AAB-M056-AAP-L" "AAB-M058-AAP-L"
[57] "AAB-C052-AAP-NL" "AAB-C076-AAP-NL" "AAB-M056-AAP-NL" "AAB-M058-AAP-NL"
[61] "AAB-HT077-AU-L" "AAB-HT082-AU-L" "AAB-M080-AU-L" "AAB-N-C054-Normal-NC"
[65] "AAB-N-C055-Normal-NC" "AAB-N-C059-Normal-NC" "AAB-N-C062-Normal-NC" "AAB-C083-AAP-L"
[69] "AAB-HT009-AAP-L" "AAB-HT079-AAP-L" "AAB-SF086-AAP-L" "AAB-C083-AAP-NL"
[73] "AAB-HT079-AAP-NL" "AAB-SF086-AAP-NL" "AAB-C016-AU-L" "AAB-HT008-AU-L"
[77] "AAB-HT091-AT-L" "AAB-SF087-AU-L" "AAB-N-C063-Normal-NC" "AAB-N-C064-Normal-NC"
[81] "AAB-N-C065-Normal-NC" "AAB-HT103-AAP-L" "AAB-SF078-AAP.T-L" "AAB-SF099-AAP-L"
[85] "AAB-HT103-AAP-NL" "AAB-SF078-AAP.T-NL" "AAB-SF099-AAP-NL" "AAB-HT096-AT-L"
[89] "AAB-M094-AU-L" "AAB-SF089-AU-L" "AAB-SF090-AU-L" "AAB-SF100-AU-L"
[93] "AAB-N-C069-Normal-NC" "AAB-N-C070-Normal-NC" "AAB-N-C071-Normal-NC" "AAB-N-C072-Normal-NC"
[97] "AAB-N-C074-Normal-NC" "AAB-N-C075-Normal-NC" "AAB-N-C085-Normal-NC" "AAB-C092-Normal-NC"
[101] "AAB-M112-AAP-L" "AAB-SF104-AAP-L" "AAB-SF114-AAP-L" "AAB-SF115-AAP.T-L"
[105] "AAB-M112-AAP-NL" "AAB-SF104-AAP-NL" "AAB-SF114-AAP-NL" "AAB-SF115-AAP.T-NL"
[109] "AAB-C109-AU-L" "AAB-C111-AU-L" "AAB-HT101-AU-L" "AAB-M110-AT-L"
[113] "AAB-SF106-AU-L" "AAB-SF113-AU-L" "AAB-N-C098-Normal-NC" "AAB-N-C105-Normal-NC"
[117] "AAB-N-C107-Normal-NC" "AAB-N-C108-Normal-NC" "AAB-HT095-AAP.T-L" "AAB-HT095-AAP.T-NL"
[121] "AAB-HT097-AT-L" "AAB-C093-Normal-NC"
Try this:
library(stringr)
# test data:
string <- c("AAB-HT002-AAP-L", "AAB-HT017-AAP.T-L", "AAB-HT003-AAP-L", "AAB-HT006-AAP-L", "AAB-HT002-AAP-NL")
str_split_fixed(string, '-', n=3)[, 3]
# output:
[1] "AAP-L" "AAP.T-L" "AAP-L" "AAP-L" "AAP-NL"
This will deliver the terminal (alpha+period)-dash-(alpha+period)-end components.
titles <-c("AAB-HT002-AAP-L", "AAB-HT003-AA.P-L", "AAB-HT006-AAP-L", "AAB-HT002-AA.P-NL")
sub( "(.+)([-])([[:alpha:].]+[-][[:alpha:].]+$)", "\\3", titles)
[1] "AAP-L" "AA.P-L" "AAP-L" "AA.P-NL"
We could use
library(stringr)
str_remove(string, ".*\\d+-")
[1] "AAP-L" "AAP.T-L" "AAP-L" "AAP-L" "AAP-NL"

Plot gaussian distribution over density of prices return under R [duplicate]

Instead of printing a nice curve, R is printing a dense mess of line segments. The edge of that dense mess looks exactly like the curve I'm trying to print, though.
I have datasets X2 and Y2 plotted and am trying to print a quadratic curve on the existing plot. Here's my code:
X22 <- X2^2
model2s <- lm(Y2 ~ X2 + X22)
plot(X2,Y2)
lines(X2,fitted(model2s))
X2:
[1] -2.69725933 -1.54282303 -1.91720835 -0.08528522 -2.57551112 -2.65955930 1.66190727 0.01135419
[9] -1.67597429 -0.46931267 1.31551076 1.78942814 -0.54821881 -2.93750249 -0.63519111 -2.17234702
[17] 2.26156660 -2.13808807 -0.74155513 2.65037057 2.44828088 -2.52896408 -2.02068505 -1.36222982
[25] 1.97171562 -0.27897421 -2.12562749 -0.85870780 0.71198294 1.24482512 0.20295272 -1.58949497
[33] -0.59396637 0.45486252 2.51659763 2.62181364 2.20407646 1.06466931 -1.43400604 0.01579675
[41] -0.33513385 -0.05453015 1.96167436 1.28448541 -2.69429783 -1.08642394 -0.09400599 2.98775967
[49] 2.05795131 1.58896527 0.67934349 -0.13352141 -0.52543898 -2.40677026 -0.13610972 -1.31887725
[57] -1.56066767 -1.35457660 1.16511448 -2.55372404 -2.28185200 -0.19699659 1.84159785 1.24092476
[65] -2.90374380 2.29220701 1.22968228 2.60137009 0.87307737 2.71556663 0.94467230 0.96922155
[73] 1.89863312 1.64500729 1.37186380 -1.87455109 1.15276643 0.26130981 -1.84580809 -1.32085543
[81] -2.41207641 0.19248616 -1.65741770 2.13950098 -1.69597327 -0.06976200 1.14711285 2.97132615
[89] 0.71798324 -1.02838913 0.44070700 2.07600642 -0.21917452 -0.36556134 2.60091749 -1.41738042
[97] 1.04864677 0.83080236 2.56432957 -0.72499588 -0.81415858 -2.49700816 -2.72860601 0.49777866
Y2:
[1] -9.00479135 -1.56827264 -3.85069478 3.80620694 -7.78195591 -6.21173824 4.24967581 3.39550072
[9] -2.51108153 1.71820705 5.44931613 4.97755290 2.66081793 -11.34941655 0.27113981 -5.27374362
[17] 3.55191243 -3.64065638 1.27630806 4.20004221 5.53455823 -6.48854059 -4.17995733 -2.00651295
[25] 3.72495467 0.68337096 -4.28579895 -1.37001146 4.87616860 6.06427661 1.70089898 -3.07543568
[33] 2.90859968 4.12792739 2.76034855 3.87910950 4.14718875 4.73100437 -2.38820139 3.32093131
[41] 1.86320165 3.27669364 2.46242358 4.92157619 -7.90548937 -0.75929903 2.94267998 1.74858185
[49] 3.45587195 3.74016585 4.00274064 2.93845395 1.85504582 -4.30620277 3.40285048 -1.11881798
[57] -0.50718093 -0.43403754 2.54878083 -6.90253145 -5.37796863 3.25636120 3.41966211 3.40255742
[65] -8.59066220 1.82125444 3.20829746 2.46454987 4.09421369 1.79725157 5.61761174 4.55423983
[73] 3.12240983 2.86139737 4.00807877 -4.19551852 4.63684416 4.82350596 -2.73656766 -1.69755051
[81] -4.16628941 2.60384722 -2.77361082 3.98215540 -2.73349536 1.61857480 4.05148933 3.57791895
[89] 3.35775758 1.13832332 4.17317062 1.62551176 1.15076311 2.24591763 1.99284489 -2.35373088
[97] 3.86807106 5.50186659 2.51879877 0.82435797 1.56822937 -9.69863069 -7.75684415 3.61224550
lines will draw lines between the points in the order they are given. Something like
plot(X2,Y2)
ox2 <- order(X2)
lines(X2[ox2],fitted(model2s)[ox2])
should give you what you're looking for.

R - Operations over corresponding vector items in list

Let's say I have a list of vectors, like so:
[[1]]
[1] -0.36603596 -0.41461025 -0.68573296 -0.55516173 0.05071238 0.47723472 0.10851948
[8] 0.67005116 0.25519780 -0.79428716 0.16506077 0.81905548 0.22808934 -0.39257712
[15] 0.44778539 -0.36149934 -0.90142102 -0.99826169 0.24544167 -0.18989310 -0.67592344
[22] -0.65447808 0.26617179 -0.25020153 0.19562031 0.53520465 -0.47531100 -0.60152887
[29] 0.12012461 -0.68947499 -0.33258301 0.19914520 -0.70396942 0.21574644 -0.67197365
[36] -0.12744723 -0.07113916 0.44497439 0.07592963 -0.29082130 -0.27967624 0.28314801
[43] -0.09840383 -0.55582233 -0.29474315 -0.41717316 0.51017306 -0.31227399 0.39484400
[50] -0.88843530
[[2]]
[1] -0.14763873 -0.69009083 -0.55705599 -0.43779047 0.15626341 -0.00629513 -0.95227841
[8] 0.85645849 -0.40110676 -0.35732008 0.31375323 0.71478975 0.02262899 -0.12802829
[15] 0.58750725 -0.25629463 -0.65609956 -0.83185625 -0.35244759 -0.33287717 -0.99199682
[22] -0.45836093 -0.19431609 -0.41590652 1.06120542 0.20687783 0.13268137 -0.34219985
[29] -0.18096691 -0.24496102 -0.47769117 0.89134577 -0.56128402 0.70825268 0.10426368
[36] -0.13962506 -0.72478276 -0.40178315 0.65943132 -0.82083464 0.22569929 -1.02243310
[43] -0.70983610 -1.36733592 0.68807554 0.09156598 0.76850778 -0.64040433 0.79276407
[50] -0.40297792
[[3]]
[1] 0.34405450 -0.07928067 0.08353835 -0.37919066 -0.47233278 -0.38839824 -0.13269067
[8] 0.17348495 0.42777652 -0.19297300 -0.86438130 0.75787336 -0.34358747 0.47852682
[15] 1.29980892 -0.42527812 -0.25074922 -0.59565850 0.32800193 -0.56109570 -0.72905476
[22] -0.11498356 -0.29827083 -0.21653428 0.78533418 0.64735755 0.31889828 -0.37129803
[29] -0.51252162 0.24192268 -0.29281809 1.03299397 -0.11251429 0.13157698 -0.06404053
[36] 0.01904473 -0.13162565 0.30488937 0.31933970 0.14135025 -0.31501649 0.16738399
[43] -0.19627252 -1.29613018 -0.03572980 -0.72008672 0.13932428 -0.06117093 -0.62665670
[50] -0.12662761
[[4]]
[1] 0.183303468 0.160037845 -0.053473912 0.005199917 -0.126312554 0.116465956 -0.061730281
[8] 0.392903969 -0.008337453 -0.752631038 -0.235599857 0.999534398 0.375208363 0.201100799
[15] 0.444068886 -0.575795949 -0.873388633 -0.863612264 0.076050073 -0.188358603 -0.391865671
[22] -1.726690292 -1.206992567 -0.547175750 0.290255919 1.119834989 0.551360182 -0.510140345
[29] -0.460314706 -0.245835558 -0.315087602 0.947181076 -0.132550448 0.038419545 -0.017929636
[36] 0.041870497 -0.520961791 0.195326850 -0.117783785 -0.427426472 -0.119577158 0.702550914
[43] -0.045789957 -0.794299036 0.181420440 0.407347072 0.571894407 -0.217325835 0.280283391
[50] -0.492866084
[[5]]
[1] -0.40852268 -0.33488615 -0.30609700 -0.67467326 -0.11966383 1.01161858 -0.27108333
[8] 0.92772286 0.39047166 0.29019594 0.24404167 0.07824440 0.32786441 0.21657727
[15] 0.34362648 -0.44996166 -0.27823770 -1.24962127 -0.57241699 -0.30297804 -0.66728157
[22] 0.01783441 0.50773758 -0.31477033 -0.14581338 -0.13827194 -0.25574117 0.40049840
[29] 0.38634920 -0.29027963 -0.03381480 0.48510557 -0.61594522 1.09573928 -0.27992008
[36] -0.41523542 -0.24131548 0.43480320 0.32855110 0.48579320 0.47366867 0.62697303
[43] -0.57792202 -0.81951194 0.21583044 0.15593484 -0.10270703 -0.10206812 -0.25195873
[50] -0.89835763
I want to average corresponding vector items (e.g.: [[1]][1], [[1]][2], [[1]][3], etc.) to result in a single vector of averaged values. For instance, the mean of every first vector item across the list would be -0.07896788. What's the best way to go about this?
let's say list is called mylist:
mydf=as.data.frame(do.call("rbind",mylist))
colMeans(mydf)
would that be the desired output?

How to plot colours for data frame values in R?

I have a data frame of 168 values - below is an example:
[1] 10.20825145 10.49029738 9.47768668 11.37237685 9.77536685 -9.96578428 9.84730064 -9.96578428 -9.96578428 9.67701164 8.89308834
[12] -9.96578428 -9.96578428 -9.96578428 4.88074954 10.83777007 9.67240471 11.55113265 12.29597119 11.17761580 1.27119342 5.89488206
[23] 11.04314439 11.51956302 8.88611025 0.41593543 10.09092012 1.11935342 2.29304065 6.44051757 7.27223875 4.17286046 12.29597119
[34] 8.93756226 -9.96578428 2.82374114 -9.96578428 -9.96578428 6.78451866 6.75725141 3.30799055 1.33285052 11.10138287 9.56310341
[45] 6.05138487 11.16478498 0.64163540 2.25818628 6.84610893 0.90170156 5.03961679 8.06503755 1.91447714 1.99289237 10.05683543
[56] 4.18387615 1.44569558 9.44208535 10.76103696 10.07250772 5.65824078 11.44590482 4.98525549 9.27145969 4.62079778 -9.96578428
[67] -0.15866721 7.84066444 10.64691705 8.10132712 10.42130331 7.63017724 9.81489036 10.44605958 7.61256542 10.59292091 10.68115428
[78] 8.63528904 7.08127497 9.37016682 9.72611928 8.79221371 11.37733558 10.13409536 8.54228484 9.19473411 9.22357213 -9.96578428
[89] 2.06859467 7.85102680 10.21632083 7.32085557 7.17868855 7.29012838 9.39064690 11.21826736 0.99311790 10.73680716 -1.27079596
[100] 1.56468983 -1.53765829 1.52571260 7.59811777 11.25804316 5.76919580 1.46352533 10.66897438 -0.19396590 -9.96578428 -0.10920277
[111] 10.27374790 1.86021231 0.05229581 9.10927587 6.75497052 -9.96578428 7.37624442 -1.18384163 10.09532648 10.66210443 0.97845531
[122] -0.58780829 1.70242105 7.11891287 -1.00259672 -9.96578428 3.44482985 3.66543196 2.30526333 9.25052252 0.47603010 0.67767918
[133] 0.53495561 -9.96578428 -0.25681726 -0.88592846 12.28143934 11.48635730 1.57340309 1.81157359 5.22452852 2.82243460 2.63202605
[144] 10.96672824 11.39766334 6.32855877 3.35147803 1.85503403 11.07168816 0.62804624 1.26195498 1.84045927 2.36940606 10.72429922
[155] 9.03370799 0.17404750 0.35583693 0.01601167 8.74355131 10.53061214 -1.02983443 -9.96578428 11.00097153 2.29188360 4.60733174
[166] 0.72027563 1.33766127 -1.02773393
I know want to plot a graph whereby the colours for each datapoint move from red to green with an increase in value. I want to preserve the order of the data frame.
numbers <-(dataframe)
my_palette <- colorRampPalette(c("red", "yellow", "green"))(168))
plot(col= my_palette,numbers, pch=20)
However, this plots colours in ascending order of the data frame indices not the values.
Any advice on how to solve this would be appreciated. Thank you
Instead of col = my_pallete, use col = my_pallete[rank(numbers)]

adaboost model gives a vector of output for one row

I have built a model using Adaboost. When I give one row as input, this is the output I get. I was expecting to get just one number as the prediction
> predict(Model,testset[1,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.5078101 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.4976465 0.5046798 0.5047607 0.4957011
[33] 0.5048601 0.5039299 0.5032739 0.5042044 0.5044005 0.5044902 0.5037352 0.4981865
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.5051926 0.5021917 0.5015447 0.5029390
[49] 0.4951465 0.5033675
> predict(Model,testset[2,],type="prob")[,2]
[1] 0.5159268 0.5143351 0.5135043 0.5127763 0.5116162 0.5097892 0.5098299 0.5098701
[9] 0.5083176 0.5088486 0.5073487 0.5082424 0.4921899 0.5073640 0.5053638 0.5066038
[17] 0.5063418 0.5055067 0.5060952 0.5051869 0.5050157 0.5038692 0.5040837 0.5052188
[25] 0.5040825 0.5046496 0.5050795 0.5042205 0.5023535 0.4953202 0.5047607 0.5042989
[33] 0.4951399 0.5039299 0.4967261 0.5042044 0.5044005 0.4955098 0.5037352 0.5018135
[41] 0.5021579 0.5038746 0.5043289 0.5032334 0.4948074 0.5021917 0.4984553 0.5029390
[49] 0.4951465 0.5033675
If I give say 5 rows as input, as expected I get 5 predictions.
> predict(Model,testset[1:5,],type="prob")[,2]
[1] 0.7470780 0.7101257 0.4795726 0.7451049 0.5607364
Why is the first command giving me 50 predictions when I'm giving just one row as input?

Resources