Plot of Density, type="l" - r

I have a output for density estimation.
$x
[1] 0.100001 0.600001 0.500001 0.800001 0.500001 0.100001 0.600001 0.300001
[9] 0.100001 0.400001 0.700001 0.500001 0.000001 0.200001 0.700001 0.500001
[17] 0.000001 0.400001 0.500001 0.400001 0.200001 0.100001 0.600001 0.700001
[25] 0.700001 0.200001 0.800001 0.500001 0.200001 0.200001
$y
[1] 1.2246774 1.1437131 1.3626914 0.6381394 1.3626914 1.2246774 1.1437131
[8] 1.5893983 1.2246774 1.5158009 0.8852983 1.3626914 0.6912818 1.5227328
[15] 0.8852983 1.3626914 0.6912818 1.5158009 1.3626914 1.5158009 1.5227328
[22] 1.2246774 1.1437131 0.8852983 0.8852983 1.5227328 0.6381394 1.3626914
[29] 1.5227328 1.5227328
where x are grid points and y are estimated values. When these are plotted, its graph is very weird with type "l"
. Its a density plot which should have a single line. Please guide me how it can be soughed.

You need to plot them in the correct order (ordered according to the value of x):
plot(sort(x), y[order(x)], type = "l")
Reproducible data
x <- c(0.100001, 0.600001, 0.500001, 0.800001, 0.500001, 0.100001,
0.600001, 0.300001, 0.100001, 0.400001, 0.700001, 0.500001, 1e-06,
0.200001, 0.700001, 0.500001, 1e-06, 0.400001, 0.500001, 0.400001,
0.200001, 0.100001, 0.600001, 0.700001, 0.700001, 0.200001, 0.800001,
0.500001, 0.200001, 0.200001)
y <- c(1.2246774, 1.1437131, 1.3626914, 0.6381394, 1.3626914, 1.2246774,
1.1437131, 1.5893983, 1.2246774, 1.5158009, 0.8852983, 1.3626914,
0.6912818, 1.5227328, 0.8852983, 1.3626914, 0.6912818, 1.5158009,
1.3626914, 1.5158009, 1.5227328, 1.2246774, 1.1437131, 0.8852983,
0.8852983, 1.5227328, 0.6381394, 1.3626914, 1.5227328, 1.5227328)
Note that in your data, for some reason there are multiple points with the same values.

Related

building circular graphic for angles in degrees

I have a data of turning angles for a group of animals separated by occupation areas (breeding ground, migratory route, feeding area).
I need to plot a circular graphic in R for angle values in degrees for each area.
The angle values are like that in the data frame
[1] NA 41.027 -43.410 29.056 18.241 -7.125 -4.702 0.298
[9] 37.846 -7.545 -69.403 -7.376 17.289 7.927 60.752 -85.219
[17] 24.218 -17.482 3.703 -3.901 -8.582 -84.871 38.448 44.028
[25] -150.796 -59.679 -169.927 -6.862 51.130 -1.784 -16.468 -2.356
[33] 5.645 -6.988 4.750 -5.707 2.949 -6.150 -4.129 0.869
[41] -1.935 5.130 0.559 4.686 145.086 14.324 -169.206 1.741
[49] 53.595 15.315 36.892 49.279 21.171 10.739 122.553 -141.081
[57] 3.126 48.323 -7.139 163.742 141.473 47.320 128.430 175.918
[65] 7.447 -16.159 55.957 37.351 -2.703 -25.308 -31.338 NA
[73] NA -16.028 25.110 -31.085 -92.887 88.917 146.903 -148.539
[81] -11.576 41.030 -155.616 -129.368 -32.886 -164.284 -120.785 118.591
[89] 68.335 -98.038 40.347 166.333 19.495 -170.337 -178.322 99.111
can someone help me with this simple question? thank u!
It is not clear what you want, but here are two visualizations that might help. The first just plots points on the unit circle to show the angles. The second version has lines in the directions of turn. BTW, I simply left out your NAs.
Data at the bottom
x = cos(pi*Turns/180)
y = sin(pi*Turns/180)
par(mfrow=c(1,2))
plot(x,y, pch=20, col="#22222266", asp=1)
plot(x,y, pch=20, col="#22222266", asp=1)
N = length(x)
segments(0, 0, x, y)
Data
Turns = c(41.027 -43.410, 29.056, 18.241, -7.125, -4.702, 0.298,
37.846, -7.545 -69.403, -7.376, 17.289, 7.927, 60.752 -85.219,
24.218, -17.482, 3.703, -3.901, -8.582 -84.871, 38.448, 44.028,
-150.796, -59.679 -169.927, -6.862, 51.130, -1.784 -16.468, -2.356,
5.645, -6.988, 4.750, -5.707, 2.949, -6.150, -4.129, 0.869,
-1.935, 5.130, 0.559, 4.686, 145.086, 14.324 -169.206, 1.741,
53.595, 15.315, 36.892, 49.279, 21.171, 10.739, 122.553, -141.081,
3.126, 48.323, -7.139, 163.742, 141.473, 47.320, 128.430, 175.918,
7.447, -16.159, 55.957, 37.351, -2.703 -25.308, -31.338,
-16.028, 25.110, -31.085, -92.887, 88.917, 146.903, -148.539,
-11.576, 41.030, -155.616, -129.368, -32.886, -164.284, -120.785, 118.591,
68.335, -98.038, 40.347, 166.333, 19.495, -170.337, -178.322, 99.111)

Plot gaussian distribution over density of prices return under R [duplicate]

Instead of printing a nice curve, R is printing a dense mess of line segments. The edge of that dense mess looks exactly like the curve I'm trying to print, though.
I have datasets X2 and Y2 plotted and am trying to print a quadratic curve on the existing plot. Here's my code:
X22 <- X2^2
model2s <- lm(Y2 ~ X2 + X22)
plot(X2,Y2)
lines(X2,fitted(model2s))
X2:
[1] -2.69725933 -1.54282303 -1.91720835 -0.08528522 -2.57551112 -2.65955930 1.66190727 0.01135419
[9] -1.67597429 -0.46931267 1.31551076 1.78942814 -0.54821881 -2.93750249 -0.63519111 -2.17234702
[17] 2.26156660 -2.13808807 -0.74155513 2.65037057 2.44828088 -2.52896408 -2.02068505 -1.36222982
[25] 1.97171562 -0.27897421 -2.12562749 -0.85870780 0.71198294 1.24482512 0.20295272 -1.58949497
[33] -0.59396637 0.45486252 2.51659763 2.62181364 2.20407646 1.06466931 -1.43400604 0.01579675
[41] -0.33513385 -0.05453015 1.96167436 1.28448541 -2.69429783 -1.08642394 -0.09400599 2.98775967
[49] 2.05795131 1.58896527 0.67934349 -0.13352141 -0.52543898 -2.40677026 -0.13610972 -1.31887725
[57] -1.56066767 -1.35457660 1.16511448 -2.55372404 -2.28185200 -0.19699659 1.84159785 1.24092476
[65] -2.90374380 2.29220701 1.22968228 2.60137009 0.87307737 2.71556663 0.94467230 0.96922155
[73] 1.89863312 1.64500729 1.37186380 -1.87455109 1.15276643 0.26130981 -1.84580809 -1.32085543
[81] -2.41207641 0.19248616 -1.65741770 2.13950098 -1.69597327 -0.06976200 1.14711285 2.97132615
[89] 0.71798324 -1.02838913 0.44070700 2.07600642 -0.21917452 -0.36556134 2.60091749 -1.41738042
[97] 1.04864677 0.83080236 2.56432957 -0.72499588 -0.81415858 -2.49700816 -2.72860601 0.49777866
Y2:
[1] -9.00479135 -1.56827264 -3.85069478 3.80620694 -7.78195591 -6.21173824 4.24967581 3.39550072
[9] -2.51108153 1.71820705 5.44931613 4.97755290 2.66081793 -11.34941655 0.27113981 -5.27374362
[17] 3.55191243 -3.64065638 1.27630806 4.20004221 5.53455823 -6.48854059 -4.17995733 -2.00651295
[25] 3.72495467 0.68337096 -4.28579895 -1.37001146 4.87616860 6.06427661 1.70089898 -3.07543568
[33] 2.90859968 4.12792739 2.76034855 3.87910950 4.14718875 4.73100437 -2.38820139 3.32093131
[41] 1.86320165 3.27669364 2.46242358 4.92157619 -7.90548937 -0.75929903 2.94267998 1.74858185
[49] 3.45587195 3.74016585 4.00274064 2.93845395 1.85504582 -4.30620277 3.40285048 -1.11881798
[57] -0.50718093 -0.43403754 2.54878083 -6.90253145 -5.37796863 3.25636120 3.41966211 3.40255742
[65] -8.59066220 1.82125444 3.20829746 2.46454987 4.09421369 1.79725157 5.61761174 4.55423983
[73] 3.12240983 2.86139737 4.00807877 -4.19551852 4.63684416 4.82350596 -2.73656766 -1.69755051
[81] -4.16628941 2.60384722 -2.77361082 3.98215540 -2.73349536 1.61857480 4.05148933 3.57791895
[89] 3.35775758 1.13832332 4.17317062 1.62551176 1.15076311 2.24591763 1.99284489 -2.35373088
[97] 3.86807106 5.50186659 2.51879877 0.82435797 1.56822937 -9.69863069 -7.75684415 3.61224550
lines will draw lines between the points in the order they are given. Something like
plot(X2,Y2)
ox2 <- order(X2)
lines(X2[ox2],fitted(model2s)[ox2])
should give you what you're looking for.

Understanding xgb.dump

I'm trying to understand the intuition about what is going on in the xgb.dump of a binary classification with an interaction depth of 1. Specifically how the same split is used twiced in a row (f38 < 2.5) (code lines 2 and 6)
The resulting output looks like this:
xgb.dump(model_2,with.stats=T)
[1] "booster[0]"
[2] "0:[f38<2.5] yes=1,no=2,missing=1,gain=173.793,cover=6317"
[3] "1:leaf=-0.0366182,cover=3279.75"
[4] "2:leaf=-0.0466305,cover=3037.25"
[5] "booster[1]"
[6] "0:[f38<2.5] yes=1,no=2,missing=1,gain=163.887,cover=6314.25"
[7] "1:leaf=-0.035532,cover=3278.65"
[8] "2:leaf=-0.0452568,cover=3035.6"
Is the difference between the first use of f38 and the second use of f38 simply the residual fitting going on? At first it seemed weird to me, and trying to understand exactly what's going on here!
Thanks!
Is the difference between the first use of f38 and the second use of f38 simply the residual fitting going on?
most likely yes - its updating the gradient after the first round and finding the same feature with split point in your example
Here's a reproducible example.
Note how I lower the learning rate in the second example and its finds the same feature, same split point again for all three rounds. In the first example it uses different features in all 3 rounds.
require(xgboost)
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(data = train$data, label=train$label)
#high learning rate, finds different first split feature (f55,f28,f66) in each tree
bst <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=1.71218"
# [5] "4:leaf=-1.70044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-1.94071" "6:leaf=1.85965"
# [9] "booster[1]" "0:[f59<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f28<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.784718"
# [13] "4:leaf=-0.96853" "2:leaf=-6.23624"
# [15] "booster[2]" "0:[f101<-9.53674e-07] yes=1,no=2,missing=1"
# [17] "1:[f66<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.658725"
# [19] "4:leaf=5.77229" "2:[f110<-9.53674e-07] yes=5,no=6,missing=5"
# [21] "5:leaf=-0.791407" "6:leaf=-9.42142"
## changed eta to lower learning rate, finds same feature(f55) in first split of each tree
bst2 <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = .01, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst2)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0171218"
# [5] "4:leaf=-0.0170044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-0.0194071" "6:leaf=0.0185965"
# [9] "booster[1]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.016952"
# [13] "4:leaf=-0.0168371" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [15] "5:leaf=-0.0192151" "6:leaf=0.0184251"
# [17] "booster[2]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [19] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0167863"
# [21] "4:leaf=-0.0166737" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [23] "5:leaf=-0.0190286" "6:leaf=0.0182581"

Dividing components of a vector into several data points in R

I am trying to turn a vector of length n (say, 14), and turn it into a vector of length N (say, 90). For example, my vector is
x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
and I want to turn it into a vector of length 90, by creating 90 equally "spaced" points on this vector- think of x as a function. Is there any way to do that in R?
Something like this?
> x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
> seq(min(x),max(x),length=90)
[1] 2.000000 2.426966 2.853933 3.280899 3.707865 4.134831 4.561798
[8] 4.988764 5.415730 5.842697 6.269663 6.696629 7.123596 7.550562
[15] 7.977528 8.404494 8.831461 9.258427 9.685393 10.112360 10.539326
[22] 10.966292 11.393258 11.820225 12.247191 12.674157 13.101124 13.528090
[29] 13.955056 14.382022 14.808989 15.235955 15.662921 16.089888 16.516854
[36] 16.943820 17.370787 17.797753 18.224719 18.651685 19.078652 19.505618
[43] 19.932584 20.359551 20.786517 21.213483 21.640449 22.067416 22.494382
[50] 22.921348 23.348315 23.775281 24.202247 24.629213 25.056180 25.483146
[57] 25.910112 26.337079 26.764045 27.191011 27.617978 28.044944 28.471910
[64] 28.898876 29.325843 29.752809 30.179775 30.606742 31.033708 31.460674
[71] 31.887640 32.314607 32.741573 33.168539 33.595506 34.022472 34.449438
[78] 34.876404 35.303371 35.730337 36.157303 36.584270 37.011236 37.438202
[85] 37.865169 38.292135 38.719101 39.146067 39.573034 40.000000
>
Try this:
#data
x <- c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
#expected new length
N=90
#number of numbers between 2 numbers
my.length.out=round((N-length(x))/(length(x)-1))+1
#new data
x1 <- unlist(
lapply(1:(length(x)-1), function(i)
seq(x[i],x[i+1],length.out = my.length.out)))
#plot
par(mfrow=c(2,1))
plot(x)
plot(x1)

When to use approxfun vs. approx

The documentation for approxfun states that it is "often more useful than approx". I'm struggling to get my head around approxfun. When would approxfun be more useful than approx (and when would approx be more useful)?
approx returns the value of the approximated function at (either) specified points or at a given number of points. approxfun returns a function which can then be evaluated at some specific points. If you need the approximation at points that you know at the time of making the approximation, approx will do that for you. If you need a function (in the mathematical sense) which will return the value of the approximation for some argument given later, approxfun is what you need.
Here are some examples.
dat <- data.frame(x=1:10, y=(1:10)^2)
The output from approx and approxfun using this data
> approx(dat$x, dat$y)
$x
[1] 1.000000 1.183673 1.367347 1.551020 1.734694 1.918367 2.102041
[8] 2.285714 2.469388 2.653061 2.836735 3.020408 3.204082 3.387755
[15] 3.571429 3.755102 3.938776 4.122449 4.306122 4.489796 4.673469
[22] 4.857143 5.040816 5.224490 5.408163 5.591837 5.775510 5.959184
[29] 6.142857 6.326531 6.510204 6.693878 6.877551 7.061224 7.244898
[36] 7.428571 7.612245 7.795918 7.979592 8.163265 8.346939 8.530612
[43] 8.714286 8.897959 9.081633 9.265306 9.448980 9.632653 9.816327
[50] 10.000000
$y
[1] 1.000000 1.551020 2.102041 2.653061 3.204082 3.755102
[7] 4.510204 5.428571 6.346939 7.265306 8.183673 9.142857
[13] 10.428571 11.714286 13.000000 14.285714 15.571429 17.102041
[19] 18.755102 20.408163 22.061224 23.714286 25.448980 27.469388
[25] 29.489796 31.510204 33.530612 35.551020 37.857143 40.244898
[31] 42.632653 45.020408 47.408163 49.918367 52.673469 55.428571
[37] 58.183673 60.938776 63.693878 66.775510 69.897959 73.020408
[43] 76.142857 79.265306 82.551020 86.040816 89.530612 93.020408
[49] 96.510204 100.000000
> approxfun(dat$x, dat$y)
function (v)
.C(C_R_approxfun, as.double(x), as.double(y), as.integer(n),
xout = as.double(v), as.integer(length(v)), as.integer(method),
as.double(yleft), as.double(yright), as.double(f), NAOK = TRUE,
PACKAGE = "stats")$xout
<bytecode: 0x05244854>
<environment: 0x030632fc>
More examples of usage:
a <- approx(dat$x, dat$y)
af <- approxfun(dat$x, dat$y)
plot(dat)
points(a, pch=2)
plot(dat)
curve(af, add=TRUE)
or another example where a function is needed:
> uniroot(function(x) {af(x)-4}, interval=c(1,10))
$root
[1] 1.999994
$f.root
[1] -1.736297e-05
$iter
[1] 24
$estim.prec
[1] 6.103516e-05

Resources