Related
I am trying to calculate a 20-day rolling percent change in R based off of a stock's closing price. Below is a sample of the most recent 100 days of closing price data. df$Close[1] is the most recent day, df$Close[2] is the previous day, and so on.
df$Close
[1] 342.94 346.22 346.18 335.24 330.45 334.20 325.45 333.79 334.90 341.66 333.74 334.49 329.75 329.82 330.56 322.81 317.87 306.84
[19] 310.39 310.60 324.46 338.03 333.12 341.06 337.25 341.01 345.30 338.69 340.77 342.96 347.56 340.89 327.74 327.64 335.37 338.62
[37] 341.13 335.85 331.62 328.08 329.98 323.57 316.92 312.22 315.81 328.69 324.61 341.88 340.78 339.99 335.34 324.76 328.53 324.54
[55] 323.77 325.45 330.05 329.22 333.64 332.96 326.23 343.01 339.39 339.61 340.65 353.58 352.96 345.96 343.21 357.48 355.70 364.72
[73] 373.06 373.92 376.53 376.51 378.69 378.00 377.57 382.18 376.26 375.28 382.05 379.38 380.66 372.63 364.38 368.39 365.51 363.35
[91] 359.37 355.12 355.45 358.45 366.56 363.18 362.65 359.96 361.13 361.61
Previously, I had used the following code to calculate the percent change:
PercChange(df, Var = 'Close', type = 'percent', NewVar = 'OneMonthChange', slideBy = 20)
which gave me the following output:
df$OneMonthChange
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02 -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02 -1.722569e-02
[10] -3.790530e-03 -3.976292e-02 -1.877438e-02 6.132910e-03 6.653644e-03 -1.434237e-02 -4.668950e-02 -6.818515e-02 -8.637785e-02
[19] -6.401906e-02 -5.327969e-02 -1.672829e-02 4.468894e-02 5.111700e-02 9.237076e-02 6.788892e-02 3.748213e-02 6.373802e-02
[28] -9.330759e-03 -2.934445e-05 8.735551e-03 3.644063e-02 4.966745e-02 -2.404651e-03 9.551981e-03 3.582790e-02 4.046705e-02
[37] 3.357067e-02 2.013851e-02 -6.054430e-03 -1.465642e-02 1.149496e-02 -5.667473e-02 -6.620702e-02 -8.065134e-02 -7.291942e-02
[46] -7.039425e-02 -8.032072e-02 -1.179327e-02 -7.080213e-03 -4.892581e-02 -5.723925e-02 -1.095635e-01 -1.193642e-01 -1.320603e-01
[55] -1.401216e-01 -1.356139e-01 -1.284428e-01 -1.290476e-01 -1.163493e-01 -1.287875e-01 -1.329666e-01 -8.598913e-02 -1.116608e-01
[64] -1.048289e-01 -1.051069e-01 -5.112310e-02 -3.134091e-02 -6.088656e-02 -6.101064e-02 -1.615522e-02 -1.021232e-02 2.703312e-02
[73] 4.954283e-02 4.315804e-02 2.719882e-02 3.670356e-02 4.422997e-02 5.011668e-02 4.552377e-02 5.688449e-02 3.507469e-02
[82] 3.391465e-02 6.444333e-02 8.011616e-02 8.157409e-02 4.583216e-02 1.691226e-02 -1.310009e-02 -6.253229e-03 -2.445900e-02
[91] -2.817816e-02 1.119052e-02 2.662970e-02 4.914242e-02 8.787654e-02 6.454450e-02 5.280729e-02 3.546875e-02 2.567525e-02
[100] 2.392683e-02
The PercChange function has now been deprecated and I need to find a new function to replace it. Essentially, I need a function that calculates the percent change of df$Close[1:20] (This would be Close of day 1 minus close of day 20, divided by close of day 20), then rolls to [2:21] for the next row, then [3:22],[4:23], and so on.
Thanks in advance!
A tidyverse approach
library(tidyr)
library(dplyr)
df %>% mutate(OneMonthChange=(Close-lead(Close, 20))/lead(Close, 20),
OneMonthChange=replace_na(OneMonthChange,0))
Close OneMonthChange
1 342.94 5.695617e-02
2 346.22 2.422862e-02
3 346.18 3.920509e-02
4 335.24 -1.706445e-02
5 330.45 -2.016308e-02
6 334.20 -1.997009e-02
etc...
Here is a simple Base R solution:
PercChange<- function(x, slideBy){
-diff(x, slideBy)/ tail(x, -slideBy)
}
PercChange(df$Close, slideBy = 20)
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02
[5] -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02
[9] -1.722569e-02 -3.790530e-03 -3.976292e-02 -1.877438e-02
If you desire a datframe back, then modify this into:
PercChange<- function(data, Var, NewVar, slideBy){
x <- data[[Var]]
data[NewVar] <- c(-diff(x, slideBy)/ tail(x, -slideBy), numeric(slideBy))
data
}
PercChange(df, Var = 'Close', NewVar = 'OneMonthChange', slideBy = 20)
data:
df <- structure(list(Close = c(342.94, 346.22, 346.18, 335.24, 330.45,
334.2, 325.45, 333.79, 334.9, 341.66, 333.74, 334.49, 329.75,
329.82, 330.56, 322.81, 317.87, 306.84, 310.39, 310.6, 324.46,
338.03, 333.12, 341.06, 337.25, 341.01, 345.3, 338.69, 340.77,
342.96, 347.56, 340.89, 327.74, 327.64, 335.37, 338.62, 341.13,
335.85, 331.62, 328.08, 329.98, 323.57, 316.92, 312.22, 315.81,
328.69, 324.61, 341.88, 340.78, 339.99, 335.34, 324.76, 328.53,
324.54, 323.77, 325.45, 330.05, 329.22, 333.64, 332.96, 326.23,
343.01, 339.39, 339.61, 340.65, 353.58, 352.96, 345.96, 343.21,
357.48, 355.7, 364.72, 373.06, 373.92, 376.53, 376.51, 378.69,
378, 377.57, 382.18, 376.26, 375.28, 382.05, 379.38, 380.66,
372.63, 364.38, 368.39, 365.51, 363.35, 359.37, 355.12, 355.45,
358.45, 366.56, 363.18, 362.65, 359.96, 361.13, 361.61)), class = "data.frame", row.names = c(NA,
-100L))
I am using the R programming language.
Using the following code, I am able to put two plots on the same page:
#load library
library(dbscan)
#specify number of plots per page
par(mfrow = c(1,2))
#load libraries
library(dbscan)
library(dplyr)
#generate data
n <- 100
x <- cbind(
x=runif(10, 0, 5) + rnorm(n, sd=0.4),
y=runif(10, 0, 5) + rnorm(n, sd=0.4)
)
### calculate LOF score
lof <- lof(x, k=3)
### distribution of outlier factors (first plot)
summary(lof)
hist(lof, breaks=10)
### point size is proportional to LOF (second plot)
plot(x, pch = ".", main = "LOF (k=3)")
points(x, cex = (lof-1)*3, pch = 1, col="red")
This produces the following plot:
Now, I am trying to make several plots (e.g. 6 plots, 2 pairs of 3) on the same page. I tried to implement this with a "for loop" (for k = 3, 4, 5):
par(mfrow = c(3,2))
vals <- 3:5
combine <- vector('list', length(vals))
count <- 0
for (i in vals) {
lof_i <- lof(x, k=i)
### distribution of outlier factors
summary(lof_i)
hist(lof_i, breaks=10)
### point size is proportional to LOF
plot(x, pch = ".", main = "LOF (k=i)")
points(x, cex = (lof_i-1)*3, pch = 1, col="red")
}
However, this seems to just repeat the same graph 6 times on the same page:
Can someone please show me how to correct this code?
Is it also possible to save the files "lof_3, lof_4, lof_5"? It seems that none of these files are created, only "lof_i" is created:
> lof_3
Error: object 'lof_3' not found
> head(lof_i)
[1] 1.223307 1.033424 1.077149 1.011407 1.040634 1.431029
Thanks
Looking at your plots you seem to have generated and plotted different plots, but to have the labels correct you need to pass a variable and not a fixed character to your title (e.g. using the paste command).
To get the calculated values out of your loop you could either generate an empty list and assign the results in the loop to individual list elements, or use something like lapply that will automatically return the results in a list form.
To simplify things a bit you could define a function that either plots or returns the calculated values, e.g. like this:
library(dbscan)
#generate data
set.seed(123)
n <- 100
x <- cbind(
x=runif(10, 0, 5) + rnorm(n, sd=0.4),
y=runif(10, 0, 5) + rnorm(n, sd=0.4)
)
plotLOF <- function(i, plot=TRUE){
lof <- lof(x, k=i)
if (plot){
hist(lof, breaks=10)
plot(x, pch = ".", main = paste0("LOF (k=", i, ")"))
points(x, cex = (lof-1)*3, pch = 1, col="red")
} else return(lof)
}
par(mfrow = c(3,2))
invisible(lapply(3:5, plotLOF))
lapply(3:5, plotLOF, plot=FALSE)
#> [[1]]
#> [1] 1.1419243 0.9551471 1.0777472 1.1224447 0.8799095 1.0377858 0.8416306
#> [8] 1.0487133 1.0250496 1.3183819 0.9896833 1.0353398 1.3088266 1.0123238
#> [15] 1.1233530 0.9685039 1.0589151 1.3147785 1.0488644 0.9212146 1.2568698
#> [22] 1.0086274 1.0454450 0.9661698 1.0644528 1.1107202 1.0942201 1.5147076
#> [29] 1.0321698 1.0553455 1.1149748 0.9341090 1.2352716 0.9478602 1.4096464
#> [36] 1.0519127 1.0507267 1.3199825 1.2525485 0.9361488 1.0958563 1.2131615
#> [43] 0.9943090 1.0123238 1.1060491 1.0377766 0.9803135 0.9627699 1.1165421
#> [50] 0.9796819 0.9946925 2.1576989 1.6015310 1.5670315 0.9343637 1.0033725
#> [57] 0.8769431 0.9783065 1.0800050 1.2768800 0.9735274 1.0377472 1.0743988
#> [64] 1.7583562 1.2662485 0.9685039 1.1662145 1.2491499 1.1131718 1.0085023
#> [71] 0.9636864 1.1538360 1.2126138 1.0609829 1.0679010 1.0490234 1.1403292
#> [78] 0.9638900 1.1863703 0.9651060 0.9503445 1.0098536 0.8440855 0.9052420
#> [85] 1.2662485 1.4447713 1.0845415 1.0661381 0.9282678 0.9380078 1.1414628
#> [92] 1.0407138 1.0942201 1.0589805 1.0370938 1.0147094 1.1067291 0.8834466
#> [99] 1.7027132 1.1766560
#>
#> [[2]]
#> [1] 1.1667311 1.0409009 1.0920953 1.0068953 0.9894195 1.1332413 0.9764505
#> [8] 1.0228796 1.0446905 1.0893386 1.1211637 1.1029415 1.3453498 0.9712910
#> [15] 1.1635936 1.0265746 0.9480282 1.2144437 1.0570346 0.9314618 1.3345561
#> [22] 0.9816097 0.9929112 1.0322014 1.2739621 1.2947553 1.0202948 1.6153264
#> [29] 1.0790922 0.9987830 1.0378609 0.9622779 1.2974938 0.9129639 1.2601398
#> [36] 1.0265746 1.0241622 1.2420568 1.2204376 0.9297345 1.1148404 1.2546361
#> [43] 1.0059582 0.9819820 1.0342491 0.9452673 1.0369500 0.9791091 1.2000825
#> [50] 0.9878844 1.0205586 2.0057587 1.2757014 1.5347815 0.9622614 1.0692613
#> [57] 1.0026404 0.9408510 1.0280687 1.3534531 0.9669894 0.9300601 0.9929112
#> [64] 1.7567871 1.3861828 1.0265746 1.1120151 1.3542396 1.1562077 0.9842179
#> [71] 1.0301098 1.2326327 1.1866352 1.0403814 1.0577086 0.8745912 1.0017905
#> [78] 0.9904356 1.0602487 0.9501681 1.0176457 1.0405430 0.9718224 1.0046821
#> [85] 1.1909982 1.6151918 0.9640852 1.0141963 1.0270237 0.9867738 1.1474414
#> [92] 1.1293307 1.0323945 1.0859417 0.9622614 1.0290635 1.0186381 0.9225209
#> [99] 1.6456612 1.1366753
#>
#> [[3]]
#> [1] 1.1299335 1.0122028 1.2077092 0.9485150 1.0115694 1.1190314 0.9989174
#> [8] 1.0145663 1.0357546 0.9783702 1.1050504 1.0661798 1.3571416 1.0024603
#> [15] 1.1484745 1.0162149 0.9601474 1.1310442 1.0957731 1.0065501 1.2687934
#> [22] 0.9297323 0.9725355 0.9876444 1.2314822 1.2209304 0.9906446 1.4249452
#> [29] 1.2156607 0.9959685 1.0304305 0.9976110 1.1711354 1.0048161 0.9813000
#> [36] 1.0128909 0.9730295 1.1741982 1.3317209 0.9708714 1.0994309 1.1900047
#> [43] 0.9960765 0.9659553 0.9744357 0.9556112 1.0508484 0.9669406 1.3919743
#> [50] 0.9467537 1.0596883 1.7396644 1.1323109 1.6516971 0.9922995 1.0223594
#> [57] 0.9917594 0.9542419 1.0672565 1.2274498 1.0589385 0.9649404 0.9953886
#> [64] 1.7666795 1.3111620 0.9860706 1.0576620 1.2547512 1.0038281 0.9825967
#> [71] 1.0104708 1.1739417 1.1884817 1.0199412 0.9956941 0.9720389 0.9601474
#> [78] 0.9898781 1.1025485 0.9797453 1.0086780 1.0556471 1.0150204 1.0339022
#> [85] 1.1174116 1.5252177 0.9721734 0.9486663 1.0161640 0.9903872 1.2339874
#> [92] 1.0753099 0.9819882 1.0439012 1.0016272 1.0122706 1.0536213 0.9948601
#> [99] 1.4693656 1.0274264
Created on 2021-02-22 by the reprex package (v1.0.0)
for i in vector
eval(parse(text = sprintf("plot(df$%s)",i)))
This is very powerful line of code...can be very handy to plot graphs with loops.
{
eval(parse(text= sprintf('lof_%s <- lof(x, k=%s)',i,i)))
### distribution of outlier factors
eval(parse(text=sprintf('summary(lof_%s)',i)))
eval(parse(text=sprintf('hist(lof_%s, breaks=10)',i)))
### point size is proportional to LOF
eval(parse(text=sprintf("plot(x, pch = '.', main = 'LOF (k=%s)')",i)))
eval(parse(text=sprintf("points(x, cex = (lof_%s-1)*3, pch = 1, col='red')",i)))
}```
Exaplaination-
eval() - it evaluates the expression
parse() - it parse the text for evaluation
sprintf() - it creates a string(text) by concatenating with the parameter parsed.
Your code is not working because inside the loop i is being treated as character. It is not holding the values from the iterator.In case you need to understand above function then i would suggest you to just run this function and see the output sprintf('lof_%s <- lof(x, k=%s)',i,i).
I'm running a loess.smooth method after running the spline method on it.
The input given below is the data I get after running the spline method.
However I'm going wrong with the loess.smooth method. The entire first column is returning the output in float format but I need it in integer format with an increment of 1.
Any help would be much appreciated.
Thanks
**input:** spline_file
1 0.157587435
2 0.146704412
3 0.129899285
4 0.138925582
5 0.104085676
out <- loess.smooth(spline_file$x, spline_file$y, span = 1, degree = 1,
family = c("gaussian"), length.out = seq(1, max_exp, by = 1), surface=
"interpolate", normalize = TRUE, method="linear")
**OUTPUT:**
0 0.150404703
1.020408163 0.154413716
2.040816327 0.158458172
3.06122449 0.162515428
4.081632653 0.166562839
5.102040816 0.170577762
**OUTPUT REQUIRED:**
x y
1 0.225926707
2 0.226026551
3 0.226241194
4 0.2265471
5 0.226920733
not sure if the following fully answers your question but maybe it helps. Below some code, demonstrative plot and some explanations/recommendations.
You should not use a degree of 1, your data requires a higher degree.
You should check the allowed parameters via ?loess.smooth. I think you mixed up some parameters of scatter.smooth and loess.smooth and further used some parameters that do not exist for the function (e.g. normalize - please correct me if I have overseen something).
In any case it makes sense that the output of a spline smoothing function has more data points than the original data. To be ablet to plot a smooth curve additional points are generated between your data points by the smoothing function. Check the plot generated at the end of below code. If the fit is good, is another question...
spline_file <- read.table(text = "
1 0.157587435
2 0.146704412
3 0.129899285
4 0.138925582
5 0.104085676
", stringsAsFactors = FALSE)
colnames(spline_file) <- c("x", "y")
spline_loess <- loess.smooth(spline_file$x, spline_file$y, span = 1, degree = 2,
family = c("gaussian")
,surface= "interpolate"
, statistics = "exact"
)
spline_loess
# $x
# [1] 1.000000 1.081633 1.163265 1.244898 1.326531 1.408163 1.489796
# [8] 1.571429 1.653061 1.734694 1.816327 1.897959 1.979592 2.061224
# [15] 2.142857 2.224490 2.306122 2.387755 2.469388 2.551020 2.632653
# [22] 2.714286 2.795918 2.877551 2.959184 3.040816 3.122449 3.204082
# [29] 3.285714 3.367347 3.448980 3.530612 3.612245 3.693878 3.775510
# [36] 3.857143 3.938776 4.020408 4.102041 4.183673 4.265306 4.346939
# [43] 4.428571 4.510204 4.591837 4.673469 4.755102 4.836735 4.918367
# [50] 5.000000
#
# $y
# [1] 0.1586807 0.1571512 0.1556485 0.1541759 0.1527367 0.1513344
# [7] 0.1499721 0.1486533 0.1473813 0.1461595 0.1449911 0.1438795
# [13] 0.1428280 0.1417881 0.1406496 0.1394364 0.1381783 0.1369053
# [19] 0.1356473 0.1344341 0.1332957 0.1322619 0.1313626 0.1306278
# [25] 0.1300873 0.1297791 0.1297453 0.1299324 0.1302747 0.1307066
# [31] 0.1311626 0.1315769 0.1318839 0.1320181 0.1319138 0.1315054
# [37] 0.1307273 0.1295270 0.1281453 0.1266888 0.1251504 0.1235232
# [43] 0.1218002 0.1199744 0.1180388 0.1159866 0.1138105 0.1115038
# [49] 0.1090594 0.1064704
plot(spline_file)
lines(spline_loess)
I am trying to turn a vector of length n (say, 14), and turn it into a vector of length N (say, 90). For example, my vector is
x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
and I want to turn it into a vector of length 90, by creating 90 equally "spaced" points on this vector- think of x as a function. Is there any way to do that in R?
Something like this?
> x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
> seq(min(x),max(x),length=90)
[1] 2.000000 2.426966 2.853933 3.280899 3.707865 4.134831 4.561798
[8] 4.988764 5.415730 5.842697 6.269663 6.696629 7.123596 7.550562
[15] 7.977528 8.404494 8.831461 9.258427 9.685393 10.112360 10.539326
[22] 10.966292 11.393258 11.820225 12.247191 12.674157 13.101124 13.528090
[29] 13.955056 14.382022 14.808989 15.235955 15.662921 16.089888 16.516854
[36] 16.943820 17.370787 17.797753 18.224719 18.651685 19.078652 19.505618
[43] 19.932584 20.359551 20.786517 21.213483 21.640449 22.067416 22.494382
[50] 22.921348 23.348315 23.775281 24.202247 24.629213 25.056180 25.483146
[57] 25.910112 26.337079 26.764045 27.191011 27.617978 28.044944 28.471910
[64] 28.898876 29.325843 29.752809 30.179775 30.606742 31.033708 31.460674
[71] 31.887640 32.314607 32.741573 33.168539 33.595506 34.022472 34.449438
[78] 34.876404 35.303371 35.730337 36.157303 36.584270 37.011236 37.438202
[85] 37.865169 38.292135 38.719101 39.146067 39.573034 40.000000
>
Try this:
#data
x <- c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
#expected new length
N=90
#number of numbers between 2 numbers
my.length.out=round((N-length(x))/(length(x)-1))+1
#new data
x1 <- unlist(
lapply(1:(length(x)-1), function(i)
seq(x[i],x[i+1],length.out = my.length.out)))
#plot
par(mfrow=c(2,1))
plot(x)
plot(x1)
I have data called veteran stored in R. I created a survival model and now wish to predict survival probability predictions. For example, what is the probability that a patient with 80 karno value, 10diagtime, age 65 and prior=10 and trt = 2 lives longer than 100 days?
In this case the design matrix is x = (1,0,1,0,80,10,65,10,2)
Here is my code:
library(survival)
attach(veteran)
weibull <- survreg(Surv(time,status)~celltype + karno+diagtime+age+prior+trt ,dist="w")
and here is the output:
Any idea how to predict the survival probabilities?
You can get predict.survreg to produce predicted times of survival for individual cases (to which you will pass values to newdata) with varying quantiles:
casedat <- list(celltype="smallcell", karno =80, diagtime=10, age= 65 , prior=10 , trt = 2)
predict(weibull, newdata=casedat, type="quantile", p=(1:98)/100)
[1] 1.996036 3.815924 5.585873 7.330350 9.060716 10.783617
[7] 12.503458 14.223414 15.945909 17.672884 19.405946 21.146470
[13] 22.895661 24.654597 26.424264 28.205575 29.999388 31.806521
[19] 33.627761 35.463874 37.315609 39.183706 41.068901 42.971927
[25] 44.893525 46.834438 48.795420 50.777240 52.780679 54.806537
[31] 56.855637 58.928822 61.026962 63.150956 65.301733 67.480255
[37] 69.687524 71.924578 74.192502 76.492423 78.825521 81.193029
[43] 83.596238 86.036503 88.515246 91.033959 93.594216 96.197674
[49] 98.846083 **101.541291** 104.285254 107.080043 109.927857 112.831032
[55] 115.792052 118.813566 121.898401 125.049578 128.270334 131.564138
[61] 134.934720 138.386096 141.922598 145.548909 149.270101 153.091684
[67] 157.019655 161.060555 165.221547 169.510488 173.936025 178.507710
[73] 183.236126 188.133044 193.211610 198.486566 203.974520 209.694281
[79] 215.667262 221.917991 228.474741 235.370342 242.643219 250.338740
[85] 258.511005 267.225246 276.561118 286.617303 297.518110 309.423232
[91] 322.542621 337.160149 353.673075 372.662027 395.025122 422.263020
[97] 457.180183 506.048094
#asterisks added
You can then figure out which one is greater than the specified time and it looks to be around the 50th percentile, just as one might expect from a homework question.
png(); plot(x=predict(weibull, newdata=casedat, type="quantile",
p=(1:98)/100), y=(1:98)/100 , type="l")
dev.off()