R: simulate ts object with dates - r

Wondering how to generate time series and assign dates at the same time. I am trying this
series = as.ts(arima.sim(model = list(ar = c(0.12, -.36)),
n = 1990 - 1875, sd = sqrt(4)),
start = 1875, deltat = 1)
But this does not return a ts object that counts the years from 1875. By my reckoning this should work. Any advice appreciated.

You are correct. I re-typed vs cut/paste your code and it's both the start parameter and the use of as.ts vs just ts that's the issue:
asim <- arima.sim(model = list(ar = c(0.12, -.36)),
n = 1990 - 1875, sd = sqrt(4))
series <- ts(asim, start = c(1875, 1), deltat = 1)
print(series)
Time Series:
Start = 1875
End = 1989
Frequency = 1
[1] -1.22873543 -2.87876290 -3.00367322 -0.93120214 1.76854684 0.93874091 -2.32494289
[8] 1.14892019 1.87773156 1.48735536 -0.84149973 -3.69650397 1.20710878 2.14151424
[15] -2.58376182 -2.97501726 2.77019523 4.50829433 0.35603642 -1.95517140 -1.12792253
[22] 1.64063413 2.25654663 -0.51293345 1.07829896 -1.77134896 2.38908172 4.29362478
[29] -1.55577635 1.17953083 3.39823289 1.11846543 -0.92758706 -1.24158935 -2.39831233
[36] 4.24302415 2.93797283 -0.75916084 -0.66967525 2.85022663 -0.18190842 -5.39057660
[43] 0.08454559 2.01667062 -3.17054706 -3.77788365 0.19987174 2.87106608 -0.33844973
[50] 1.20917997 -1.00509230 3.23130604 5.80269444 3.33781468 2.67050526 1.85130774
[57] -0.46065144 -2.79539368 0.29784271 -4.51945793 0.61091013 2.56372897 -4.66101520
[64] 2.43024521 0.04428268 -1.19454953 -3.10583191 4.55208114 6.00037902 -3.32996632
[71] 2.22167610 1.07499343 1.89873604 2.04067084 -3.43648828 -0.53093294 0.66225057
[78] -2.30214366 0.78945348 0.35241170 -0.68250626 1.39801271 -1.01914282 -0.33615058
[85] 0.92311887 1.66289752 -0.83158693 -0.74454853 6.53884660 1.53567335 -2.16745416
[92] -0.01540633 -1.25032821 -0.02958796 3.18116493 -2.07512219 -1.40620668 -0.78869155
[99] 2.30251140 -2.23997817 0.34824690 4.81898402 -0.38751197 -5.74540148 -0.37754295
[106] 2.59869857 -1.90175430 0.37994317 -1.27326292 -3.96302760 -2.01928982 2.57643462
[113] 2.62600151 -4.20987173 0.46388883
Since asim is already a ts class object, as.ts is pulling the tsp attribute from it vs creating it from the input parameters. Using ts creates a new tsp attribute.

Related

Calculate a rolling percent change in R

I am trying to calculate a 20-day rolling percent change in R based off of a stock's closing price. Below is a sample of the most recent 100 days of closing price data. df$Close[1] is the most recent day, df$Close[2] is the previous day, and so on.
df$Close
[1] 342.94 346.22 346.18 335.24 330.45 334.20 325.45 333.79 334.90 341.66 333.74 334.49 329.75 329.82 330.56 322.81 317.87 306.84
[19] 310.39 310.60 324.46 338.03 333.12 341.06 337.25 341.01 345.30 338.69 340.77 342.96 347.56 340.89 327.74 327.64 335.37 338.62
[37] 341.13 335.85 331.62 328.08 329.98 323.57 316.92 312.22 315.81 328.69 324.61 341.88 340.78 339.99 335.34 324.76 328.53 324.54
[55] 323.77 325.45 330.05 329.22 333.64 332.96 326.23 343.01 339.39 339.61 340.65 353.58 352.96 345.96 343.21 357.48 355.70 364.72
[73] 373.06 373.92 376.53 376.51 378.69 378.00 377.57 382.18 376.26 375.28 382.05 379.38 380.66 372.63 364.38 368.39 365.51 363.35
[91] 359.37 355.12 355.45 358.45 366.56 363.18 362.65 359.96 361.13 361.61
Previously, I had used the following code to calculate the percent change:
PercChange(df, Var = 'Close', type = 'percent', NewVar = 'OneMonthChange', slideBy = 20)
which gave me the following output:
df$OneMonthChange
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02 -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02 -1.722569e-02
[10] -3.790530e-03 -3.976292e-02 -1.877438e-02 6.132910e-03 6.653644e-03 -1.434237e-02 -4.668950e-02 -6.818515e-02 -8.637785e-02
[19] -6.401906e-02 -5.327969e-02 -1.672829e-02 4.468894e-02 5.111700e-02 9.237076e-02 6.788892e-02 3.748213e-02 6.373802e-02
[28] -9.330759e-03 -2.934445e-05 8.735551e-03 3.644063e-02 4.966745e-02 -2.404651e-03 9.551981e-03 3.582790e-02 4.046705e-02
[37] 3.357067e-02 2.013851e-02 -6.054430e-03 -1.465642e-02 1.149496e-02 -5.667473e-02 -6.620702e-02 -8.065134e-02 -7.291942e-02
[46] -7.039425e-02 -8.032072e-02 -1.179327e-02 -7.080213e-03 -4.892581e-02 -5.723925e-02 -1.095635e-01 -1.193642e-01 -1.320603e-01
[55] -1.401216e-01 -1.356139e-01 -1.284428e-01 -1.290476e-01 -1.163493e-01 -1.287875e-01 -1.329666e-01 -8.598913e-02 -1.116608e-01
[64] -1.048289e-01 -1.051069e-01 -5.112310e-02 -3.134091e-02 -6.088656e-02 -6.101064e-02 -1.615522e-02 -1.021232e-02 2.703312e-02
[73] 4.954283e-02 4.315804e-02 2.719882e-02 3.670356e-02 4.422997e-02 5.011668e-02 4.552377e-02 5.688449e-02 3.507469e-02
[82] 3.391465e-02 6.444333e-02 8.011616e-02 8.157409e-02 4.583216e-02 1.691226e-02 -1.310009e-02 -6.253229e-03 -2.445900e-02
[91] -2.817816e-02 1.119052e-02 2.662970e-02 4.914242e-02 8.787654e-02 6.454450e-02 5.280729e-02 3.546875e-02 2.567525e-02
[100] 2.392683e-02
The PercChange function has now been deprecated and I need to find a new function to replace it. Essentially, I need a function that calculates the percent change of df$Close[1:20] (This would be Close of day 1 minus close of day 20, divided by close of day 20), then rolls to [2:21] for the next row, then [3:22],[4:23], and so on.
Thanks in advance!
A tidyverse approach
library(tidyr)
library(dplyr)
df %>% mutate(OneMonthChange=(Close-lead(Close, 20))/lead(Close, 20),
OneMonthChange=replace_na(OneMonthChange,0))
Close OneMonthChange
1 342.94 5.695617e-02
2 346.22 2.422862e-02
3 346.18 3.920509e-02
4 335.24 -1.706445e-02
5 330.45 -2.016308e-02
6 334.20 -1.997009e-02
etc...
Here is a simple Base R solution:
PercChange<- function(x, slideBy){
-diff(x, slideBy)/ tail(x, -slideBy)
}
PercChange(df$Close, slideBy = 20)
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02
[5] -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02
[9] -1.722569e-02 -3.790530e-03 -3.976292e-02 -1.877438e-02
If you desire a datframe back, then modify this into:
PercChange<- function(data, Var, NewVar, slideBy){
x <- data[[Var]]
data[NewVar] <- c(-diff(x, slideBy)/ tail(x, -slideBy), numeric(slideBy))
data
}
PercChange(df, Var = 'Close', NewVar = 'OneMonthChange', slideBy = 20)
data:
df <- structure(list(Close = c(342.94, 346.22, 346.18, 335.24, 330.45,
334.2, 325.45, 333.79, 334.9, 341.66, 333.74, 334.49, 329.75,
329.82, 330.56, 322.81, 317.87, 306.84, 310.39, 310.6, 324.46,
338.03, 333.12, 341.06, 337.25, 341.01, 345.3, 338.69, 340.77,
342.96, 347.56, 340.89, 327.74, 327.64, 335.37, 338.62, 341.13,
335.85, 331.62, 328.08, 329.98, 323.57, 316.92, 312.22, 315.81,
328.69, 324.61, 341.88, 340.78, 339.99, 335.34, 324.76, 328.53,
324.54, 323.77, 325.45, 330.05, 329.22, 333.64, 332.96, 326.23,
343.01, 339.39, 339.61, 340.65, 353.58, 352.96, 345.96, 343.21,
357.48, 355.7, 364.72, 373.06, 373.92, 376.53, 376.51, 378.69,
378, 377.57, 382.18, 376.26, 375.28, 382.05, 379.38, 380.66,
372.63, 364.38, 368.39, 365.51, 363.35, 359.37, 355.12, 355.45,
358.45, 366.56, 363.18, 362.65, 359.96, 361.13, 361.61)), class = "data.frame", row.names = c(NA,
-100L))

Tuning SMOTE's K with a trafo fails: 'warning("k should be less than sample size!")'

I'm having trouble with the trafo function for SMOTE {smotefamily}'s K parameter. In particular, when the number of nearest neighbours K is greater than or equal to the sample size, an error is returned (warning("k should be less than sample size!")) and the tuning process is terminated.
The user cannot control K to be smaller than the sample size during the internal resampling process. This would have to be controlled internally so that if, for instance, trafo_K = 2 ^ K >= sample_size for some value of K, then, say, trafo_K = sample_size - 1.
I was wondering if there's a solution to this or if one is already on its way?
library("mlr3") # mlr3 base package
library("mlr3misc") # contains some helper functions
library("mlr3pipelines") # create ML pipelines
library("mlr3tuning") # tuning ML algorithms
library("mlr3learners") # additional ML algorithms
library("mlr3viz") # autoplot for benchmarks
library("paradox") # hyperparameter space
library("OpenML") # to obtain data sets
library("smotefamily") # SMOTE algorithm for imbalance correction
# get list of curated binary classification data sets (see https://arxiv.org/abs/1708.03731v2)
ds = listOMLDataSets(
number.of.classes = 2,
number.of.features = c(1, 100),
number.of.instances = c(5000, 10000)
)
# select imbalanced data sets (without categorical features as SMOTE cannot handle them)
ds = subset(ds, minority.class.size / number.of.instances < 0.2 &
number.of.symbolic.features == 1)
ds
d = getOMLDataSet(980)
d
# make sure target is a factor and create mlr3 tasks
data = as.data.frame(d)
data[[d$target.features]] = as.factor(data[[d$target.features]])
task = TaskClassif$new(
id = d$desc$name, backend = data,
target = d$target.features)
task
# Code above copied from https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/
class_counts <- table(task$truth())
majority_to_minority_ratio <- class_counts[class_counts == max(class_counts)] /
class_counts[class_counts == min(class_counts)]
# Pipe operator for SMOTE
po_smote <- po("smote", dup_size = round(majority_to_minority_ratio))
# Random Forest learner
rf <- lrn("classif.ranger", predict_type = "prob")
# Pipeline of Random Forest learner with SMOTE
graph <- po_smote %>>%
po('learner', rf, id = 'rf')
graph$plot()
# Graph learner
rf_smote <- GraphLearner$new(graph, predict_type = 'prob')
rf_smote$predict_type <- 'prob'
# Parameter set in data table format
ps_table <- as.data.table(rf_smote$param_set)
View(ps_table[, 1:4])
# Define parameter search space for the SMOTE parameters
param_set <- ps_table$id %>%
lapply(
function(x) {
if (grepl('smote.', x)) {
if (grepl('.dup_size', x)) {
ParamInt$new(x, lower = 1, upper = round(majority_to_minority_ratio))
} else if (grepl('.K', x)) {
ParamInt$new(x, lower = 1, upper = round(majority_to_minority_ratio))
}
}
}
)
param_set <- Filter(Negate(is.null), param_set)
param_set <- ParamSet$new(param_set)
# Apply transformation function on SMOTE's K (= The number of nearest neighbors used for sampling new values. See SMOTE().)
param_set$trafo <- function(x, param_set) {
index <- which(grepl('.K', names(x)))
if (sum(index) != 0){
x[[index]] <- round(3 ^ x[[index]]) # Intentionally define a trafo that won't work
}
x
}
# Define and instantiate resampling strategy to be applied within pipeline
cv <- rsmp("cv", folds = 2)
cv$instantiate(task)
# Set up tuning instance
instance <- TuningInstance$new(
task = task,
learner = rf_smote,
resampling = cv,
measures = msr("classif.bbrier"),
param_set,
terminator = term("evals", n_evals = 3),
store_models = TRUE)
tuner <- TunerRandomSearch$new()
# Tune pipe learner to find optimal SMOTE parameter values
tuner$optimize(instance)
And here's what happens
INFO [11:00:14.904] Benchmark with 2 resampling iterations
INFO [11:00:14.919] Applying learner 'smote.rf' on task 'optdigits' (iter 2/2)
Error in get.knnx(data, query, k, algorithm) : ANN: ERROR------->
In addition: Warning message:
In get.knnx(data, query, k, algorithm) : k should be less than sample size!
Session info
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] smotefamily_1.3.1 OpenML_1.10 mlr3viz_0.1.1.9002
[4] mlr3tuning_0.1.2-9000 mlr3pipelines_0.1.2.9000 mlr3misc_0.2.0
[7] mlr3learners_0.2.0 mlr3filters_0.2.0.9000 mlr3_0.2.0-9000
[10] paradox_0.2.0 yardstick_0.0.5 rsample_0.0.5
[13] recipes_0.1.9 parsnip_0.0.5 infer_0.5.1
[16] dials_0.0.4 scales_1.1.0 broom_0.5.4
[19] tidymodels_0.0.3 reshape2_1.4.3 janitor_1.2.1
[22] data.table_1.12.8 forcats_0.4.0 stringr_1.4.0
[25] dplyr_0.8.4 purrr_0.3.3 readr_1.3.1
[28] tidyr_1.0.2 tibble_3.0.1 ggplot2_3.3.0
[31] tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] utf8_1.1.4 tidyselect_1.0.0 lme4_1.1-21
[4] htmlwidgets_1.5.1 grid_3.6.2 ranger_0.12.1
[7] pROC_1.16.1 munsell_0.5.0 codetools_0.2-16
[10] bbotk_0.1 DT_0.12 future_1.17.0
[13] miniUI_0.1.1.1 withr_2.2.0 colorspace_1.4-1
[16] knitr_1.28 uuid_0.1-4 rstudioapi_0.10
[19] stats4_3.6.2 bayesplot_1.7.1 listenv_0.8.0
[22] rstan_2.19.2 lgr_0.3.4 DiceDesign_1.8-1
[25] vctrs_0.2.4 generics_0.0.2 ipred_0.9-9
[28] xfun_0.12 R6_2.4.1 markdown_1.1
[31] mlr3measures_0.1.3-9000 rstanarm_2.19.2 lhs_1.0.1
[34] assertthat_0.2.1 promises_1.1.0 nnet_7.3-12
[37] gtable_0.3.0 globals_0.12.5 processx_3.4.1
[40] timeDate_3043.102 rlang_0.4.5 workflows_0.1.1
[43] BBmisc_1.11 splines_3.6.2 checkmate_2.0.0
[46] inline_0.3.15 yaml_2.2.1 modelr_0.1.5
[49] tidytext_0.2.2 threejs_0.3.3 crosstalk_1.0.0
[52] backports_1.1.6 httpuv_1.5.2 rsconnect_0.8.16
[55] tokenizers_0.2.1 tools_3.6.2 lava_1.6.6
[58] ellipsis_0.3.0 ggridges_0.5.2 Rcpp_1.0.4.6
[61] plyr_1.8.5 base64enc_0.1-3 visNetwork_2.0.9
[64] ps_1.3.0 prettyunits_1.1.1 rpart_4.1-15
[67] zoo_1.8-7 haven_2.2.0 fs_1.3.1
[70] furrr_0.1.0 magrittr_1.5 colourpicker_1.0
[73] reprex_0.3.0 GPfit_1.0-8 SnowballC_0.6.0
[76] packrat_0.5.0 matrixStats_0.55.0 tidyposterior_0.0.2
[79] hms_0.5.3 shinyjs_1.1 mime_0.8
[82] xtable_1.8-4 XML_3.99-0.3 tidypredict_0.4.3
[85] shinystan_2.5.0 readxl_1.3.1 gridExtra_2.3
[88] rstantools_2.0.0 compiler_3.6.2 crayon_1.3.4
[91] minqa_1.2.4 StanHeaders_2.21.0-1 htmltools_0.4.0
[94] later_1.0.0 lubridate_1.7.4 DBI_1.1.0
[97] dbplyr_1.4.2 MASS_7.3-51.4 boot_1.3-23
[100] Matrix_1.2-18 cli_2.0.1 parallel_3.6.2
[103] gower_0.2.1 igraph_1.2.4.2 pkgconfig_2.0.3
[106] xml2_1.2.2 foreach_1.4.7 dygraphs_1.1.1.6
[109] prodlim_2019.11.13 farff_1.1 rvest_0.3.5
[112] snakecase_0.11.0 janeaustenr_0.1.5 callr_3.4.1
[115] digest_0.6.25 cellranger_1.1.0 curl_4.3
[118] shiny_1.4.0 gtools_3.8.1 nloptr_1.2.1
[121] lifecycle_0.2.0 nlme_3.1-142 jsonlite_1.6.1
[124] fansi_0.4.1 pillar_1.4.3 lattice_0.20-38
[127] loo_2.2.0 fastmap_1.0.1 httr_1.4.1
[130] pkgbuild_1.0.6 survival_3.1-8 glue_1.4.0
[133] xts_0.12-0 FNN_1.1.3 shinythemes_1.1.2
[136] iterators_1.0.12 class_7.3-15 stringi_1.4.4
[139] memoise_1.1.0 future.apply_1.5.0
Many thanks.
I've found a workaround.
As pointed out earlier, the problem is that SMOTE {smotefamily}'s K cannot be greater than or equal to the sample size.
I dag into the process and disovered that SMOTE {smotefamily} uses knearest {smotefamily}, which uses knnx.index {FNN}, which in turn uses get.knn {FNN},
which is what returns the error warning("k should be less than sample size!") that terminates the tuning process in mlr3.
Now, within SMOTE {smotefamily}, the three arguments for knearest {smotefamily} are P_set, P_set and K. From an mlr3 resampling perspective,
data frame P_set is a subset of the cross-validation fold of the training data, filtered to only contain the records of the minority class. The 'sample size' that
the error is referring to is the number of rows of P_set.
Thus, it becomes more likely that K >= nrow(P_set) as K increases via a trafo such as some_integer ^ K (e.g. 2 ^ K).
We need to ensure that K will never be greater than or equal to P_set.
Here's my proposed solution:
Define a variable cv_folds before defining the CV resampling strategy with rsmp().
Define the CV resampling strategy where folds = cv_folds in rsmp(), before defining the trafo.
Instantiate the CV. Now, the dataset is split into training and test/valitation data in each fold.
Find the minimum sample size of the minority class among all training data folds and set that as the threshold for K:
smote_k_thresh <- 1:cv_folds %>%
lapply(
function(x) {
index <- cv$train_set(x)
aux <- as.data.frame(task$data())[index, task$target_names]
aux <- min(table(aux))
}
) %>%
bind_cols %>%
min %>%
unique
Now define the trafo as follows:
param_set$trafo <- function(x, param_set) {
index <- which(grepl('.K', names(x)))
if (sum(index) != 0){
aux <- round(2 ^ x[[index]])
if (aux < smote_k_thresh) {
x[[index]] <- aux
} else {
x[[index]] <- sample(smote_k_thresh - 1, 1)
}
}
x
}
In other words, when the trafoed K remains smaller than the sample size, keep it. Otherwise, set its value to be any number between 1 and smote_k_thresh - 1.
Implementation
Original code slightly modified to accommodate proposed tweaks:
library("mlr3learners") # additional ML algorithms
library("mlr3viz") # autoplot for benchmarks
library("paradox") # hyperparameter space
library("OpenML") # to obtain data sets
library("smotefamily") # SMOTE algorithm for imbalance correction
# get list of curated binary classification data sets (see https://arxiv.org/abs/1708.03731v2)
ds = listOMLDataSets(
number.of.classes = 2,
number.of.features = c(1, 100),
number.of.instances = c(5000, 10000)
)
# select imbalanced data sets (without categorical features as SMOTE cannot handle them)
ds = subset(ds, minority.class.size / number.of.instances < 0.2 &
number.of.symbolic.features == 1)
ds
d = getOMLDataSet(980)
d
# make sure target is a factor and create mlr3 tasks
data = as.data.frame(d)
data[[d$target.features]] = as.factor(data[[d$target.features]])
task = TaskClassif$new(
id = d$desc$name, backend = data,
target = d$target.features)
task
# Code above copied from https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/
class_counts <- table(task$truth())
majority_to_minority_ratio <- class_counts[class_counts == max(class_counts)] /
class_counts[class_counts == min(class_counts)]
# Pipe operator for SMOTE
po_smote <- po("smote", dup_size = round(majority_to_minority_ratio))
# Define and instantiate resampling strategy to be applied within pipeline
# Do that BEFORE defining the trafo
cv_folds <- 2
cv <- rsmp("cv", folds = cv_folds)
cv$instantiate(task)
# Calculate max possible value for k-nearest neighbours
smote_k_thresh <- 1:cv_folds %>%
lapply(
function(x) {
index <- cv$train_set(x)
aux <- as.data.frame(task$data())[index, task$target_names]
aux <- min(table(aux))
}
) %>%
bind_cols %>%
min %>%
unique
# Random Forest learner
rf <- lrn("classif.ranger", predict_type = "prob")
# Pipeline of Random Forest learner with SMOTE
graph <- po_smote %>>%
po('learner', rf, id = 'rf')
graph$plot()
# Graph learner
rf_smote <- GraphLearner$new(graph, predict_type = 'prob')
rf_smote$predict_type <- 'prob'
# Parameter set in data table format
ps_table <- as.data.table(rf_smote$param_set)
View(ps_table[, 1:4])
# Define parameter search space for the SMOTE parameters
param_set <- ps_table$id %>%
lapply(
function(x) {
if (grepl('smote.', x)) {
if (grepl('.dup_size', x)) {
ParamInt$new(x, lower = 1, upper = round(majority_to_minority_ratio))
} else if (grepl('.K', x)) {
ParamInt$new(x, lower = 1, upper = round(majority_to_minority_ratio))
}
}
}
)
param_set <- Filter(Negate(is.null), param_set)
param_set <- ParamSet$new(param_set)
# Apply transformation function on SMOTE's K while ensuring it never equals or exceeds the sample size
param_set$trafo <- function(x, param_set) {
index <- which(grepl('.K', names(x)))
if (sum(index) != 0){
aux <- round(5 ^ x[[index]]) # Try a large value here for the sake of the example
if (aux < smote_k_thresh) {
x[[index]] <- aux
} else {
x[[index]] <- sample(smote_k_thresh - 1, 1)
}
}
x
}
# Set up tuning instance
instance <- TuningInstance$new(
task = task,
learner = rf_smote,
resampling = cv,
measures = msr("classif.bbrier"),
param_set,
terminator = term("evals", n_evals = 10),
store_models = TRUE)
tuner <- TunerRandomSearch$new()
# Tune pipe learner to find optimal SMOTE parameter values
tuner$optimize(instance)
# Here are the original K values
instance$archive$data
# And here are their transformations
instance$archive$data$opt_x

R + ggplot2: plot time series with linear regression with changepoint

I have a time series data which has 2 variables (x,y) and I am currently using R base plot to generate a plot like this.
the red lines is a linear model fitted between 2 points.
The data looks likes this.
X
[1] 559.2 559.8 560.6 561.1 561.2 561.8
[7] 562.4 563.0 563.4 563.5 563.5 563.5
[13] 563.5 563.5 563.5 563.5 563.8 564.5
[19] 565.3 565.9 566.4 566.5 566.7 567.4
[25] 567.6 568.5 569.3 570.3 571.6 572.2
[31] 572.5 573.6 574.1 575.5 576.9 578.1
[37] 579.0 580.1 580.9 581.4 581.8 583.1
[43] 583.8 584.4 585.2 586.0 586.1 586.2
[49] 586.8 587.4
**y**
[1] 115.4375 115.3008 115.2069 115.3306 115.3900 115.1189 114.8619
[8] 114.7992 114.7117 114.4722 114.7031 115.1358 115.4811 115.4500
[15] 115.6347 115.8286 115.8361 115.7986 115.9169 116.1225 116.1803
[22] 116.3794 116.2872 116.2517 116.3411 116.4167 116.5108 116.2900
[29] 116.3456 116.3658 116.1547 116.2042 116.1517 116.2083 116.3642
[36] 116.4347 116.5428 116.5119 116.5925 116.3969 116.2614 116.3494
[43] 116.1242 116.1469 116.0872 116.1000 116.2319 116.1225 116.1069
[50] 116.1364
I am calculating the change point manually from X.
Is this kind of plot possible in ggplot2?i.e. using ggplot2 to loop through change points and fit linear model?
Any help would be appreciated. Thanks.
#create some fake data
segment1 = 100:1 + runif(100)*10
df1 = data.frame(value = segment1, time = 1:100, type="segment1")
segment2 = 75:1 + runif(75)*10
df2 = data.frame(value = segment2, time = 101:175, type="segment2")
segment3 = 50:1 + runif(50)*10
df3 = data.frame(value = segment3, time = 176:225, type="segment3")
data.complete = rbind(df1,df2,df3)
#create the plot
require(ggplot2)
g = ggplot(data.complete,aes(x=time,y=value))
g = g + geom_line()
g = g + geom_smooth(method = "lm",aes(group=type))
g
To have the underlying line graph connected the group aesthetic must be called in the smoother.

Error while using sparse model matrix

Hi I am programming here in R, and I want to use the xgboost function for predicting a dummy variable.
That's the code:
library(xgboost)
library(Matrix)
mydata<-read.csv(file.choose(),header = TRUE,sep=",")
names(mydata)
[1] "Factor_Check" "Cor_Check" "Cor_Check4"
[4] "Cor_Check2" "n_tokens_title" "n_tokens_content"
[7] "n_unique_tokens" "n_non_stop_words" "n_non_stop_unique_tokens"
[10] "num_hrefs" "num_self_hrefs" "num_imgs"
[13] "num_videos" "average_token_length" "num_keywords"
[16] "data_channel_is_lifestyle" "data_channel_is_entertainment" "data_channel_is_bus"
[19] "data_channel_is_socmed" "data_channel_is_tech" "data_channel_is_world"
[22] "kw_min_min" "kw_max_min" "kw_avg_min"
[25] "kw_min_max" "kw_max_max" "kw_avg_max"
[28] "kw_min_avg" "kw_max_avg" "kw_avg_avg"
[31] "self_reference_min_shares" "self_reference_max_shares" "self_reference_avg_sharess"
[34] "weekday_is_monday" "weekday_is_tuesday" "weekday_is_wednesday"
[37] "weekday_is_thursday" "weekday_is_friday" "weekday_is_saturday"
[40] "weekday_is_sunday" "is_weekend" "LDA_00"
[43] "LDA_01" "LDA_02" "LDA_03"
[46] "LDA_04" "global_subjectivity" "global_sentiment_polarity"
[49] "global_rate_positive_words" "global_rate_negative_words" "rate_positive_words"
[52] "rate_negative_words" "avg_positive_polarity" "min_positive_polarity"
[55] "max_positive_polarity" "avg_negative_polarity" "min_negative_polarity"
[58] "max_negative_polarity" "title_subjectivity" "title_sentiment_polarity"
[61] "abs_title_subjectivity" "abs_title_sentiment_polarity" "TargetVarCont"
[64] "TargetVar1" "TargetVar2"
Factor Check is Factor the rest are numeric
output.var <- "TargetVar2"
vars.to.exclude <- c("Factor_Check","Cor_Check","Cor_Check4","Cor_Check2","TargetVar1", "TargetVarCont")
Building the model based on 80% of the data
train<-mydata[(1:round(nrow(mydata)*(0.8))),]
train<-train[,!(names(train) %in% vars.to.exclude)]
Train<- Matrix::sparse.model.matrix(~.-1 , data=train)
xgb <- xgboost(data = Train[,!(names(Train) %in% output.var)], label = Train[,output.var],max.depth = 2, eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
Train
Error: shinyjs: could not find the Shiny session object. This usually
happens when a shinyjs function is called from a context that wasn't
set up by a Shiny session.
Does anyone know why I am getting this error?

Dividing components of a vector into several data points in R

I am trying to turn a vector of length n (say, 14), and turn it into a vector of length N (say, 90). For example, my vector is
x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
and I want to turn it into a vector of length 90, by creating 90 equally "spaced" points on this vector- think of x as a function. Is there any way to do that in R?
Something like this?
> x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
> seq(min(x),max(x),length=90)
[1] 2.000000 2.426966 2.853933 3.280899 3.707865 4.134831 4.561798
[8] 4.988764 5.415730 5.842697 6.269663 6.696629 7.123596 7.550562
[15] 7.977528 8.404494 8.831461 9.258427 9.685393 10.112360 10.539326
[22] 10.966292 11.393258 11.820225 12.247191 12.674157 13.101124 13.528090
[29] 13.955056 14.382022 14.808989 15.235955 15.662921 16.089888 16.516854
[36] 16.943820 17.370787 17.797753 18.224719 18.651685 19.078652 19.505618
[43] 19.932584 20.359551 20.786517 21.213483 21.640449 22.067416 22.494382
[50] 22.921348 23.348315 23.775281 24.202247 24.629213 25.056180 25.483146
[57] 25.910112 26.337079 26.764045 27.191011 27.617978 28.044944 28.471910
[64] 28.898876 29.325843 29.752809 30.179775 30.606742 31.033708 31.460674
[71] 31.887640 32.314607 32.741573 33.168539 33.595506 34.022472 34.449438
[78] 34.876404 35.303371 35.730337 36.157303 36.584270 37.011236 37.438202
[85] 37.865169 38.292135 38.719101 39.146067 39.573034 40.000000
>
Try this:
#data
x <- c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
#expected new length
N=90
#number of numbers between 2 numbers
my.length.out=round((N-length(x))/(length(x)-1))+1
#new data
x1 <- unlist(
lapply(1:(length(x)-1), function(i)
seq(x[i],x[i+1],length.out = my.length.out)))
#plot
par(mfrow=c(2,1))
plot(x)
plot(x1)

Resources