Function predict() for object of segmented.lme() - r

I have previously run mixed model analyses using glmer() in package lme4. I applied functions dredge() and get.models() in package MuMIn to quantify the top.models. I then used a model.avg() approach in package MuMIn to create a fitted object for function predict(). Finally, I created a newdata object called newdat, i.e. a new object for each predictor.
I then used newdatfinal <- predict(avModX, newdata = newdat, se.fit=TRUE, re.form=NA), where avModX presents the fitted model derived from subset.top.models <- c(top.models[[1]],top.models[[1]]) and avModX <- model.avg(subset.top.models). This all works fine.
I now need to use predict() on a segmented.lme() object. The code for function segmented.lme() can be found here: https://www.researchgate.net/publication/292986444_segmented_mixed_models_in_R_code_and_data. A reference working paper is available here: https://www.researchgate.net/publication/292629179_Segmented_mixed_models_with_random_changepoints_in_R. This function allows for detection of differences in slope and provides changepoint estimates, i.e. a test for breakpoint(s) in the data.
I first used the function
global.model.lme <- lme(response ~ predictor1*predictor2*predictor3*
predictor4 + covariate1 + covariate2 + covariate3,
data = mydat,
random = list(block = pdDiag(~ 1 + predictor1),
transect = pdDiag(~ 1 + predictor1)),
na.action="na.fail")
and followed by function
global.model.seg <- segmented.lme(global.model.lme,
Z = predictor1,
random = list(block = pdDiag(~ 1 + predictor1 + U + G0),
transect = pdDiag(~ 1 + predictor1 + U + G0)),
psi.link = "identity")
Z = the 'segmented' covariate having a segmented relationship with the response, U = slope difference, G0 = the formula of random effects for changepoints (changepoint estimate)
I would now like to use the segmented.lme() object in function predict(), i.e. something like newdatfinal <- predict(global.model.seg, newdata = newdat, se.fit=TRUE, re.form=NA)
I currently get the error message:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "segmented.lme"
This is a reproducible subset of the original data:
structure(list(block = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8"), class = "factor"), transect = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("B1L", "B1M", "B1S", "B2L", "B2M", "B2S", "B3L", "B3M", "B3S", "B4L", "B4M", "B4S", "B5L", "B5M", "B5S", "B6L", "B6M", "B6S", "B7L", "B7M", "B7S", "B8L", "B8M", "B8S"), class = "factor"), predictor1 = c(28.63734661, 31.70995133, 27.40407982, 25.48842992, 21.81094637, 24.02032756), predictor2 = c(5.002945364, 6.85567854, 0, 22.470422, 0, 0), predictor3 = c(3.72, 3.55, 3.66, 3.65, 3.53, 3.66), predictor4 = c(504.8, 547.6, 499.7, 497.8, 473.8, 467.5), covariate1 = c(391L, 394L, 351L, 336L, 304L, 335L), covariate2 = c(0.96671086, 2.81939707, 0.899512367, 1.024730094, 1.641161861, 1.419433714), covariate3 = c(0.787505444, 0.641693911, 0.115804751, -0.041146951, 1.983567486, -0.451039179), response = c(0.81257636, 0.622662116, 0.490330786, 0.709929461, -0.156398286, -1.185175095)), .Names = c("block", "transect", "predictor1", "predictor2", "predictor3", "predictor4", "covariate1", "covariate2", "covariate3", "response"), row.names = c(NA, 6L), class = "data.frame")
and a reproducible subset of the newdat data:
structure(list(predictor1 = c(-0.441935, -0.433467318435754,0.424999636871508, -0.416531955307263, -0.408064273743017, -0.399596592178771), covariate1 = c(0L, 0L, 0L, 0L, 0L, 0L), covariate2 = c(0L, 0L, 0L, 0L, 0L, 0L), covariate3 = c(0L, 0L, 0L, 0L, 0L, 0L),
predictor2 = c(0L, 0L, 0L, 0L, 0L, 0L), predictor3 = c(0L,
0L, 0L, 0L, 0L, 0L), predictor4 = c(0L, 0L, 0L, 0L, 0L, 0L
)), .Names = c("predictor1", "covariate1", "covariate2", "covariate3", "predictor2", "predictor3", "predictor4"), row.names = c(NA, 6L), class = "data.frame")
Many thanks in advance for any advice.

segmented.lme is at preliminary stage, so currently there is no predict method function. However, since the algorithm relies on working linear model, you could use the last one (at convergence) to make predictions,
predict(global.model.seg[[2]], ..)
Results should be carefully checked.

Related

Error in rmse(., truth = variable, estimate = .pred) : unused arguments (truth = , estimate = .pred) in R Tidymodels (yardstick)

I am Fitting a regression tree model, using this Tidymodels tutorial.
# Create a specification
tree_spec <- decision_tree() %>% set_engine("rpart")
# Create an engine
reg_tree_spec <- tree_spec %>% set_mode("regression")
# Fit the model
reg_tree_fit <- fit(reg_tree_spec, loan_amount ~ ., kenya_data_df_train)
# Print
reg_tree_fit
parsnip model object
Fit time: 2.5s
n= 56868
node), split, n, deviance, yval
* denotes terminal node
root 56868 32009190000 455.2222
lender_count< 728.5 56859 13948640000 448.2417
lender_count< 81.5 56613 6692397000 428.2886
lender_count< 20.5 47772 2345794000 342.4569
lender_count< 12.5 35164 1238679000 282.1622 *
lender_count>=12.5 12608 622737900 510.6202 *
lender_count>=20.5 8841 2092969000 892.0767
lender_count< 38.5 7455 740153600 787.4748 *
lender_count>=38.5 1386 832502400 1454.7080 *
lender_count>=81.5 246 2046660000 5040.1420
lender_count< 229 224 938017600 4421.3170 *
lender_count>=229 22 149470700 11340.9100 *
lender_count>=728.5 9 554222200 44555.5600 *
But I receive a weird error when I use test data.
# Evaluate on test data
augment(reg_tree_fit, new_data = kenya_data_df_test) %>%
rmse(truth = loan_amount, estimate = .pred)
Error in rmse(., truth = loan_amount, estimate = .pred) :
unused arguments (truth = loan_amount, estimate = .pred)
My dput() example for train data:
structure(list(loan_amount = 200, term_in_months = 14, lender_count = 8,
sector_Agriculture = 1L, sector_Arts = 0L, sector_Clothing = 0L,
sector_Construction = 0L, sector_Education = 0L, sector_Entertainment = 0L,
sector_Food = 0L, sector_Health = 0L, sector_Housing = 0L,
sector_Manufacturing = 0L, sector_Personal_Use = 0L, sector_Retail = 0L,
sector_Services = 0L, sector_Transportation = 0L, sector_Wholesale = 0L,
repayment_interval_bullet = 0L, repayment_interval_irregular = 0L,
repayment_interval_monthly = 1L, repayment_interval_weekly = 0L,
gender_both = 0L, gender_female = 1L, gender_male = 0L, gender_NA = 0L), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .internal.selfref = <pointer:
0x000001d8b6f91ef0>)
dput() for test data.
structure(list(loan_amount = 250, term_in_months = 14, lender_count =
1,
sector_Agriculture = 0L, sector_Arts = 0L, sector_Clothing = 0L,
sector_Construction = 0L, sector_Education = 0L, sector_Entertainment
= 0L,
sector_Food = 0L, sector_Health = 0L, sector_Housing = 0L,
sector_Manufacturing = 0L, sector_Personal_Use = 0L, sector_Retail =
0L,
sector_Services = 1L, sector_Transportation = 0L, sector_Wholesale =
0L,
repayment_interval_bullet = 0L, repayment_interval_irregular = 1L,
repayment_interval_monthly = 0L, repayment_interval_weekly = 0L,
gender_both = 0L, gender_female = 1L, gender_male = 0L, gender_NA =
0L), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .internal.selfref =
<pointer: 0x000001d8b6f91ef0>)
Fixed with akrun's answer above - yardstick::rmse() gave the necessary result.

Build Decision Tree Classification

I have two datasets , partb_data1 and partb_data2 . Given sample of customers of a bank that reflects the characteristics of the clients and whether the bank continues to work with them or not (Churn). Exited: Churn (1 if he has left the bank and 0 if he continues to work with it). Im using partb_data1 as train set and partb_data2 as test set.
Here is my data :
> dput(head(partb_data1))
structure(list(RowNumber = 1:6, CustomerId = c(15634602L, 15647311L,
15619304L, 15701354L, 15737888L, 15574012L), Surname = c("Hargrave",
"Hill", "Onio", "Boni", "Mitchell", "Chu"), CreditScore = c(619L,
608L, 502L, 699L, 850L, 645L), Geography = c("France", "Spain",
"France", "France", "Spain", "Spain"), Gender = c("Female", "Female",
"Female", "Female", "Female", "Male"), Age = c(42L, 41L, 42L,
39L, 43L, 44L), Tenure = c(2L, 1L, 8L, 1L, 2L, 8L), Balance = c(0,
83807.86, 159660.8, 0, 125510.82, 113755.78), NumOfProducts = c(1L,
1L, 3L, 2L, 1L, 2L), HasCrCard = c(1L, 0L, 1L, 0L, 1L, 1L), IsActiveMember = c(1L,
1L, 0L, 0L, 1L, 0L), EstimatedSalary = c(101348.88, 112542.58,
113931.57, 93826.63, 79084.1, 149756.71), Exited = c(1L, 0L,
1L, 0L, 0L, 1L)), row.names = c(NA, 6L), class = "data.frame")
> dput(head(partb_data2))
structure(list(RowNumber = 8001:8006, CustomerId = c(15629002L,
15798053L, 15753895L, 15595426L, 15645815L, 15632848L), Surname = c("Hamilton",
"Nnachetam", "Blue", "Madukwe", "Mills", "Ferrari"), CreditScore = c(747L,
707L, 590L, 603L, 615L, 634L), Geography = c("Germany", "Spain",
"Spain", "Spain", "France", "France"), Gender = c("Male", "Male",
"Male", "Male", "Male", "Female"), Age = c(36L, 32L, 37L, 57L,
45L, 36L), Tenure = c(8L, 9L, 1L, 6L, 5L, 1L), Balance = c(102603.3,
0, 0, 105000.85, 0, 69518.95), NumOfProducts = c(2L, 2L, 2L,
2L, 2L, 1L), HasCrCard = c(1L, 1L, 0L, 1L, 1L, 1L), IsActiveMember = c(1L,
0L, 0L, 1L, 1L, 0L), EstimatedSalary = c(180693.61, 126475.79,
133535.99, 87412.24, 164886.64, 116238.39), Exited = c(0L, 0L,
0L, 1L, 0L, 0L)), row.names = c(NA, 6L), class = "data.frame")
I have created Classification trees in order to predict churn . Here follows the code:
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)
# Split the data into training and test set
train.data <- head(partb_data1, 500)
test.data <- tail(partb_data2, 150)
# Build the model
modelb <- rpart(Exited ~., data = train.data, method = "class")
# Visualize the decision tree with rpart.plot
rpart.plot(modelb)
# Make predictions on the test data
predicted.classes <- modelb %>%
predict(test.data, type = "class")
head(predicted.classes)
# Compute model accuracy rate on test data
mean(predicted.classes == test.data$Exited)
### Pruning the tree :
# Fit the model on the training set
modelb2 <- train(
Exited ~., data = train.data, method = "rpart",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
# Plot model accuracy vs different values of
# cp (complexity parameter)
plot(modelb2)
# Print the best tuning parameter cp that
# maximizes the model accuracy
modelb2$bestTune
# Plot the final tree model
plot(modelb2$finalModel)
# Make predictions on the test data
predicted.classes <- modelb2 %>% predict(test.data)
# Compute model accuracy rate on test data
mean(predicted.classes == test.data$Exited)
Note: I have made test set from the partb_data2.
Is the procedure i follow right? I must make any changes in order to accomplish my target which is classification trees ? Your help would be trully welcome !
EDITED !!!
Your head(partb_data1$Exited, 500) isn't a data.frame. Because of the $ you take a subset of your partb_data1 data. It's only an integer-vector, so that can't work.
class(head(partb_data1$Exited, 500))
[1] "integer"
There are always a lot of procedure options.
But you're right with sepreate your Data into a trainings, and a testdataset. Its also possible to use a crossvalidation instead. You're using a crossvalidation on your trainingsset, thats normally not necessary, but also possible.
I think using your complete Data for the cv should also work, but what you're doing isn't wrong.

Error: NA in probability vector when using the estR0() function from the R0 package

Trying to find the reproduction number given some data using the R0 package, however I'm having trouble at the very end when using the estimation function. Here is what I do:
## Get the incidence data
test <- c(`2020-01-22` = 0L, `2020-01-23` = 0L, `2020-01-24` = 0L, `2020-01-25` = 0L,
`2020-01-26` = 0L, `2020-01-27` = 0L, `2020-01-28` = 0L, `2020-01-29` = 0L,
`2020-01-30` = 0L, `2020-01-31` = 0L, `2020-02-01` = 0L, `2020-02-02` = 0L,
`2020-02-03` = 0L, `2020-02-04` = 0L, `2020-02-05` = 0L, `2020-02-06` = 0L,
`2020-02-07` = 0L, `2020-02-08` = 0L, `2020-02-09` = 0L, `2020-02-10` = 0L,
`2020-02-11` = 0L, `2020-02-12` = 0L, `2020-02-13` = 0L, `2020-02-14` = 0L,
`2020-02-15` = 0L, `2020-02-16` = 0L, `2020-02-17` = 0L, `2020-02-18` = 0L,
`2020-02-19` = 0L, `2020-02-20` = 0L, `2020-02-21` = 0L, `2020-02-22` = 0L,
`2020-02-23` = 0L, `2020-02-24` = 0L, `2020-02-25` = 0L, `2020-02-26` = 0L,
`2020-02-27` = 0L, `2020-02-28` = 1L, `2020-02-29` = 3L, `2020-03-01` = 1L,
`2020-03-02` = 0L, `2020-03-03` = 0L, `2020-03-04` = 0L, `2020-03-05` = 0L,
`2020-03-06` = 1L, `2020-03-07` = 0L, `2020-03-08` = 1L, `2020-03-09` = 0L,
`2020-03-10` = 0L, `2020-03-11` = 1L, `2020-03-12` = 4L, `2020-03-13` = 0L,
`2020-03-14` = 14L, `2020-03-15` = 15L, `2020-03-16` = 12L, `2020-03-17` = 29L,
`2020-03-18` = 11L, `2020-03-19` = 25L, `2020-03-20` = 46L, `2020-03-21` = 39L,
`2020-03-22` = 48L, `2020-03-23` = 65L, `2020-03-24` = 51L, `2020-03-25` = 38L,
`2020-03-26` = 70L, `2020-03-27` = 110L, `2020-03-28` = 132L,
`2020-03-29` = 131L, `2020-03-30` = 145L, `2020-03-31` = 101L
)
## Make a time generation distribution (these parameters were found from the disease I'm studying)
d <- generation.time("gamma", c(4.243319, 2.488787))
## Calculate R0
estR0 <- estimate.R(
epid = test,
GT = d,
begin = 45,
end = 70,
methods = c("EG", "ML", "TD", "AR", "SB"),
pop.size = 126200000,
nsim = 1000
)
This produces an error and several warnings:
Waiting for profiling to be done...
Error in rmultinom(nsim, epid$incid[s] - import[s], p[1:s, s]) :
NA in probability vector
In addition: Warning messages:
1: In est.R0.TD(epid = c(`2020-01-22` = 0L, `2020-01-23` = 0L, `2020-01-24` = 0L, :
Simulations may take several minutes.
2: In est.R0.TD(epid = c(`2020-01-22` = 0L, `2020-01-23` = 0L, `2020-01-24` = 0L, :
Gap in epidemic curve is longer than the generation interval. Consider using a different GT distribution (maybe with "truncate= 37 " (length of longest gap)).
3: In est.R0.TD(epid = c(`2020-01-22` = 0L, `2020-01-23` = 0L, `2020-01-24` = 0L, :
Using initial incidence as initial number of cases.
Switching around the start/end intervals, population and simulation parameters didn't help. What is going wrong during the rmultinom() step?
According to the help page of est.R0.TD, the "begin" and "end" arguments are not actually used.
Try this:
test2 <- test[45:70]
estimate.R(
epid = test2,
GT = d,
methods = c("EG", "ML", "TD", "AR", "SB"),
pop.size = 126200000,
nsim = 1000
)
estR0
Reproduction number estimate using Exponential Growth method.
R : 2.179363[ 2.074176 , 2.291742 ]
Reproduction number estimate using Maximum Likelihood method.
R : 1.945082[ 1.778461 , 2.121745 ]
Reproduction number estimate using Attack Rate method.
R : 1.000004[ 1.000004 , 1.000005 ]
Reproduction number estimate using Time-Dependent method.
3.294674 0 4.463411 0 0 5.772949 5.059529 0 2.858751 2.361108 ...
Reproduction number estimate using Sequential Bayesian method.

Returning full list from nested loop

The below code works correctly, but it only returns the last value of the 16 variables in list methodlist. As you can see below, I would like to return 6 variables (z in seq(1:6)) for each variable in methodlist. So I want to return a 6x16 matrix, not a 1x16 matrix as present. What does the code look like to append a list of 6 rows for each variable in methodlist.
rm(list = ls()) #clears the workspace
library(caret)
library(scales)
library(foreach)
library(iterators)
library(parallel)
library(doParallel)
registerDoParallel(cores = 16)
# read data
proj_path = "P:/R"
Macro <- read.csv("P:/Earnest/Old/R/Input.csv")
#select trainControl
ctrl = trainControl(method = "cv", number = 5, repeats = 5, verboseIter = TRUE, savePredictions = TRUE)
methodlist <- c("BstLm", "glmnet", "penalized", "bridge", "bayesglm", "spikeslab", "leapForward",
"glmboost", "blassoAveraged", "blasso", "gaussprPoly", "earth", "cubist", "pcr",
"leapSeq", "leapBackward")
fit <- list()
output <- list()
forecast <- list()
for(i in seq_along(methodlist)){
for(z in seq(1:6)) {
x <- Macro[1:(14-z),3:21]
x <- as.matrix(x)
y <- Macro[1:(14-z),2:2]
y <- as.matrix(y)
t <- Macro[(15-z):(15-z),3:21]
t <- as.matrix(t)
fit[[i]] <- caret::train(y = as.vector(y),
x = x,
method = methodlist[i],
trControl = ctrl,
preProc = c("center", "scale"))
output[i] <- predict(fit[[i]], t)
} }
dput of data:
dput(Macro)
structure(list(qtrs = structure(1:14, .Label = c("16_Q2", "16_Q3",
"16_Q4", "17_Q1", "17_Q2", "17_Q3", "17_Q4", "18_Q1", "18_Q2",
"18_Q3", "18_Q4", "19_Q1", "19_Q2", "QQ_New"), class = "factor"),
y = c(17427, 17613, 21626, 16177, 16154, 16423, 20661, 15995,
16410, 16647, 22734, 16556, 17552, 17550.6), c1372 = c(52.38107607,
51.71910264, 66.04439265, 48.7435049, 52.84235574, 52.45234009,
66.60212761, 48.00370834, 53.27819725, 53.4036627, 73.41349958,
51.24441724, 58.80001938, 58.26812139), c5244 = c(27.18948635,
26.44530248, 34.00832812, 25.34750922, 27.82252627, 26.87902356,
34.15057986, 25.60616679, 29.11586519, 27.66748031, 39.66005562,
28.78471195, 34.15138012, 34.05864161), c5640 = c(40.4431936,
39.28350352, 51.04846142, 37.25188584, 41.15752543, 41.08080649,
51.73736768, 36.88113619, 42.25997532, 42.21743585, 57.81514276,
39.73877542, 47.44410618, 46.40224715), c6164 = c(24.94812191,
25.3, 30.71137161, 23.17995059, 26.86871377, 28.7449476,
35.42080406, 25.30866569, 29.36705061, 30.39925678, 40.20550413,
28.31441758, 34.99314256, 34.99500917), b1372 = c(58.28673781,
57.43780252, 72.94086917, 60.56258739, 61.15138265, 61.63152251,
74.78137432, 61.45308406, 63.49305917, 63.48869267, 84.41035843,
65.42555003, 69.93057227, 69.86501992), b5244 = c(72.67600678,
73.17343986, 94.23074183, 84.03045989, 84.72343232, 85.24495216,
105.0116727, 87.49923648, 89.37263925, 88.98222187, 120.6690755,
96.10955339, 97.36718121, 101.7633261), b5640 = c(105.412433,
101.027769, 125.8418584, 108.8459417, 102.8725409, 105.3201174,
126.6188705, 106.9247911, 106.545478, 107.8489509, 140.2524354,
111.4552219, 114.9787081, 117.2442333), b6164 = c(27.9437266,
28.79918951, 36.23272036, 31.36799287, 32.61711727, 33.0039884,
39.57137571, 33.8573912, 35.51335532, 36.05804281, 46.99019762,
37.78925823, 40.49508975, 41.03555772), v1372 = c(0.894,
0.908194185, 1.05126864, 0.852367402, 0.897574838, 0.925303822,
1.094709584, 0.850266106, 0.925, 0.945399812, 1.22551744,
0.926201463, 1.036983254, 1.04), v5244 = c(0.490619506, 0.492131527,
0.579979842, 0.463819664, 0.496379414, 0.501965507, 0.60256246,
0.478, 0.532945695, 0.531012629, 0.710329807, 0.549488271,
0.630092163, 0.635811317), v5640 = c(0.622618968, 0.627586484,
0.743424141, 0.588809668, 0.628539746, 0.651820208, 0.781763961,
0.595489092, 0.66419102, 0.676916428, 0.886929371, 0.651849221,
0.752827781, 0.753805427), v6164 = c(0.418380085, 0.442455804,
0.496217777, 0.415503322, 0.461154591, 0.516586013, 0.603938678,
0.4807425, 0.5378356, 0.569872101, 0.718373269, 0.561622352,
0.664532192, 0.681), bv1372 = c(1.125049204, 1.163584621,
1.341847365, 1.161431345, 1.165026048, 1.20311151, 1.399386298,
1.188293134, 1.223648195, 1.25, 1.599388376, 1.280592762,
1.361357962, 1.383316822), bv5244 = c(1.555199027, 1.615436514,
1.923769618, 1.763617138, 1.786232543, 1.839815369, 2.171862846,
1.850221722, 1.906545506, 1.938117792, 2.516019557, 2.03585812,
2.068596964, 2.195524499), bv5640 = c(1.91698232, 1.955269185,
2.288843372, 2.00814652, 1.927080282, 2.001341412, 2.320656843,
1.986254618, 1.997858329, 2.054757524, 2.599910718, 2.081192693,
2.161265445, 2.23468007), bv6164 = c(0.50964551, 0.545488286,
0.636741752, 0.574599203, 0.59206156, 0.615815337, 0.711999807,
0.633746358, 0.661835227, 0.688012691, 0.866096559, 0.725703651,
0.773148669, 0.796223323), s1 = c(1L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L), s2 = c(0L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L), s3 = c(0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), date = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "8/4/2018", class = "factor")), class = "data.frame", row.names = c(NA,
-14L))
Your current loop structure will make it so that output[i] captures the final output of each inner loop. All 6 runs of the inner z loop are writing into the same element of output, so the first 5 runs are successively overwritten. Instead, output should become a list of lists (fit should be changed in the same way to match). I can't run your code myself, but as far as I can tell there are only a few small changes needed, which I've marked with comments:
for(i in seq_along(methodlist)){
output[[i]] <- list() # treat output as a list-of-lists
fit[[i]] <- list() # treat fit as a list-of-lists
for(z in seq(1:6)) {
x <- Macro[1:(14-z),3:21]
x <- as.matrix(x)
y <- Macro[1:(14-z),2:2]
y <- as.matrix(y)
t <- Macro[(15-z):(15-z),3:21]
t <- as.matrix(t)
# treat 'fit' as a list-of-lists
fit[[i]][[z]] <- caret::train(y = as.vector(y),
x = x,
method = methodlist[i],
trControl = ctrl,
preProc = c("center", "scale"))
output[[i]][[z]] <- predict(fit[[i]][[z]], t) # assign the loop outputs to the nested list
} }

dplyr Arrange verb won't work on character class, only works on factors

library(tidyverse)
df <- structure(list(PN = c("41681", "16588", "34881",
"36917", "33116", "68447"), `2017-10` = c(0L,
0L, 0L, 0L, 0L, 0L), `2017-11` = c(0L, 1L, 0L, 0L, 0L, 0L), `2017-12` = c(0L,
0L, 0L, 0L, 1L, 0L), `2018-01` = c(0L, 0L, 1L, 1L, 0L, 0L), `2018-02` = c(1L,
0L, 0L, 0L, 0L, 0L), `2018-03` = c(0L, 0L, 0L, 0L, 0L, 0L), `2018-04` = c(0L,
0L, 0L, 0L, 0L, 1L), Status = c("OK", "NOK", "OK", "NOK", "OK",
"OK")), .Names = c("PN", "2017-10", "2017-11", "2017-12",
"2018-01", "2018-02", "2018-03", "2018-04", "Status"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
The Status column in the df data frame above was generated with the following apply function:
mutate(
Status =
ifelse(
(apply(.[, 2:7], 1, sum) > 0) &
(.[, 8] > 0),
"NOK",
"OK"
)
)
If I %>% pipe an arrange(Status) directly after the code chunk above I get the following error.
Error in arrange_impl(.data, dots) : Argument 1 is of unsupported
type matrix
If I run all my code without the arrange(Status) the code executes fine and turns into the reproducible chunk I setup at the beginning of this post - via dput(df)
The Status column is of type character, but if I factor it prior to running the arrange(Status) command the error shown above goes away.
I've never had an issue with the arrange() verb before on character classes. Why am I forced to factor my Status column to make the error go away? Is it something to do with my use of the apply command? That's the only new thing I've done in my 'programming' this time around.
Earlier in my analysis I had to replace NA with 0 and this is what I did:
mutate(n = parse_integer(str_replace_na(n, replacement = 0)))
Apparently I ended up creating a column of character matrices, maybe with this apply command, maybe with the stringR command above, not sure which:
mutate(
Status =
ifelse(
(apply(.[, 2:7], 1, sum) > 0) &
(.[, 8] > 0),
"NOK",
"OK"
)
)
As pointed out by #joran and #akrun the following fixed the issue:
mutate(Status = as.vector(Status))

Resources