Related
Participants in an experiment took a test that has a rule that says "once a participant has gotten 6 items wrong in a window of 8 items, you stop running the test". However, some experimenters kept testing past this point. I now need to find a way in which I can automatically see where the test should have been stopped, and change all values following the end to 0 (= item wrong). I am not even sure if this is something that can be done in R.
To be clear, I would like to go row by row (which are the participants) and once there are six 0s in a given window of 8 columns (items), I would need all values after the sixth 0 to be 0 too.
While the reproducible data is below, here is a visualization of what I would need, where the blue cells are the ones that should change to 0:
Pre-changes
Post-changes
Reproducible data:
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Any help is highly appreciated!
Here is a solution that involves some pivoting, rollsum, cumsum, if_else logic, then pivoting back. Let me know if it works.
library(tidyverse)
library(zoo)
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) %>%
as_tibble() %>%
pivot_longer(-1) %>%
group_by(Participant_ID) %>%
mutate(running_total = zoo::rollsumr(value==0, k = 8, fill = 0),
should_terminate = cumsum(running_total >= 6),
value = if_else(should_terminate > 0, 0, value)) %>%
ungroup() %>%
select(Participant_ID, name, value) %>%
pivot_wider(names_from = name, values_from = value)
I created a binary matrix and I wanna plot 1's as black square.
How can I write it without using any package?
For example, my matrix is:
m <- matrix(c(0,1,1,0,0,1,0,1,1),nrow=3, ncol=3)
Do you want this?
m <- matrix(c(0,1,1,0,0,1,0,1,1), nrow=3, ncol=3)
image(m, main = "My binary matrix plot", col = c("white", "black"))
If image doesn't suffice, we could write a generalized function using mapply like this one.
chessplot <- function(m, col=1, border=NA) {
stopifnot(dim(m)[1] == dim(m)[2]) ## allows only square matrices
n <- nrow(m)
plot(n, n, type='n', xlim=c(0, n), ylim=c(0, n))
mapply(\(i, j, m) {
rect(-1 + i, n - j, 0 + i, n - j + 1, col=m, border=border)
}, seq(n), rep(seq(n), each=n), t(m)) |> invisible()
}
Gives:
chessplot(m3)
chessplot(m4)
chessplot(m8)
Data:
m3 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1), .Dim = c(3L, 3L))
m4 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0), .Dim = c(4L,
4L))
m8 <- structure(c(0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1,
0, 1, 0, 1, 0), .Dim = c(8L, 8L))
I would like to subset my data frame based on the index column; I would like to keep those cases whose index is saved in myvar (eg. 110, 111). I don't understand why I receive 0 observations when running this code:
newdata <- df[ which(df$index=="myvars"), ]
Sample data:
df<-structure(list(index = c(111, 110, 101, 111), et = c(1, 1, 1,
1), d1_t2 = c(0, 1, 1, 1), d1_t3 = c(0, 0, 1, 1), d1_t4 = c(0,
1, 0, 1), d2_t1 = c(0, 0, 1, 1), d2_t2 = c(0, 1, 1, 1), d2_t3 = c(0,
0, 0, 1), d2_t4 = c(1, 0, 1, 1), d3_t1 = c(1, 0, 1, 1), d3_t2 = c(1,
1, 0, 1), d3_t3 = c(1, 0, 1, 1), d3_t4 = c(1, 1, 0, 1), d4_t1 = c(0,
0, 1, 1), d4_t2 = c(1, 1, 0, 1), d4_t3 = c(0, 0, 1, 1), d4_t4 = c(1,
0, 1, 1), d5_t1 = c(1, 0, 0, 1), d5_t2 = c(0, 1, 1, 1), d5_t3 = c(1,
0, 1, 1), d5_t4 = c(0, 0, 1, 1), d6_t1 = c(1, 0, 0, 1), d6_t2 = c(0,
0, 1, 1), d6_t3 = c(1, 0, 1, 1), d6_t4 = c(1, 0, 1, 1), d7_t1 = c(1,
1, 1, 1), d7_t2 = c(1, 1, 1, 1), d7_t3 = c(1, 0, 1, 1), d7_t4 = c(1,
0, 1, 1)), row.names = c(NA, 4L), class = "data.frame")
Code:
myvars<-c("110", "111")
try
myvars<-c(110, 111) # <-- !! no quotes !!
df[ which(df$index %in% myvars ), ] #also, no quotes round myvars
There are several basic problems with what you are trying to do.
You are not using the variable 'myvars' -- you are using a string with the value "myvars". None of your rows has the index "myvars".
You are using == which is good for one value (e.g. values==4), but myvars has multiple values in it. Instead, you could use df$index %in% myvars
This does work, but you have integer indices, and are accessing them with strings. This is unnecessary, and could lead to problems in other places.
You may be confused because of your very large and complex example data. You only need one column to test -- not twenty.
I have data such as this
data <- data.table(
"School" = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0),
"Grade" = c(0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0),
"CAT" = c(1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1),
"FOX" = c(1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0),
"DOG" = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1)
)
and wish to achieve a new data table such as this:
dataWANT <- data.frame(
"VARIABLE" = c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"SCHOOL" = c(1, 1, 0, 1, 1, 0, 1, 1, 0),
"GRADE" = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
"MEAN" = c(NA)
)
dataWANT takes the mean for CAT and FOX and DOG by SCHOOL, GRADE, and SCHOOL X GRADE when they are equal to 1.
I know how to do this one at a time but that is not good for doing this with a big data.
data[, CAT1 := mean(CAT), by = list(SCHOOL)]
data[, FOX1 := mean(FOX), by = list(GRADE)]
data[, DOG1 := mean(DOG), by = list(SCHOOL, GRADE)]
data$CAT2 = unique(data[SCHOOL == 1, CAT1])
data$FOX2 = unique(data[GRADE == 1, FOX1])
data$DOG2 = unique(data[SCHOOL == 1 & GRADE == 1, DOG1])
Please only use this:
data <- data.table(
"SCHOOL" = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0),
"GRADE" = c(0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0),
"CAT" = c(1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1),
"FOX" = c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0),
"DOG" = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1)
)
data[, CAT1 := mean(CAT), by = list(SCHOOL)]
data[, CAT2 := mean(CAT), by = list(GRADE)]
data[, CAT3 := mean(CAT), by = list(SCHOOL, GRADE)]
data[, FOX1 := mean(FOX), by = list(SCHOOL)]
data[, FOX2 := mean(FOX), by = list(GRADE)]
data[, FOX3 := mean(FOX), by = list(SCHOOL, GRADE)]
data[, DOG1 := mean(DOG), by = list(SCHOOL)]
data[, DOG2 := mean(DOG), by = list(GRADE)]
data[, DOG3 := mean(DOG), by = list(SCHOOL, GRADE)]
dataWANT <- data.frame(
"VARIABLE" = c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"TYPE" = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
"MEAN" = c(0.48, 0.44, 0.428, 0.6, 0.611, 0.6428, 0.52, 0.61, 0.6428)
)
where:
TYPE equals to 1 when MEAN in estimated by SCHOOL,
TYPE equals to 2 when MEAN is estimated by GRADE,
TYPE equals to 3 when MEAN is estimated by SCHOOL and GRADE
We could use rbindlist after creating a list by taking the MEAN after melting the dataset (as in the other post)
library(data.table)
cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('SCHOOL', 'GRADE', c('SCHOOL', 'GRADE'))
lst1 <- lapply(list_cols, function(x)
data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])
rbindlist(lapply(lst1, function(x) {
nm1 <- setdiff(names(x), c('variable', 'MEAN'))
x[Reduce(`&`, lapply(mget(nm1), as.logical)),
.(VARIABLE = variable, MEAN)]}), idcol = 'TYPE')[order(VARIABLE)]
# TYPE VARIABLE MEAN
#1: 1 CAT 0.4800000
#2: 2 CAT 0.4444444
#3: 3 CAT 0.4285714
#4: 1 FOX 0.6000000
#5: 2 FOX 0.5555556
#6: 3 FOX 0.6428571
#7: 1 DOG 0.5200000
#8: 2 DOG 0.6111111
#9: 3 DOG 0.6428571
Do you mean to get something like this?
library(data.table)
melt(data, measure.vars = c('CAT', 'FOX', 'DOG'))[,
.(MEAN = mean(value, na.rm = TRUE)), .(School, Grade, variable)]
To group it by different columns, we can do :
cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('School', 'Grade', c('School', 'Grade'))
lapply(list_cols, function(x)
data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])
You could subset and calculate your means first using lapply(.SD,...) then melt that into your output:
melt(data[School != 0 | Grade != 0, lapply(.SD, mean), by = .(School, Grade)], id.vars = c("School", "Grade"))
Adding this after also adds the TYPE variable
...][, TYPE := School + (2*Grade)]
Putting it all together and tidying it up too, it matches your desired output
dataWANT <- melt(data[School != 0 | Grade != 0, lapply(.SD, mean), by = .(School, Grade)], id.vars = c("School", "Grade"))[, TYPE := School + (2*Grade)][order(variable, TYPE), .("VARIABLE" = variable, TYPE, "MEAN" = value)]
From other threads I've seen people provide solutions that are specific to exact problems, but I don't understand the underlying reason of what's going wrong.
I do...
modTest = glm( trainLabels[,1] ~ A + B + C +
D + E + F + G +
H + I, family=binomial(link='logit') )
The above is 20 labels, and 9 vectors each with 20 values.
I then try to predict on 10 unseen examples. This is 10 rows, 9 features, same order.
preds = predict( modTest, testFeatures )
I get the error...
Warning message:
'newdata' had 10 rows but variables found have 20 rows
Edit : Simplified, removed long feature names, etc.
> names(trainFeatures)
[1] "Neg" "Pos" "Num" "UN" "UNA" "UNUA" "UP" "UPA" "UPUA"
names(testFeatures)
[1] "Neg" "Pos" "Num" "UN" "UNA" "UNUA" "UP" "UPA" "UPUA"
Edit: Dputs...
To use the dputs, what I did was...
modTest = glm( trainLabels[,1] ~ as.matrix(trainFeatures) )
preds = predict( modTest, testFeatures )
Warning message:
'newdata' had 10 rows but variables found have 20 rows
Not sure why I'm getting that warning still.
dput(trainLabels)
structure(list(Neg = c(1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
0, 0, 0, 1, 1, 1, 0), Pos = c(1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
1, 1, 0, 0, 0, 1, 1, 1, 0), Num = c(1, 1, 0, 0, 0, 0, 1, 0, 0,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UN = c(1, 1, 0, 0, 0, 0, 1,
0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UNA = c(1, 1, 0, 0, 0,
0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UNUA = c(1, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UP = c(1,
1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UPA = c(1,
1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0), UPUA = c(1,
1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0)), .Names = c("Neg",
"Pos", "Num", "UN", "UNA", "UNUA", "UP", "UPA", "UPUA"), row.names = c(NA,
-20L), class = "data.frame")
dput(trainFeatures)
structure(list(Neg = c(39106, 44664, 114130, 26526, 22122, 19175,
29438, 17741, 17589, 20666, 66024, 168336, 86283, 74826, 88998,
75756, 16041, 17087, 15235, 16659), Pos = c(16129, 21064, 57730,
10314, 18105, 16837, 19300, 16873, 13681, 18414, 27148, 120497,
60031, 49016, 59250, 36264, 15786, 16315, 14556, 16057), Num = c(82994,
121367, 306842, 55458, 69148, 63167, 85891, 58674, 55874, 67505,
152475, 427106, 221043, 190043, 223744, 177388, 51657, 54883,
48378, 54115), UN = c(32343, 35433, 74835, 22271, 17686, 15498,
22416, 14238, 14078, 16800, 54636, 121211, 68079, 59913, 70884,
61408, 13221, 14114, 12647, 13487), UNA = c(95.1499874, 95.0987263,
95.3942596, 95.5444865, 113.1263844, 112.3827424, 111.2684513,
113.2184128, 112.4336258, 114.1739588, 113.5086472, 111.6715378,
112.2842917, 111.9490612, 113.6465561, 111.5254103, 112.2179148,
111.2933853, 112.9056117, 113.1511475), UNUA = c(-94.4280737,
-94.5019854, -94.9246672, -95.0379578, -113.2247115, -112.3497485,
-111.1631387, -113.2051289, -112.1822898, -114.0431466, -113.7435412,
-111.6226818, -112.4077795, -111.9886653, -113.8072166, -111.6138577,
-113.0855995, -112.3075275, -114.2628431, -114.1088453), UP = c(10384,
13015, 24470, 6891, 13445, 12852, 13008, 13093, 9878, 14272,
14938, 77058, 40595, 32518, 39889, 21424, 8322, 8451, 7440, 8071
), UPA = c(58.6289931, 57.73430079, 61.3480343, 57.8297594, 62.1749994,
65.1140073, 62.619361, 63.6791219, 63.412582, 65.1856906, 45.18365794,
71.32918265, 56.04488913, 58.13008276, 53.16603128, 50.36242011,
64.6742956, 64.0982314, 63.4422878, 64.24099034), UPUA = c(88.9216885,
88.3012858, 88.1996008, 88.9910129, 91.0232669, 89.4524702, 91.9122816,
89.8549338, 90.6487273, 88.2063941, 99.9573821, 109.9128868,
103.7989926, 104.0274764, 103.4209936, 101.5065677, 85.8110039,
87.0786241, 86.1020646, 86.8835026)), .Names = c("Neg", "Pos",
"Num", "UN", "UNA", "UNUA", "UP", "UPA", "UPUA"), row.names = c(NA,
-20L), class = "data.frame")
dput(testLabels)
structure(list(Neg = c(0, 1, 1, 1, 0, 1, 1, 1, 1, 1), Pos = c(0,
1, 1, 1, 0, 1, 1, 1, 1, 1), Num = c(0, 1, 1, 1, 0, 1, 1, 1, 1,
1), UN = c(0, 1, 1, 1, 0, 1, 1, 1, 1, 1), UNA = c(0, 1, 1, 1,
0, 1, 1, 1, 1, 1), UNUA = c(0, 1, 1, 1, 0, 1, 1, 1, 1, 1), UP = c(0,
1, 1, 1, 0, 1, 1, 1, 1, 1), UPA = c(0, 1, 1, 1, 0, 1, 1, 1, 1,
1), UPUA = c(0, 1, 1, 1, 0, 1, 1, 1, 1, 1)), .Names = c("Neg",
"Pos", "Num", "UN", "UNA", "UNUA", "UP", "UPA", "UPUA"), row.names = c(NA,
-10L), class = "data.frame")
> dput(testFeatures)
structure(list(Neg = c(51404, 32447, 24642, 95979, 15743, 90843,
13813, 11496, 12871, 13546), Pos = c(23350, 13525, 19941, 49984,
10867, 64404, 13324, 11302, 12918, 13118), Num = c(121342, 68160,
77219, 248890, 49259, 232645, 43707, 35674, 40734, 42979), UN = c(40766,
27363, 19590, 71772, 12615, 71496, 11529, 9739, 10810, 11346),
UNA = c(95.2486872, 93.4642772, 111.3853297, 112.6770471,
110.0845355, 113.6696598, 111.8409793, 116.0476022, 120.3481302,
111.9496978), UNUA = c(-94.6150698, -92.5605373, -111.1994432,
-112.4947319, -109.7130777, -113.8083912, -112.5678322, -116.5407619,
-121.4756386, -113.4991191), UP = c(14285, 9043, 14862, 31626,
7491, 43903, 7021, 5559, 6149, 6789), UPA = c(61.25585053,
62.6231081, 64.191128, 64.6397131, 63.4911744, 58.4792454,
63.5063289, 60.5667637, 60.3857056, 64.1569975), UPUA = c(88.4605419,
88.2790682, 90.0217465, 88.8441004, 91.0222662, 105.0494229,
85.8914139, 86.7685668, 84.8304901, 86.9786109)), .Names = c("Neg",
"Pos", "Num", "UN", "UNA", "UNUA", "UP", "UPA", "UPUA"), row.names = c(NA,
-10L), class = "data.frame")
So, I ran the code with all the data you provided and get the results just fine. Here is the model fit:
modTest = glm(trainLabels[,1] ~ Neg + Pos + Num +
UN + UNA + UNUA + UP +
UPA + UPUA, family=binomial(link='logit'),
data = trainFeatues)
Here are the predicted values on test data:
predict( modTest, testFeatures)
1 2 3 4 5 6 7 8
4.6711576 -1.3572345 -2.0639104 18.7625539 -7.6961149 0.4317324 -0.8983256 -8.2052158
9 10
-1.5968013 10.8357174
NOTE: an alternative specification can be like this:
modTest = glm(trainLabels[,1] ~ trainFeatues$Neg + trainFeatues$Pos +
trainFeatues$Num + trainFeatues$UN + trainFeatues$UNA +
trainFeatues$UNUA + trainFeatues$UP + trainFeatues$UPA +
trainFeatues$UPUA, family=binomial(link='logit'))
However, the fit model is as follows:
modTest$coefficients
(Intercept) trainFeatues$Neg trainFeatues$Pos trainFeatues$Num trainFeatues$UN
4.027803e+01 8.874801e-04 -3.000123e-03 1.277138e-04 -4.521793e-04
trainFeatues$UNA trainFeatues$UNUA trainFeatues$UP trainFeatues$UPA trainFeatues$UPUA
-1.519463e+01 -1.480503e+01 2.930261e-03 4.741432e-01 -3.690940e-01
When you feed the train data to predict, this is causing problems since the features fit above are not matching the new data being fed to predict. Leading to:
predict( modTest, testFeatures)
1 2 3 4 5 6 7
0.21651890 3.23450117 -2.16298672 -0.06949967 -0.91026504 -0.91484739 -1.69209826
8 9 10 11 12 13 14
-2.45603982 -6.35855600 -1.84871546 -0.25027815 2.72625440 -0.50422297 -1.76701963
15 16 17 18 19 20
0.05033351 0.65101666 0.27680835 1.79176029 6.79618311 -0.16186455
Warning message:
'newdata' had 10 rows but variables found have 20 rows