Residual errors from the klaR. Stepwise LDA - r

I am trying to use the klaR package to do a stepwise analysis on spectral data. I have 400 (spectra readings) variables and about 40 factors (plants). Running the following code I get the subsequent error:
gw<- greedy.wilks(plant ~ ., data = logged, niveau = 0.1)
Error in summary.manova(e2, test = "Wilks") : residuals have rank 53 < 54
I thought I was getting this error because the varaible data is highly correlated, hence I tried log-transforming. I ran the same code but this time including qr = FALSE. Got the same error. I hve searched for solutions and reasons for this error and most point towards high correlations or differences in variable and factor numbers. I would like to keep all the variables, beacause I am using this procedure for feature selection, therefore deleting the highly correlated data isn't really an option, unless I have to.
Is there a valid way around this problem?
Data looks like this:
`
dput(str(logged))
'data.frame': 1020 obs. of 402 variables:
$ plant: Factor w/ 5 levels "ADPA","ALAL",..: 2 2 2 2 2 2 2 2 2 2 ...
$ R400 : num 0.147 0.144 0.145 0.141 0.129 ...
$ R405 : num 0.147 0.144 0.145 0.143 0.132 ...
$ R410 : num 0.142 0.138 0.141 0.139 0.129 ...
$ R415 : num 0.143 0.141 0.144 0.141 0.133 ...
$ R420 : num 0.142 0.141 0.143 0.142 0.133 ...
$ R425 : num 0.144 0.145 0.147 0.145 0.137 ...
$ R430 : num 0.147 0.147 0.149 0.147 0.14 ...
$ R435 : num 0.148 0.147 0.15 0.148 0.142 ...
$ R440 : num 0.15 0.149 0.152 0.15 0.143 ...
$ R445 : num 0.152 0.152 0.155 0.153 0.147 ...
$ R450 : num 0.155 0.154 0.156 0.154 0.149 ...
$ R455 : num 0.155 0.155 0.156 0.155 0.15 ...
$ R460 : num 0.156 0.155 0.157 0.155 0.151 ...
$ R465 : num 0.155 0.155 0.156 0.154 0.151 ...
$ R470 : num 0.156 0.155 0.157 0.155 0.152 ...
$ R475 : num 0.155 0.155 0.156 0.155 0.152 ...
$ R480 : num 0.155 0.155 0.157 0.155 0.152 ...
$ R485 : num 0.157 0.156 0.157 0.156 0.153 ...
$ R490 : num 0.159 0.158 0.159 0.158 0.156 ...
$ R495 : num 0.162 0.162 0.162 0.161 0.16 ...
$ R500 : num 0.17 0.169 0.169 0.168 0.168 ...
$ R505 : num 0.182 0.18 0.179 0.179 0.182 ...
$ R510 : num 0.203 0.201 0.197 0.199 0.206 ...
$ R515 : num 0.237 0.233 0.225 0.23 0.245 ...
$ R520 : num 0.281 0.274 0.261 0.27 0.296 ...
$ R525 : num 0.325 0.314 0.297 0.311 0.35 ...
$ R530 : num 0.36 0.345 0.324 0.343 0.394 ...
$ R535 : num 0.383 0.365 0.34 0.363 0.425 ...
$ R540 : num 0.396 0.376 0.349 0.375 0.445 ...
$ R545 : num 0.407 0.384 0.356 0.384 0.461 ...
$ R550 : num 0.412 0.389 0.359 0.389 0.47 ...
$ R555 : num 0.404 0.381 0.351 0.381 0.464 ...
$ R560 : num 0.383 0.361 0.333 0.362 0.443 ...
$ R565 : num 0.355 0.334 0.308 0.336 0.414 ...
$ R570 : num 0.323 0.304 0.281 0.306 0.378 ...
$ R575 : num 0.295 0.279 0.259 0.282 0.347 ...
$ R580 : num 0.275 0.261 0.244 0.265 0.324 ...
$ R585 : num 0.262 0.248 0.233 0.252 0.308 ...
$ R590 : num 0.253 0.24 0.226 0.244 0.299 ...
$ R595 : num 0.248 0.235 0.222 0.24 0.293 ...
$ R600 : num 0.242 0.23 0.217 0.234 0.285 ...
$ R605 : num 0.232 0.221 0.21 0.225 0.272 ...
$ R610 : num 0.219 0.209 0.2 0.214 0.255 ...
$ R615 : num 0.207 0.199 0.191 0.204 0.239 ...
$ R620 : num 0.199 0.192 0.186 0.197 0.229 ...
$ R625 : num 0.196 0.189 0.183 0.194 0.225 ...
$ R630 : num 0.194 0.187 0.182 0.192 0.223 ...
$ R635 : num 0.191 0.184 0.179 0.189 0.218 ...
$ R640 : num 0.184 0.177 0.173 0.181 0.208 ...
$ R645 : num 0.175 0.169 0.167 0.173 0.195 ...
$ R650 : num 0.167 0.163 0.162 0.166 0.183 ...
$ R655 : num 0.161 0.158 0.158 0.161 0.173 ...
$ R660 : num 0.153 0.152 0.154 0.155 0.161 ...
$ R665 : num 0.148 0.148 0.152 0.151 0.153 ...
$ R670 : num 0.147 0.148 0.152 0.15 0.151 ...
$ R675 : num 0.149 0.15 0.156 0.152 0.152 ...
$ R680 : num 0.154 0.156 0.162 0.157 0.158 ...
$ R685 : num 0.165 0.166 0.172 0.168 0.174 ...
$ R690 : num 0.196 0.193 0.195 0.199 0.221 ...
$ R695 : num 0.277 0.267 0.258 0.277 0.329 ...
$ R700 : num 0.42 0.401 0.378 0.415 0.501 ...
$ R705 : num 0.6 0.576 0.539 0.59 0.702 ...
$ R710 : num 0.791 0.764 0.719 0.778 0.901 ...
$ R715 : num 0.984 0.956 0.909 0.968 1.088 ...
$ R720 : num 1.17 1.14 1.1 1.15 1.26 ...
$ R725 : num 1.35 1.32 1.28 1.33 1.4 ...
$ R730 : num 1.49 1.47 1.44 1.47 1.52 ...
$ R735 : num 1.61 1.59 1.58 1.59 1.61 ...
$ R740 : num 1.7 1.68 1.68 1.68 1.68 ...
$ R745 : num 1.77 1.74 1.75 1.75 1.72 ...
$ R750 : num 1.81 1.79 1.8 1.79 1.75 ...
$ R755 : num 1.84 1.82 1.83 1.82 1.77 ...
$ R760 : num 1.85 1.83 1.85 1.84 1.78 ...
$ R765 : num 1.86 1.84 1.86 1.85 1.79 ...
$ R770 : num 1.87 1.85 1.86 1.85 1.79 ...
$ R775 : num 1.87 1.85 1.87 1.85 1.8 ...
$ R780 : num 1.87 1.85 1.87 1.86 1.8 ...
$ R785 : num 1.87 1.86 1.87 1.86 1.8 ...
$ R790 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R795 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R800 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R805 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R810 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R815 : num 1.88 1.86 1.87 1.86 1.8 ...
$ R820 : num 1.88 1.86 1.87 1.86 1.81 ...
$ R825 : num 1.88 1.86 1.88 1.86 1.81 ...
$ R830 : num 1.88 1.86 1.88 1.86 1.81 ...
$ R835 : num 1.88 1.86 1.88 1.86 1.81 ...
$ R840 : num 1.88 1.86 1.88 1.87 1.81 ...
$ R845 : num 1.88 1.86 1.88 1.87 1.81 ...
$ R850 : num 1.88 1.87 1.88 1.87 1.81 ...
$ R855 : num 1.88 1.87 1.88 1.87 1.81 ...
$ R860 : num 1.89 1.87 1.88 1.87 1.81 ...
$ R865 : num 1.89 1.87 1.88 1.87 1.81 ...
$ R870 : num 1.89 1.87 1.88 1.87 1.81 ...
$ R875 : num 1.89 1.87 1.88 1.87 1.81 ...
$ R880 : num 1.89 1.87 1.89 1.87 1.82 ...
$ R885 : num 1.89 1.87 1.89 1.87 1.82 ...
[list output truncated]`

Related

Turn raster files (4-dimensional) into structure that allows to conduct a random forest classification

My goal is to conduct a random forest classification for agricultural landuse forms (crop classification). I have several ground truth points for all classes. Furthermore, I have 37 raster files (.tif) each having the same 12 bands and same extent, with one file representing one date in the time series. The time series is NOT constant.
The following shows the files, the dates and band names plus and first file read with terra:
> files <- list.files("C:/temp/final2",full.names = T,pattern = ".tif$",recursive = T)
> files[1:3]
[1] "C:/temp/final2/20190322T100031_20190322T100524_T33UXP.tif" "C:/temp/final2/20190324T095029_20190324T095522_T33UXP.tif"
[3] "C:/temp/final2/20190329T095031_20190329T095315_T33UXP.tif"
> dates <- as.Date(substr(basename(files),1,8),"%Y%m%d")
> band_names <- c("B02","B03","B04","B05","B08","B11","B12","NDVI","NDWI","SAVI")
> rast(files[1])
class : SpatRaster
dimensions : 386, 695, 12 (nrow, ncol, nlyr)
resolution : 10, 10 (x, y)
extent : 634500, 641450, 5342460, 5346320 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
source : 20190322T100031_20190322T100524_T33UXP.tif
names : B2, B3, B4, B5, B6, B7, ...
I want to extract the value for every date and band. This should result in a dataframe with oberserved variables and the respective class for each point (see below). With this dataframe I want to train a random forest model in order to predict the crop classes for each raster (resulting in a single raster layer with classes as values).
The following structure (copied from https://gdalcubes.github.io/source/tutorials/vignettes/gc03_ML_training_data.html) is what I need as observed values, which serve as the training data for the rf model.
## FID time B2 ... more bands ... and class of respective FID
## 1 16 2018-01-01 13.33
## 2 17 2018-01-01 13.63
## 3 18 2018-01-01 13.33
## 4 19 2018-01-01 12.15
## 5 20 2018-01-01 14.73
## 6 21 2018-01-01 15.91
## 7 16 2018-01-09 12.23
## 8 17 2018-01-09 12.15
## 9 18 2018-01-09 12.07
## 10 19 2018-01-09 10.19
## 11 20 2018-01-09 9.83
I (1) read all the rasters into list called 'cube' and
(2) combined all the spatRasters in the list into one spatRaster.
> cube <- c()
> for (file in files){
+ ras <- rast(file)
+ cube<-c(cube,ras)
+ }
> names(cube) <- dates
> cubef <- rast(cube)
> cubef
class : SpatRaster
dimensions : 386, 695, 444 (nrow, ncol, nlyr)
resolution : 10, 10 (x, y)
extent : 634500, 641450, 5342460, 5346320 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
sources : 20190322T100031_20190322T100524_T33UXP.tif (12 layers)
20190324T095029_20190324T095522_T33UXP.tif (12 layers)
20190329T095031_20190329T095315_T33UXP.tif (12 layers)
... and 34 more source(s)
names : 2019-03-22_1, 2019-03-22_2, 2019-03-22_3, 2019-03-22_4, 2019-03-22_5, 2019-03-22_6, ...
When I extract the values of all the layers for sample points, I get the follwing result.
> s_points <- st_read(connex,query="SELECT * FROM s_points WHERE NOT ST_IsEmpty(geom);")
> str(s_points)
Classes ‘sf’ and 'data.frame': 286 obs. of 3 variables:
$ s_point_id: int 1 1 2 2 4 4 6 6 7 7 ...
$ kf_klasse : chr "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" ...
$ geom :sfc_POINT of length 286; first list element: 'XY' num 637052 5345218
- attr(*, "sf_column")= chr "geom"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "s_point_id" "kf_klasse"
> s_points_coords <- st_coordinates(s_points)
> e <- terra::extract(cubef, s_points)
> str(e)
'data.frame': 286 obs. of 445 variables:
$ ID : num 1 1 2 2 3 3 4 4 5 5 ...
$ 2019-03-22_1 : num 0.0789 0.0901 0.0587 0.063 0.0937 0.0901 0.0517 0.0528 0.0819 0.0882 ...
$ 2019-03-22_2 : num 0.096 0.1056 0.0728 0.0771 0.1072 ...
$ 2019-03-22_3 : num 0.108 0.1226 0.0734 0.0788 0.125 ...
$ 2019-03-22_4 : num 0.1301 0.1437 0.0998 0.1017 0.1395 ...
$ 2019-03-22_5 : num 0.166 0.174 0.157 0.151 0.156 ...
$ 2019-03-22_6 : num 0.183 0.188 0.174 0.163 0.169 ...
$ 2019-03-22_7 : num 0.196 0.196 0.183 0.169 0.186 ...
$ 2019-03-22_8 : num 0.27 0.293 0.171 0.172 0.282 ...
$ 2019-03-22_9 : num 0.236 0.269 0.138 0.142 0.252 ...
$ 2019-03-22_10: num 0.29 0.229 0.427 0.365 0.196 ...
$ 2019-03-22_11: num -0.343 -0.299 -0.43 -0.374 -0.268 ...
$ 2019-03-22_12: num 0.1353 0.1108 0.1739 0.1452 0.0928 ...
$ 2019-03-24_1 : num 0.099 0.1088 0.0919 NA 0.1058 ...
$ 2019-03-24_2 : num 0.111 0.115 0.11 NA 0.114 ...
$ 2019-03-24_3 : num 0.116 0.127 0.104 NA 0.131 ...
$ 2019-03-24_4 : num 0.145 0.154 0.147 NA 0.152 ...
$ 2019-03-24_5 : num 0.19 0.19 0.258 NA 0.171 ...
$ 2019-03-24_6 : num 0.208 0.21 0.294 NA 0.186 ...
$ 2019-03-24_7 : num 0.231 0.222 0.31 NA 0.197 ...
$ 2019-03-24_8 : num 0.318 0.341 0.281 NA 0.331 ...
$ 2019-03-24_9 : num 0.283 0.314 0.217 NA 0.305 ...
$ 2019-03-24_10: num 0.329 0.271 0.497 NA 0.202 ...
$ 2019-03-24_11: num -0.35 -0.317 -0.477 NA -0.268 ...
$ 2019-03-24_12: num 0.1698 0.1405 0.291 NA 0.0997 ...
$ 2019-03-29_1 : num NA NA 0.0476 NA 0.0891 0.0847 0.0664 0.0719 NA NA ...
$ 2019-03-29_2 : num NA NA 0.0642 NA 0.0965 ...
$ 2019-03-29_3 : num NA NA 0.0607 NA 0.1196 ...
$ 2019-03-29_4 : num NA NA 0.0904 NA 0.1351 ...
$ 2019-03-29_5 : num NA NA 0.162 NA 0.149 ...
$ 2019-03-29_6 : num NA NA 0.18 NA 0.167 ...
$ 2019-03-29_7 : num NA NA 0.182 NA 0.183 ...
$ 2019-03-29_8 : num NA NA 0.167 NA 0.337 ...
$ 2019-03-29_9 : num NA NA 0.125 NA 0.311 ...
$ 2019-03-29_10: num NA NA 0.5 NA 0.209 ...
$ 2019-03-29_11: num NA NA -0.479 NA -0.309 ...
$ 2019-03-29_12: num NA NA 0.1955 NA 0.0971 ...
$ 2019-04-01_1 : num 0.0616 0.0703 0.0543 0.0573 0.0733 0.0783 0.0675 0.0693 0.0557 0.0584 ...
$ 2019-04-01_2 : num 0.0742 0.0838 0.073 0.076 0.0849 0.0872 0.0783 0.0821 0.0733 0.073 ...
$ 2019-04-01_3 : num 0.0798 0.0945 0.066 0.0758 0.0987 ...
$ 2019-04-01_4 : num 0.101 0.114 0.104 0.106 0.116 ...
$ 2019-04-01_5 : num 0.144 0.143 0.205 0.188 0.129 ...
$ 2019-04-01_6 : num 0.157 0.157 0.231 0.209 0.143 ...
$ 2019-04-01_7 : num 0.17 0.165 0.249 0.214 0.153 ...
$ 2019-04-01_8 : num 0.24 0.259 0.208 0.212 0.275 ...
$ 2019-04-01_9 : num 0.207 0.232 0.152 0.168 0.256 ...
$ 2019-04-01_10: num 0.362 0.272 0.581 0.476 0.216 ...
$ 2019-04-01_11: num -0.393 -0.326 -0.547 -0.475 -0.287 ...
$ 2019-04-01_12: num 0.1449 0.1119 0.2783 0.2137 0.0871 ...
$ 2019-04-16_1 : num 0.0639 0.0695 0.0539 0.0541 0.0767 0.081 0.0754 0.0739 0.0606 0.0621 ...
$ 2019-04-16_2 : num 0.0733 0.0797 0.0717 0.07 0.0834 0.0862 0.0835 0.0854 0.0748 0.0785 ...
$ 2019-04-16_3 : num 0.0832 0.0923 0.0658 0.0626 0.1042 ...
$ 2019-04-16_4 : num 0.108 0.115 0.111 0.107 0.118 ...
$ 2019-04-16_5 : num 0.164 0.159 0.229 0.223 0.136 ...
$ 2019-04-16_6 : num 0.183 0.179 0.26 0.26 0.149 ...
$ 2019-04-16_7 : num 0.202 0.198 0.284 0.275 0.166 ...
$ 2019-04-16_8 : num 0.255 0.27 0.205 0.202 0.288 ...
$ 2019-04-16_9 : num 0.219 0.244 0.141 0.144 0.278 ...
$ 2019-04-16_10: num 0.416 0.364 0.623 0.63 0.23 ...
$ 2019-04-16_11: num -0.467 -0.426 -0.596 -0.595 -0.332 ...
$ 2019-04-16_12: num 0.1846 0.1638 0.3228 0.3181 0.0979 ...
$ 2019-04-18_1 : num 0.0702 0.0792 0.0636 0.063 0.0875 0.094 0.0858 0.0868 0.0662 0.0709 ...
$ 2019-04-18_2 : num 0.0838 0.0946 0.0898 0.0872 0.101 ...
$ 2019-04-18_3 : num 0.0908 0.1038 0.0785 0.0765 0.1206 ...
$ 2019-04-18_4 : num 0.121 0.13 0.13 0.125 0.138 ...
$ 2019-04-18_5 : num 0.186 0.183 0.266 0.253 0.154 ...
$ 2019-04-18_6 : num 0.213 0.205 0.299 0.289 0.167 ...
$ 2019-04-18_7 : num 0.221 0.214 0.312 0.297 0.186 ...
$ 2019-04-18_8 : num 0.275 0.294 0.228 0.228 0.314 ...
$ 2019-04-18_9 : num 0.227 0.255 0.154 0.157 0.296 ...
$ 2019-04-18_10: num 0.418 0.346 0.598 0.59 0.214 ...
$ 2019-04-18_11: num -0.45 -0.387 -0.553 -0.546 -0.297 ...
$ 2019-04-18_12: num 0.199 0.167 0.335 0.321 0.101 ...
$ 2019-04-21_1 : num 0.0404 0.0619 0.0373 0.0351 0.0814 0.0844 0.0764 0.0801 0.0563 0.0626 ...
$ 2019-04-21_2 : num 0.0592 0.0823 0.0614 0.0579 0.0927 0.0966 0.0933 0.0952 0.0776 0.0869 ...
$ 2019-04-21_3 : num 0.0542 0.0873 0.048 0.0433 0.1118 ...
$ 2019-04-21_4 : num 0.082 0.105 0.0933 0.0841 0.1279 ...
$ 2019-04-21_5 : num 0.15 0.163 0.225 0.207 0.144 ...
$ 2019-04-21_6 : num 0.173 0.184 0.259 0.247 0.155 ...
$ 2019-04-21_7 : num 0.174 0.199 0.274 0.251 0.172 ...
$ 2019-04-21_8 : num 0.192 0.237 0.168 0.156 0.291 ...
$ 2019-04-21_9 : num 0.1352 0.1804 0.0994 0.0903 0.2674 ...
$ 2019-04-21_10: num 0.525 0.391 0.702 0.706 0.213 ...
$ 2019-04-21_11: num -0.493 -0.415 -0.634 -0.625 -0.3 ...
$ 2019-04-21_12: num 0.1954 0.174 0.3422 0.3212 0.0941 ...
$ 2019-05-01_1 : num 0.0342 0.0435 0.0282 0.0292 0.07 0.0684 0.0722 0.0757 0.0458 0.061 ...
$ 2019-05-01_2 : num 0.0516 0.055 0.0517 0.048 0.0781 0.0793 0.0861 0.0919 0.0613 0.0839 ...
$ 2019-05-01_3 : num 0.0422 0.0538 0.0299 0.0325 0.0991 ...
$ 2019-05-01_4 : num 0.0753 0.0836 0.0761 0.0755 0.1112 ...
$ 2019-05-01_5 : num 0.182 0.177 0.247 0.235 0.124 ...
$ 2019-05-01_6 : num 0.21 0.203 0.3 0.287 0.138 ...
$ 2019-05-01_7 : num 0.214 0.19 0.314 0.293 0.157 ...
$ 2019-05-01_8 : num 0.164 0.182 0.148 0.146 0.264 ...
$ 2019-05-01_9 : num 0.0988 0.1156 0.0777 0.0763 0.235 ...
$ 2019-05-01_10: num 0.67 0.559 0.826 0.801 0.225 ...
$ 2019-05-01_11: num -0.611 -0.552 -0.717 -0.719 -0.334 ...
$ 2019-05-01_12: num 0.273 0.2196 0.4226 0.3935 0.0916 ...
$ 2019-05-26_1 : num 0.0537 0.0633 0.0431 0.0444 0.118 ...
$ 2019-05-26_2 : num 0.0675 0.0835 0.0611 0.0564 0.1284 ...
[list output truncated]
What I have now is a dataframe, that has a column for every band of each image (12 columns for each image), which results in 37x12 columns. From here on, I don't know how to add the extracted values to the s_points dataframe, in order to have the ID and classname of the extracted values. This isn't possible, because I have 444 values for every point.
My questions are:
How can I combine the extracted values and the sample_points?
How can I train a rf-model with this extracted data?
Does it make more sense to use a datacube here (gdalcubes in R)? I forget this idea, mainly because of the unconstant character of the time series, which would result in problem with the temporal aggregation. This isn't expedient in the research question.
Thanks
You mention that you want a dataset with four dimensions. But how you are going to train your model and make predictions (you can only use two dimensions for that)? So it would seem to me that what you need is a three-dimensional SpatRaster that you can make with
cube <- rast(files)
Unless you want to run a separate model for each file --- but then you should loop over the files.
Here is an example (taken from ?terra::predict showing how you might then run a RandomForest, or any other regression or classification model.
library(terra)
logo1 <- rast(system.file("ex/logo.tif", package="terra"))
logo2 <- sqrt(logo1)
cube <- c(logo1, logo2)
names(cube) <- c("red1", "green1", "blue1", "red2", "green2", "blue2")
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
e <- extract(cube, xy[,2:3])
v <- data.frame(cbind(pa=xy[,1], e))
library(randomForest)
rfm <- randomForest(formula=pa~., data=v)
p <- predict(cube, rfm)
Perhaps you can edit your question and explain why this would not work for you. And include a toy example of how you intend to fit your model. I suppose the rasters are your predictors, but what are you predicting (your y variable)? Is it constant or is it different for each time step (raster file)?
If the issue is that you want to distinguish between variables with the same names at different dates you can concatenate them. Something like this with SpatRaster x
names(x) <- paste0(names(x), "_", time(x))
If you want to write a single netCDF file you could do
sds <- rast(files)
writeCDF(sds, "test.nc")

Error when applying comb_CSR function in R

my aim is to estimate a complete subset regression using the function comb_CSR() in R.
My data set is the following:
str(df_na)
Classes ‘fredmd’ and 'data.frame': 360 obs. of 128 variables:
$ date : Date, format: "1992-03-01" "1992-04-01" "1992-05-01" "1992-06-01" ...
$ RPI : num 0.001653 0.00373 0.005329 0.004173 -0.000796 ...
$ W875RX1 : num 0.000812 0.002751 0.005493 0.004447 -0.001346 ...
$ DPCERA3M086SBEA: num 0.001824 0.000839 0.005146 0.002696 0.003342 ...
$ CMRMTSPLx : num 0.00402 0.00664 -0.00874 0.01049 0.0133 ...
$ RETAILx : num -0.003 0.00602 0.00547 0.0028 0.00708 ...
$ INDPRO : num 0.008279 0.007593 0.003221 0.000539 0.008911 ...
$ IPFPNSS : num 0.00851 0.00743 0.0055 -0.00244 0.00998 ...
$ IPFINAL : num 0.00899 0.0076 0.0058 -0.00309 0.01129 ...
$ IPCONGD : num 0.00911 0.00934 0.00648 -0.0049 0.01298 ...
$ IPDCONGD : num 0.0204 0.0185 0.0308 -0.0138 0.0257 ...
$ IPNCONGD : num 0.00518 0.00612 -0.00219 -0.00172 0.00843 ...
$ IPBUSEQ : num 0.01174 0.00958 0.00792 0.00247 0.01016 ...
$ IPMAT : num 0.007989 0.007794 0.000352 0.004296 0.007562 ...
$ IPDMAT : num 0.0113 0.00652 0.01044 0.00211 0.01118 ...
$ IPNMAT : num 0.014042 0.001707 -0.004866 0.010879 0.000204 ...
$ IPMANSICS : num 0.01014 0.00538 0.00579 0.00327 0.0089 ...
$ IPB51222S : num -0.00883 0.04244 -0.02427 -0.04027 0.00958 ...
$ IPFUELS : num 0.0048 0.00603 -0.00854 -0.00383 0.00329 ...
$ CUMFNS : num 0.6213 0.2372 0.2628 0.0569 0.5077 ...
$ HWI : num 140 -104 94 -36 -20 68 -91 55 98 3 ...
$ HWIURATIO : num 0.014611 -0.009559 -0.000538 -0.012463 0.003537 ...
$ CLF16OV : num 0.003116 0.001856 0.002172 0.00265 0.000809 ...
$ CE16OV : num 0.003315 0.002384 -0.000431 0.000372 0.00248 ...
$ UNRATE : num 0 0 0.2 0.2 -0.1 ...
$ UEMPMEAN : num 0.4 0.3 0.4 0.4 -0.1 ...
$ UEMPLT5 : num 0.0404 -0.0346 0.0361 0.0291 -0.023 ...
$ UEMP5TO14 : num -0.05285 0.00711 -0.01177 0.0075 0.00815 ...
$ UEMP15OV : num 0.00439 -0.02087 0.0956 0.08725 -0.03907 ...
$ UEMP15T26 : num -0.0365 -0.0321 0.0564 0.0966 -0.0762 ...
$ UEMP27OV : num 0.0386 -0.0119 0.1255 0.0804 -0.0122 ...
$ CLAIMSx : num -0.02914 -0.02654 -0.00203 0.00323 0.05573 ...
$ PAYEMS : num 0.000498 0.00142 0.001197 0.000607 0.000717 ...
$ USGOOD : num -0.000678 0.000226 0.000136 -0.001718 -0.001041 ...
$ CES1021000001 : num -0.00225 -0.00628 -0.00486 -0.01144 -0.00296 ...
$ USCONS : num 0.00195 -0.003903 0.000434 -0.004571 -0.003059 ...
$ MANEMP : num -0.001427 0.001546 0.000238 -0.000535 -0.000416 ...
$ DMANEMP : num -0.002104 0.000802 0.0002 -0.001304 -0.002009 ...
$ NDMANEMP : num -0.000439 0.00263 0.000292 0.000583 0.001893 ...
$ SRVPRD : num 0.0008 0.00173 0.00147 0.0012 0.00117 ...
$ USTPU : num 4.52e-05 4.97e-04 -6.78e-04 1.36e-04 -1.95e-03 ...
$ USWTRADE : num -0.000509 -0.002313 -0.002043 -0.001476 -0.003649 ...
$ USTRADE : num 1.56e-05 1.47e-03 -1.17e-04 5.22e-04 -1.68e-03 ...
$ USFIRE : num 0.000153 0.001379 0.001835 0.001222 -0.000153 ...
$ USGOVT : num 0.00139 0.001282 0.000747 0.00048 0.002927 ...
$ CES0600000007 : num 40.3 40.5 40.4 40.3 40.3 40.3 40.2 40.3 40.3 40.3 ...
$ AWOTMAN : num 0.1 0 0.2 -0.1 0 ...
$ AWHMAN : num 40.7 40.8 40.9 40.8 40.8 40.8 40.7 40.8 40.9 40.9 ...
$ HOUST : num 7.17 7 7.1 7.04 7.04 ...
$ HOUSTNE : num 4.92 4.81 4.82 4.74 4.81 ...
$ HOUSTMW : num 5.79 5.47 5.7 5.62 5.6 ...
$ HOUSTS : num 6.27 6.17 6.22 6.14 6.16 ...
$ HOUSTW : num 5.72 5.57 5.67 5.68 5.61 ...
$ PERMIT : num 6.99 6.96 6.96 6.96 6.99 ...
$ PERMITNE : num 4.81 4.77 4.83 4.8 4.86 ...
$ PERMITMW : num 5.58 5.49 5.55 5.49 5.53 ...
$ PERMITS : num 6.1 6.05 6.04 6.08 6.09 ...
$ PERMITW : num 5.52 5.59 5.54 5.55 5.59 ...
$ ACOGNO : num 0.04458 0.00165 0.02271 0.01092 -0.00382 ...
$ AMDMNOx : num 0.04682 0.03636 0.0108 -0.02403 -0.00199 ...
$ ANDENOx : num 0.0931 0.0104 0.0242 -0.0371 -0.0105 ...
$ AMDMUOx : num -0.00481 0.00194 0.00191 -0.00509 -0.00927 ...
$ BUSINVx : num 0.003853 0.003351 -0.000536 0.006642 0.005653 ...
$ ISRATIOx : num -0.03 0 -0.01 0 -0.01 ...
$ M1SL : num -0.003773 -0.004802 -0.000372 -0.003294 0.005502 ...
$ M2SL : num -0.004398 -0.002381 0.000911 -0.001208 0.001679 ...
$ M2REAL : num -0.00245 -0.0034 -0.00246 -0.00441 -0.00269 ...
$ BOGMBASE : num 0.01892 -0.02067 0.01167 0.00355 0.002 ...
$ TOTRESNS : num 0.0305 -0.1304 0.0784 0.0465 -0.0082 ...
$ NONBORRES : num 0.02896 -0.12317 0.06947 0.04605 -0.00826 ...
$ BUSLOANS : num 0.00237 -0.00104 0.00132 0.00173 -0.00106 ...
$ REALLN : num -0.00132 0.0058 -0.00663 -0.00338 0.00177 ...
$ NONREVSL : num -6.43e-05 -4.49e-03 3.65e-03 -2.72e-03 5.74e-03 ...
$ CONSPI : num -0.000498 -0.001173 -0.000825 -0.001014 -0.000115 ...
$ S&P 500 : num -0.012684 0.000123 0.018001 -0.015892 0.01647 ...
$ S&P: indust : num -0.01236 -0.000681 0.012694 -0.018013 0.010731 ...
$ S&P div yield : num 0.047815 -0.000371 -0.053946 0.047576 -0.04368 ...
$ S&P PE ratio : num 0.00689 0.02343 0.0377 -0.00193 0.02857 ...
$ FEDFUNDS : num -0.08 -0.25 0.09 -0.06 -0.51 ...
$ CP3Mx : num 0.19 -0.26 -0.16 0.04 -0.48 ...
$ TB3MS : num 0.2 -0.29 -0.12 0.03 -0.45 ...
$ TB6MS : num 0.25 -0.31 -0.12 0.02 -0.49 ...
$ GS1 : num 0.34 -0.33 -0.11 -0.02 -0.57 ...
$ GS5 : num 0.37 -0.17 -0.09 -0.21 -0.64 ...
$ GS10 : num 0.2 -0.06 -0.09 -0.13 -0.42 ...
$ AAA : num 0.06 -0.02 -0.05 -0.06 -0.15 ...
$ BAA : num 0.02 -0.04 -0.08 -0.08 -0.21 ...
$ COMPAPFFx : num 0.32 0.31 0.06 0.16 0.19 0.08 0.02 0.23 0.57 0.75 ...
$ TB3SMFFM : num 0.06 0.02 -0.19 -0.1 -0.04 -0.17 -0.31 -0.24 0.04 0.3 ...
$ TB6SMFFM : num 0.2 0.14 -0.07 0.01 0.03 -0.09 -0.26 -0.06 0.25 0.44 ...
$ T1YFFM : num 0.65 0.57 0.37 0.41 0.35 0.17 -0.04 0.2 0.59 0.79 ...
$ T5YFFM : num 2.97 3.05 2.87 2.72 2.59 2.3 2.16 2.5 2.95 3.16 ...
$ T10YFFM : num 3.56 3.75 3.57 3.5 3.59 3.29 3.2 3.49 3.78 3.85 ...
$ AAAFFM : num 4.37 4.6 4.46 4.46 4.82 4.65 4.7 4.89 5.01 5.06 ...
$ BAAFFM : num 5.27 5.48 5.31 5.29 5.59 5.35 5.4 5.74 5.87 5.89 ...
$ TWEXAFEGSMTHx : num 0.02529 -0.00399 -0.01238 -0.02224 -0.02363 ...
$ EXSZUSx : num 0.036 0.0066 -0.0191 -0.0451 -0.0655 ...
$ EXJPUSx : num 0.03964 0.00508 -0.02095 -0.03056 -0.00755 ...
$ EXUSUKx : num -0.0308 0.0188 0.0297 0.0249 0.0332 ...
[list output truncated]
- attr(*, "na.action")= 'omit' Named int [1:402] 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "names")= chr [1:402] "2" "3" "4" "5" ...
The code is the following so far:
#define response variable
y <- df_na$PAYEMS
#define matrix of predictor variables
x <- data.matrix(df_na[, !names(df_na) %in% c("PAYEMS", "date")])
# break data into in-sample and out-of-smple
y.in = y[1:190]; y.out = y[-c(1:190)]
x.in = x[1:190, ]; x.out = x[-c(1:190), ]
trial <- foreccomb(y.in, x.in, y.out, x.out)
result <- comb_CSR(trial)
However, as soon as I run the last line, I get the following error:
> result <- comb_CSR(trial)
Error in matrix(0, ndiff_models, 4) :
invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(0, ndiff_models, 4) : NAs introduced by coercion to integer range
The data set does not have any NA values as I get rid of them beforehand. Unfortunately, I do not understand where the error comes from. Does anyone have an idea?

Merge 2 list of lists in R [duplicate]

This question already has answers here:
Combining elements of list of lists by index
(3 answers)
in r combine a list of lists into one list
(3 answers)
Closed 4 years ago.
I have 2 list of lists in R with the same list name as follow :
str(total_delta_final[[1]])
List of 4
$ sector1_T02 :'data.frame': 24 obs. of 3 variables:
..$ DeltaF_1: num [1:24] 0.737 0.737 0.693 0.738 0.738 ...
..$ DeltaF_2: num [1:24] 0.24 0.24 0.279 0.239 0.239 ...
..$ DeltaF_3: num [1:24] 0.0233 0.0233 0.0275 0.0232 0.0232 ...
$ sector2_T03 :'data.frame': 24 obs. of 3 variables:
..$ DeltaF_1: num [1:24] 0.582 0.582 0.568 0.69 0.69 ...
..$ DeltaF_2: num [1:24] 0.377 0.377 0.39 0.282 0.282 ...
..$ DeltaF_3: num [1:24] 0.0406 0.0406 0.0426 0.0278 0.0278 ...
$ sector3_T03 :'data.frame': 24 obs. of 3 variables:
..$ DeltaF_1: num [1:24] 0.607 0.607 0.495 0.409 0.375 ...
..$ DeltaF_2: num [1:24] 0.356 0.356 0.451 0.519 0.544 ...
..$ DeltaF_3: num [1:24] 0.0373 0.0373 0.0541 0.072 0.0809 ...
$ sector12_T02:'data.frame': 24 obs. of 3 variables:
..$ DeltaF_1: num [1:24] 0.743 0.743 0.758 0.689 0.705 ...
..$ DeltaF_2: num [1:24] 0.234 0.234 0.22 0.283 0.269 ...
..$ DeltaF_3: num [1:24] 0.0226 0.0226 0.0213 0.028 0.0263 ...
> str(total_TI_final[[1]])
List of 4
$ sector1_T02 :'data.frame': 24 obs. of 3 variables:
..$ I_1: num [1:24] NA 0.0756 0.083 0.0799 0.0799 ...
..$ I_2: num [1:24] 0.122 NA 0.163 0.172 0.172 ...
..$ I_3: num [1:24] 0.212 0.211 NA 0.266 0.273 ...
$ sector2_T03 :'data.frame': 24 obs. of 3 variables:
..$ I_1: num [1:24] NA 0.0986 0.1013 0.1011 0.101 ...
..$ I_2: num [1:24] 0.15 NA 0.184 0.211 0.211 ...
..$ I_3: num [1:24] 0.249 0.249 NA 0.331 0.337 ...
$ sector3_T03 :'data.frame': 24 obs. of 3 variables:
..$ I_1: num [1:24] NA 0.119 0.115 0.113 0.105 ...
..$ I_2: num [1:24] 0.193 NA 0.2 0.193 0.177 ...
..$ I_3: num [1:24] 0.323 0.323 NA 0.277 0.256 ...
$ sector12_T02:'data.frame': 24 obs. of 3 variables:
..$ I_1: num [1:24] NA 0.0825 0.0681 0.0723 0.0706 ...
..$ I_2: num [1:24] 0.138 NA 0.146 0.145 0.144 ...
..$ I_3: num [1:24] 0.24 0.24 NA 0.22 0.226 ...
How could I merge these 2 list of lists in a way that my final output looks like total_TI_final[[1]][1] and the second list total_delta_final[[1]][1] then total_TI_final[[1]][2] and total_delta_final[[1]][2] and so on ...
We can use Map
Map(c, total_delta_final, total_TI_final)

transform all columns that are NAs from numeric to factors

I have a data.table called td.br.2, in which some columns are completely NAs. These columns are of type numeric. What I would like to do, is only for these columns to transform them to factors.
I have tried the following, but it does not work ( I do not get an error but it does not do the job either)
td.br.2[] <- td.br.2[,lapply(.SD, function(x) {ifelse(sum(is.na(x)==nrow(td.br.2)),as.factor(x),x)})]
n=10#nr of rows
m=10#nr of cols
N<-n*m
m1<-matrix(runif(N),nrow=n,ncol = m)
dt<-data.table(m1)
names(dt)<-letters[1:m]
dt<-cbind(dt,xxx=rep(NA,nrow(dt)))#adding NA column
At this point
str(dt)
Classes ‘data.table’ and 'data.frame': 10 obs. of 11 variables:
$ a : num 0.661 0.864 0.152 0.342 0.989 ...
$ b : num 0.06036 0.67587 0.00847 0.37674 0.30417 ...
$ c : num 0.3938 0.6274 0.0514 0.882 0.1568 ...
$ d : num 0.777 0.233 0.619 0.117 0.132 ...
$ e : num 0.655 0.926 0.277 0.598 0.237 ...
$ f : num 0.649 0.197 0.547 0.585 0.685 ...
$ g : num 0.6877 0.3676 0.009 0.6975 0.0327 ...
$ h : num 0.519 0.705 0.457 0.465 0.966 ...
$ i : num 0.43777 0.00961 0.30224 0.58172 0.37621 ...
$ j : num 0.44 0.481 0.485 0.125 0.263 ...
$ xxx: logi NA NA NA NA NA NA ...
So by executing:
dt<-dt[, lapply(.SD, function(x){ if(all(is.na(x)))as.factor(as.character(x)) else x}),]
yields:
str(dt)
Classes ‘data.table’ and 'data.frame': 10 obs. of 11 variables:
$ a : num 0.0903 0.0448 0.5956 0.418 0.1316 ...
$ b : num 0.672 0.582 0.687 0.113 0.371 ...
$ c : num 0.404 0.16 0.848 0.863 0.737 ...
$ d : num 0.073 0.129 0.243 0.334 0.285 ...
$ e : num 0.485 0.186 0.539 0.486 0.784 ...
$ f : num 0.4685 0.4815 0.585 0.3596 0.0764 ...
$ g : num 0.958 0.194 0.549 0.71 0.737 ...
$ h : num 0.168 0.355 0.552 0.765 0.605 ...
$ i : num 0.665 0.88 0.23 0.575 0.413 ...
$ j : num 0.1113 0.8797 0.1244 0.0741 0.8724 ...
$ xxx: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA
I am not sure why you would want to do that, but here you are:
naColumns <- sapply(td.br.2, function(x) { all(is.na(x)) })
for (col in which(naColumns))
set(td.br.2, j=col, value=as.factor(x[[col]]))
The factors will have no levels, but you can deal with that as necessary.
(The for loop is partly based on this.)

remove entire rows when data is above 4 stdev

I have a dataset that I need to clean removing the rows that contain values that are above 4 stdev. I need to delete rows where one or more of the columns (2:18) value is above 4 stdev. How to best do this?
data.frame': 154940 obs. of 19 variables:
$ msec: int 0 170 340 500 670 840 1010 1180 1340 1510 ...
$ a412: num 0.0607 0.0584 0.0644 0.0607 0.0577 ...
$ a440: num 0.0697 0.0649 0.0706 0.0706 0.0649 ...
$ a488: num 0.0663 0.0633 0.0653 0.0673 0.0653 ...
$ a510: num 0.466 0.459 0.44 0.462 0.445 ...
$ a532: num 0.453 0.444 0.45 0.454 0.444 ...
$ a555: num 0.428 0.424 0.436 0.426 0.428 ...
$ a650: num 0.0839 0.0839 0.0839 0.0891 0.0865 ...
$ a676: num 0.0963 0.0954 0.0963 0.1 0.0991 ...
$ a715: num 0.0899 0.0912 0.0893 0.0887 0.0887 ...
$ c412: num 0.343 0.337 0.342 0.346 0.344 ...
$ c440: num 0.341 0.343 0.344 0.353 0.348 ...
$ c488: num 0.33 0.335 0.337 0.345 0.34 ...
$ c510: num 0.081 0.0802 0.0794 0.0794 0.081 ...
$ c532: num 0.0594 0.0606 0.0582 0.057 0.0594 ...
$ c555: num 0.067 0.0633 0.0615 0.0633 0.0689 ...
$ c650: num 0.562 0.56 0.565 0.571 0.556 ...
$ c676: num 0.549 0.552 0.551 0.55 0.537 ...
$ c715: num 0.487 0.481 0.481 0.489 0.473 ...

Resources