Reading in columns without clear separators with read.table - r

Hello I'm loading a data file which is formated as a table separated with multispaces. Ordinarily it is easily loaded via read.table(data_file, sep = "", header = T, fill = T), but some values are not divided with spaces in case they are negative:
523.2 -166.1 1.62 0.079 0.0 0.0 0.0 2260 0
528.4 -168.6 -0.71-0.034 0.0 0.0 0.0 2284 0
533.9 -169.7 -1.75-0.085 0.0 0.0 0.0 2308 0
538.4 -169.5 -1.60-0.078 0.0 0.0 0.0 2333 0
543.3 -170.8 -2.83-0.137 0.0 0.0 0.0 2357 0
548.2 -171.8 -3.77-0.183 0.0 0.0 0.0 2381 0
552.8 -172.1 -3.87-0.187 0.0 0.0 0.0 2406 0
554.9 -172.5 -4.23-0.205 0.0 0.0 0.0 2430 0
Then whole part eg -3.77-0.183 is taken as a single value.
What is convenient way to cope with this without preliminary file conversion using other scripts.
Thanks in advance!

One way would be:
lines <- readLines("datN.txt") #read your data using `readLines`
lines1 <- gsub("(?<=[0-9])((-|\\s)[0-9]+)", " \\1", lines, perl=TRUE)
dat <- read.table(text=lines1, sep="", header=FALSE)
dat
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 523.2 -166.1 1.62 0.079 0 0 0 2260 0
#2 528.4 -168.6 -0.71 -0.034 0 0 0 2284 0
#3 533.9 -169.7 -1.75 -0.085 0 0 0 2308 0
#4 538.4 -169.5 -1.60 -0.078 0 0 0 2333 0
#5 543.3 -170.8 -2.83 -0.137 0 0 0 2357 0
#6 548.2 -171.8 -3.77 -0.183 0 0 0 2381 0
#7 552.8 -172.1 -3.87 -0.187 0 0 0 2406 0
#8 554.9 -172.5 -4.23 -0.205 0 0 0 2430 0
str(dat)
#'data.frame': 8 obs. of 9 variables:
#$ V1: num 523 528 534 538 543 ...
#$ V2: num -166 -169 -170 -170 -171 ...
#$ V3: num 1.62 -0.71 -1.75 -1.6 -2.83 -3.77 -3.87 -4.23
#$ V4: num 0.079 -0.034 -0.085 -0.078 -0.137 -0.183 -0.187 -0.205
#$ V5: num 0 0 0 0 0 0 0 0
#$ V6: num 0 0 0 0 0 0 0 0
#$ V7: num 0 0 0 0 0 0 0 0
#$ V8: int 2260 2284 2308 2333 2357 2381 2406 2430
#$ V9: int 0 0 0 0 0 0 0 0

If it is a well-formatted (from a fixed-field perspective), then:
data <- read.fwf("fixed.dat", widths = c(6, 9, 10, 6, 12, 9, 9, 7, 9))
data
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 523.2 -166.1 1.62 0.079 0 0 0 2260 0
## 2 528.4 -168.6 -0.71 -0.034 0 0 0 2284 0
## 3 533.9 -169.7 -1.75 -0.085 0 0 0 2308 0
## 4 538.4 -169.5 -1.60 -0.078 0 0 0 2333 0
## 5 543.3 -170.8 -2.83 -0.137 0 0 0 2357 0
## 6 548.2 -171.8 -3.77 -0.183 0 0 0 2381 0
## 7 552.8 -172.1 -3.87 -0.187 0 0 0 2406 0
## 8 554.9 -172.5 -4.23 -0.205 0 0 0 2430 0
might work.

Related

Difference in values between rows after group_by

I want to calculate the difference in values for the following row after the previous. However, I am getting this error:
Error in mutate():
! Problem while computing ..1 = across(where(is.numeric), diff).
ℹ The error occurred in group 1: vs = 0
Caused by error in across():
! Problem while computing column mpg.
Caused by error in dplyr_internal_error():
Run rlang::last_error() to see where the error occurred.
Here is what I have tried:
mtcars %>% group_by(vs) %>% mutate(across(where(is.numeric), diff))
This seems to do the trick:
mtcars %>% group_by(vs) %>% aggregate(. ~ vs, data=., diff) %>% as.data.frame() %>% unnest()
#//--
# A tibble: 30 × 11
vs mpg cyl disp hp drat wt qsec am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 0 0.255 0.560 0 0 0
2 0 -2.3 2 200 65 -0.75 0.565 0 -1 -1 -2
3 0 -4.4 0 0 70 0.0600 0.130 -1.18 0 0 2
4 0 2.10 0 -84.2 -65 -0.140 0.500 1.56 0 0 -1
5 0 0.900 0 0 0 0 -0.340 0.200 0 0 0
6 0 -2.10 0 0 0 0 0.0500 0.400 0 0 0
7 0 -4.8 0 196. 25 -0.140 1.47 -0.0200 0 0 1
8 0 0 0 -12 10 0.0700 0.174 -0.160 0 0 0
9 0 4.3 0 -20 15 0.23 -0.0790 -0.400 0 0 0
10 0 0.800 0 -122 -80 -0.47 -1.82 -0.550 0 0 -2
# … with 20 more rows
You could explicitly define the calculation using lag. Or you could do this in base R:
library(tidyverse)
#tidyverse
mtcars %>%
group_by(vs) %>%
mutate(across(where(is.numeric), ~.-lag(., default = first(.)))) |>
arrange(vs)
#> # A tibble: 32 x 11
#> # Groups: vs [2]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 0 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0.255 0.560 0 0 0 0
#> 3 -2.3 2 200 65 -0.75 0.565 0 0 -1 -1 -2
#> 4 -4.4 0 0 70 0.0600 0.130 -1.18 0 0 0 2
#> 5 2.10 0 -84.2 -65 -0.140 0.500 1.56 0 0 0 -1
#> 6 0.900 0 0 0 0 -0.340 0.200 0 0 0 0
#> 7 -2.10 0 0 0 0 0.0500 0.400 0 0 0 0
#> 8 -4.8 0 196. 25 -0.140 1.47 -0.0200 0 0 0 1
#> 9 0 0 -12 10 0.0700 0.174 -0.160 0 0 0 0
#> 10 4.3 0 -20 15 0.23 -0.0790 -0.400 0 0 0 0
#> # ... with 22 more rows
#base R
by(mtcars, mtcars$vs, \(x) apply(x, 2, diff)) |>
do.call(what = rbind.data.frame)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0.Mazda RX4 Wag 0.0 0 0.0 0 0.00 0.255 0.56 0 0 0 0
#> 0.Hornet Sportabout -2.3 2 200.0 65 -0.75 0.565 0.00 0 -1 -1 -2
#> 0.Duster 360 -4.4 0 0.0 70 0.06 0.130 -1.18 0 0 0 2
#> 0.Merc 450SE 2.1 0 -84.2 -65 -0.14 0.500 1.56 0 0 0 -1
#> 0.Merc 450SL 0.9 0 0.0 0 0.00 -0.340 0.20 0 0 0 0
#> 0.Merc 450SLC -2.1 0 0.0 0 0.00 0.050 0.40 0 0 0 0
#> 0.Cadillac Fleetwood -4.8 0 196.2 25 -0.14 1.470 -0.02 0 0 0 1
#> 0.Lincoln Continental 0.0 0 -12.0 10 0.07 0.174 -0.16 0 0 0 0
#> 0.Chrysler Imperial 4.3 0 -20.0 15 0.23 -0.079 -0.40 0 0 0 0
#> 0.Dodge Challenger 0.8 0 -122.0 -80 -0.47 -1.825 -0.55 0 0 0 -2
#> 0.AMC Javelin -0.3 0 -14.0 0 0.39 -0.085 0.43 0 0 0 0
#> 0.Camaro Z28 -1.9 0 46.0 95 0.58 0.405 -1.89 0 0 0 2
#> 0.Pontiac Firebird 5.9 0 50.0 -70 -0.65 0.005 1.64 0 0 0 -2
#> 0.Porsche 914-2 6.8 -4 -279.7 -84 1.35 -1.705 -0.35 0 1 2 0
#> 0.Ford Pantera L -10.2 4 230.7 173 -0.21 1.030 -2.20 0 0 0 2
#> 0.Ferrari Dino 3.9 -2 -206.0 -89 -0.60 -0.400 1.00 0 0 0 2
#> 0.Maserati Bora -4.7 2 156.0 160 -0.08 0.800 -0.90 0 0 0 2
#> 1.Hornet 4 Drive -1.4 2 150.0 17 -0.77 0.895 0.83 0 -1 -1 0
#> 1.Valiant -3.3 0 -33.0 -5 -0.32 0.245 0.78 0 0 0 0
#> 1.Merc 240D 6.3 -2 -78.3 -43 0.93 -0.270 -0.22 0 0 1 1
#> 1.Merc 230 -1.6 0 -5.9 33 0.23 -0.040 2.90 0 0 0 0
#> 1.Merc 280 -3.6 2 26.8 28 0.00 0.290 -4.60 0 0 0 2
#> 1.Merc 280C -1.4 0 0.0 0 0.00 0.000 0.60 0 0 0 0
#> 1.Fiat 128 14.6 -2 -88.9 -57 0.16 -1.240 0.57 0 1 0 -3
#> 1.Honda Civic -2.0 0 -3.0 -14 0.85 -0.585 -0.95 0 0 0 1
#> 1.Toyota Corolla 3.5 0 -4.6 13 -0.71 0.220 1.38 0 0 0 -1
#> 1.Toyota Corona -12.4 0 49.0 32 -0.52 0.630 0.11 0 -1 -1 0
#> 1.Fiat X1-9 5.8 0 -41.1 -31 0.38 -0.530 -1.11 0 1 1 0
#> 1.Lotus Europa 3.1 0 16.1 47 -0.31 -0.422 -2.00 0 0 1 1
#> 1.Volvo 142E -9.0 0 25.9 -4 0.34 1.267 1.70 0 0 -1 0

PCA in R: Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

My dataframe contains about 60 observations and 15 variables. They are mixture of continuous and binary data but I've made sure all of the variables are numeric and do not have any NA values (used na.omit). I used is.finite, is.na to check for infinite/NA values. Using the function prcomp() on my dataframe tells me "Infinite or missing values in x". What might I be overlooking then? I am new to R, just started learning so i appreciate the help.
enter image description here
Some of my columns are still characters and I am not sure how to change it, using gsub and then as.numeric still gives me an error.
library(readxl)
> Pca_for_R <- read_excel("~/Pca for R.xlsx")
sapply(Pca_for_R, as.numeric)
new <- gsub(",", "", Pca_for_R)
Mypca <- prcomp(Pca_for_R, center=TRUE, scale=TRUE)
List item
Sample of my data:
CRT GFA MRT VA Contast Myp Hyp eso exo VFV VFH
[1,] 247 2.71 1283 0.63 2.50 0 0 1 0 0 0
[2,] 226 2.06 442 1.00 1.50 0 0 0 0 0 0
[3,] 251 2.16 420 1.00 1.25 0 0 0 0 0 0
[4,] 202 3.02 282 0.80 1.25 0 1 0 0 0 0
[5,] 252 2.17 640 0.50 1.50 0 0 0 0 0 0
[6,] 260 2.25 857 0.40 1.50 0 1 0 0 1 0
[7,] 255 2.51 736 0.63 1.20 0 1 1 0 0 0
[8,] 242 1.90 353 1.00 1.20 0 0 1 0 0 0
[9,] 206 1.90 292 0.80 1.20 1 0 0 0 0 0
[10,] 515 3.04 376 0.25 1.20 0 0 0 0 1 0
[11,] 222 2.13 424 0.80 10.00 0 1 0 0 1 0
[12,] 292 1.70 326 0.50 1.25 0 1 0 0 0 0
[13,] 207 2.55 427 1.00 2.50 0 1 0 0 0 0
[14,] 242 1.89 387 0.63 1.20 0 0 0 0 0 0
[15,] 205 1.86 341 1.00 2.50 0 0 0 1 0 0
[16,] 250 3.01 728 0.40 1.20 1 0 0 0 0 0
[17,] 269 3.51 410 0.50 6.00 1 0 0 0 0 1
[18,] 271 2.17 592 0.63 1.20 1 0 0 1 0 0
[19,] 264 1.52 235 0.63 1.20 0 0 0 0 0 1
[20,] 381 4.63 628 0.80 1.25 0 0 1 0 0 0
[21,] 342 3.35 422 0.30 2.50 0 0 1 0 0 0
[22,] 219 3.75 372 0.40 1.50 1 0 1 0 0 0
[23,] 306 3.35 564 0.40 3.00 0 0 0 0 0 0
[24,] 253 3.94 592 0.63 1.50 0 1 0 1 0 0
[25,] 268 2.13 387 1.00 1.25 0 0 0 0 0 0
[26,] 346 2.16 345 0.50 2.50 0 1 0 1 0 0
[27,] 289 1.79 370 0.50 1.50 0 0 0 0 0 0
[28,] 362 1.91 616 1.00 2.50 1 0 0 0 0 1
[29,] 321 3.65 791 0.50 5.00 0 0 0 0 0 0
[30,] 497 2.64 516 0.80 5.00 0 0 0 0 0 0
[31,] 291 2.52 900 1.00 5.00 0 0 0 0 0 0
[32,] 176 2.94 376 1.00 1.20 0 1 0 1 0 0
[33,] 192 2.00 336 0.32 2.00 0 1 0 0 0 0
[34,] 207 2.05 340 1.00 1.20 0 1 0 0 0 0
[35,] 331 2.05 480 0.80 1.20 0 1 0 0 0 0
[36,] 238 2.33 550 1.00 1.50 0 1 0 0 0 0
[37,] 205 4.32 554 0.63 5.00 0 1 0 0 0 0
[38,] 300 1.55 499 1.00 2.50 1 0 0 0 0 0
[39,] 374 2.92 687 1.00 5.00 0 0 0 0 0 0
[40,] 243 3.43 735 0.40 2.50 0 0 0 1 0 0
[41,] 221 2.39 489 0.50 1.25 0 0 0 0 0 0
[42,] 177 1.88 249 1.25 1.25 0 0 0 0 0 0
[43,] 377 3.35 581 0.50 5.00 0 0 0 0 0 0
[44,] 285 2.28 459 0.30 25.00 0 0 0 0 0 0
[45,] 230 2.17 438 1.00 1.80 0 1 0 0 0 0
[46,] 183 2.34 344 1.00 1.80 0 1 1 0 1 0
[47,] 245 1.63 418 0.50 1.25 0 1 1 0 0 0
[48,] 235 1.89 514 0.60 4.00 0 0 0 0 0 0
[49,] 179 2.89 525 0.30 4.00 1 0 1 0 0 0
[50,] 187 1.47 313 0.16 5.00 0 1 0 0 0 0
[51,] 243 2.48 331 0.63 3.00 1 0 0 0 1 0
[52,] 289 1.79 370 0.80 1.50 0 0 0 0 0 0
[53,] 287 2.80 569 0.60 6.00 0 1 0 1 0 0
[54,] 271 1.61 337 0.80 1.65 0 0 1 0 0 0
[55,] 198 1.70 429 0.80 1.25 0 0 0 0 0 0
[56,] 246 2.65 516 0.50 5.00 1 0 1 0 0 0
[57,] 318 2.16 746 0.25 8.00 0 0 0 1 0 0
[58,] 238 1.61 355 0.80 1.25 0 0 0 1 0 0
[59,] 268 2.13 387 0.32 1.50 0 0 0 0 0 1
[60,] 272 2.41 406 0.80 1.25 0 1 1 0 0 0

Error message using glmer function "Error in pwrssUpdate"

I'm trying to create linear mixed model to explain the presence / absence of a species according to 30 fixed environmental variables and 2 random variables ("Location" and "Season"). My data looks like this:
str(glmm_data)
'data.frame': 209 obs. of 40 variables:
$ CODE : Factor w/ 209 levels "VAL1_1","VAL1_2",..: 1 72 142 170 176 183 190 197 203 8 ...
$ Location : Factor w/ 32 levels "ALMENARA","ARES 1",..: 10 11 12 15 17 2 3 4 21 18 ...
$ Season : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PO4 : num -1.301 -1.301 -1.301 0.437 -1.301 ...
$ NO2 : num -1.129 -1.629 -0.781 -1.699 -1.654 ...
$ NO3 : num 1.044 0.115 1.918 1.457 1.467 ...
$ NH4 : num 0.0123 -0.014 -1.301 -0.2772 -1.301 ...
$ ChlA : num 0.341 0.117 0.87 -0.699 1.53 ...
$ Secchi : num 29 23 10 17 20 9 22 25 25 24 ...
$ Temp_w : num 5.4 3.2 10.3 10.5 4.7 7.2 8 9.2 4.6 6.9 ...
$ Conductivity : num 2.74 2.52 2.76 2.36 2.66 ...
$ Oxi_conc : num 11.6 9.2 7.04 9.99 7 ...
$ Hydroperiod : int 0 0 0 0 1 0 1 0 0 0 ...
$ Rain : int 1 1 1 1 1 1 1 1 1 1 ...
$ RainFre : int 0 0 0 0 0 0 0 0 0 0 ...
$ Veg_flo : num 0 0 0 0 0 0 0 0 0 0 ...
$ Veg_emg : num 0.735 0.524 0.226 0.685 0.226 ...
$ Depth_max : num 1.64 1.57 1.18 1.11 1.85 ...
$ Agricultural : num 0 0 0 0 0 ...
$ LowGrass : num 0 0.41 0.766 0 0.856 ...
$ Forest : num 1.097 1.161 0.44 1.05 0.502 ...
$ Buildings : num 0 0 0 0 0 ...
$ Heterogeneity : num 0.512 0.437 1.028 0.559 0.98 ...
$ Morphology : num 0.04519 -0.00115 0.01556 0.00771 0.12125 ...
$ Fish : int 0 0 0 0 0 0 0 0 0 0 ...
$ TempRange : num 1.4 1.4 1.4 1.4 1.4 ...
$ Tavg : num 1.03 1 1.03 1.03 1 ...
$ Precipitation : num 2.8 2.82 2.8 2.81 2.8 ...
$ MatOrg : num 0.264 0.257 0.236 0.251 0.313 ...
$ CO3 : num 0.14 0.163 0.222 0.335 0.306 ...
$ PC1 : num -0.132 -0.186 -0.074 0.127 -0.175 ...
$ PC2 : num -0.0729 0.0568 -0.0428 -0.0688 -0.0464 ...
$ PC3 : num -0.00638 0.01857 0.02817 -0.00918 0.02056 ...
$ Alytes_obstetricans : int 0 0 0 0 0 0 1 0 0 0 ...
$ Bufo_spinosus : int 0 0 0 0 0 0 0 0 0 0 ...
$ Epidalea_calamita : int 0 0 0 0 0 0 0 0 0 0 ...
$ Pelobates_cultripes : int 0 0 0 0 0 0 0 0 0 0 ...
$ Pelodytes_hespericus: int 1 0 0 0 0 0 0 0 0 0 ...
$ Pelophylax_perezi : int 0 0 0 0 1 0 1 0 0 0 ...
$ Pleurodeles_waltl : int 0 0 0 0 0 0 0 0 0 0 ...
PS: if anyone knows a better way to show my data please explain, I'm a noob at this.
The last 7 columns are the response variables, namely presence (1) or absence (0) of said species so my response variables are binomial. I'm using the glmer function from the lme4 package.
I'm trying to create a model for each species. So the first one looks like this:
Aly_Obs_GLMM <- glmer(Alytes_obstetricans ~ PO4 + NO2 + NO3 + NH4 + ChlA +
Secchi + Temp_w + Conductivity + Oxi_conc + Hydroperiod + Rain + RainFre +
Veg_flo + Veg_emg + Depth_max + Agricultural + LowGrass + Forest + Buildings +
Heterogeneity + Morphology + Fish + TempRange + Tavg + Precipitation +
MatOrg + CO3 + PC1 + PC2 + PC3 + (1|Location) + (1|Season), family = binomial,
data = glmm_data
)
However when running the code, I get the followed error message:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GHrule(0L),
compDev = compDev, : Downdated VtV is not positive definite
and the model fails to create.
Any ideas on what I may be doing wrong? Thanks

How to transform numbers of a variable that are readed as factor to be readed as numbers? [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 4 years ago.
First of all, I'm new at R, I'm just learning.
I have a data frame and I want to make some plots and graphics with two variables, one of these variables is read as a factor but this variable is with real numbers. This variable is a percentage so I want to graphics this percentage related to some municipalities, how can I transform these numbers to numeric values?
I've tried this following code because in the guide I'm reading its say to convert factors to numeric with the function as.numeric() but the result is totally different numbers.
for example
#the data frame is valle.abu2
valle.abu2$Porcentaje.de.Excedencias
#then
as.numeric(valle.abu2$Porcentaje.de.Excedencias)
valle.abu2$Porcentaje.de.Excedencias
[1] 1.3 0.04 1.6 0 0 0 0.31 0.61 0 2.31 3.6 8.04 0 7.18 0 5.88 1.35 0
[19] 2.56 0 3.2 0 0 0 0 0 0.05 0.32 0 5.23 0 0 0 0 0 0
[37] 0 5.42 5.54 11.44 0 2.51 0 4.88 0 3.45 0 2.78 2.7 0 4.39 0 0 0
[55] 0 3.99 3.42 6.01 0 5.52 0 0.04 0 0.46 0.34 0 4.63 0 14.65 2.91 5.9 4.17
[73] 0 0 0 0 0 0 1.15 1.52 9.17 2.22 3.82 0 0 0 0 7.04 3.57 12.5
[91] 0 0 0 0.72 1.32 0 9.88 2.63 0 0 0 0 0 0 37.57
134 Levels: 0 0.03 0.04 0.05 0.06 0.07 0.09 0.1 0.11 0.14 0.15 0.23 0.27 0.29 0.31 0.32 0.33 0.34 0.42
as.numeric(valle.abu2$Porcentaje.de.Excedencias)
[1] 42 3 48 1 1 1 15 25 1 69 92 129 1 127 1 120 44 1 71 1 86 1 1 1 1 1 4
[28] 16 1 115 1 1 1 1 1 1 1 116 118 59 1 70 1 108 1 90 1 75 73 1 103 1 1 1
[55] 1 97 89 122 1 117 1 3 1 21 18 1 104 1 64 77 121 101 1 1 1 1 1 1 39 47 131
[82] 67 96 1 1 1 1 126 91 60 1 1 1 28 43 1 134 72 1 1 1 1 1 1 98
Try:
as_numeric_factor <- function(x){
as.numeric(levels(x))[x]
}
as_numeric_factor(valle.abu2$Porcentaje.de.Excedencias)
Explanation.
The help page ?factor section Warning includes two different ways of doing what the question asks for, and states that one of them is more efficient.
To transform a factor f to approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more efficient than
as.numeric(as.character(f)).
Here is a simple speed test. set.seed is not needed since the result of interest are the timings, not computations.
library(microbenchmark)
library(ggplot2)
as_numeric_factor2 <- function(x){
as.numeric(as.character(x))
}
f <- factor(rnorm(1e4))
mb <- microbenchmark(
levl = as_numeric_factor(f),
char = as_numeric_factor2(f)
)
autoplot(mb)

R Error Invalid type (list) for variable

I am import matlab file and construct a data frame, matlab file contains two columns with and each row maintain a cell that has a matrix, I construct a dataframe to run random forest. But I am getting following error.
Error in model.frame.default(formula = expert_data_frame$t_labels ~ ., :
invalid type (list) for variable 'expert_data_frame$t_labels'
Here is the code how I import the matlab file and construct the dataframe:
all_exp_traintest <- readMat(all_exp_filepath);
len = length(all_exp_traintest$exp.traintest)/2;
for (i in 1:len) {
expert_train_df <- data.frame(all_exp_traintest$exp.traintest[i]);
labels = data.frame(all_exp_traintest$exp.traintest[i+302]);
names(labels)[1] <- "t_labels";
expert_train_df$t_labels <- labels;
expert_data_frame <- data.frame(expert_train_df);
rf_model = randomForest(expert_data_frame$t_labels ~., data=expert_data_frame, importance=TRUE, do.trace=100);
}
Structure of the Matlab input file
[56x12 double] [56x1 double]
[62x12 double] [62x1 double]
[62x12 double] [62x1 double]
[62x12 double] [62x1 double]
[62x12 double] [62x1 double]
[74x12 double] [74x1 double]
> str(all_exp_traintest)
List of 1
$ exp.traintest:List of 604
..$ NA: num [1:56, 1:12] 0 0 0 0 8 1 1 0 0 0 ...
..$ NA: num [1:62, 1:12] 2 10 11 13 5 10 13 8 11 8 ...
..$ NA: num [1:62, 1:12] 0 0 1 0 0 0 0 0 1 1 ...
..$ NA: num [1:62, 1:12] 4 2 1 3 3 20 6 3 2 2 ...
..$ NA: num [1:62, 1:12] 2731 2362 2937 1229 1898 ...
..$ NA: num [1:74, 1:12] 27 33 34 38 33 35 36 35 47 46 ...
..$ NA: num [1:74, 1:12] 106 79 99 94 153 104 146 105 125 146 ...
..$ NA: num [1:74, 1:12] 3 9 3 0 1 26 0 4 0 0 ...
..$ NA: num [1:51, 1:12] 5 7 3 30 0 0 0 0 0 0 ...
..$ NA: num [1:66, 1:12] 0 0 13 0 0 3 2 2 0 2 ...
..$ NA: num [1:73, 1:12] 1 0 1 0 0 0 2 1 2 5 ...
..$ NA: num [1:73, 1:12] 23 14 20 14 24 22 32 61 84 278 ...
..$ NA: num [1:75, 1:12] 1 7 0 1 2 3 3 0 16 10 ...
..$ NA: num [1:90, 1:12] 10 7 8 15 25 12 37 31 18 48 ...
..$ NA: num [1:90, 1:12] 0 6 3 1 5 7 8 6 1 1 ...
..$ NA: num [1:90, 1:12] 0 1 1 2 0 4 9 6 3 4 ...
..$ NA: num [1:90, 1:12] 6 0 5 27 11 50 22 8 10 4 ...
..$ NA: num [1:90, 1:12] 3 9 13 12 4 0 5 0 5 0 ...
..$ NA: num [1:90, 1:12] 1 0 1 0 1 2 1 0 1 2 ...
..$ NA: num [1:90, 1:12] 3395 3400 3360 3770 3533 ...
..$ NA: num [1:84, 1:12] 0 0 0 0 5 0 0 5 4 2 ...
..$ NA: num [1:80, 1:12] 2 3 3 3 4 28 61 26 8 1 ...
..$ NA: num [1:81, 1:12] 4 28 22 9 16 43 80 21 19 18 ...
..$ NA: num [1:76, 1:12] 1 0 0 1 49 64 60 230 222 267 ...
..$ NA: num [1:76, 1:12] 4786 4491 2510 1144 2071 ...
..$ NA: num [1:76, 1:12] 80 128 254 109 114 267 152 139 368 363 ...
..$ NA: num [1:76, 1:12] 1 5 8 2 14 5 3 13 8 2 ...
..$ NA: num [1:76, 1:12] 10 3 8 79 4 4 11 30 2 0 ...
..$ NA: num [1:68, 1:12] 0 0 2 0 0 2 6 0 0 4 ...
..$ NA: num [1:68, 1:12] 1 4 5 2 2 3 3 1 3 0 ...
..$ NA: num [1:68, 1:12] 0 0 1 0 0 0 0 0 0 1 ...
..$ NA: num [1:69, 1:12] 39 45 2 0 1 4 3 0 13 0 ...
..$ NA: num [1:69, 1:12] 0 4 6 0 0 4 1 6 10 1 ...
..$ NA: num [1:69, 1:12] 0 2 5 2 2 2 0 0 3 6 ...
..$ NA: num [1:69, 1:12] 3 0 1 1 1 4 7 5 5 1 ...
..$ NA: num [1:66, 1:12] 5 0 0 0 0 0 0 1 3 5 ...
..$ NA: num [1:66, 1:12] 4 3 3 0 0 4 0 0 0 0 ...
..$ NA: num [1:65, 1:12] 0 0 1 0 0 0 5 8 4 1 ...
..$ NA: num [1:65, 1:12] 0 5 6 0 2 0 0 1 1 2 ...
..$ NA: num [1:69, 1:12] 0 16 5 1 14 0 1 0 0 16 ...
..$ NA: num [1:69, 1:12] 0 0 0 0 0 25 2 3 0 0 ...
..$ NA: num [1:64, 1:12] 2 0 0 0 0 0 0 0 0 0 ...
..$ NA: num [1:42, 1:12] 0 0 0 0 0 0 0 0 0 0 ...
..$ NA: num [1:67, 1:12] 0 2 4 10 15 4 1 43 1 7 ...
..$ NA: num [1:63, 1:12] 32 6 12 5 92 8 29 7 21 20 ...
..$ NA: num [1:63, 1:12] 2 5 12 8 10 13 6 11 10 14 ...
..$ NA: num [1:63, 1:12] 3 5 10 9 0 1 8 13 2 14 ...
..$ NA: num [1:54, 1:12] 0 0 14 0 0 0 0 0 0 1 ...
..$ NA: num [1:82, 1:12] 152 99 63 57 105 44 28 33 43 49 ...
..$ NA: num [1:81, 1:12] 0 1 0 0 0 0 0 0 0 0 ...
..$ NA: num [1:75, 1:12] 0 1 3 0 0 0 0 0 0 0 ...
..$ NA: num [1:75, 1:12] 1 0 0 2 0 1 0 0 0 0 ...
..$ NA: num [1:75, 1:12] 1 6 5 5 3 8 1 3 1 0 ...
..$ NA: num [1:72, 1:12] 0 0 0 0 1 0 1 2 0 0 ...
..$ NA: num [1:62, 1:12] 310 91 4 4 9 0 0 1 0 0 ...
..$ NA: num [1:62, 1:12] 239 374 1060 599 805 808 139 150 490 326 ...
..$ NA: num [1:49, 1:12] 9 18 10 12 19 5 13 10 2 3 ...
..$ NA: num [1:61, 1:12] 2 0 0 0 1 0 0 0 0 0 ...
..$ NA: num [1:61, 1:12] 4 10 16 15 8 14 10 23 11 5 ...
..$ NA: num [1:61, 1:12] 0 1 4 4 5 3 0 1 1 1 ...
..$ NA: num [1:65, 1:12] 165 100 177 65 148 58 188 55 59 62 ...
..$ NA: num [1:65, 1:12] 13 0 0 2 2 3 0 0 0 0 ...
..$ NA: num [1:66, 1:12] 157 58 101 92 15 21 73 80 78 75 ...
..$ NA: num [1:66, 1:12] 8 6 1 0 6 2 2 6 10 9 ...
..$ NA: num [1:87, 1:12] 1 2 5 6 8 3 3 3 2 3 ...
..$ NA: num [1:83, 1:12] 0 0 0 0 0 0 2 13 0 0 ...
..$ NA: num [1:81, 1:12] 0 0 1 0 3 5 3 0 2 7 ...
..$ NA: num [1:81, 1:12] 33 81 94 30 5 36 16 90 121 182 ...
..$ NA: num [1:81, 1:12] 10 11 16 6 0 0 0 1 0 0 ...
..$ NA: num [1:81, 1:12] 7 0 0 2 1 3 1 4 0 0 ...
..$ NA: num [1:81, 1:12] 1 0 5 0 2 3 1 0 1 1 ...
..$ NA: num [1:95, 1:12] 30 160 116 130 444 515 225 135 108 175 ...
..$ NA: num [1:95, 1:12] 12 1 0 10 3 3 0 4 0 0 ...
..$ NA: num [1:95, 1:12] 1 0 0 0 3 3 1 0 0 0 ...
..$ NA: num [1:95, 1:12] 11 42 61 23 41 56 81 6 83 82 ...
..$ NA: num [1:95, 1:12] 1 2 5 3 6 4 2 8 28 1 ...
..$ NA: num [1:95, 1:12] 283 192 377 216 207 261 394 262 262 554 ...
..$ NA: num [1:94, 1:12] 0 0 0 0 0 0 0 0 0 0 ...
..$ NA: num [1:72, 1:12] 0 0 0 0 0 0 0 0 0 0 ...
..$ NA: num [1:72, 1:12] 5 3 0 2 13 27 6 2 12 36 ...
..$ NA: num [1:72, 1:12] 0 2 2 0 1 0 1 4 2 2 ...
..$ NA: num [1:72, 1:12] 0 0 1 0 3 1 0 4 1 0 ...
..$ NA: num [1:67, 1:12] 27 7 18 1 2 0 0 0 0 0 ...
..$ NA: num [1:67, 1:12] 10 2 1 10 7 0 0 1 1 4 ...
..$ NA: num [1:67, 1:12] 14 17 9 20 13 20 18 13 10 7 ...
..$ NA: num [1:64, 1:12] 0 0 0 0 4 0 0 0 3 0 ...
..$ NA: num [1:64, 1:12] 3 0 1 0 2 7 13 14 4 2 ...
..$ NA: num [1:64, 1:12] 0 0 0 0 0 0 0 0 2 0 ...
..$ NA: num [1:72, 1:12] 59 61 55 120 49 202 325 244 377 551 ...
..$ NA: num [1:72, 1:12] 0 0 0 0 0 0 0 0 1 0 ...
..$ NA: num [1:72, 1:12] 0 3 1 0 1 0 0 0 4 0 ...
..$ NA: num [1:72, 1:12] 5 12 6 9 15 10 15 27 15 9 ...
..$ NA: num [1:72, 1:12] 7 0 3 0 0 1 1 1 1 0 ...
..$ NA: num [1:72, 1:12] 0 0 0 0 89 0 19 3 3 2 ...
..$ NA: num [1:61, 1:12] 5 3 5 3 3 29 46 140 49 24 ...
..$ NA: num [1:63, 1:12] 23 0 0 0 0 60 7 73 13 19 ...
..$ NA: num [1:95, 1:12] 7 96 28 2 9 5 8 190 166 1 ...
..$ NA: num [1:95, 1:12] 0 0 1 1 0 0 0 0 0 0 ...
..$ NA: num [1:95, 1:12] 4 0 2 6 6 11 6 5 6 9 ...
.. [list output truncated]
- attr(*, "header")=List of 3
..$ description: chr "MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Sun Dec 9 17:35:24 2012 "
..$ version : chr "5"
..$ endian : chr "little"
After loading the matlab file into R
all_exp_traintest$exp.traintest[1]
$<NA>
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 0 0.0 0.00 0.000 0.5000 0.03125 0.015625 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[2,] 0 0.0 0.00 1.000 0.0625 0.03125 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[3,] 0 0.0 2.00 0.125 0.0625 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[4,] 0 4.0 0.25 0.125 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0009765625
[5,] 8 0.5 0.25 0.000 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0019531250 0.0000000000
[6,] 1 0.5 0.00 0.000 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.003906250 0.0000000000 0.0004882812
[7,] 1 0.0 0.00 0.000 0.0000 0.00000 0.000000 0.0000000 0.00781250 0.000000000 0.0009765625 0.0009765625
[8,] 0 0.0 0.00 0.000 0.0000 0.00000 0.000000 0.0156250 0.00000000 0.001953125 0.0019531250 0.0000000000
[9,] 0 0.0 0.00 0.000 0.0000 0.00000 0.031250 0.0000000 0.00390625 0.003906250 0.0000000000 0.0004882812
[10,] 0 0.0 0.00 0.000 0.0000 0.06250 0.000000 0.0078125 0.00781250 0.000000000 0.0009765625 0.0000000000
[11,] 0 0.0 0.00 0.000 0.1250 0.00000 0.015625 0.0156250 0.00000000 0.001953125 0.0000000000 0.0000000000
[12,] 0 0.0 0.00 0.250 0.0000 0.03125 0.031250 0.0000000 0.00390625 0.000000000 0.0000000000 0.0004882812
[13,] 0 0.0 0.50 0.000 0.0625 0.06250 0.000000 0.0078125 0.00000000 0.000000000 0.0009765625 0.0000000000
[14,] 0 1.0 0.00 0.125 0.1250 0.00000 0.015625 0.0000000 0.00000000 0.001953125 0.0000000000 0.0024414062
[15,] 2 0.0 0.25 0.250 0.0000 0.03125 0.000000 0.0000000 0.00390625 0.000000000 0.0048828125 0.0014648438
[16,] 0 0.5 0.50 0.000 0.0625 0.00000 0.000000 0.0078125 0.00000000 0.009765625 0.0029296875 0.0039062500
[17,] 1 1.0 0.00 0.125 0.0000 0.00000 0.015625 0.0000000 0.01953125 0.005859375 0.0078125000 0.0151367188
[18,] 2 0.0 0.25 0.000 0.0000 0.03125 0.000000 0.0390625 0.01171875 0.015625000 0.0302734375 0.0019531250
[19,] 0 0.5 0.00 0.000 0.0625 0.00000 0.078125 0.0234375 0.03125000 0.060546875 0.0039062500 0.0029296875
[20,] 1 0.0 0.00 0.125 0.0000 0.15625 0.046875 0.0625000 0.12109375 0.007812500 0.0058593750 0.0253906250
[21,] 0 0.0 0.25 0.000 0.3125 0.09375 0.125000 0.2421875 0.01562500 0.011718750 0.0507812500 0.0253906250
[22,] 0 0.5 0.00 0.625 0.1875 0.25000 0.484375 0.0312500 0.02343750 0.101562500 0.0507812500 0.0063476562
[23,] 1 0.0 1.25 0.375 0.5000 0.96875 0.062500 0.0468750 0.20312500 0.101562500 0.0126953125 0.0009765625
[24,] 0 2.5 0.75 1.000 1.9375 0.12500 0.093750 0.4062500 0.20312500 0.025390625 0.0019531250 0.0000000000
[25,] 5 1.5 2.00 3.875 0.2500 0.18750 0.812500 0.4062500 0.05078125 0.003906250 0.0000000000 0.0019531250
[26,] 3 4.0 7.75 0.500 0.3750 1.62500 0.812500 0.1015625 0.00781250 0.000000000 0.0039062500 0.0029296875
[27,] 8 15.5 1.00 0.750 3.2500 1.62500 0.203125 0.0156250 0.00000000 0.007812500 0.0058593750 0.0009765625
[28,] 31 2.0 1.50 6.500 3.2500 0.40625 0.031250 0.0000000 0.01562500 0.011718750 0.0019531250 0.0000000000
[29,] 4 3.0 13.00 6.500 0.8125 0.06250 0.000000 0.0312500 0.02343750 0.003906250 0.0000000000 0.0083007812
[30,] 6 26.0 13.00 1.625 0.1250 0.00000 0.062500 0.0468750 0.00781250 0.000000000 0.0166015625 0.0000000000
[31,] 52 26.0 3.25 0.250 0.0000 0.12500 0.093750 0.0156250 0.00000000 0.033203125 0.0000000000 0.0048828125
[32,] 52 6.5 0.50 0.000 0.2500 0.18750 0.031250 0.0000000 0.06640625 0.000000000 0.0097656250 0.0034179688
[33,] 13 1.0 0.00 0.500 0.3750 0.06250 0.000000 0.1328125 0.00000000 0.019531250 0.0068359375 0.0229492188
[34,] 2 0.0 1.00 0.750 0.1250 0.00000 0.265625 0.0000000 0.03906250 0.013671875 0.0458984375 0.0297851562
[35,] 0 2.0 1.50 0.250 0.0000 0.53125 0.000000 0.0781250 0.02734375 0.091796875 0.0595703125 0.0771484375
[36,] 4 3.0 0.50 0.000 1.0625 0.00000 0.156250 0.0546875 0.18359375 0.119140625 0.1542968750 0.0004882812
[37,] 6 1.0 0.00 2.125 0.0000 0.31250 0.109375 0.3671875 0.23828125 0.308593750 0.0009765625 0.0000000000
[38,] 2 0.0 4.25 0.000 0.6250 0.21875 0.734375 0.4765625 0.61718750 0.001953125 0.0000000000 0.0048828125
[39,] 0 8.5 0.00 1.250 0.4375 1.46875 0.953125 1.2343750 0.00390625 0.000000000 0.0097656250 0.0000000000
[40,] 17 0.0 2.50 0.875 2.9375 1.90625 2.468750 0.0078125 0.00000000 0.019531250 0.0000000000 0.0000000000
[41,] 0 5.0 1.75 5.875 3.8125 4.93750 0.015625 0.0000000 0.03906250 0.000000000 0.0000000000 0.0000000000
[42,] 10 3.5 11.75 7.625 9.8750 0.03125 0.000000 0.0781250 0.00000000 0.000000000 0.0000000000 0.0004882812
[43,] 7 23.5 15.25 19.750 0.0625 0.00000 0.156250 0.0000000 0.00000000 0.000000000 0.0009765625 0.0078125000
[44,] 47 30.5 39.50 0.125 0.0000 0.31250 0.000000 0.0000000 0.00000000 0.001953125 0.0156250000 0.0000000000
[45,] 61 79.0 0.25 0.000 0.6250 0.00000 0.000000 0.0000000 0.00390625 0.031250000 0.0000000000 0.0000000000
[46,] 158 0.5 0.00 1.250 0.0000 0.00000 0.000000 0.0078125 0.06250000 0.000000000 0.0000000000 0.0004882812
[47,] 1 0.0 2.50 0.000 0.0000 0.00000 0.015625 0.1250000 0.00000000 0.000000000 0.0009765625 0.0000000000
[48,] 0 5.0 0.00 0.000 0.0000 0.03125 0.250000 0.0000000 0.00000000 0.001953125 0.0000000000 0.0000000000
[49,] 10 0.0 0.00 0.000 0.0625 0.50000 0.000000 0.0000000 0.00390625 0.000000000 0.0000000000 0.0000000000
[50,] 0 0.0 0.00 0.125 1.0000 0.00000 0.000000 0.0078125 0.00000000 0.000000000 0.0000000000 0.0000000000
[51,] 0 0.0 0.25 2.000 0.0000 0.00000 0.015625 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[52,] 0 0.5 4.00 0.000 0.0000 0.03125 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[53,] 1 8.0 0.00 0.000 0.0625 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[54,] 16 0.0 0.00 0.125 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[55,] 0 0.0 0.25 0.000 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
[56,] 0 0.5 0.00 0.000 0.0000 0.00000 0.000000 0.0000000 0.00000000 0.000000000 0.0000000000 0.0000000000
Ok, I will explain for you the difference between [ and [[ that is causing you problems. I'll leave it to you to use this information to make the appropriate changes.
Consider the following list:
l <- list(a = matrix(1:25,5,5),b = 1:5,c = letters[1:5],d = NA)
> l
$a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
$b
[1] 1 2 3 4 5
$c
[1] "a" "b" "c" "d" "e"
$d
[1] NA
Say we want to select the first element of this list, i.e. the matrix. You are doing something like this:
> l[1]
$a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
This is wrong. [ will always return a sub-list of the original list. So what you are seeing here with l[1] is actually a list of length one. It's one element is the matrix that we are actually after.
What you want instead is:
> l[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
If you compare the output of str(l[1]) with str(l[[1]]) the difference should be obvious, and also make clear why the first piece of information that was requested from you involved the output of str. It is an invaluable debugging tool, to ensure that the object is what you actually think it is.
Finally, as I mentioned in one of my comments, please never, ever do things like:
expert_data_frame$t_labels ~.
Rather, just do:
t_labels ~.
The whole point of the formula interface is that you don't have to write the name of the data frame. The function will look within the data frame you provide for the variables that you name in the formula. If you use $ to explicitly select variables, you introduce a nasty source of bugs, whereby you will potentially force R to use a variable you didn't intend.

Resources