Feature selection with "fscaret" in R issue with empty list() - r

I have a data frame that has a lot of factor and numeric variables and I'm trying to use fscaret package for feature selection.
Sample of the data:
age clerical construc educ earns74 gdhlth inlf leis1 rlxal
32 0 0 12 0 0 1 3529 3479
40 1 0 14 9500 1 1 3929 3329
20 1 1 10 329 0 0 5300 2309
22 0 0 6 602 1 0 5205 4290
I try to run the following code:
fsMod <- c("gbm", "treebag", "ridge", "lasso", "Boruta", "glm")
myFS<-fscaret(train.sleepDF, test.sleepDF, myTimeLimit = 40, preprocessData=TRUE, Used.funcRegPred = 'fsMod', with.labels=TRUE,
supress.output=FALSE, no.cores=2)
And then finding the myFS$VarImp I get list()
I've read the annotation to the package and they also report on that issue, not giving a clear solution to it. There appears to be a troublesome method in the calculations, however, how do I identificate it?
Is there a way to solve the problem?
Any help is greatly appreciated.

Related

afex in R: Error: Empty cells in within-subjects design (i.e., bad data structure). But the data structure is fine

I'm attempting to run an ANCOVA with 1 between-subjects variable and 2 within-subjects variables and I'm running into an error that makes no sense to me. My data looks like this:
Scan
ID
Region
ALFF
Age
Resp
1
20
AID
0.826
Adol
77.25
2
20
AID
1.116
Adol
73.18
1
22
AID
0.362
Adult
78.70
2
22
AID
0.849
Adult
72.58
1
20
MDM
0.826
Adol
79.25
2
20
MDM
1.116
Adol
71.18
1
22
MDM
0.778
Adult
79.70
2
22
MDM
0.291
Adult
73.58
My ANCOVA code is:
Full_Anova_ALFF<- AlFF_Resp %>% group_by(Region) %>% do(fit=aov_car(ALFF ~ AgeScan+Error(ID/ScanResp), data = .))
and I get this error when I run it:
Converting to factor: Age
Error: Empty cells in within-subjects design (i.e., bad data structure).
table(data[c("Scan", "Resp")])
Resp
Scan
X77.25
X73.1777777766667
X63.1944444433333
X70.3333333333333
X78.7
X72.5833333333333
X1
1
0
0
0
1
0
X2
0
1
0
0
0
1
X3
0
0
1
0
0
1
X4
0
0
0
1
0
0
Resp
Scan
X72.4833333333333
X78.25
X65.1833333333333
X71.9166666666667
X57.333333335
X65.55
X1
0
0
1
0
0
1
X2
0
0
0
1
0
0
X3
1
0
0
0
0
0
X4
0
1
0
Scan is factor variable and resp is numeric and I have no idea why this error is occurring. There're no empty cells! And this weird table that was outputted as a part of the error message is also quite strange. It appears to be treating respiration as a factor maybe? But I've definitely told it that respiration is numeric. When I take respiration out of the model, it runs completely fine. However, unfortunately, I do need to include respiration.
Anybody have any idea what's going wrong? Or even just a workaround I can use to get this done?
Thanks in advance for your help!!
For those interested, I figured it out! I ended up using a linear mixed-model instead of what I have above because my covariate was at the scan-level rather than a between-subjects variable.
My command ended up being:
Full_Anova_ALFF<- AlFF_Resp %>% group_by(Region) %>% do(fit=mixed(ALFF~Resp+Age*Scan+(1|ID)+(1|Scan),data=.))
Hope this helps someone in the future!

Creating Tables and Bar Plots in R as a Beginner

I've begun using R recently so this might be simple to solve. I actually have two problems but I believe they`re connected.
I have a simple dataset (.csv file with 3 columns and 7 rows) and I'm trying to create a table out of it and plot a bar graph with the values of the two numerical columns.
Grupo de idade;Freq. Relativa Homens;Freq. Relativa Mulheres
16 a 19;0,411;0,415
20 a 24;0,787;0,701
25 a 34;0,922;0,745
35 a 44;0,923;0,755
45 a 54;0,882;0,760
55 a 64;0,696;0,583
65 ou mais;0,205;0,126
df = read.csv(filename, header = TRUE, sep = ";")
tab = table(df)
sd = cbind(df$Freq.Homens, df$Freq.Mulheres)
barplot(sd, beside = TRUE)
So first my table ends up looking like this, with the values as headers:
Freq..Relativa.Homens
Grupo.de.idade 0,205 0,411 0,696 0,787 0,882 0,922 0,923
16 a 19 0 0 0 0 0 0 0
20 a 24 0 0 0 0 0 0 0
25 a 34 0 0 0 0 0 0 0
35 a 44 0 0 0 0 0 0 0
45 a 54 0 0 0 0 1 0 0
55 a 64 0 0 0 0 0 0 0
65 ou mais 0 0 0 0 0 0 0
And my graph is plotted with integers values like 2, 4, and 6. I noticed that happened because of the cbind function, but without it, I can`t plot anything.
First: R thinks anglo-american (; , i.e. the decimal mark is a ".".
The decimal mark in your data is a ",". You have to tell this to R, by adding the argument `dec = ","``, i.e.
df = read.csv(filename, header = TRUE, sep = ";". dec = ",")
Otherwise R interprets the numbers as characters or strings
table makes a contigency table of two variables. This however makes only sense for categorical variables, e.g. number of observations by age and sex.
You have only one categorical variable (Grupo.de.idade) and two continuous variables
R does the best to make sense of this, and simply interprets the values of the continuous variables as categories, which however makes no sense, e.g there is 1 observation in your data set with "Grupo de idade" = "16 a 19" and a value of "0,411" for "Freq. Relativa Homens". That's what table is telling you.
Moreover your data is already in table format so if you want to have a look at your data simply type df to the console
df
#> Grupo.de.idade Freq..Relativa.Homens Freq..Relativa.Mulheres
#> 1 16 a 19 0.411 0.415
#> 2 20 a 24 0.787 0.701
#> 3 25 a 34 0.922 0.745
#> 4 35 a 44 0.923 0.755
#> 5 45 a 54 0.882 0.760
#> 6 55 a 64 0.696 0.583
#> 7 65 ou mais 0.205 0.126
The easiest way to meke a simple barplot is like this:
barplot(Freq..Relativa.Homens ~ Grupo.de.idade, data = df)
On the left of the "~" put the variable to plot, on the right the grouping variable. Furthermore you have to tell R the name of the dataset.
However, instead of a trial-and-error-approach to R I recommend to work through the introductory chapters of one of the free tutorials or textbooks one can find on the internet, like The Pirate's guide to R
Created on 2020-03-27 by the reprex package (v0.3.0)

Print varible names in table() with 2 binary variables in R

I'm sure I'll kick myself for not being able to figure this out, but when you have a table with 2 variables (i.e. cross-tab) and both are binary or otherwise have the same levels, how can you make R show which variable is displayed row-wise and which is column-wise?
For example:
> table(tc$tr, tc$fall_term)
0 1
0 1569 538
1 0 408
is a little confusing because it's not immediately obvious which is which. Of course, I checked out ?table but I don't see an option to do this, at least not a logical switch that doesn't require me to already know which is which.
I tried ftable but had the same problem.
The output I want would be something like this:
> table(tc$tr, tc$fall_term)
tr tr
0 1
fallterm 0 1569 538
fallterm 1 0 408
or
> table(tc$tr, tc$fall_term)
fallterm fallterm
0 1
tr 0 1569 538
tr 1 0 408
You can use the dnn option :
table(df$tr,df$fall_term) # impossible to tell the difference
0 1
0 18 33
1 15 34
table(df$tr,df$fall_term,dnn=c('tr','fall_term')) # you have the names
fall_term
tr 0 1
0 18 33
1 15 34
Note that it's easier (and safer) to do table(df$tr,df$fall_term,dnn=colnames(df))
Check out dimnames, and in particular their names. I’m using another example here since I don’t have your data:
x = HairEyeColor[, , Sex = 'Male']
names(dimnames(x))
# [1] "Hair" "Eye"
names(dimnames(x)) = c('Something', 'Else')
x
# Else
# Something Brown Blue Hazel Green
# Black 32 11 10 3
# Brown 53 50 25 15
# Red 10 10 7 7
# Blond 3 30 5 8

expected number in from data in data.frame in R

I want to turn this equation into an R code: ((e^-mean)(mean^i)/i!)XN; where i = index and N is sample size.
What I have is this:
x["expected92"]<-((exp(-me92))(me92^(x$multX1992))/(x$multX1992));
I want to create a new column that goes through the index and makes the expected mean.
example data:
Drag 1992 multX1992
0 113 0
1 30 30
3 15 30
example of wanted output:
Drag 1992 multX1992 expected92
0 113 0 90.03
1 30 30 58.80
3 15 30 19.20
Can someone help fix my code?

How to add dummy variables in R

I know there are several questions about this topic, but none of them seem to answer my specific question.
I have a dataset with five independent variables and I want to add two dummy variables to my regression in R. I have my data in Excel and importing the dataset is not a problem (I use read.csv2). Now, when I want to see my dummy variables, D1 and D2, I can't. I can see all the other variables. The two dummy variables both vary from 0 and 1 through the dataset.
I can easily see a summary of all my data, including D1 and D2 (with median, mean, etc.), and I can call each of the 5 variables separately without any problems at all, but I can't do that with D1 and D2.
> str(tilskuere) 'data.frame': 180 obs. of 7 variables:
$ ATT : int 3166 4315 7123 6575 7895 7323 3579 9571 5345 6595 ...
$ PRICE : int 80 95 120 100 105 115 80 130 105 100 ...
$ viewers: int 41000 43000 56000 66000 157000 91000 51000 30000 36000 72000 ...
$ CB1 : int 10 10 5 2 7 2 3 1 10 1 ...
$ CB2 : num 1 1 1 0 0.33 ...
$ D1 : int 0 0 0 1 0 0 0 0 0 0 ...
$ D2 : int 1 0 0 0 0 1 1 0 0 0 ...
> summary(tilskuere)
> mean(ATT) [1] 6856.372
> mean(D1) Fejl i
mean(D1) : object 'D1' not found
To sum up: I can run regressions in R without D1 and D2, but I can't include these as dummy variables as R can't find these variables, when I run them. R simply says "object D1 not found."
I hope someone can help. Thank you in advance.
Kind regards
Mikkel
I added the material in your comment to the text , added some linefeeds, and it is now clear that you don't understand that columns are not first class objects in R. Try:
mean(tilskuere$D1)
You can see what objects are in your workspace with:
ls()
You appear to have an object named ATT in your workspace as well as a length-180 column by the same name in the object named tilskuere.

Resources