How *not* to remove entire case from analysis, using aov_car - r

I'm running an ANOVA with:
within: Session (Pre vs. Post)
within: Condition (A, B, C)
between: Group (Female, Male)
Three participants are missing all of 'C' (pre and post). I don't want to completely exclude them from my analyses because I think their 'A' and 'B' data is still interesting. I have tried including na.rm=TRUE to my script, and to no avail. Is there any way that I can run my aov_car (mixed-design ANOVA) without completely remove all the data from these three participants?
I keep getting the following error: Contrasts set to contr.sum for the following variables: Group. Warning message: Missing values for following ID(s): P20, R21, R22. Removing those cases from the analysis.
Sample data (note, it's fudged/randomized data here):
my_data <- readr::read_csv("PID,Session,Condition,Group,data
P1,Pre,A,Female,0.935147485
P2,Pre,A,Female,0.290449952
P3,Pre,A,Female,0.652213856
P4,Pre,A,Female,0.349222763
P5,Pre,A,Female,0.235789135
P6,Pre,A,Female,0.268469251
P7,Pre,A,Female,0.419284033
P8,Pre,A,Female,0.797236877
P9,Pre,A,Female,0.784526027
P10,Pre,A,Female,0.44837527
P11,Pre,A,Female,0.359525572
P12,Pre,A,Male,0.923775343
P13,Pre,A,Male,0.431557872
P14,Pre,A,Male,0.425703913
P15,Pre,A,Male,0.39916012
P16,Pre,A,Male,0.168378348
P17,Pre,A,Male,0.260462544
P18,Pre,A,Male,0.945835896
P19,Pre,A,Male,0.495932288
P20,Pre,A,Male,0.045565042
P21,Pre,A,Male,0.748259161
P22,Pre,A,Male,0.426588091
P1,Pre,B,Female,0.761677517
P2,Pre,B,Female,0.985953719
P3,Pre,B,Female,0.657063156
P4,Pre,B,Female,0.166859072
P5,Pre,B,Female,0.850201269
P6,Pre,B,Female,0.227918183
P7,Pre,B,Female,0.701946655
P8,Pre,B,Female,0.079116861
P9,Pre,B,Female,0.094935181
P10,Pre,B,Female,0.376525478
P11,Pre,B,Female,0.725431114
P12,Pre,B,Male,0.922099723
P13,Pre,B,Male,0.664993697
P14,Pre,B,Male,0.450501356
P15,Pre,B,Male,0.201276143
P16,Pre,B,Male,0.735428897
P17,Pre,B,Male,0.304752274
P18,Pre,B,Male,0.393020637
P19,Pre,B,Male,0.452345203
P20,Pre,B,Male,0.697709526
P21,Pre,B,Male,0.130459291
P22,Pre,B,Male,0.210211859
P1,Pre,C,Female,0.280820754
P2,Pre,C,Female,0.206499238
P3,Pre,C,Female,0.127540559
P4,Pre,C,Female,0.001998028
P5,Pre,C,Female,0.554408227
P6,Pre,C,Female,0.235435708
P7,Pre,C,Female,0.341077362
P8,Pre,C,Female,0.101103042
P9,Pre,C,Female,0.834297025
P10,Pre,C,Female,0.256605011
P11,Pre,C,Female,0.65647746
P12,Pre,C,Male,0.110716441
P13,Pre,C,Male,0.075856866
P14,Pre,C,Male,0.518357132
P15,Pre,C,Male,0.222078883
P16,Pre,C,Male,0.414747048
P17,Pre,C,Male,0.525522894
P18,Pre,C,Male,0.758019496
P19,Pre,C,Male,0.213927508
P20,Pre,C,Male,
P21,Pre,C,Male,
P22,Pre,C,Male,
P1,Post,A,Female,0.435204978
P2,Post,A,Female,0.681378597
P3,Post,A,Female,0.928158111
P4,Post,A,Female,0.525061816
P5,Post,A,Female,0.46271948
P6,Post,A,Female,0.649810342
P7,Post,A,Female,0.748819476
P8,Post,A,Female,0.207494638
P9,Post,A,Female,0.060148769
P10,Post,A,Female,0.074998663
P11,Post,A,Female,0.177396477
P12,Post,A,Male,0.61446322
P13,Post,A,Male,0.367348586
P14,Post,A,Male,0.853124208
P15,Post,A,Male,0.268734518
P16,Post,A,Male,0.784226481
P17,Post,A,Male,0.892830959
P18,Post,A,Male,0.950081146
P19,Post,A,Male,0.731274982
P20,Post,A,Male,0.901554267
P21,Post,A,Male,0.170960222
P22,Post,A,Male,0.2337913
P1,Post,B,Female,0.940130538
P2,Post,B,Female,0.575209304
P3,Post,B,Female,0.84527559
P4,Post,B,Female,0.160605498
P5,Post,B,Female,0.547844182
P6,Post,B,Female,0.287795345
P7,Post,B,Female,0.010274473
P8,Post,B,Female,0.408166731
P9,Post,B,Female,0.562733542
P10,Post,B,Female,0.44217795
P11,Post,B,Female,0.390071799
P12,Post,B,Male,0.767768344
P13,Post,B,Male,0.548800315
P14,Post,B,Male,0.489825627
P15,Post,B,Male,0.783939035
P16,Post,B,Male,0.772595033
P17,Post,B,Male,0.252895712
P18,Post,B,Male,0.383513642
P19,Post,B,Male,0.709882712
P20,Post,B,Male,0.517304459
P21,Post,B,Male,0.77186642
P22,Post,B,Male,0.395415627
P1,Post,C,Female,0.649783292
P2,Post,C,Female,0.490853459
P3,Post,C,Female,0.467705056
P4,Post,C,Female,0.988740552
P5,Post,C,Female,0.413980642
P6,Post,C,Female,0.83941706
P7,Post,C,Female,0.111722237
P8,Post,C,Female,0.501984852
P9,Post,C,Female,0.15634255
P10,Post,C,Female,0.547770503
P11,Post,C,Female,0.576203944
P12,Post,C,Male,0.857518274
P13,Post,C,Male,0.176794297
P14,Post,C,Male,0.127501287
P15,Post,C,Male,0.831191664
P16,Post,C,Male,0.257022941
P17,Post,C,Male,0.295366754
P18,Post,C,Male,0.113785049
P19,Post,C,Male,0.621389037
P20,Post,C,Male,
P21,Post,C,Male,
P22,Post,C,Male,")
Current Code :
library(tidyverse)
library(car)
library(afex)
library(emmeans)
my_anova <-aov_car(data ~ Group*Session*Condition
+ Error(PID/Session*Condition), na.rm = TRUE,
data=my_data)
I've also tried:
my_anova2 <- aov_ez("PID", "data",
my_data,
within = c("Session", "Condition"),
between = "Group", na.rm=TRUE)

Related

Create a function with a column name as an argument (with grouping)

I want to write a function that takes a data frame, grouping a variable(a column) and a variable (also a column).
From reading multiple stakcflow attempts at cracking this, mostly recommended using arguments as strings.
My initial code to check for normality using the Shapiro-Wilk test for multiple data frames and variables was unsuccessful.
check_normality <- function(d, x_grouping_variable, y_cont_var){
d %>%
group_by([[x_grouping_variable]]) %>%
summarise(`W Statistic` = shapiro.test([[y_cont_var]])$statistic,
`p-value` = shapiro.test([[y_cont_var]])$p.value)
return(shapiro.test([[y_cont_var]])$p.value)
}
ERROR:
Error: unexpected '[[' in " return(shapiro.test([["
My attempt to fix it using this code was also unsuccessful.
check_normality <- function(d, x_grouping_variable, y_cont_var){
d %>%
group_by(((!! sym(x_grouping_variable)))) %>%
summarise(`W Statistic` = shapiro.test((!! sym(y_cont_var)))$statistic,
`p-value` = shapiro.test((!! sym(y_cont_var)))$p.value)
return(shapiro.test((!! sym(y_cont_var)))$p.value)
}
check_normality(df, "RHF", "duratoin_days")
the error :
Error in !sym(y_cont_var) : invalid argument type
3.stopifnot(is.numeric(x))
2.shapiro.test((!!sym(y_cont_var)))
1.check_normality(df, "RHF", "duratoin_days")

Problems formating data (biomod2)

I keep running into an error while trying to run the BIOMOD_FormatingData()-function.
I have checked through all arguments and removed any NA-values, the explanatory variables are the same for both the testing and training datasets (independent datasets), and I've generated pseudo-absence data for the evaluation dataset (included in eval.resp.var).
Has anyone run into this error before? and if so, what was the issue related to? This is my first time using Biomod2 for ensemble modelling and I've run out of ideas as to what could be causing this error!
Here is my script and the subsequent error:
library(biomod2)
geranium_data <-
BIOMOD_FormatingData(
resp.var = SG.occ.train['Geranium.lucidum'],
resp.xy = SG.occ.train[, c('Longitude', 'Latitude')],
expl.var = SG.variables,
resp.name = "geranium_data",
eval.resp.var = SG.test.data['Geranium.lucidum'],
eval.expl.var = SG.variables,
eval.resp.xy = SG.test.data[, c('Longitude', 'Latitude')],
PA.nb.rep = 10,
PA.nb.absences = 4650,
PA.strategy = 'random',
na.rm = TRUE
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= geranium_data Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Response variable name was converted into geranium.data
> Pseudo Absences Selection checkings...
> random pseudo absences selection
> Pseudo absences are selected in explanatory variablesError in `names<-`(`*tmp*`, value = c("calibration", "validation")) : incorrect number of layer names

How to extract columns from a row and save the output as a variable dplyr

I am trying to extract a specific column from a specific row on my excel sheet (df). However, when I try to do so I get the message:
Error: ... must evaluate to column positions or names, not a list
Call `rlang::last_error()` to see a backtrace.
When I call rlang::last_error() I get:
Backtrace:
1. dplyr::select(., FGA, FTA, TOV, MP, TmFga, TmFta, TmTov, TmMin)
9. tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
10. tidyselect:::bad_calls(bad, "must evaluate to { singular(.vars) } positions or names, \\\n not { first_type }")
11. tidyselect:::glubort(fmt_calls(calls), ..., .envir = .envir)
12. dplyr::select(., FGA, FTA, TOV, MP, TmFga, TmFta, TmTov, TmMin)
At this point, I am lost. What can I do to my code to work?
library(readxl)
Lakers_Overall_Stats <- read_excel("Desktop/Lakers Overall Stats.xlsx")
library(readxl)
Lakers_Record <- read_excel("Desktop/Lakers Record.xlsx")
require(dplyr)
require(ggplot2)
##WinPercentage of the Team after season
mydata <- Lakers_Record %>% select(Pts,Opp,W,L)%>%
+ mutate(wpct=Pts^13.91/(Pts^13.91+Opp^13.91),expwin=round(wpct*(W+L)),diff=W-expwin)
head(mydata)
##Specifiying
Lakers_Overall_Stats[23,6] <- TmMin
Lakers_Overall_Stats[23,8] <- TmFga
Lakers_Overall_Stats[23,18] <- TmFta
Lakers_Overall_Stats[23,26] <- TmTov
rlang::last_error()
##Usage Percentage
Usgpct <- Lakers_Overall_Stats %>% select(FGA,FTA,TOV,MP,TmFga,TmFta,TmTov,TmMin)%>%
+ mutate(100*(Fga+0.44*Fta+Tov))*TmMin/(TmFga+0.44*TmFta+TmTov)*5(MP)
##head(Usgpct)
##filter(rank(desc(Usgpct))==1)
Also, am I filtering correctly? or should it be written as
Usgpct <- Lakers_Overall_Stats %>% select(FGA,FTA,TOV,MP,TmFga,TmFta,TmTov,TmMin)%>%
filter(rank(desc(Usgpct))==1)%>%
mutate(100*(Fga+0.44*Fta+Tov))*TmMin/(TmFga+0.44*TmFta+TmTov)*5(MP)
head(Usgpct)
You have
Lakers_Overall_Stats[23,6] <- TmMin
This will modify the Lakers_Overall_Stats data frame by setting the element at 23,6 etc. to be TmMin. TmMin is an object outside of your data frame.
Maybe you want:
TmMin <- Lakers_Overall_Stats[23,6]
?
Also, you cannot select TmFga,TmFta,TmTov,TmMin since these variables are not part of your data frame. You can refer to those variables in your mutate equation, but because of the way you've set it up, they're stand-alone variables.

Getting error when applying Smbinning in R

I am working on an example from http://r-statistics.co/Logistic-Regression-With-R.html. I have problem with smbinning code. I am trying to get Information Value via using smbinning.
library(smbinning)
# segregate continuous and factor variables
factor_vars <- c ("WORKCLASS", "EDUCATION", "MARITALSTATUS", "OCCUPATION", "RELATIONSHIP", "RACE", "SEX", "NATIVECOUNTRY")
continuous_vars <- c("AGE", "FNLWGT","EDUCATIONNUM", "HOURSPERWEEK", "CAPITALGAIN", "CAPITALLOSS")
iv_df <- data.frame(VARS=c(factor_vars, continuous_vars), IV=numeric(14)) # init for IV results
# compute IV for categoricals
for(factor_var in factor_vars){
smb <- smbinning.factor(trainingData, y="ABOVE50K", x=factor_var) # WOE table
if(class(smb) != "character"){ # heck if some error occured
iv_df[iv_df$VARS == factor_var, "IV"] <- smb$iv
}
}
This is the code given. I cannot understand the reason behind checking class of the smbinning. My general understanding on smbinning is also not that good.
for(vars in factor_vars){
smb <- smbinning.factor(trainingData, y = "ABOVE50K", x = vars )
iv_df[iv_df$VARS == vars, "IV"] <- smb["iv"]
}
When I run this code I am getting some values NA values. So class checking is apparently needed but why?
Thank you very much.
Following the example to the letter, your problem would be the following:
If you do smb <- smbinning.factor(trainingData, y="ABOVE50K", x="EDUCATION") and then smb, you get
1 "Too many categories"
str(trainingData) shows that:
$ EDUCATION : Factor w/ 16 levels...
While the smbinning documentation says that
maxcat - Specifies the maximum number of categories. Default value is 10. Name of x
must not have a dot.
Therefore your solution is to use: smb <- smbinning.factor(trainingData, y="ABOVE50K", x=factor_var, maxcat=16) in the for loop

InR , multiple age pyramids using pyramidlattice {Giza}

I am trying to compare more that one age pyramids in only one frame. I was looking for some example using traditional packages like ggplot but, I did not finded it.
Do you have some option?
The unique able to compare is Package ‘Giza’. This package has some difficulty for find others examples... the only I can find is the following
data(EduDat)
data(dictionary)
# select the desired year, country, and education-scenario from EduDat
Years <- c(2010,2030,2050)
Countries <- c("Pakistan","Bangladesh","Indonesia")
Scenarios <- c("GET")
# the male-column needs to be flipped
iEduDat <- subset(EduDat,match(cc,getcode(Countries,dictionary)) & match(yr,Years) & match(scen2,Scenarios))
iEduDat$value[iEduDat$sex == "Male"] <- (-1) * iEduDat$value[iEduDat$sex == "Male"]
agegrs <- paste(seq(15,100,5),seq(19,104,5),sep="-")
agegrs[length(agegrs)] <- "100+"
lattice.options(axis.padding = list(numeric=0))
x <- pyramidlattice(agegr ~ value| factor(sex,levels=c("Male","Female")) *
factor(cc,levels=getcode(Countries,dictionary),labels=Countries) *
factor(yr,levels=Years,labels=Years),
groups=variable,data=iEduDat,layout=c(length(Countries)*2,length(Years)),
type="l",lwd=1,xlab="Population",ylab="Age",main="Population by Highest Level of Education",
strip=TRUE,par.settings = simpleTheme(lwd=3,col=colors()[c(35,76,613,28)]),box.width=1,
scales=list(alternating=3,tick.number=5,relation="same",y=list(at=1:length(4:21),labels=agegrs)),
auto.key=list(text=c("No-edu","Primary","Secondary","Tertiary"),reverse.row=TRUE,
points=FALSE,rectangles=TRUE,space="right",columns=1,border=FALSE,
title="ED-Level",cex.title=1.1,lines.title=2.5,padding.text=1,background="white"),
prepanel=prepanel.default.bwplot2,panel=function(...){
panel.grid(h=length(agegrs),v=5,col="lightgrey",lty=3)
panel.pyramid(...)
})
x # with strips for every factor over each panel
# useOuterStrips(x) # with outer strips, but only in case of two factors
useOuterStrips2(x) # with outer strips in case of three factors
Now, I would like to do some modications... for example, I would like to change the colors between the years panels. The most important modification that I want is axis x limits. I am trying to do something like this(scale parameter)
scales= list(x = list( relation = "free" , limits = list( c(-85000,85000) , c(-260000,260000) , c(-260000,260000)) ) , y = list(relation="same", at=1:length(agegrs)) ).
But this results in a error:
Error in abs(x$x.limits) : non-numeric argument to mathematical function
In addition: Warning message:
In valid.charjust(just) : reached elapsed time limit

Resources