I would like to perform a stratified fisher test.
I've tried with tabulate without success.
These are my data:
> db$site
[1] 2 2 2 3 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 2 2 3 3 1 1 3 1 2 1 1 2 1 1 1 1 1 3 1 1 1 1 1 3 1
[50] 2 1 1 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 1 3 3 1
[99] 3 1 1 1 1 1 1 1 3 3 1 1 1 1 1 3 2 1 1 1 1 1 3 3 1 3 3 3 3 1 3 1 3 3 1 3 1 1 3 3 3 2 3 3 3 3 1 3 3
[148] 3 2 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3 1 3 3 1 3 1 1 1 1 1 1 1 3 2 1 3 2 2 2 3 2 3 2 2 2 2 2 2 2 2 3 2
[197] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 3 1 3 2 1 2 3 1 3 3 1 1 1 1 3 3 3 1 3 2 2 1 3 1 2 3 1
[246] 1 1 1 1 1 2 3 2 1 2 3 3 3 1 1 2 3 2 3 3 3 2 2 1 3 2 3 1 1 3 3 2 1 1 1 1 1 2 1 1 2 2 1 2 3 3 1 1 1
[295] 3 2 2 3 1 1 2 2 2 3 3 2 1 3 1 2 1 3 1 1 3 1 1 3 2 2 2 2 2 1 3 1 1 2 3 3 3 1 3 1 3 2 3 1 1 1 3 3 3
[344] 3 1 2 2 2 3 1 3 1 1 3 1 3 2 1 3 2 2 2 2 2 2 2
Levels: 1 2 3
> db$phq_cat
[1] 1 2 2 3 1 2 2 2 1 1 1 2 1 1 1 2 2 1 2 1 3 2 1 1 1 5 1 2 3 2 3 1 2 4 2 1 1 2 2 1 1 1 1 2 1 2 2 2 2
[50] 2 1 3 1 2 3 2 2 2 3 2 1 1 3 2 2 2 2 2 3 1 1 2 3 2 2 5 3 1 3 1 2 3 2 2 3 3 1 3 1 1 2 2 1 2 2 1 2 4
[99] 1 1 2 2 2 2 2 1 3 2 2 1 1 3 2 1 2 2 2 1 3 2 2 3 2 1 1 1 2 2 2 2 1 3 1 3 2 2 1 2 2 2 2 1 1 3 2 2 1
[148] 3 2 4 1 2 1 2 3 1 3 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 1 1 3 3 2 1 1 3 2 2 3 3 2 4 2 2 2 3 2 1 4 1 2
[197] 3 1 2 2 2 2 3 2 3 3 3 3 5 1 3 2 3 1 3 3 3 2 2 1 2 2 1 3 4 4 2 1 2 2 3 4 3 3 2 1 2 4 1 1 2 1 3 2 3
[246] 3 1 1 2 2 3 3 1 1 4 3 2 1 1 2 2 2 2 1 2 2 2 3 1 1 2 4 1 1 2 3 3 2 1 1 1 2 5 1 2 3 2 2 3 1 1 3 3 1
[295] 3 5 2 1 1 1 2 2 1 1 1 4 4 2 1 1 2 1 2 3 1 2 1 2 4 1 1 1 3 1 4 1 1 1 1 4 3 1 2 1 3 3 3 1 2 1 3 1 1
[344] 2 1 1 2 2 2 1 4 2 2 1 4 1 1 4 2 1 4 2 3 2 1 1
Levels: 1 2 3 4 5
> db$area
[1] 3 1 1 0 3 2 5 3 1 3 3 3 1 5 4 0 3 5 5
[20] 1 3 2 3 1 3 3 0 0 3 3 0 1 3 1 3 3 3 3
[39] 3 3 3 0 1 2 2 2 2 0 2 1 1 1 0 0 0 3 3
[58] 3 3 4 3 3 3 3 3 1 3 3 0 3 3 5 3 2 3 5
[77] 5 3 3 3 3 5 3 2 5 3 3 3 2 0 0 0 0 0 5
[96] 0 0 3 0 3 5 3 3 3 1 3 0 0 3 3 2 1 3 0
[115] 3 2 5 2 5 1 0 0 5 0 0 0 0 1 0 1 0 0 3
[134] 0 3 3 0 0 0 3 0 0 0 0 3 0 0 0 3 0 0 2
[153] 0 3 0 0 0 0 0 0 3 0 0 0 3 0 0 3 0 5 3
[172] 5 3 3 3 3 0 2 3 0 3 2 3 0 3 0 3 2 5 3
[191] 2 3 5 5 0 3 5 3 2 3 3 3 3 2 2 3 1 3 3
[210] 3 3 3 5 3 3 3 3 3 0 3 0 3 2 1 0 3 0 0
[229] 5 3 3 1 0 0 0 3 0 5 1 3 0 3 3 0 1 5 5
[248] 3 1 3 5 0 2 3 2 0 0 0 3 3 5 0 3 0 0 0
[267] 3 3 3 0 3 0 3 4 0 0 3 3 3 3 5 5 3 3 1
[286] 3 1 2 3 0 0 5 3 1 0 5 3 0 3 3 3 2 3 0
[305] 0 <NA> 3 0 3 5 5 0 5 3 0 3 1 0 5 3 3 3 2
[324] <NA> 0 3 5 1 0 0 0 5 0 5 0 5 0 3 3 3 0 0
[343] 0 0 2 3 3 3 0 5 0 3 3 0 3 0 5 1 0 3 3
[362] 2 3 3 3 3
Levels: 0 1 2 3 4 5
library(survival)
library(Exact)
library(plyr)
b<-tabulate(db$site, db$phq_cat, db$area, tests=c("fisher"))
I obtain this error message:
Error in tabulate(db$site, db$phq_cat, db$area, tests = c("fisher")) :
unused arguments (db$AREEDISCIPL, tests = c("fisher"))
How cain I handle this?
I also would like to perform stratified wilcoxon rank sum test.
Is there a way?
Thank you!
Related
This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 2 years ago.
I have a dataset of adolescents over 3 waves. I need to reshape the data from wide to long, but I haven't been able to figure out how to use pivot_longer (I've checked other questions, but maybe I missed one?). Below is sample data:
HAVE DATA:
id c1sports c2sports c3sports c1smoker c2smoker c3smoker c1drinker c2drinker c3drinker
1 1 1 1 1 1 4 1 5 2
2 1 1 1 5 1 3 4 1 4
3 1 0 0 1 1 5 2 3 2
4 0 0 0 1 3 3 4 2 3
5 0 0 0 2 1 2 1 5 3
6 0 0 0 4 1 4 4 3 1
7 1 0 1 2 2 3 1 4 1
8 0 1 1 4 4 1 4 5 4
9 1 1 1 3 2 2 3 4 2
10 0 1 0 2 5 5 4 2 3
WANT DATA:
id wave sports smoker drinker
1 1 1 1 1
1 2 1 1 5
1 3 1 4 2
2 1 1 5 4
2 2 1 1 1
2 3 1 3 4
3 1 1 1 2
3 2 0 1 3
3 3 0 5 2
4 1 0 1 4
4 2 0 3 2
4 3 0 3 3
5 1 0 2 1
5 2 0 1 5
5 3 0 2 3
6 1 0 4 4
6 2 0 1 3
6 3 0 4 1
7 1 1 2 1
7 2 0 2 4
7 3 1 3 1
8 1 0 4 4
8 2 1 4 5
8 3 1 1 4
9 1 1 3 3
9 2 1 2 4
9 3 1 2 2
10 1 0 2 4
10 2 1 2 2
10 3 0 5 3
So far the only think that I've been able to run is:
long_dat <- wide_dat %>%
pivot_longer(., cols = c1sports:c3drinker)
But this doesn't get me separate columns for sports, smoker, drinker.
You could use names_pattern argument in pivot_longer.
tidyr::pivot_longer(df,
cols = -id,
names_to = c('wave', '.value'),
names_pattern = 'c(\\d+)(.*)')
# id wave sports smoker drinker
# <int> <chr> <int> <int> <int>
# 1 1 1 1 1 1
# 2 1 2 1 1 5
# 3 1 3 1 4 2
# 4 2 1 1 5 4
# 5 2 2 1 1 1
# 6 2 3 1 3 4
# 7 3 1 1 1 2
# 8 3 2 0 1 3
# 9 3 3 0 5 2
#10 4 1 0 1 4
# … with 20 more rows
I am using accuracy.meas function of ROSE package in R. I got the error Response must have two levels. So checked
both the parameter response and predicted1. But both are numeric. Is there some limitations to usability of accuracy.meas function.
Note- The answer is wrong but it has nothing to do with error
accuracy.meas(test$Walc,predicted1,threshold = 0.5)
Error in accuracy.meas(response=test$Walc,predicted= predicted1, threshold = 0.5) :
Response must have two levels.
>test$Walc
[1] 1 1 1 3 3 3 1 1 2 2 1 2 1 1 3 3 1 1 1 1 3 1 1 4 2 1 1 1 1 4 4 4 5 1 1 1 1 3 1 2 3
[42] 1 5 1 4 4 1 2 2 2 1 2 2 3 2 3 1 2 1 5 1 1 3 2 2 1 1 1 1 1 1 1 2 1 1 3 3 3 2 3 1 2
[83] 2 2 1 1 3 1 1 1 2 3 3 1 1 3 1 2 1 5 2 2 1 2 1 1 2 2 1 1 3 1 2 1 1 1 3 1 1 1 1 1 1
[124] 3 3 3 4 1 1 1 1 4 1 1 1 1 3 2 1 3 3 1 1 1 1 1 1 1 1 5 1 1 1 3 1 1 1 3 4 1 3 2 4 5
[165] 2 1 1 2 1 1 2 3 1 4 1 2 1 4 4 5 1 1 5 3 5 4 5 2 4 2 2 4 1 5 5 4 2 2 1 4 4 4 2 3 4
[206] 2 3 4 4 5 2 3 4 5 5 3 2 4 4 1 5 5 5 3 2 2 4 1 5 5 2 1 1 1 2 3 3 2 1 1 3 4 1 1 1 4
[247] 1 3 1 2 2 3 3 2 2 2 2 1 2 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 3 1 1 4 3 5 2 2 4 3 4 2 3
[288] 5 5 3 1 1 3 4 4 4 3 4 5 3 3 3 3 3 4 4 3 1 3 3 4 3
> predicted1
[1] 2 2 1 2 2 2 1 1 1 2 2 2 1 1 4 4 1 1 1 1 3 2 2 3 2 2 1 2 2 2 2 2 5 3 3 2 2 2 1 1 2
[42] 1 3 2 3 3 2 2 2 2 2 2 2 3 1 3 2 1 2 4 2 3 2 3 3 1 2 2 2 1 1 2 2 1 1 2 2 3 1 2 2 2
[83] 2 2 1 1 3 2 2 1 1 3 3 1 2 2 2 3 1 3 3 3 1 2 1 2 1 2 3 1 3 2 2 2 2 2 2 2 2 2 2 1 2
[124] 4 1 4 4 2 1 1 2 1 1 2 1 1 2 2 2 3 3 1 1 1 1 2 1 1 1 4 2 1 1 2 2 1 2 2 3 1 2 2 3 4
[165] 2 2 2 3 2 1 2 2 2 4 1 2 2 4 4 5 1 1 5 2 5 4 4 2 4 3 2 2 1 4 4 2 2 2 1 4 2 3 2 3 4
[206] 3 2 4 4 5 2 2 4 4 5 4 3 3 3 2 4 4 4 3 1 2 2 2 4 4 1 1 2 2 2 3 3 1 2 1 2 2 1 1 3 2
[247] 2 2 1 4 2 2 4 2 2 2 2 2 2 2 1 1 3 2 1 2 2 2 2 1 1 2 2 2 4 4 2 3 3 5 2 2 3 3 3 3 3
[288] 3 5 4 2 2 4 4 5 4 3 4 5 3 4 4 3 3 3 3 3 2 4 4 2 3
I am using the pvclust package in R to get hierarchical clustering dendrograms with p-values.
I want to use the "Ward" clustering and the "Euclidean" distance method. Both work fine with my data when using hclust. In pvclust however I keep getting the error message "invalid clustering method". The problem apparently results from the "ward" method, because other methods such as "average" work fine, as does "euclidean" on its own.
This is my syntax and the resulting error message:
result <- pvclust(t(data2007num), method.hclust="ward", method.dist="euclidean", nboot=100)
Bootstrap (r = 0.5)...
Error in hclust(distance, method = method.hclust) : invalid clustering method
My data matrix has the following form (28 countries x 20 policy dimensions):
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
AUT 2 3 4 2 1 1 4 3 2 2 2 3 3 4 4 2.0 5 4 0 3
GER 3 5 3 2 1 3 2 4 4 5 4 0 4 5 4 3.0 5 5 3 2
SWE 5 5 1 5 4 3 1 4 4 5 3 4 5 2 4 3.0 3 3 5 0
NLD 4 4 2 3 2 1 0 4 4 0 4 4 4 2 2 4.0 4 4 2 5
ESP 3 4 1 4 5 0 3 2 4 1 4 3 3 1 2 3.0 2 2 0 2
ITA 3 2 0 3 1 1 3 3 5 5 4 2 4 1 1 2.0 0 2 0 2
FRA 3 2 1 3 1 2 4 2 5 2 3 2 3 3 5 4.0 1 2 0 3
DNK 5 2 1 3 4 4 2 4 3 0 4 4 2 3 5 2.0 5 4 5 3
GRE 3 3 2 5 2 1 3 2 2 2 3 2 3 0 2 3.0 0 1 0 2
CHE 5 4 3 3 4 3 2 3 4 1 4 4 2 1 1 3.0 5 4 0 3
BEL 3 2 3 1 4 2 4 2 2 2 3 3 3 1 5 2.0 2 3 2 0
CZE 2 4 3 3 2 2 1 2 5 2 3 1 4 1 2 3.0 1 4 0 2
POL 3 3 4 4 0 1 3 3 2 2 4 2 2 0 3 4.0 2 2 0 3
IRL 3 1 2 1 4 3 2 1 5 4 3 2 2 1 3 2.0 0 1 1 2
LUX 2 1 2 5 3 2 2 5 4 2 2 4 3 2 4 3.0 2 3 0 1
HUN 1 3 2 3 2 1 4 3 5 4 2 3 4 3 3 2.0 3 2 4 2
PRT 3 2 3 5 4 1 4 1 5 5 3 2 2 1 2 2.0 1 1 1 1
AUS 4 1 2 1 2 3 1 1 1 5 4 5 3 1 2 3.0 1 3 5 1
CAN 1 1 1 1 4 1 0 1 1 5 1 1 3 3 2 2.0 1 2 5 4
FIN 5 4 4 3 2 3 2 3 3 3 2 2 4 3 3 3.0 4 4 5 2
GBR 3 1 2 1 2 3 1 1 2 5 4 4 4 3 1 2.0 1 3 5 5
JPN 4 1 0 1 2 2 0 2 5 4 3 1 1 3 3 2.0 2 4 5 3
KOR 3 3 0 1 2 1 0 0 1 4 0 1 1 2 3 2.0 1 2 1 3
MEX 0 3 4 0 3 2 5 2 3 5 2 2 0 0 0 0.0 0 1 0 3
NZL 5 1 2 1 2 3 1 1 5 2 3 5 2 2 2 0.5 0 0 3 3
NOR 5 3 2 4 2 4 2 5 4 2 4 5 4 2 4 4.0 5 4 5 0
SVK 1 4 3 2 4 2 1 2 5 2 3 2 4 2 2 3.0 0 2 0 3
USA 3 0 1 3 2 4 0 3 0 1 0 0 3 4 1 2.0 1 1 5 4
I tried to used "ward" with the dataset provided by the pvclust package (lung) as well as other data provided in R (such as Boston in the MASS package, without any success. Does anyone now a solution or if the "ward" method was disabled inpvclust?
Suppose I have this matrix
8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3
3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1
1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1
1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2
2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2
2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1
1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1
1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1
1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1
1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1
1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1
1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2
2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2
2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1
1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1
1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3
3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8
I want to check
Off-diagonals are symmetric or not?(in above matrix, these are symmetric)
Elements occur in Off-diagonal (without repetition)?-- in above matrix, these elements are 1,2,3
Elements in diagonal are symmetric? if yes print element? (like 8 in above matrix)
# 1
all(mat == t(mat))
[1] TRUE
# 2
unique(mat[upper.tri(mat) | lower.tri(mat)])
[1] 3 1 2
# 3
if(length(unique(diag(mat))) == 1) print(diag(mat)[1])
[1] 8
mat <- as.matrix(read.table('abbas.txt'))
isSymmetric(unname(mat))
'Note that a matrix is only symmetric if its 'rownames' and 'colnames' are identical.'
unique(mat[lower.tri(mat)])
all(diag(mat) == rev(diag(mat)))
# I assume you mean the diagonal is symmetric when its reverse is the same with itself.
I have following type of data:
mydata <- data.frame (yvar = rnorm(200, 15, 5), xv1 = rep(1:5, each = 40),
xv2 = rep(1:10, 20))
table(mydata$xv1, mydata$xv2)
1 2 3 4 5 6 7 8 9 10
1 4 4 4 4 4 4 4 4 4 4
2 4 4 4 4 4 4 4 4 4 4
3 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4
5 4 4 4 4 4 4 4 4 4 4
I want tabulate again with yvar categories. The following is cutkey.
cutkey :
< 10 - group 1
10-12 - group 2
12-16 - group 3
>16 - group 4
Thus we will have similar to above type of table to each cutkey elements. I want to have margin sums everytime.
< 10 - group 1
1 2 3 4 5 6 7 8 9 10
1 4 4 4 4 4 4 4 4 4 4
2 4 4 4 4 4 4 4 4 4 4
3 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4
5 4 4 4 4 4 4 4 4 4 4
10-12 - group 2
1 2 3 4 5 6 7 8 9 10
1 4 4 4 4 4 4 4 4 4 4
2 4 4 4 4 4 4 4 4 4 4
3 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4
5 4 4 4 4 4 4 4 4 4 4
and so on for all groups
(the numbers will be definately different)
Is there easyway to do it ?
Yes, using cut, dlply (plyr package) and addmargins:
mydata$yvar1 <- cut(mydata$yvar,breaks = c(-Inf,10,12,16,Inf))
> dlply(mydata,.(yvar1),function(x) addmargins(table(x$xv1,x$xv2)))
$`(-Inf,10]`
1 2 3 4 5 6 7 8 9 10 Sum
1 0 0 0 0 0 0 2 0 1 0 3
2 1 1 0 1 0 0 0 0 2 0 5
3 0 1 0 0 1 1 0 2 0 0 5
4 0 0 2 0 1 1 0 1 0 0 5
5 0 1 1 0 1 1 1 0 0 2 7
Sum 1 3 3 1 3 3 3 3 3 2 25
$`(10,12]`
1 2 3 4 6 7 8 9 10 Sum
1 0 0 0 1 2 0 0 0 0 3
2 0 0 1 0 0 1 0 0 1 3
3 0 1 0 1 1 2 0 0 1 6
4 0 1 0 0 0 0 0 0 0 1
5 1 0 1 1 1 0 1 1 2 8
Sum 1 2 2 3 4 3 1 1 4 21
$`(12,16]`
1 2 3 4 5 6 7 8 9 10 Sum
1 2 3 1 1 1 2 0 3 0 2 15
2 0 1 0 1 3 3 2 0 0 1 11
3 3 1 3 1 0 0 0 2 4 1 15
4 3 2 1 2 2 0 1 1 4 1 17
5 3 1 1 2 0 1 1 1 1 0 11
Sum 11 8 6 7 6 6 4 7 9 5 69
$`(16, Inf]`
1 2 3 4 5 6 7 8 9 10 Sum
1 2 1 3 2 3 0 2 1 3 2 19
2 3 2 3 2 1 1 1 4 2 2 21
3 1 1 1 2 3 2 2 0 0 2 14
4 1 1 1 2 1 3 3 2 0 3 17
5 0 2 1 1 3 1 2 2 2 0 14
Sum 7 7 9 9 11 7 10 9 7 9 85
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
yvar1
1 (-Inf,10]
2 (10,12]
3 (12,16]
4 (16, Inf]
You can adjust the breaks argument to cut to get the values just how you want them. (Although the margin sums you display in your question don't look like margin sums at all.)