Mann Whitney U test in R - r

I want to perform a Mann Whitney U test on a small data set of non-parametric data in R, can anyone help me find the right code?
I have been trying to use the following code, to do the test straight from my data set, but I keep getting the error message:
Error: Column.Heading.1 not found.
wilcox.test(Column.Heading.1, Column.Heading.2, data=DATA)
I'm new to stats so not sure if I'm missing something here.
Thanks in advance!

You might be assuming that the wilcox.test() will be able to find Column.Heading.1 and Column.Heading.2 inside DATA.
Unfortunately, this does not happen. The argument data only plays a role if you are giving a formula to the first element, i.e. Column.Heading.1~Column.Heading.2
If you want to use the x,y configuration that you are using you have to write the column in full, as if you trying to see the values on the console. For example.
wilcox.test(DATA$Column.Heading.1, DATA$Column.Heading.2)
Note that the formula and x,y configuration have different meanings. the formula assumes that Column.Heading.2 is a factor grouping the numbers (like "group1" or "group2"). the x, y configuration expects that both columns are filled with numbers.

If you want to specify the data argument to wilcox.test then you need to use the formula interface, which means you will need a dataset in long format with a single column for the response and other for the group.
Otherwise to compare two columns of the same data frame you will need to refer to their locations explicitly. That is if we make a data frame:
dat <- data.frame(x=c(1.1,2.2,3.3),y=c(0.9,2.1,2.8))
Then we can run a Wilcoxon test either using:
wilcox.test(dat$x, dat$y)
or
with(dat , wilcox.test(x,y))

Related

Estimation to plot person-item map not feasible because items "have no 0-responses" in data matrix

I am trying to create a person item map that organizes the questions from a dataset in order of difficulty. I am using the eRm package and the output should looks like follows:
[person-item map] (https://hansjoerg.me/post/2018-04-23-rasch-in-r-tutorial_files/figure-html/unnamed-chunk-3-1.png)
So one of the previous steps, before running the function that outputs the map, I have to fit the data set to have a matrix which is the object that the plotting functions uses to create the actual map, but I am having an error when creating that matrix
I have already tried to follow and review some documentation that might be useful if you want to have some extra-information:
[Tutorial] https://hansjoerg.me/2018/04/23/rasch-in-r-tutorial/#plots
[Ploting function] https://rdrr.io/rforge/eRm/man/plotPImap.html
[Documentation] https://eeecon.uibk.ac.at/psychoco/2010/slides/Hatzinger.pdf
Now, this is the code that I am using. First, I install and load the respective libraries and the data:
> library(eRm)
> library(ltm)
Loading required package: MASS
Loading required package: msm
Loading required package: polycor
> library(difR)
Then I fit the PCM and generate the object of class Rm and here is the error:
*the PCM function here is specific for polytomous data, if I use a different one the output says that I am not using a dichotomous dataset
> res <- PCM(my.data)
>Warning:
The following items have no 0-responses:
AUT_10_04 AUN_07_01 AUN_07_02 AUN_09_01 AUN_10_01 AUT_11_01 AUT_17_01
AUT_20_03 CRE_05_02 CRE_07_04 CRE_10_01 CRE_16_02 EFEC_03_07 EFEC_05
EFEC_09_02 EFEC_16_03 EVA_02_01 EVA_07_01 EVA_12_02 EVA_15_06 FLX_04_01
... [rest of items]
>Responses are shifted such that lowest
category is 0.
Warning:
The following items do not have responses on
each category:
EFEC_03_07 LC_07_03 LC_11_05
Estimation may not be feasible. Please check
data matrix
I must clarify that all the dataset has a range from 1 to 5. Is a Likert polytomous dataset
Finally, I try to use the plot function and it does not have any output, the system just keep loading ad-infinitum with no answer
>plotPImap(res, sorted=TRUE)
I would like to add the description of that particular function and the arguments:
>PCM(X, W, se = TRUE, sum0 = TRUE, etaStart)
#X
Input data matrix or data frame with item responses (starting from 0);
rows represent individuals, columns represent items. Missing values are
inserted as NA.
#W
Design matrix for the PCM. If omitted, the function will compute W
automatically.
#se
If TRUE, the standard errors are computed.
#sum0
If TRUE, the parameters are normed to sum-0 by specifying an appropriate
W.
If FALSE, the first parameter is restricted to 0.
#etaStart
A vector of starting values for the eta parameters can be specified. If
missing, the 0-vector is used.
I do not understand why is necessary to have a score beginning from 0, I think that that what the error is trying to say but I don't understand quite well that output.
I highly appreciate any hint that you can provide me
Feel free to ask for any information that could be useful to reach the solution to this issue
The problem is not caused by the fact that there are no items with 0-responses. The model automatically corrects this by centering the response scale categories on zero. (You'll notice that the PI-map that you linked to is centered on zero. Also, I believe the map you linked to is of dichotomous data. Polytomous data should include the scale categories on the PI-map, I believe.)
Without being able to see your data, it is impossible to know the exact cause though.
It may be that the model is not converging. That may be what this error was alluding to: Estimation may not be feasible. Please check data matrix. You could check by entering > res at the prompt. If the model was able to converge you should see something like:
Conditional log-likelihood: -2.23709
Number of iterations: 27
Number of parameters: 8
...
Does your data contain answers with decimal numbers? I found the same error, I solved it by using dplyr::dense_rank() function:
df_ranked <- sapply(df_decimal_data, dense_rank)
Worked.

Anova Test- Separate Groups with Two Factor Comparison

Good morning,
I am trying to run some ANOVA tests on my dataset (using R) and I keep getting errors. I am trying to compare the average percentage of correct responses, as a factor of what "group" the subjects were in and what session/day it was. However, I have two separate conditions that I need to analyze separately.
So essentially, I need to compare PctCorrect in Condition 1, between groups and sessions and then do the same thing for condition 2.
I attempted using this code:
aov(ext$Pct.Correct[ext$Condition=="NC-EXT"]~ext$Group*ext$Session, data=ext)
and I got the following error:
Error in model.frame.default(formula = ext$Pct.Correct[ext$Condition
== : variable lengths differ (found for 'ext$Group')
I ran this code to make sure that all of my values were even:
mytable <- table(ext$Session, ext$Group, ext$Condition)
ftable(mytable)
And they were all the same value (which was to be expected), so I am not sure what's wrong.
I am very new to R so it's entirely possible that I am doing this completely wrong. Any help would be greatly appreciated.
You are filtering the left side of the equation and not filtering the right side, thus the "variable length error".
You could try filter your dataframe in the data= option like this:
aov(Pct.Correct ~ Group* Session, data=ext[ext$Condition=="NC-EXT",])

Atomic vectors in R and applying function to them

So I have a data set that I'm using from UC Irvine's website("Wine Quality" dataset) and I want to take a look at a plot of the residuals of the data set. The reason I'm doing this is to look to see if there is a an increase in variance so I could run a log based regression. To look at the residuals I apply this code:
residuals(white.wine)
white.wine is how I named my dataframe. However I get this error thrown at me, "NULL". If I want to look at the residuals of a specific predictor variable like Fixed Acidity I get this error:
Error: $ operator is invalid for atomic vectors.
Any way around this? Thanks for any help!
#Hugh was right in saying that "residuals" must be used against a model, but I think your question was also asking about how to apply something over a data frame. In case you just want the variance of each predictor variable, you might want something like:
apply(white.wine, 2, var)
As the ?apply documentation says, you need to provide the data, the margin, and the function. The margin refers to applying over rows or columns, with 1 signaling to apply a function over the rows, and 2 indicating that the function should be applied over columns. I'm assuming you have predictor variables in columns, so I used a 2 in the code above.

Taylor diagram from existing Correlation and Standard Dev values

Is it possible to create a Taylor diagram from already calculated correlation and standard deviation values?
I am doing model evaluation, and I have already the correlation and standard deviations values.I understand that there is already a package plotrix where by giving the observation and the modeled values, the diagram is created. However for the type of work that I am doing, it is easier to start by giving already the correlation and standard deviation values.
Is there any way I can do this in R?
There's no reason it shouldn't be possible, but the authors didn't seem to allow for that when they wrote the function. The function is a bit long and complex, but the part that does the calculation is at the top. It is possible to swap out that code and replace it to allow for the passing of summary statistics. Now, keep in mind what i'm about to do is a hack and i've only tested it with versions 3.5-5 of plotrix. Other version may not work.
Here will will create a new function taylor.diagram2 that takes all the code from taylor.diagram but adds in an extra if statement to check for a list of summarized data as the first argument
taylor.diagram2<-taylor.diagram
bl<-as.list(body(taylor.diagram))
cond<-list(
as.name("if"),
quote(is.list(ref) & missing(model)), #condition
quote({R<-ref$R; sd.r<-ref$sd.r; sd.f<-ref$sd.f}), #if true
as.call(c(as.symbol("{"), bl[3:8]))) #else
bl<-c(bl[1:2], as.call(cond), bl[9:length(bl)]) #splice in new code
body(taylor.diagram2)<-as.call(bl) #update function
Now we can test the function. First, we'll do things the standard way
#test data
aref<-rnorm(30,sd=2)
amodel1<-aref+rnorm(30)/2
#standard behavior function
taylor.diagram2(aref,amodel1, main="Standard Behavior"))
#summarized data
xx<-list(
R=cor(aref, amodel1, use = "pairwise"),
sd.r=sd(aref),
sd.f=sd(amodel1)
)
#modified behavior
taylor.diagram2(xx, main="Modified Behavior")
So the new taylor.diagram2 function can do both. If you pass it two vectors, it will do the standard behavior. If you pass it a list with the names R, sd.r, and sd.f, then it will do the same plot but with the values you passed in. Also, the model parameter must be empty for the modified version to work. That means if you want to set any additional parameter, you must use named parameters rather than positional arguments.

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources