Atomic vectors in R and applying function to them - r

So I have a data set that I'm using from UC Irvine's website("Wine Quality" dataset) and I want to take a look at a plot of the residuals of the data set. The reason I'm doing this is to look to see if there is a an increase in variance so I could run a log based regression. To look at the residuals I apply this code:
residuals(white.wine)
white.wine is how I named my dataframe. However I get this error thrown at me, "NULL". If I want to look at the residuals of a specific predictor variable like Fixed Acidity I get this error:
Error: $ operator is invalid for atomic vectors.
Any way around this? Thanks for any help!

#Hugh was right in saying that "residuals" must be used against a model, but I think your question was also asking about how to apply something over a data frame. In case you just want the variance of each predictor variable, you might want something like:
apply(white.wine, 2, var)
As the ?apply documentation says, you need to provide the data, the margin, and the function. The margin refers to applying over rows or columns, with 1 signaling to apply a function over the rows, and 2 indicating that the function should be applied over columns. I'm assuming you have predictor variables in columns, so I used a 2 in the code above.

Related

Mann Whitney U test in R

I want to perform a Mann Whitney U test on a small data set of non-parametric data in R, can anyone help me find the right code?
I have been trying to use the following code, to do the test straight from my data set, but I keep getting the error message:
Error: Column.Heading.1 not found.
wilcox.test(Column.Heading.1, Column.Heading.2, data=DATA)
I'm new to stats so not sure if I'm missing something here.
Thanks in advance!
You might be assuming that the wilcox.test() will be able to find Column.Heading.1 and Column.Heading.2 inside DATA.
Unfortunately, this does not happen. The argument data only plays a role if you are giving a formula to the first element, i.e. Column.Heading.1~Column.Heading.2
If you want to use the x,y configuration that you are using you have to write the column in full, as if you trying to see the values on the console. For example.
wilcox.test(DATA$Column.Heading.1, DATA$Column.Heading.2)
Note that the formula and x,y configuration have different meanings. the formula assumes that Column.Heading.2 is a factor grouping the numbers (like "group1" or "group2"). the x, y configuration expects that both columns are filled with numbers.
If you want to specify the data argument to wilcox.test then you need to use the formula interface, which means you will need a dataset in long format with a single column for the response and other for the group.
Otherwise to compare two columns of the same data frame you will need to refer to their locations explicitly. That is if we make a data frame:
dat <- data.frame(x=c(1.1,2.2,3.3),y=c(0.9,2.1,2.8))
Then we can run a Wilcoxon test either using:
wilcox.test(dat$x, dat$y)
or
with(dat , wilcox.test(x,y))

Error-The variable must be numeric in cocor

I ran a simple function cocor for the difference in correlation, but I got the error message: one of the variables (temporality) must be numeric. So I checked the data type of the variable and it is double/numeric. I do not have the issue to calculate partial correlation or confidence interval using the same database.
cocor(~temporality+expectability|temporality+positive,data =data2)
is.numeric(data2$temporality) # True
Data2 is a database with 5 variables (gender and 4 numeric measures).
So what is the real reason behind the issue? Thank you
I had the same problem with "The variable 'x' must be numeric." for the cocor function. I found somewhere that cocor does not seem to work with tibbles, but when the data is converted to data.frame it works.
Your script would go like this:
cocor(~temporality+expectability|temporality+positive, data = as.data.frame(data2))
Finally, I used cocor.indep.groups and cocor.dep.groups.overlap to deal with numeric issues.

Estimation to plot person-item map not feasible because items "have no 0-responses" in data matrix

I am trying to create a person item map that organizes the questions from a dataset in order of difficulty. I am using the eRm package and the output should looks like follows:
[person-item map] (https://hansjoerg.me/post/2018-04-23-rasch-in-r-tutorial_files/figure-html/unnamed-chunk-3-1.png)
So one of the previous steps, before running the function that outputs the map, I have to fit the data set to have a matrix which is the object that the plotting functions uses to create the actual map, but I am having an error when creating that matrix
I have already tried to follow and review some documentation that might be useful if you want to have some extra-information:
[Tutorial] https://hansjoerg.me/2018/04/23/rasch-in-r-tutorial/#plots
[Ploting function] https://rdrr.io/rforge/eRm/man/plotPImap.html
[Documentation] https://eeecon.uibk.ac.at/psychoco/2010/slides/Hatzinger.pdf
Now, this is the code that I am using. First, I install and load the respective libraries and the data:
> library(eRm)
> library(ltm)
Loading required package: MASS
Loading required package: msm
Loading required package: polycor
> library(difR)
Then I fit the PCM and generate the object of class Rm and here is the error:
*the PCM function here is specific for polytomous data, if I use a different one the output says that I am not using a dichotomous dataset
> res <- PCM(my.data)
>Warning:
The following items have no 0-responses:
AUT_10_04 AUN_07_01 AUN_07_02 AUN_09_01 AUN_10_01 AUT_11_01 AUT_17_01
AUT_20_03 CRE_05_02 CRE_07_04 CRE_10_01 CRE_16_02 EFEC_03_07 EFEC_05
EFEC_09_02 EFEC_16_03 EVA_02_01 EVA_07_01 EVA_12_02 EVA_15_06 FLX_04_01
... [rest of items]
>Responses are shifted such that lowest
category is 0.
Warning:
The following items do not have responses on
each category:
EFEC_03_07 LC_07_03 LC_11_05
Estimation may not be feasible. Please check
data matrix
I must clarify that all the dataset has a range from 1 to 5. Is a Likert polytomous dataset
Finally, I try to use the plot function and it does not have any output, the system just keep loading ad-infinitum with no answer
>plotPImap(res, sorted=TRUE)
I would like to add the description of that particular function and the arguments:
>PCM(X, W, se = TRUE, sum0 = TRUE, etaStart)
#X
Input data matrix or data frame with item responses (starting from 0);
rows represent individuals, columns represent items. Missing values are
inserted as NA.
#W
Design matrix for the PCM. If omitted, the function will compute W
automatically.
#se
If TRUE, the standard errors are computed.
#sum0
If TRUE, the parameters are normed to sum-0 by specifying an appropriate
W.
If FALSE, the first parameter is restricted to 0.
#etaStart
A vector of starting values for the eta parameters can be specified. If
missing, the 0-vector is used.
I do not understand why is necessary to have a score beginning from 0, I think that that what the error is trying to say but I don't understand quite well that output.
I highly appreciate any hint that you can provide me
Feel free to ask for any information that could be useful to reach the solution to this issue
The problem is not caused by the fact that there are no items with 0-responses. The model automatically corrects this by centering the response scale categories on zero. (You'll notice that the PI-map that you linked to is centered on zero. Also, I believe the map you linked to is of dichotomous data. Polytomous data should include the scale categories on the PI-map, I believe.)
Without being able to see your data, it is impossible to know the exact cause though.
It may be that the model is not converging. That may be what this error was alluding to: Estimation may not be feasible. Please check data matrix. You could check by entering > res at the prompt. If the model was able to converge you should see something like:
Conditional log-likelihood: -2.23709
Number of iterations: 27
Number of parameters: 8
...
Does your data contain answers with decimal numbers? I found the same error, I solved it by using dplyr::dense_rank() function:
df_ranked <- sapply(df_decimal_data, dense_rank)
Worked.

Grabbing R^2 from linear model in R

After running something like:
mod.1<-lm(z~x+y)
I know I can do summary(mod.1) and see the $R^2$ value. I'm wondering how I might grab it from mod.1, sort of like grabbing the coefficients with mod.1$coefficients.
mod.1 = lm(c(1,2,3)~ c(1,2.3,3.4))
summary(mod.1)$r.squared
R-squared is actually not an element of the lm object itself, but of summary(mod.1). That is, if you type str(summary(mod.1)) you will see that the summary is itself a list (with a special print method) and that one of those list items is R-squared.
However, for programmatic use it's inefficient to calculate the entire summary just to extract one element. Rolling your own extractor function would lead to faster code in general, especially if you call lm with the argument y = TRUE. Then R-squared would just be 1 - sum(mod.1$residuals^2)/sum((mod.1$y - mean(mod.1$y))^2).

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources