I had to teach my friend how to run a t-test on R (using the t.test function) and I just wished that the function was more interactive. Newcomers could run the function easily if the function guides them through the test. I was unable to find such function online so I decided to make one myself. Trying to make an interactive function is a huge challenge for me but it is a fun breather in my graduate school life.
I want my function to be able to run like myttest(x, y, paired = T) so that Rmarkdown could produce the output normally. I also want the function to run interactively by typing myttest(). Thus I decided to base my function on t.test.default and add readline in the source code where needed.
I used getAnywhere(t.test.default) function to display the source code. And I put the following code right after the first { so that R could ask for a vector like data$GPA.
if (missing(x)) {x <- readline("What is the name of the data set?")}
However, I got the following error message when I ran myttest() and typed data$GPA in the interactive dialogue.
Error in myttest() : not enough 'x' observations
In addition: Warning message:
In myttest() : NAs introduced by coercion
The data set data actually exists in the Global Environment and it has the column GPA so I think it is the problem with my coding. Why isn't R reading the observations in the GPA column?
(Also, one of my final goals is to make R ask for only a data set. R would read the columns of the data set, display them in the interactive dialogue, and ask Which variable do you want to use as the DV?. Then I could type in GPA, for example. I also think if R to asked for the type of t-test at the beginning (e.g. one-sample, two-sample, or paired-sample). Do you think this level of interaction is possible?)
Related
Problem: I am trying to run a mixed ANOVA in R. When I run the ANOVA, I get an error saying that all of my data values are missing.
This is what the data look like (just the first few rows):
Screenshot of data after running View(SWL_data)
When I try to run the ANOVA, I get an error that all the cases are missing values even though they clearly have values when I look at the data file in View(). ANOVA code and error
I checked each of the object types. R code and output of object types
I am not sure how to share the code so that this problem is reproducible without sharing all of my data. This is my first post so thank you for your patience with me!
I am working with the 'indicspecies' package - multipatt function and am unable to extract summary values of the package. Unfortunately I can't print all the summary and am left with impartial information for my model. The reason is the huge amount of data that needs to be printed from the summary (300.000 different species, 3 groups, 6 comparable combinations).
This is what happens with summary being saved (pre-code incl.):
x <- multipatt(data, ...)
sumx <-summary(x)
sumx
NULL
str(sumx)
NULL
So, the summary does not work exactly like a generic summary. It seems that the function is based around the older indval function from the 'labdsv' package (which is mentioned in the documentation). I found an archived thread where a similar problem is discussed: http://r.789695.n4.nabble.com/extract-values-from-summary-of-function-indval-of-the-package-labdsv-td4637466.html
but it seems not resolved (and is not exactly about the same function, rather the base function indval).
I was wondering if anyone has experience with the indicspecies package and knows a way to either extract the info from the summary.
It is possible to extract significance and other information from the other saved data from the model, but it might be nice to just get a quick complete overview from the data.
ps. I tried
options(max.print=1000000)
but this didn't solve it for me.
I use to capture the summary output for a multipatt object, but don't any more because the p-values reported are not corrected for multiple testing. To answer the OP's question you can capture the summary output using capture.output
ex.
dat.multipatt.summary<-capture.output(summary(dat.multipatt, indvalcomp=TRUE))
Again, I do not recommend this. It is very important to correct the p-values for multiple testing, so the summary output actually isn't helpful. To be clear ?multipatt states:
"sign Data table with results of the best matching pattern, the association value and the degree of statistical significance of the association (i.e. p-values from permutation test). Note that p-values are not corrected for multiple testing."
I just posted an answer for how to correct the p-values here https://stats.stackexchange.com/questions/370724/indiscpecies-multipatt-and-overcoming-multi-comparrisons/401277#401277
I don't have any experience with this package and since you haven't provided the data, it's difficult to reproduce. But since summary is returning NULL, are you sure your x is computed properly? Check the object.size or class or something else of x to see if it indeed has any content.
Also instead of accessing all the contents of summary(x) together, you can use # to access slots of it (similar to $ in dataframe).
If you need further assistance, it'd be better t provide atleast a small subset or some other sample data so that the community can work with it.
This might not be the right place to ask but I'm not sure where else to ask it. I'm trying to use the smbinning package. In particular, I'm trying to bin by multiple predictor variables. The issue is all the examples in the package documentation only deal with one predictor variable. I tried this naively:
result=smbinning(df=training,y="FlagGB",x=".,",p=.05)
which seemed to execute okay, but then if I tried to run result$ivtable I got the error
Error in result$ivtable : $ operator is invalid for atomic vectors
Does anyone know a) how to get smbinning to accept multiple predictors or if it can't another package that can; b) how to resolve the specific error listed above?
I have solved the problem ,It is because the training may not a data frame, you have to convert training into data frame with as.data.frame(training). you can see the smbinning code (https://github.com/cran/smbinning/blob/master/R/smbinning.R#L490), there is this block
i=which(names(df)==y) # Find Column for dependant
j=which(names(df)==x) # Find Column for independant
if (!is.numeric(df[,i]))
{
return("Target (y) not found or it is not numeric")
}
secondly,the y FlagGB must be numerical ,if your y varible is factor ,you have to convert to numerical ,you can use as.numeric(as.character(y)) not directly use as.numerical()
the problem is similarly to "Target (y) not found or it is not numeric" -Package smbinning - R
Have you looked into "Information" package? It seems to be doing the job, but there is no facility to recode the variable. Of if there is one, I haven't been able to find. Otherwise, it is a really great package for exploration and analysis of the variables.
To answer b) you should do: result and (most probably) see that the function in fact did not execute for the specific reason that you will get in return.
Indeed, it is a bit confusing that the smbinning package returns its errors silently and within the variable itself.
Question a), on the other hand, is hard to answer without looking at the data. You can try to cross/multiply your variables, but that may result in a very large number of factor levels. I would suggest that you apply the smbinnign package to group each of your characteristics into a few groups and then try to cross the groups.
for question a), you should use sumiv method which can calculates IV for all variables in one step. code like:
sumivt=smbinning.sumiv(chileancredit.train,y="FlagGB")
sumivt # Display table with IV by characteristic
Completely new to R here. I ran R in SPSS to solve some complex polynomials from SPSS datasets. I managed to get the result from R back into SPSS, but it was a very inelegant process:
begin program R.
z <- polyroot(unlist(spssdata.GetDataFromSPSS(variables=c("qE","qD","qC","qB","qA"),cases=1),use.names=FALSE))
otherVals <- spssdata.GetDataFromSPSS(variables=c("b0","b1","Lc","tInv","sR","c0","c1","N2","xBar","DVxSq"),cases=1)
b0<-unlist(otherVals["b0"],use.names=FALSE)
b1<-unlist(otherVals["b1"],use.names=FALSE)
Lc<-unlist(otherVals["Lc"],use.names=FALSE)
tInv<-unlist(otherVals["tInv"],use.names=FALSE)
sR<-unlist(otherVals["sR"],use.names=FALSE)
c0<-unlist(otherVals["c0"],use.names=FALSE)
c1<-unlist(otherVals["c1"],use.names=FALSE)
N2<-unlist(otherVals["N2"],use.names=FALSE)
xBar<-unlist(otherVals["xBar"],use.names=FALSE)
DVxSq<-unlist(otherVals["DVxSq"],use.names=FALSE)
z2 <- Re(z[abs(c(abs(b0+b1*Re(z)-tInv*sR*sqrt(1/(c0+c1*Re(z))^2+1/N2+(Re(z)-xBar)^2/DVxSq))-Lc))==min(abs(c(abs(b0+b1*Re(z)-tInv*sR*sqrt(1/(c0+c1*Re(z))^2+1/N2+(Re(z)-xBar)^2/DVxSq))-Lc)))])
varSpec1 <- c("Xd","Xd",0,"F8","scale")
dict <- spssdictionary.CreateSPSSDictionary(varSpec1)
spssdictionary.SetDictionaryToSPSS("results", dict)
new = data.frame(z2)
spssdata.SetDataToSPSS("results", new)
spssdictionary.EndDataStep( )
end program.
Honestly, it was mostly pieced together from somewhat-related examples and seems more complicated than it should be. I had to take the new dataset created by R and run MATCH FILES with my original dataset. All I want to do is a) pull numbers from SPSS into R, b) manipulate them-in this case, finding a polyroot that fit certain criteria- , and c) put the results right back into the SPSS dataset without messing up any of the previous data.
Am I missing something that would make this more simple? Keep in mind that I have zero R experience outside of this attempt, but I have decent experience in programming SPSS and matlab.
Thanks in advance for any help you give!
R in SPSS can create new SPSS datasets, but it can't modify an existing one. There are a lot of situations where the data from R would be dimensionally inconsistent with the active SPSS dataset. So you need to create a dictionary and data frame using the apis above and then do whatever is appropriate on the SPSS side if you need to match back. You might want to submit an enhancement request for SPSS at suggest#us.ibm.com
First of all I want to point out that I'm not very familiar with R, so sorry if one of the following questions is clear.
My motivation is to write a simple R-script, which should contain:
import data
do regression of form $ Y=aX+bZ+intercept$
some calculations
ouput
now here are my questions:
This is a very general question: If I wrote the R script, then I have to load it with source(name.R), right? Must be there an additional command to execute the script?
Suppose I did my regression with lm, like fit<-lm(Y~X+Z,data=database) this gives a nice ouput. What I really want is to save the coefficients of the model in a vector. How can I do this? Here would it be a 3-dimensional vector (intercept, a, b). EDIT I've tried coefficient<-coefficient(fit). This does not work! coefficient is not a numerical vector. There are also the name, i.e. intercept and the value below for the first element of it.
If I want to print out the coefficients and some calculations at the very end of the script, how do I do this? Just write print(....)?
I'm very thankful for your help and Hopefully I considered all rules and conventions in this forum, since this is my first question. If not, I'm very sorry.
If I wrote the R script, then I have to load it with source(name.R), right? Must be there an additional command to execute the script?
Not if your script directly invokes the commands
For instance if name.R contains
a <- 1:10
plot(a, a^2, t="l")
Then source("name.R") will directly generate a plot
However, if name.R contains
myfunction <- function()
{
a <- 1:10
plot(a, a^2, t="l")
}
Then sourcing it will only load the function. You will then have to invoke myfunction() to get the plot.
Suppose I did my regression with lm, like fit<-lm(Y~X+Z,data=database) this gives a nice ouput. What I really want is to save the coefficients of the model in a vector. How can I do this? Here would it be a 3-dimensional vector (intercept, a, b)
If I want to print out the coefficients and some calculations at the very end of the script, how do I do this? Just write print(....)?
print(coef(fit))
will give you what you need (you can store them in an array with model.coef <- coef(fit))
Also, it can be interesting to run
summary(fit)
See ?coef and ?summary for more info