Applying survey weights to data before compiling contingency tables in R - r

The sample for a survey I am analysing was not selected randomly and so I need to apply a vector of weights to make the findings representative of the population. I have used wtd.table() (from gmodels) successfully to create frequency tables but now want to create a contingency table to compare two categorical variables and conduct a Chi-sqrd test. I'm struggling to find the right function. The svytable() function in the survey package sounds promising but I don't see where I should input the weight vector. I'm new to R. Could anyone explain how to use svytable() or suggest an alternative?

Related

Applying a population total variable in R?

I have a weighting variable that I'd like to apply to my dataset so that I have weighted totals. In SPSS, this is straightforward enough. However, in R, I've been multiplying the variable by the weight variable to create a new variable as shown in the following example:
https://stats.stackexchange.com/questions/210697/weighting-variable-based-on-another-variable
Is there a more sophisticated way of applying weights in R?
Thanks.
If you need to work with a weighted dataset and define a complex survey sample, you can use the survey package : https://cran.r-project.org/web/packages/survey/survey.pdf.
You can therefore use all sorts of summary statistics once you have defined the weights to be applied.
However, I would advise this for complex weighted analysis.
Otherwise, there are several other packages dealing with weights such as questionr for instance.
It all depends on if you have to do a simple weighted sum or go on to do other types of analysis that require using more sophisticated methods.

How to extract variables from tab_model in R to create new data frame?

Example output of tab_model
I have created a table from tab_model that includes multiple models and wish to extract all 'p-values' and 'Estimates/Odds Ratio' to create a data frame that includes these. Output of tab_model is an html file. I am unable to find a function to pull this info in accordance, any ideas on how I could do this?
For example, I want to retrieve all p-values and Estimates for variable 'age' in all of my models...Only 3 in example image but I have hundreds
You should get these values from the regression models themselves, instead of outputting them to a HTML-table, and then extract them.
Without further knowledge of your process and data it is difficult to provide a more concrete answer.

How do I find out which observations of my dataset have been used for my mlm in R (nlme)?

I have longitudinal data and specified 3 multilevel models for different outcomes with nlme in R.
'model <- lme (...)'
They all are based on the same dataset.
Now,
'summary(model)'
shows me that the observations used for my final three models vary.
Probably, this is due to missing data that is different for every outcome (predictors stayed pretty much the same).
Is there a possibility to see, which observations of my dataset were included in each model? Note, that lme does not give me a S4 object, but medMer. Therefore,
'model#frame'
unfortunately does not work.
My aim is to give precise sample characteristics for each model. Therefore, I somehow need to adress the observations included each of them.
Thank you for any thoughts on this!

Applying univariate coxph function to multiple covariates (columns) at once

First, I gathered from this link Applying a function to multiple columns that using the "function" function would perhaps do what I'm looking for. However, I have not been able to make the leap from thinking about it in the way presented to making it actually work in my situation (or really even knowing where to start). I'm a beginner in R so I apologize in advance if this is a really "newb" question. My data is a data frame that consists of an event variable (tumor recurrence) and a time variable (followup time/time to recurrence) as well as recurrence risk factors (t-stage, tumor size,age at dx, etc.). Some risk factors are categorical and some are continuous. I have been running my univariate analysis by hand, one at a time like this example univariateageatdx<-coxph(survobj~agedx), and then collecting the data. This gets very tedious for multiple factors and doing it for a few different recurrence types. I figured there must be a way to code such that I could basically have one line of code that had the coxph equation and then applied it to all of my variables of interest and spit out a result that had the univariate analysis results for each factor. I tried using cbind to bind variables (i.e x<-cbind("agedx","tumor size") then running cox coxph(recurrencesurvobj~x) but this of course just did the multivariate analysis on these variables and didn't split them out as true univariate analyses.
I also tried the following code based on a similar problem that I found on a different site, but it gave the error shown and I don't know quite what to make of it. Is this on the right track?
f <- as.formula(paste('regionalsurvobj ~', paste(colnames(nodcistradmasvssubcutmasR)[6-9], collapse='+')))
I then ran it has coxph(f)
Gave me the results of a multivariate cox analysis.
Thanks!
**edit: I just fixed the error, I needed to use the column numbers I suppose not the names. Changes are reflected in the code above. However, it still runs the variables selected as a multivariate analysis and not as the true univariate analysis...
If you want to go the formula-route (which in your case with multiple outcomes and multiple variables might be the most practical way to go about it) you need to create a formula per model you want to fit. I've split the steps here a bit (making formulas, making models and extracting data), they can off course be combined this allows you to inspect all your models.
#example using transplant data from survival package
#make new event-variable: death or no death
#to have dichot outcome
transplant$death <- transplant$event=="death"
#making formulas
univ_formulas <- sapply(c("age","sex","abo"),function(x)as.formula(paste('Surv(futime,death)~',x))
)
#making a list of models
univ_models <- lapply(univ_formulas, function(x){coxph(x,data=transplant)})
#extract data (here I've gone for HR and confint)
univ_results <- lapply(univ_models,function(x){return(exp(cbind(coef(x),confint(x))))})

Naive Bayes Classifier in e1071 package [R] - Editing Data

So I'm currently using the Naive Bayes classifier from the e1071 package to classify data, and I was wondering if there was any way to interact with, and edit the data.
For example, using the iris dataset, and the methods described here to extract a classifier from it, I want to be able to select the individual tables in the classifier.
I would like to be able to select a specific data table (such as the Sepal.Length) table, and compare the values against each other in order to get more information.
Does anyone have any methods for doing this?
Just figured it out
Essentially, the classifier is a set of 4 values, the apriori probabilities, the mean and standard deviations of each of the probabilities, the different classes, and the original call.
Each of those values is a nested list with one item, and if you keep on delving into the individual lists you can get at the individual items, including the individual probability matrices, and work from there. The first value of each is the mean, and the second is the standard deviation. From there you can pull whatever data you want, and edit to your heart's extent.

Resources