Binning categorial variables in SPSS [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have two different variables which are both categorical. One of them is the seriousness of the sickness in terms of degrees. The other one is the type of surgery.And my dependent variable is their recovery day after the surgery. Because the days of surgery is not normally distributed I have to use a non-parametric method.
So I need to combine the independent variables into a new variable: 1:sickness1Surgery1 2:sickness1Surgery2, 3:sickness2Surgery1, sickness2Surgery1. By this way I will be able to test it.
I have checked the Youtube but, they are all about how to bin scales into categories.

If you provide more details about the structure of your data (preferably with some sample data) we could provide better suited code. Still, the basic idea should be the same:
if sickness=1 and surgery=1 combinedVar=1.
if sickness=2 and surgery=1 combinedVar=2.
if sickness=1 and surgery=2 combinedVar=3.
if sickness=2 and surgery=2 combinedVar=4.
value labels combinedVar
1 "sickness=1, surgery=1"
2 "sickness=2, surgery=1"
3 "sickness=1, surgery=2"
4 "sickness=2, surgery=2".

Related

Interpretation of ACF plot [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Need help on interpreting the acf plot(sin graph pattern)
May be you will need to examine the PACF, you have a large peak in the first lag, followed by a decreasing wave that alternates between positive and negative correlations. Which can mean an autoregressive term of higher order in the data.
Use the partial autocorrelation function to determine the order of the autoregressive term.

Qualitative data analysis using data mining techniques [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have 22 companies response about 22 questions/parameters in a 22x22 matrix. I applied clustering technique which gives me different groups with similarities.
Now I would like to find correlations between parameters and companies preferences. Which technique is more suitable in R?
Normally we build Bayesian network to find a graphical relationship between different parameters from data. As this data is very limited, how i can build Bayesian Network for it?
Any suggestion to analyze this data.
Try looking at Feature selection and Feature Importance in R, it's simple,
this could lead you: http://machinelearningmastery.com/feature-selection-with-the-caret-r-package/
Some packages are good: https://cran.r-project.org/web/packages/FSelector/FSelector.pdf
, https://cran.r-project.org/web/packages/varSelRF/varSelRF.pdf
this is good SE question with good answers: https://stats.stackexchange.com/questions/56092/feature-selection-packages-in-r-which-do-both-regression-and-classification

Correlations, Scatter Plots and P-Value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a set of data, after questioning customers.(it's about a shoe company) Two of the columns include GENDER and INCOME. I am supposed to test if there are any significant differences in income between genders, and give the corresponding P-value.
I'm still a n00b when it comes to R, I'm still learning and I've been struggling for 3 days now to find the functions to do so. Does anyone have any lead, or could help me with it? would be awesome.
I am editing this because I realized my other answer was not correct.
What you want is a linear model.
say
GENDER <- factor(c(0,1,1,0,1)
INCOME <- c(20000,30000,40000,50000,550000)
then you want
model <-lm(INCOME~GENDER)
and
summary(model)
anova(model)
will give you the information you are after.
Good luck,
Bryan

Machine learning - Calculating the importance of a "value" in a variable [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I’m analyzing a medical dataset containing 15 variables and 1.5 million data points. I would like to predict hospitalization and more importantly which type of medication may be responsible. The medicine-variable have around 700 types of drugs. Does anyone know how to calculate the importance of a "value" (type of drug in this case) in a variable for boosting? I need to know if ‘drug A’ is better for prediction than ‘drug B’ both in a variable called ‘medicine’.
The logistic regression model is able to give such information in terms of p-values for each drug, but I would like to use a more complex method. Of cause you can create a binary variable of each type of drug, but this gives 700 extra variables and does not seems to work very well. I’m currently using r. I really hope you can help me solve this problem. Thanks in advance! Kind regards Peter
see varImp() in library caret, which supports all the ML algorithms you referenced.

When Teaching R, how to avoid the possible confusion with the term ''variable''? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In the R official docs, the term ''variable'' is used to describe two distinct things:
The name we give to any type of object with the <-operator or with assign
For instance, we could say that in a <- data.frame(0), a is a variable, i.e. a symbol that links that particular dataframe to it.
A vector or a factor, belonging or not to a structure like a matrix or a dataframe, and containing units of data which, we assume, can take any of several or many values.
In this case it's akin to the statistical version of the term, such as in ''random variable''.
So my question is the following:
How do I help students understand the difference between programmatic and statistical usage of the term variable when teaching R?
(thanks and credits to #Gregor who formulated it in a better way than I would.)

Resources