I'm working with a large data set with repeated patients over multiple months with ordered outcomes on a severity scale from 1 to 5. I was able to analyze the first set of patients using the polr function to run a basic ordinal logistic regression model, but now want to analyze association across all the time points using a longitudinal ordinal logistic model. I can't seem to find any clear documentation online or on this site so far explaining which package to use and how to use it. I am also an R novice so any simple explanations would be incredibly useful. Based on some initial searching it seems like the mixor function might be what I need though I am not sure how it works. I found it on this site
https://cran.r-project.org/web/packages/mixor/vignettes/mixor.pdf
Would appreciate a simple explanation of how to use this function if this is the right one, or would happily take any alternate suggestions with an explanation.
Thank you in advance for your help!
My question is about how to implement the nested Dirichlet process (NDP) with R code.
The NDP is suitable for clustering over distributions and simultaneously clustering within a distribution. Rodriguez et al. (2008) provided a simulation example to demontrate the ability of the NDP to distinguish different distributions. I am trying to learn this approach by reproducing the results for this example. But failed to do so because I cannot understand well how the base distribution is related to the mixture components.
The simulation example used a normal inverse-gamma distributioin, NIG(0,0.01,3,1), as the base distribution. But the four different distributions are:
The algorithm provided in Section 4 (Rodriguez et al.,2008, p.1135) was used to do the simulation. I have problem to understand and execute this algorithm, especially step 5:
Can you please provide a sample code to demonstrate this algorithm? Your help is highly appreciated!
I have not be able to do the coding by myself but I have found a recent paper which does the simulation using exact inference instead of truncation approximation. I think it might help someone else who has interest just like me, so I post the link to that paper here.
enter link description here
The good thing I like about this paper is that it is well written and has source code (in R) for me to understand this methodology better.
I asked the following question over on stackexchange https://stats.stackexchange.com/questions/272657/determining-the-direction-of-a-significant-spearmans-rho-correlation - someone pointed me in the direction of this site as I am using spss, so if anyone had any advice that would be much appreciated.
I have conducted Spearman's Rho tests with two ordinal variables (one with 4 possible answers and the other with 6). I have obtained a statistically significant correlation between the two. My question is, how can I graphically (or some other way) determine which answer of each correlate together - as a scatterplot would not work with my data (since it is not scale).
A fluctuation plot is often a good way to look at the distribution of pairs of categorical variables. There is a custom dialog available for this if you don't want to figure out the GPL code. It is available from the Community site, but if you can't find it, send me an email (jkpeck#gmail.com), and I'll send it to you.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have data of a lot of students who got selected by some colleges based on their marks. Iam new to machine Learning. Can I have some suggestions how can I add Azure Machine Learning for predicting the colleges that they can get based on their marks
Try a multi-class logistic regression - also look at this https://gallery.cortanaanalytics.com/Experiment/da44bcd5dc2d4e059ebbaf94527d3d5b?fromlegacydomain=1
Apart from logistic regression, as #neerajkh suggested, I would try as well
One vs All classifiers. This method use to work very well in multiclass problems (I assume you have many inputs, which are the marks of the students) and many outputs (the different colleges).
To implement one vs all algorithm I would use Support Vector Machines (SVM). It is one of the most powerful algorithms (until deep learning came into the scene, but you don't need deep learning here)
If you could consider changing framework, I would suggest to use python libraries. In python it is very straightforward to compute very very fast the problem you are facing.
use randomforesttrees and feed this ML algorithm to OneVsRestClassifer which is a multi class classifier
Keeping in line with other posters' suggestions of using multi-class classification, you could use artificial neural networks (ANNs)/multilayer perceptron to do this. Each output node could be a college and, because you would be using a sigmoid transfer function (logistic) the output for each of the nodes could be directly viewed as the probability of that college accepting a particular student (when trying to make predictions).
Why don't you try softmax regression?
In extremely simple terms, Softmax takes an input and produces the probability distribution of the input belonging to each one of your classes. So in other words based on some input (grade in this case), your model can output the probability distribution that represents the "chance" a given sudent has to be accepted to each college.
I know this is an old thread but I will go ahead and add my 2 cents too.
I would recommend adding multi-class, multi-label classifier. This allows you to find more than one college for a student. Of course this is much easier to do with an ANN but is much harder to configure (say with the configuration of the network; number of nodes/hidden nodes or even the activation function for that matter).
The easiest method to do this as #Hoap Humanoid suggests is to use a Support Vector Classifier.
To do any of these method its a given that you have to havea well diverse data set. I cant say the number of data points you need that you have to experiment with but the accuracy of the model is dependent on number of data points and its diversity.
This is very subjective. Just applying any algorithm that classifies into categories won't be a good idea. Without performing Exploratory Data Analysis and checking following things you can't be sure of a doing predictive analytics, apart from missing values:
Quantitative and Qualitative variable.
Univariate, Bivariate and multivariate distribution.
Variable relationship to your response(college) variable.
Looking for outliers(multivariate and univariate).
Required variable transformation.
Can be the Y variable broken down into chunks for example location, for example whether a candidate can be a part of Colleges in California or New York. If there is a higher chance of California, then what college. In this way you could capture Linear + non-linear relationships.
For base learners you can fit Softmax regression model or 1 vs all Logistic regression which does not really matters a lot and CART for non-linear relationship. I would also do K-nn and K-means to check for different groups within data and decide on predictive learners.
I hope this makes sense!
The Least-square support vector machine (LSSVM) is a powerful algorithm for this application. Visit http://www.esat.kuleuven.be/sista/lssvmlab/ for more information.
If one uses obj=coxph(... + frailty(id) ), then the object also returns (log)frailty estimates for each individual, which can be extracted with obj$frail.
Does anybody knows how these estimates are being obtained? Are they Empirical Bayes estimates?
Thanks!
Theodor
The default distribution for frailty can be seen in the ?frailty page to be "gamma". If you look at the frailty function (which is not hidden) you see that it simply pastes the name of the distribution onto "frailty." and uses get() to retrieve the proper function. So look at frailty.gamma (also not hidden) to find the answers to your question. Looking back at the help page again, you can see that I should have been able to figure all that out without looking at the code, since it's right up at the top of the page. But there are many routes to knowledge with R. (They are ML, not "empirical Bayes", estimates.)
The help page suggests to me that the author (Therneau) expects you to consult Therneau and Grambsch for further details not obvious from reading the code. If you are doing serious work with survival models in R that is a very useful book to have. It's very clear and helpful in understanding the underpinnings of the 'survival'-package.