Polynomial Regression in Azure Machine Learning Studio - azure-machine-learning-studio

I have data that seems to fit a polynomial regression much better than a linear regression, but Azure Machine Learning Studio doesn't have native support for polynomial regressions. Is there a way to transform data and produce a web service capable of predicting values that fit this type of data?
I found a nice article here that talks about how to train a model to fit a polynomial regression even though Azure ML doesn't currently natively supports only linear regressions. The TL;DR of the article calls for adding one or more columns to the data with the square, cube, etc. of the label column in order to improve the accuracy of the model. That approach has improved my model's coefficient of determination to 0.973346. However, when I create the "predictive experiment" and the web service to predict new values, and ask the service to predict a new value, the accuracy is horrendous, like not even in the ballpark.
How can I train a predictive model to data that fits a polynomial regression using Azure Machine Learning Studio?

Related

What models currently support multivariate regression in tidymodels?

I was checking tidymodels for multivariate regression and saw this example here:
https://www.tidymodels.org/learn/models/pls/
This covers multivariate for Partial Least Squares model.
Is there a page that states what models currently support multivariate regression?
I believe the current models that support multivariate (more than one outcome) regression are:
single layer neural network: mlp()
multivariate adaptive regression splines: mars()
good old linear regression: linear_reg()
This list was made by looking for which models use the maybe_multivariate() internal helper, but we should document this better somehow.

Extract feature weights of a linear regression in Azure Machine Learning Studio

Currently, we can only view the feature weights (or coefficient estimates) of a trained linear regression through 'visualize' option but not possible to save this as a table or dataset.
I am experimenting on a market-mix model to understand the incremental sales lift by each media variable, so I need to save the regression estimates.
Is there any workaround for this other than to use 'Execute R' module.

sLDA for predicting categorical response instead of continuous in R

I have a collection of documents, that might have latent topics associated with them. It is likely that each document might relate to one or more topics. I have a master file of all possible "topics"/categories and descriptions to these topics. I am seeking to create a model that predicts the topics for each document.
I could potentially use Supervised text classification using RTextTools, but that would only help me categorize documents to belong to one category or another. I am seeking to find a solution that would not only help me determine the topic proportions to the document, but also give the term-topic/category distributions.
sLDA seems like a good fit, but it seems to only predict continuous variable outcomes instead of categorical.
LDA is more of a classification method, predicting classes. other methods can be multinational logistic regression. LDA could be harder to train compared to Multinational, given a possible little improved fit it can provide.
update: LDA is a classification method where unlike logistic regression that you directly predict Pr(Y = k|X = x) using the logit link, LDA uses the Bayes theorem for prediction. It is normally a more popular compared to logistic regression (and its extension for multi-class prediction, namely multinational logistic regression) for multi-class problems.
LDA assumes that the observations are drawn from a Gaussian distribution with a common covariance matrix in each class, and so can provide some improvements over logistic regression when this assumption approximately holds. in contrast,it is suggested that logistic regression can outperform LDA if these Gaussian assumptions are not hold. To sum up, While both are appropriate for the development of linear classification models, linear discriminant analysis makes more assumptions about the underlying data as opposed to logistic regression, which makes logistic regression a more flexible and robust method when these assumptions are not hold. So what I meant was, it is important to understand your data well, and see which might fit your data better. There are good sources on read you can read and comparison of classification methods:
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
I suggest Introduction to statistical learning, on classification chapter. Hope this helps

identifying key columns/features used by decision tree regression

In Azure ML, I have a predictive regression model using boosted decision tree regression and it is reasonably accurate.
The input dataset has over 450 columns and the model has done a good job of predicting against test data sets, without over-fitting.
To report on the result i need to know what features/columns the model mainly used to make predictions but i cant find this information easily when looking at the trained model data.
How do i identify this information? Im happy to import the result dataset into R to help find this but I just need pointers on what direction to start working in.
Mostly, in using Microsoft Azure Machine Learning, when looking at the features that is mainly used to make predictions, it is found on the output of the Train Model module.
But on using Decision Trees as your algorithm, the output of your Train Model module would be the constructed 'trees' of the algorithm, and it looks like this:
To know the features that made impact on predictions while using Decision Trees algorithms, you can use the Permutation Feature Importance module. Look at the sample experiment below:
The parameters of Permutation Feature Importance are Random Seed and Metric for Measuring Performance (in this case, Regression - Coefficient of Determination)
The left input of Permutation Feature Importance is your trained model, and the right input is your test data.
The output of Permutation Feature Importance looks like this:
You can add Execute R Script to extract the Features and Scores from Permutation Feature Importance module.

How does Azure ML handle categorical columns during training a linear or logistic regression model?

How does Azure ML handle categorical columns during training a linear regression model? A linear regression model takes continuous values. However, even though I haven't changed anything of those categorical columns, Azure ML trains linear and logistic regression without error. So I would like to know how Azure ML manages to process categorical columns behind the scene. Thanks!
It depends upon the model you are using, but you can get clues to how it's done by right-clicking on the "Train Model" element in your experiment, then clicking "Trained Model" -> "Visualize". The visualization will show you how it's used the supplied data.
The linear regression module will only take numeric independent variables. Are you sure you had this working with categoricals in the linear regression?
https://msdn.microsoft.com/en-us/library/azure/dn905978.aspx

Resources