Classifying anger,disgust,fear,joy,sadness,surprise using RTextTools - r

I have trying to do sentiment analysis of tweets. I am trying to classify anger,disgust,fear,joy,sadness,surprise of those tweets which is generally done by RTextTools. But I can't how to do it? It would helpful if anyone would help.
Any way of doing it would help. I am not trying to achieve positive or negative categorization. Which i have successfully done.
Similar categorization can be done in sentiment R package. But only Bayes Algorithm can be used. It is also okay if I can apply other Algorithms in the classify_emotion() of sentiment package.

You should check out the caret package (http://topepo.github.io/caret/index.html). What you are trying to do are two different classifications (one mulit-class and one two class problem). Represent the document as term frequency vectors and run a classification algorithm of your choice. SVMs usually work well with bag of words approaches.
You would need some training data of course, but there are data sets available. https://www.crowdflower.com/data-for-everyone/

Related

Specification of a mixed model using glmmLasso package

I have a dataset containing repeated measures and quite a lot of variables per observation. Therefore, I need to find a way to select explanatory variables in a smart way. Regularized Regression methods sound good to me to address this problem.
Upon looking for a solution, I found out about the glmmLasso package quite recently. However, I have difficulties defining a model. I found a demo file online, but since I'm a beginner with R, I had a hard time understanding it.
(demo: https://rdrr.io/cran/glmmLasso/src/demo/glmmLasso-soccer.r)
Since I cannot share the original data, I would suggest you use the soccer dataset (the same dataset used in glmmLasso demo file). The variable team is repeated in observations and should be taken as a random effect.
# sample data
library(glmmLasso)
data("soccer")
I would appreciate if you can explain the parameters lambda and family, and how to tune them.

Decisional boundary SVM caret (R)

I have built an SVM-RBF model in R using Caret. Is there a way of plotting the decisional boundary?
I know it is possible to do so by using other R packages but unfortunately I’m forced to use the Caret package because this is the only package I found that allows me to calculate the variables importance.
In alternative, can you suggest a package that allows to plot the decision boundaries AND gives also the vars importance?
Thank you very much
First of all, unlike other methods, SVM does not produce feature importance. In your case, the importance score caret reports is calculated independent of the method itself: https://topepo.github.io/caret/variable-importance.html#model-independent-metrics
Second, the decision boundary (or hyperplane) you see in most textbook example is based on a toy problem with only two or three features. If you have more than three features, it is not trivial to visualize this hyperplane.

Unsupervised discretization to convert continuous into categorical for frequent item set mining

I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.
As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.
Can you give me good advices for this, thanks.
#Michael Hahsler
I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.

Multivariate Classification Trees in R

I'm looking for advice on creating classification trees where each split is based on multiple variables. A bit of background: I'm helping design a vegetation classification system, and we're hoping to use a classification and regression tree algorithm to both classify new veg data and create (or at least help to create) visual keys which can be used in publications. The data I'm using is laid out as community data, with tree species as columns, and observations as rows, and the first column is a factor with classes. I'll also add that I'm very new to this type of analysis, and while I've tried to read about it as much as possible, it's quite likely that I've missed some simple but important aspects. My apologies.
Now the problem: R has excellent packages and great documentation for classification with univariate splits (e.g. rpart, partykit, C5.0). However, I would ideally like to be able to create classification trees where each split was based on multiple criteria - so instead of each split having one decision (e.g. "Percent cover of Species A > 6.67"), it would have multiple (Percent cover of Species A > 6.67 AND Percent cover of Species B < 4.2). I've had a lot of trouble finding packages that are capable of doing multivariate splits and creating trees. This answer: https://stats.stackexchange.com/questions/4356/does-rpart-use-multivariate-splits-by-default has been very useful, and I've tried all the packages suggested there for multivariate splitting. Prim does do multivariate splits, but doesn't seem to make trees; the partDSA package seems to be somewhat what I'm looking for, but it also only creates trees with one criteria per split; the optpart package also doesn't seem to be able to make classification trees. If anyone has advice on how I could go about making a classification tree based on a multivariate partitioning method, that would be super appreciated.
Also, this is my first question, and I am very open to suggestions about how to ask questions. I didn't feel that providing an example would be helpful in this case, but if necessary I easily can.
Many Thanks!

Analysis of complex survey design with multiple plausible values

I am working with several large databases (e.g. PISA and NAEP) that use a complex survey design with replicate weights and multiple plausible values. I can address the former using the survey package. However, does there exist an R package/function to analyze the latter?
For reference, I have found this article to provide a good overview of the issue: http://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_02_Chapter_01.pdf
I'm not sure how the general idea of 'plausible values' differs from using multiple imputation to generate several sets of imputed values (such as the the Amelia package does). But Thomas Lumley's mitools package can be used to combine the various sets of imputed values, and it might be the case that it can be used to combine your sets of plausible values to obtain the 'correct' standard errors of the estimates.
Daniel Caro develop an R package for large scale assessments. You can find it here http://cran.r-project.org/web/packages/intsvy/index.html
This is code example using the regression command, over the plausible values on Mathemathics:
## Not run:
# Table I.2.3a, p. 305, International Report 2012
pisa.reg.pv(pvlabel="MATH", x="ST04Q01", by = "IDCNTRYL", data=pisa)
Although, I'm not sure if this package can be used to analyze NAEP data.
I hope this fulfill your purposes; at least partially.
As of survey version 3.36 there's withPV
data(pisamaths, package="mitools")
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~STRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=pisamaths)
options(survey.lonely.psu="remove")
results<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH),
data=des,
action=quote(svyglm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=des)))
summary(MIcombine(results))

Resources