I have a classification task that I'm seeking to do via Qualtrics. Essentially, each participant sees a randomly generated piece of embedded data (from a total of 400 embedded data fields), and answers 5 questions about this piece of data. Each participant does this 10 times (i.e. they classify 10 pieces of data). Each classification task comprises of the same five muliple-choice questions. Is there any way to aggregate the answers obtained to each question and see the aggregate results? For instance, the first question in each task is "was this written before or after purchase?", and I want to see 10 yes/no answers together for each participant, and also aggregated across participants (right now Qualtrics aggregates them only across participants, but not within).
Thanks!
You can use Qualtrics scoring functionality to aggregate within participants: https://www.qualtrics.com/support/survey-platform/survey-module/survey-tools/response-management-tools/scoring/
Related
I am analyzing a dataset from open answer questions in a questionnaire. Now I have a list of frequent words/categories. I've done some editing so that the essence of the answer is the same but the words/categories are more frequent.
What it comes down to is that I now have a list of these most frequent words and categories which I simply want to plot in a word cloud, based on their frequency.
What I am running in to is that some of these wordcloud generators do their seperate step of analyzing the text, which simply isn't what I need. I just need my list of words to be plotted based on their frequency.
How do I go about doing this in R?
I am not quite sure if this is the right place to post this question.
I have to do a logistic regression using R. Now the programming part should not be an issue as there is enough tutorials and similar questions on these forums already.
My question is more about how to get data into usable form this model.
To specify: The survey is about a tax on a specific consumer good. Specifically on the change in the consumers purchasing behaviour. There were two categories that people were randomly selected for. One with tax and the other without. Additionaly, two different situations were people were asked about their preferences. So to sum up, Group A were taxed on the good in both situations, Group B was not taxed in either situation.
The results are now in a CSV file. The problem now is, however, all those subgroups got their own respective column. This means that this can't be evaluated well as they should all be merged into one to then create a logistic regression with a 1 if a person has chosen the taxed good and 0 if they did not. This should then be evaluated to see if a tax on said good would reduce the amount bought by x percent, if the tax even has an impact on purchasing behaviour. (This may not apply to this question but is more aimed towards clarification. Logistic regression will not tell me the before mentioned point)
My question now is, is there even a way to make this work with the design chosen? Is it possible to merge all the data into usable form without losing / distorting any data?
I am not sure if this question is stated clearly enough. Let me know if I should clarify more details for this question to be properly answered.
Thank you for your help!
EDIT:
The columns in the CSV file now each have a number in them corresponding to the choice they made in the survey. But since there were different groups they all got their respective column. For a logistic regression they have to be all in the same column (I believe). Can I just stack them using the links posted in the comments and go from there?
Also. Does it not distort any data when just stacking columns? I am not sure if this is the right place to ask this but I think it's worth a try.
What you could try, is splitting the csv, in two seperate datasets (one for each group) and use rbind to combine them:
# note: the column names needs to be identical in order for them to stack
df_final <- rbind(df1, df2)
I have an SAS table imported to R Studio, there are around 2500 observations (patient ID). This is a table with data from different experiments (3 experiments or readings), and for some observations/patients, more than one experiment was conducted. I want to know which of those observations have more than one experiment in order to filter the data and just choose an experiment. Also, this will help me see the exact number of observations.
I am trying to think of a potential way to predict an ID value given text data. The data is broken by:
Group: A 4 digit number in which a group of IDs exists in
ID: 13 Digit Number that is the Group Number + a Unique Value
Text: Words coming from documents.
Goal: is to predict an ID number given only the text from a document.
The data that I have has about 1200 different IDs while there are only 140 different groups. The document term matrix is about 186 columns wide with about 20,000 rows. I have a lot more data I could include. I had created a simple neural net to predict the Group number with 70% accuracy. My idea is to use this model first to predict the group number and then build separate models for each group to narrow the amount of IDs in the prediction. A final model would be trained and would be used to predict the ID. Below is a drawing of what I had in mind. Is this similar to stacking in ensemble learning? I am relatively new to machine learning and I am trying to think of different ways to approach this problem.
Am I on the right path or is there a better way of doing this? Any advice is greatly appreciated.
A lot depends on how well you think you can infer the group_number and unique_value from the text. Does the unique_value depend at all on the group_number? If so, then you will likely want to predict the group_number first and use that in the prediction of the unique_value - as you have suggested doing for each unique group number. You will also have to consider the amount of data you have for each given group and if it's enough to train respective models. Give it a shot, and if it doesn't work, try a single neural network where you enter the text and the group number you've already predicted!
Good Luck!
I'm newbie to azure machine learning and I'm trying to build a model that rates groups of items.
My data is a file with a list of items with features (small list - less than 80 items) and I need to make groups (of diferent sizes - groups of 2, 3, 4,... 10 items, for all the possible combinations) so that the model rate those groups (rates from 1 to 10). I also have some group rates to train the model.
I don't know how to transform the items into groups.
Another thing is, I'm not sure which model is the best. From all I gather, I think that a multiclass classification is the most suitable for this problem. Is it?
Thank you in advance and sorry for any grammar error in my text.
You need to convert various groups as columns. One of the such example is where you have sales for specific day and you need have sales for past days as additional features. Here is the code that does convert rows to columns for having sales for previous days - https://gallery.cortanaintelligence.com/CustomModule/Generate-Lag-Features-1 - source code for this - https://gist.github.com/nk773/a2ed7cd0ce8020647f5e7711f749b3b5