Integrating Qualtrics and MTurk for a multi-wave study - r

I want to run a study that recruits Mechanical Turk workers to take a survey on Qualtrics. But the survey itself is pretty long, and I want to make sure I get a good number of respondents from hard-to-find groups on MTurk (such as Latino men) without having to pay for thousands and thousands of respondents. So my idea is this: I'll first recruit many people to take a short demographic survey where they answer a few questions about themselves. Then, I'll load all those survey responses into R and, using MTurkR, I'll select a subset of those people to recontact (they'll be my sample for the real survey) and assign them some qualification, then send them a message asking them to sign up for my second HIT.
Okay, so then hopefully a bunch of people from my selected sample will go on to take my second (longer) survey. The challenge is that I'd like some of the questions in this second survey to include demographic data from the first survey. So if you said in the first survey that your religion is Catholic, I'd like there to be a question on the second survey that goes "As a Catholic...". But I don't see an obvious way to pull respondents' answers from the first survey into the second survey. Ideally, I'd have them put in their MTurk worker ID in both surveys, and that ID would serve as a link between surveys. But practically, how do I do this?

After you select the subset of people you want, save the recontact data to a CSV then upload it into a Qualtrics panel.
At the beginning of your Qualtrics survey flow, define the recontact fields as embedded variables that will be populated from the panel. Then you can use them in your survey as piped text (e.g. ${e://Field/religion}) or in logic (if embedded variable religion = Catholic).
In Qualtrics, create an email invite, then invite the people in your panel to take the survey. They will receive an email invite with a link that ties them to their panel data.

Related

Qualtrics: Assign 4-digit Unique Participant ID number after enrolling via QR code

This is for longitudinal human subjects biomedical research. The basic flow of the enrollment process is this:
Enter clinic & scan QR code OR click email link
Read informed consent form and check I consent [survey]
Pass study screener question (are you 18 yo) [survey]
Contact form - enter your name, phone, email [survey w/ workflow to create XM
directory from this data]
Baseline survey [survey w/ ID number attached]
I have it set up right now that survey 2,3,4 all flow together assuming you meet all the criteria. This study is longitudinal so for every survey I want their ID number attached so we can track changes over time at the person-level. We are also collecting blood samples so I will need their participant ID number on their blood tubes so we know who's who when we process and analyze them. Therefore the nurse will need to know their ID number between steps 4 & 5.
I've seen on the community boards that assigning a random number does not guarantee unique values and that unless you make the number long you might have a duplicate. Can't have that. The numbers must be unique and 4 digits so the clinic staff doesn't get bogged down writing a 10-digit number on a blood tube.
I'd love for this number to be sequential, but beggars can't be choosers.
Any ideas on how to do this?? Here are my thoughts thus far
In step 4 (the contact sheet) set a quota larger than the participants I plan to enroll and then save the quota count as embedded data. Then in the workflow where I create the contact directory, save the quota number into an ID number field. DRAWBACK: if multiple people are enrolling at once, Quatrics Customer Service said they can't guarantee that two people won't have the same quote number but couldn't give details about how to define "at the same time". like clicking it at the same milisecond? second? few seconds?
Use the Qualtrics generated unique Person ID field, which is very long and alphanumeric and somehow re-code it with.... python magic?
Thank you!!!

How do you compare quota counts in qualtrics?

I need to use Qualtrics to elicit responses of a group of two subjects. Participants should be randomly assigned into Participant A and Participant B.
Basically, participant A would play rock-paper-scissors game against the computer. Participant B need to place a bet on player A's final outcome of the game. Survey questions would be asked to investigate the different reaction of two players.
At the END of each participants survey, player A would be informed about whether player B placed the bet or not. ie. I need to display the previous answer of a DIFFERENT survey participant in the Qualtrics survey.
There are two way I'm thinking of doing this: randomize and assign people into two survey streams, use quota counts for survey stream A and B. Compare the number of quota. If the number of quota counts for stream B is greater than stream A, then the next participant would be assigned into survey stream A. However, I do not know how to compare quota counts.
Could someone please help me with this?
You don't need to compare quota counts. Just check the "Evenly Present Elements" box on your randomizer and Qualtrics will keep the A & B counts equal for you.

R: Clustering customers based on similar product interests for an event

I have a dataset with a list of customers and their product preferences. Basically, it is a simple CSV with a column called "CUSTOMER" and 5 other columns called "PRODUCT_WANTED_A", "PRODUCT_WANTED_B" and so on.
I asked these customers if they were interested to know more about a particular product, and answers could be simply YES or NO (1 or 0 in the dataset). The dataset can be downloaded here. Obviously, there will be customers with many different interests, based on the mix of their YES or NO in these 5 columns.
My goal is to understand which customers are similar to others in such interests. This will help me manage an agenda of product presentations and, in each meeting, I would like to understand the best grouping for it. I started with a hierarchical plot like this:
customer_list <- read.csv("customers_products_wanted.csv", sep=",", header = TRUE)
customer.hclust <- hclust(dist(customers_list))
plot(customer.hclust, customer_list$CUSTOMER)
library(rect.hclust)
rect.clust(customer.hplot,5)
This is the plot I got, asking for 5 clusters:
Tried the same, but with 10 clusters:
Question 1: I know it's always hard to tell, but looking at the charts and dataset, what would be your 'cut' to group customers? 5? 10?
I was reviewing the results, and in the same group, I had CUSTOMER112 with 1,0,1,0,1 as their preferences together with CUSTOMER 110 (1,1,1,1,1), CUSTOMER106 (1,1,1,1,0) and so on. The "distance" can be right, but in a given group I have customers with some relevant differences in their preferences.
Question 2: I don't know if it's a case of total ignorance about clustering, the code I used or even the dataset. Based on your experience, what would be your approach for the best clustering in this case?
Any comments will be highly appreciated. As you see, I did some efforts, but still in doubt.
Thanks a lot!
Ricardo
All answers were important, but #Ben video recommendation and #Samuel Tan advice on breaking the customers into grids, I found a good way to handle it.
The video gave me a lot of insights about "noisy" variables in hierarchical clustering, and the grid recommendation helped me think on what the data is really trying to tell me.
That said, a basic data cleaning process eliminated all customers with no interests in any products (this is obvious, but I didn't pay attention to it at first). Then, I ignored customers with a specific interest (single product). It was done because these customers wouldn't need to attend the workshop series I'm planning (they just want to listen about one product).
Evaluating all the others, interested in more than one product, I realized the product mix could point me to a better classification. From there, I grouped customers into 3 clusters: integration opportunities (2 or 3 products), convergence opportunities (4 products) and transformation opportunities (all products).
Now it's clear to me which customers I should focus on for my workshops, and plan my post-workshop sales campaigns leveraging materials that target each customer group (integration, convergence, transformation).
Thanks for all the advices!
Ricardo

Looping through website links with R

I am looking to incorporate a loop in R which goes through every game's boxscore data on the NFL statistics website here: http://www.pro-football-reference.com/years/2012/games.htm
At the moment I am having to manually click on the "boxscore" link for every game every week; is there any way to automate this in R? My code works with the Full play-by-play dataset within each link; it's taking me ages at the moment!
Web scraping may be against the terms of use of some websites. The enforceability of these terms is unclear. While outright duplication of original expression will in many cases be illegal, in the United States the courts ruled in Feist Publications v. Rural Telephone Service that duplication of facts is allowable.
require(RCurl)
require(XML)
bdata<-getURL('http://www.pro-football-reference.com/years/2012/games.htm')
bdata<-htmlParse(bdata)
boxdata<-xpathSApply(bdata,'//a[contains(#href,"boxscore")]',xmlAttrs)[-1]
The above will get the boxscore stem for the various games.

Get Annual Financial Data for a Stock for many years in R

Suppose I want to regress in R Gross Profit on Total Revenue. I need data for this, and the more, the better.
There is a library on CRAN that I find very useful: quantmod , that does what I need.
library(quantmod)
getFinancials(Symbol="AMD", src="google")
#to get the names of the matrix: rownames(AMD.f$IS$A)
Total.Revenue<-AMD.f$IS$A["Revenue",]
Gross.Profit<-AMD.f$IS$A["Gross Profit",]
#finally:
reg1<-lm(Gross.Profit~Total.Revenue)
The biggest issue that I have is that this library gets me data only for 4 years (4 observations, and who runs a regression with only 4 observations???). Is there any other way (maybe other libraries) that would get data for MORE than 4 years?
I agree that this is not an R programming question, but I'm going to make a few comments anyway before this question is (likely) closed.
It boils down to this: getting reliable fundamental data across sectors and markets is difficult enough even if you have money to spend. If you are looking at the US then there are a number of options, but all the major (read 'relatively reliable') providers require thousands of dollars per month - FactSet, Bloomberg, Datastream and so on. For what it's worth, for working with fundamental data I prefer and use FactSet.
Generally speaking, because the Excel tools offered by each provider are more mature, I have found it easier to populate spreadsheets with the data and then read the data into R. Then again, I typically deal with the fundamentals of a few dozen companies at most, because once you move out of the domain of your "known" companies the time it takes to check anomalies increases exponentially.
There are numerous potential "gotchas". The most obvious is that definitions vary from sector to sector. "Sales" for an industrial company is very different from "sales" for a bank, for example. Another problem is changes in definitions. Pretty much every year some accounting regulation or other changes and breaks your data series. Last year minorities were reported here, but this year this item is moved to another position in the P&L and so on.
Another problem is companies themselves changing. How does one deal with mergers, acquisitions and spin-offs, for example? This sort of thing can make measuring organic sales growth next to impossible. Yet another point to bear in mind is that if you're dealing with operating or net profit, you have to consider exceptionals and whether to adjust for them.
Dealing with companies outside the US adds a whole bunch of further problems. Of course, the major data providers try to standardise globally (FactSet Fundamentals for example). This just adds another layer of abstraction and typically it is hard to check to see how the data has been manipulated.
In short, getting the data is onerous and I know of no reliable free sources. Unless you're dealing with the simplest items for a very homogenous group of companies, this is a can of worms even if you do have the data.

Resources