Qualtrics: Assign 4-digit Unique Participant ID number after enrolling via QR code - css

This is for longitudinal human subjects biomedical research. The basic flow of the enrollment process is this:
Enter clinic & scan QR code OR click email link
Read informed consent form and check I consent [survey]
Pass study screener question (are you 18 yo) [survey]
Contact form - enter your name, phone, email [survey w/ workflow to create XM
directory from this data]
Baseline survey [survey w/ ID number attached]
I have it set up right now that survey 2,3,4 all flow together assuming you meet all the criteria. This study is longitudinal so for every survey I want their ID number attached so we can track changes over time at the person-level. We are also collecting blood samples so I will need their participant ID number on their blood tubes so we know who's who when we process and analyze them. Therefore the nurse will need to know their ID number between steps 4 & 5.
I've seen on the community boards that assigning a random number does not guarantee unique values and that unless you make the number long you might have a duplicate. Can't have that. The numbers must be unique and 4 digits so the clinic staff doesn't get bogged down writing a 10-digit number on a blood tube.
I'd love for this number to be sequential, but beggars can't be choosers.
Any ideas on how to do this?? Here are my thoughts thus far
In step 4 (the contact sheet) set a quota larger than the participants I plan to enroll and then save the quota count as embedded data. Then in the workflow where I create the contact directory, save the quota number into an ID number field. DRAWBACK: if multiple people are enrolling at once, Quatrics Customer Service said they can't guarantee that two people won't have the same quote number but couldn't give details about how to define "at the same time". like clicking it at the same milisecond? second? few seconds?
Use the Qualtrics generated unique Person ID field, which is very long and alphanumeric and somehow re-code it with.... python magic?
Thank you!!!

Related

How can I categorize variables, count variables and define their percentages in Rstudio

Data exampleThis is my first time working in R studio.
I have a database of 36 participants but it has 150600 entries.
There is a column for the participants:
A column for the probes Activityprobe/ Screenprobe, SMSprobe and CallLogprobe
A column for the Activity Level High/low/none, screenon/off etc.
I need a code that helps me count the activity level of all the participants
High activity level. No activity level and Low activity level.
And to help me find out for every Participant what the percentages are of all their high/no/low activity.
For screenprobe I need to count how many times the participant turned their screen on and how many times they turned it off and the percentage of screen on/off.
For callLog I need to count how many times each participant got called and the percentage.
For SMS I need to count the number of SMS for each participant and their percentage.
I also need to categorize the probes. So that my database shows all the activity levels first, organized by none/high/low and then all the screenprobes, organized by on and off etc.
I hope that my description is clear

How do you compare quota counts in qualtrics?

I need to use Qualtrics to elicit responses of a group of two subjects. Participants should be randomly assigned into Participant A and Participant B.
Basically, participant A would play rock-paper-scissors game against the computer. Participant B need to place a bet on player A's final outcome of the game. Survey questions would be asked to investigate the different reaction of two players.
At the END of each participants survey, player A would be informed about whether player B placed the bet or not. ie. I need to display the previous answer of a DIFFERENT survey participant in the Qualtrics survey.
There are two way I'm thinking of doing this: randomize and assign people into two survey streams, use quota counts for survey stream A and B. Compare the number of quota. If the number of quota counts for stream B is greater than stream A, then the next participant would be assigned into survey stream A. However, I do not know how to compare quota counts.
Could someone please help me with this?
You don't need to compare quota counts. Just check the "Evenly Present Elements" box on your randomizer and Qualtrics will keep the A & B counts equal for you.

modeling scenario with mostly semi-additive facts

Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).

Integrating Qualtrics and MTurk for a multi-wave study

I want to run a study that recruits Mechanical Turk workers to take a survey on Qualtrics. But the survey itself is pretty long, and I want to make sure I get a good number of respondents from hard-to-find groups on MTurk (such as Latino men) without having to pay for thousands and thousands of respondents. So my idea is this: I'll first recruit many people to take a short demographic survey where they answer a few questions about themselves. Then, I'll load all those survey responses into R and, using MTurkR, I'll select a subset of those people to recontact (they'll be my sample for the real survey) and assign them some qualification, then send them a message asking them to sign up for my second HIT.
Okay, so then hopefully a bunch of people from my selected sample will go on to take my second (longer) survey. The challenge is that I'd like some of the questions in this second survey to include demographic data from the first survey. So if you said in the first survey that your religion is Catholic, I'd like there to be a question on the second survey that goes "As a Catholic...". But I don't see an obvious way to pull respondents' answers from the first survey into the second survey. Ideally, I'd have them put in their MTurk worker ID in both surveys, and that ID would serve as a link between surveys. But practically, how do I do this?
After you select the subset of people you want, save the recontact data to a CSV then upload it into a Qualtrics panel.
At the beginning of your Qualtrics survey flow, define the recontact fields as embedded variables that will be populated from the panel. Then you can use them in your survey as piped text (e.g. ${e://Field/religion}) or in logic (if embedded variable religion = Catholic).
In Qualtrics, create an email invite, then invite the people in your panel to take the survey. They will receive an email invite with a link that ties them to their panel data.

Determining the popularity of a video with ratings and views

I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
the existence of a third, implicit, value of the vote: "No vote"
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
the bandwagon effect
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
"quality" vs. "notoriety"
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
statistical representativity
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
trends (the time variable)
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.

Resources