I'm trying to create a fairly standard cognitive psychology experiment that's proving to be problematic in Qualtrics.
I'm randomly presenting a sequence of images using loop and merge, but I want some of the images to be repeated multiple times. That part is easy - I just overrepresent some images in the loop & merge list. However, I don't want the repeated images to occur twice in a row. Is there a solution to this, or a clever workaround?
Related
I’m trying to do a basic A/B test, where I have to split my observations into two as equal groups as possible. The observations are based on session data from Google Analytics for different cities.
Therefore I need to create a random 50\50 split between the cities, but the split has to make the two groups as equal regarding sessions as possible. What would be my best option?
I’ve tried to perform a genetic algorithm approach to the problem, but when there’s only one variable, there must be an easier and faster way, right?
Background: I have a very long vector (think many millions of rows) that I cannot display easily, as the data is simply too large. The data is time-series - it exhibits temporal dependency.
My goal is to somehow visualize a part (or parts) of it that is representative enough (i.e. not just the first 10k rows or so)
Normally, if the data were iid and I wanted to display a part of it, I would just do resampling with replacement.
Question Since the data is time series, I was thinking of using "block resampling" (I don't know if this is a real term, I was thinking more of block bootstrap but without actually computing any statistics). Does anybody have a good idea (or even packages) of how I can achieve what I am looking for in a clever way?
So I have created a dataframe with data coming from a loop that runs several migration rates over a period of time and outputs number of people in several categories.
However when I try plotting this the output is read as a single line instead of several seperate ones for different migration rates.
What would be the easiest way to seperate these out?
Thanks!
I tried using sdf_pivot() to widen my column with duplicate values into multiple (a very big number) columns. I planned to use these columns as the feature space for training an ML model.
Example: I have a language element sequence in one column (words), which I wish to turn into binary matrix of a huge width (say, 100,000) and run a sentiment analysis using a logistic regression.
The first problem is that by default sparklyr does not allow me to make more than 10K columns, citing possible eeror in my design.
The second problem is that even if I override this warning and make lots of columns, further calculations last forever on this very wide data.
Question 1: is it a good practice to make extra wide datasets or I should work differently with so deep feature spaces, while using the power of fast parallel calculations with Spark?
Question 2: is it possible to construct the vector-type feature column avoiding the generation of a very wide matrix?
I just need a small example or practical tips to follow.
https://github.com/rstudio/sparklyr/issues/1322
I'm pretty new to Tableau but have a lot of experience with R. Everytime I use SCRIPT_REAL to call an R function based on Tableau aggregates, I get back a number that seems to be like the closest fraction approximation. For example if raw R gives me .741312, Tableau will spit out .777778, and so on. Does anything have any experience with this issue?
I'm pretty sure this is an aggregation issue.
From the Tableau and R Integration post by Jonathan Drummey on their community site:
Using Every Row of Data - Disaggregated Data For accurate results
for the R functions, sometimes those R functions need to be called
with every row in the underlying data. There are two solutions to
this:
Disaggregate the measures using Analysis->Aggregate Measures->Off. This doesn’t actually cause the measures to stop their
aggregations, instead it tells Tableau to return every row in the data
without aggregating by the dimensions on the view (which gives the
wanted effect). Using this with R scripts can get the desired results,
but can cause problems for views that we want to have R work on the
non-aggregated data and then display the data with some level of
aggregation.
The second solution deals with this situation: Add a
dimension such as a unique Row ID to the view, and set the Compute
Using (addressing) of the R script to be along that dimension. If
we’re doing some sort of aggregation with R, then we might need to
reduce the number of values returned by filtering them out with
something like:
IF FIRST()==0 THEN SCRIPT_REAL('insert R script here') END
If we need to then perform additional aggregations on that
data, we can do so with table calculations with the appropriate
Compute Usings that take into account the increased level of detail in
the view.