Converting 3 star rating to 5 star - math

I have some data in which the ratings are from 1 to 5.
But another department who gave us data has given it in 1 to 3.
Is there a way to convert those 1 to 3 rating and spread it on 1 to 5? Or a mathematical way to do this?

Related

How do I change the order of multiple grouped values in a row dependent on another variable in that row in R?

I need some help conditionally sorting/switching data based on a factor variable.
I'm not sure if it's a typical use case I just can't formulate properly enough for a search engine to show me a solution or if it is that niche but I haven't found anything yet.
I currently have a dataframe like this:
id group a1 a2 a3 a4 b1 b2 b3 b4
1 1 2 6 6 3 4 4 6 4
2 2 5 2 2 2 2 5 2 3
3 1 6 3 3 1 3 6 4 1
4 1 4 8 4 2 7 8 8 9
5 2 3 1 1 4 2 1 1 7
For context this is from a psychological experiment where people went through two variations of a task and the order of those conditions was determined by the experimental group they were assigned to. The columns represent different measurements from different trials and are currently grouped together for the same variable and in chronological order, meaning a1,a2,a3,a4 are essentially the same variable at consecutive time points, same with b1,b2,b3,b4.
I want to split them up for the different conditions so regardless of which group (=which order of tasks) someone went through, data from one condition should come first in the dataframe and columns should still be grouped together for the same variables and in chronological order within that condition. It should essentially look like this:
id group c1a1 c1a2 c2a1 c2a2 c1b1 c1b2 c2b1 c2b2
1 1 2 6 6 3 4 4 6 4
2 2 2 2 5 2 2 3 2 5
3 1 6 3 3 1 3 6 4 1
4 1 4 8 4 2 7 8 8 9
5 2 1 4 3 1 1 7 2 1
So essentially for group 1 everything stays the same since they happened to go through the conditions in the same order that I want to have in the new dataframe while for group 2 values are being switched where the originally second half of values for each variable is put in front of the originally first one.
I hope I formulated the problem in a way, people can understand it.
My real dataset is a bit more complicated it has 180 columns minus id and group so 178.
I have 13 variables some of which were measured over two conditions with 5 trials for each of those and some which have those 5 trials for each of the 2 main condition but which also have 2 adittional measurements for each condition where the order was determined by the same group variable.
(We essentially asked participants to do the task again in two certain ways, which allowed us to see if they were capable of doing them like that if they wanted to under the circumstences of both main conditions).
So there are an adittional 4 columns for some variables which need to be treated seperately. It should look like this when transformed (x and y are the 2 extra tasks where only b was measured once):
id group c1a1 c1a2 c2a1 c2a2 c1b1 c1b2 c1bx c1by c2b1 c2b2 c2bx c2by
1 1 2 6 6 3 4 4 3 7 6 4 4 2
2 2 2 2 5 2 2 3 4 3 2 5 2 2
3 1 6 3 3 1 3 6 2 2 4 1 1 1
4 1 4 8 4 2 7 8 1 1 8 9 5 8
5 2 1 4 3 1 1 7 8 9 2 1 3 4
What I want to say with this is, I need a pretty general solution.
I already tried formulating a function for creation of two seperate datasets for the groups and then merging them by id but got stuck with the automatic creation and naming of columns which I can't seem to wrap my head around. dplyr is currently loaded and used for some other transformations but since I'm not really good with it, I need to ask for your help regarding a solution with or without it. I'm still pretty new to R and this is for my bachelor thesis.
Thanks in advance!
Your question leaves a few things unclear that make this hard to answer, but here is maybe a start that could help, or at least help clarify your problem.
It would really help if you could clarify 2 pieces of info, what types of column rearrangements you need, and how you distinguish what indicates that a row needs to have this transformation.
I'm also wondering if instead of trying to manipulate your data in its current shape, if it not might be more practical to figure out how to change the shape of your data to better represent your data, perhaps using something like pivot_longer(), I don't know how this data will ultimately be used or what the actual values indicate, but it doesn't seem to be very tidy in its current form, and instead having a "longer" table might be more meaningful, but I'll still provide what I think is a solution to your listed problem.
This creates some example data that looks like it reflects yours in the example table.
ID=seq(1:10)
group=sample(1:2,10,replace=T)
Data=matrix(sample(1:10,80,replace=T),nrow=10,ncol=8)
DataFrame=data.frame('ID'=ID,'Group'=group,Data)
You then define the groups of columns that need to be kept together. I can't tell if there is an automated way for you to indicate which columns are grouped, but this might get bulky if done manually. Some more information on what your column names actually are, and how they are distributed in groups would help.
ColumnGroups=list('One'=c('X1','X2'),'Two'=c('X3','X4'),'Three'=c('X5','X6'),'Four'=c('X7','X8'))
You can then figure out which rows need to have rearranged done by using some conditional. Based on your example, I'm assuming when the group variable equals 2, then the rearranging needs to be done, which is what I've used here.
FlipRows=DataFrame$Group==2
You can then have R only apply the rearrangement needed to those rows that need it, and define the rearrangement based on the ordering of the different column groups. I know you ask for a general solution, but is hard to identify the general solution you need without knowing what types of column rearrangements you need. If it is always flipping two sets of consecutive column groups, that would be easier to define without having to type it all out. What I have done here would require you to manually type out the order of the different column groups that you would like the rows to be rearranged as. The SortedDataFrame object seems to be what you are looking for, but might not actually reflect your real data. I removed columns 1 and 2 in this operation since those are ID and group which you don't want overridden.
SortedDataFrame=DataFrame
SortedDataFrame[FlipRows,-c(1,2)]=DataFrame[FlipRows,c(ColumnGroups$Two,ColumnGroups$One,ColumnGroups$Four,ColumnGroups$Three)]
This solution won't work if you need to rearrange each row differently, but it is unclear if that is the case. Try to provide any of the other info requested here, and let me know where this solution doesn't work for you, and that.

R: Pairwise Matrix Manipulation & Variable Construction with Many Groups

I'm starting with data of scores at the "group-person" level as follows:
group_id person_id score
1 1 3
1 2 1
1 3 5
2 1 3
2 2 3
2 3 6
The goal is to generate data on person-person pairs that looks like the following:
person_id1 person_id2 sumsquarederror
1 2 4
1 3 13
2 3 25
where the "sumsquarederror" variable is defined as the sum across all groups of the squared differences in score values for each possible pair of persons. In mathspeak, this variable would be defined like: for persons i=1 and i=2 and groups j=(1,...,J)
sumsquarederror(i=1,i=2) = sum_j (( score(i=1) - score(i=2) )^2)
Building this data is trivial with small numbers of groups and persons, but I have roughly 1,000 groups and 150,000 persons, so creating matrices/dataframes for all combinations possible quickly becomes computationally burdensome (=150K by 150K by 1K, before collapsing to the sumsquarederror variable)
I'm guessing there might be some linear algebra approaches or regression-type ideas, but am stumped. Any tips or tricks or useful packages would be greatly appreciated!

Grouping and building intervals of data in R and useful visualization

I have some data extracted via HIVE. In the end we are talking of csv with around 500 000 rows. I want to plot them after grouping them in intervals.
Beside the grouping it's not clear how to visualize the data. Since we are talking about low spends and sometimes a high frequency I'm not sure how to handle this problem.
Here is just an overview via head(data)
userid64 spend freq
575033023245123 0.00924205 489
12588968125440467 0.00037 2
13830962861053825 0.00168 1
18983461971805285 0.001500366 333
25159368164208149 0.00215 1
32284253673482883 0.001721303 222
33221593608613197 0.00298 709
39590145306822865 0.001785281 11
45831636009567401 0.00397 654
71526649454205197 0.000949978 1
78782620614743930 0.00552 5
I want to group the data in intervals. So I want an extra columns indicating the groups. The first group should contain all data with an frequency (called freq) between 1 and 100. The second group should contain all rows where there entries have a frequency between 101 and 200... and so on.
The result should look like
userid64 spend freq group
575033023245123 0.00924205 489 5
12588968125440467 0.00037 2 1
13830962861053825 0.00168 1 1
18983461971805285 0.001500366 333 3
25159368164208149 0.00215 1 1
32284253673482883 0.001721303 222 2
33221593608613197 0.00298 709 8
39590145306822865 0.001785281 11 1
45831636009567401 0.00397 654 7
71526649454205197 0.000949978 1 1
78782620614743930 0.00552 5 1
Is there a nice and gentle art to get this? I need this grouping for upcoming plots. I want to do visualization for all intervals to get an overview regarding the spend. If you have any ideas for the visualization please let me know. I thought I should work with boxplots.
If you want to group freq for every 100 units, you can try ceiling function in base R
ceiling(df$freq / 100)
#[1] 5 1 1 4 1 3 8 1 7 1 1
where df is your dataframe.

Simple formula to ensure two teams get mixed up?

Say I have a number of users who are evenly split between two teams (if there is an odd count then one team will have one extra player).
I want to make sure everyone gets to be on the same team as everyone else over the course of 3 games with team changes after each game.
What's an easy mechanism for doing this for any number of players?
If it makes it easier for the purpose of explanation, I can give each player a number from 1 to N (where N is the number of players).
TIA
Let's assume we have 6 players, 1 - 6. You want to create different teams for 3 rounds of play.
For the first round, you deal the players.
1 2
3 4
5 6
For the second round, you put the winning team first, then the losing team. Let's assume that the team with player 1 won. Then the list of players would look like this.
1 3 5 2 4 6
And you would deal them like this.
1 3
5 2
4 6
For the third round, you do the same as the second round. This time, let's assume the team with player 3 won.
3 2 6 1 5 4
And you would deal them like this.
3 2
6 1
5 4
With only 3 rounds and many more than 6 players, everybody isn't going to be able to play everybody. But this shuffling algorithm gives a good mixture and is relatively simple to implement.

how to add variable in longitudinal data using SPSS or R?

I have a file with repeated measures data and another file with single observations for the same persons (e.g. in one file subjects have repeated assessments and the other file just says if subjects are male or female) when I merge the files I get something like this:
ID time gender
1 1 0
1 2
1 3
2 1 1
2 2
3 1 0
3 2
3 3
3 4
but I want that the variable that was measured once (e.g.male/female) to be repeated across time (in each row) for each subject. So I would like to have :
1 1 0
1 2 0
1 3 0
2 1 1
2 2 1
and not do it manually, since I have thousands of cases...
How to do this in SPSS (preferably), or in R ?
You should have used match files with one "file" (multiple record per ID) and one "table" (no duplicate ID's).
But you can probably still fix it by running
sort cases by ID.
if mis(gender) and ID = lag(ID) gender= lag(gender).
Wherever there's no value for gender, it will be filled in with the gender of the previous case if it has the same ID as the current one.

Resources