Formatting data for two sample t-tests on R - r

Suppose I have the dataset that has the following information:
1) Number (of products bought, for example)
1 2 3
2) Frequency for each number (e.g., how many people purchased that number of products)
2 5 10
Let's say I have the above information for each of the 2 groups: control and test data.
How do I format the data such that it would look like this:
controldata<-c(1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
(each number * frequency listed as a vector)
testdata<- (similar to above)
so that I can perform the two independent sample t-test on R?
If I don't even need to make them a vector / if there's an alternative clever way to format the data to perform the t-test, please let me know!
It would be simple if the vector is small like above, but I can have the frequency>10000 for each number.
P.S.
Control and test data have a different sample size.
Thanks!

Use rep. Using your data above
rep(c(1, 2, 3), c(2, 5, 10))
# [1] 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Or, for your case
control_data = rep(n_bought, frequency)

Related

How to count subjects on a longitudinal patient study in R?

I have a database with multiple patient visits, like
1
1
1
1
2
2
3
3
3
3
4
4
4
4
They are in a column (although here are shown in a row) and I would like to know how to count how many subjects do I have. Like in this case: 4
I don't know which code to use in R.
Thank you.
If I'm not wrong, you just want to know how many subjects you have.
In your case you have 4 subjects: 1, 2, 3 and 4.
Then, is the column that you say is stored in some data.frame, for example, you have one option:
length(unique(data$subjects))
Or if it's stored in a vector:
length(unique(vector.subjects))
I hope this is what you were looking for.
unique shows the different values that you may find on the vector. In this case: 1, 2, 3 and 4.
length counts the number of elements of unique vector (1, 2, 3 and 4)

Combining data using R (or maybe Excel) -- looping to match stimuli

I have two sets of data, which correspond to different experiment tasks that I want to merge for analysis. The problem is that I need to search and match up certain rows for particular stimuli and for particular participants. I'd like to use a script to save some trouble. This is probably quite simple, but I've never done it before.
Here's my problem more specifically:
In the first data set, each row corresponds to a two-alternative forced choice task where two stimuli are presented at a time and the participant selects one. In the second data set, each row corresponds to a single item task where the participants are asked if they have ever seen the stimulus before. The stimuli in the second task match the stimuli in the pairs on the first task (twice as many rows). I want to be able to match up and add two columns to the first dataset--one that states if the leftside item was recognized later and one for the rightside stimulus.
I assume this could be done with nested loops, but I'm not sure if there is a elegant way to do this or perhaps a package.
As I understand it, your first dataset looks something like this:
(dat1 <- data.frame(person=1:2, stim1=1:2, stim2=3:4))
# person stim1 stim2
# 1 1 1 3
# 2 2 2 4
This would mean person 1 got stimuli 1 and 3 and person 2 got stimuli 2 and 4. Then your second dataset looks something like this:
(dat2 <- data.frame(person=c(1, 1, 2, 2), stim=c(1, 3, 4, 2), responded=c(0, 1, 0, 1)))
# person stim responded
# 1 1 1 0
# 2 1 3 1
# 3 2 4 0
# 4 2 2 1
This gives information about how each person responded to each stimulus they were given.
You can merge these two by matching person/stimulus pairs with the match function:
dat1$response1 <- dat2$responded[match(paste(dat1$person, dat1$stim1), paste(dat2$person, dat2$stim))]
dat1$response2 <- dat2$responded[match(paste(dat1$person, dat1$stim2), paste(dat2$person, dat2$stim))]
dat1
# person stim1 stim2 response1 response2
# 1 1 1 3 0 1
# 2 2 2 4 1 0
Another option (starting from the original dat1 and dat2) would be to merge twice with the merge function. You have a little less control on the names of the output columns, but it requires a bit less typing:
merged <- merge(dat1, dat2, by.x=c("person", "stim1"), by.y=c("person", "stim"))
merged <- merge(merged, dat2, by.x=c("person", "stim2"), by.y=c("person", "stim"))

R : Create and transform a dataframe based on existing dataframe

My original question was poorly defined and confusing. Here it is again with extraneous columns removed for clarity and additional background.
My end game is to create a forced network graph using networkD3. To do this I need two dataframes.
The first dataframe (dfNodes) lists each node in the graph and its grouping. Volt, Miata and Prius are cars, so they get the group value of 1.
Crusader is a bus so it is in group 3. Etc.
dfNodes dataframe:
id name group
0 vehicle 0
1 car 1
2 truck 2
3 bus 3
4 volt 1
5 miata 1
6 prius 1
7 tahoe 2
8 suburban 2
9 crusader 3
From this dataframe I need to construct the dataframe dfLinks to provide the linkages between nodes. It must look like:
dfLinks dataframe:
source target
0 4
0 5
0 6
0 7
0 8
0 9
1 4
1 5
1 6
2 7
2 8
3 9
This shows the following:
vehicle is linked to volt, miata, prius, tahoe, suburban, crusader (they are all vehicles). (0, 4; 0, 5...0, 9)
car is linked to volt, miata, prius (1, 4; 1, 5; 1, 6)
truck is linked to tahoe, suburban (2, 7 ; 2, 8)
bus is linked to crusader (3, 9)
It may appear strange to link vehicle-->model name (volt...crusader)--> type (car/bus/truck) instead of
vehicle--> type --> model name
but that is the form that I need for my graph.
I don't understand the criteria for creating the second dataframe; for example 'car' appears three times in the second data frame but is present four times in the first data frame? If you can provide more precise details about the transformation I can try and help more, but my first suggestion would be to look at the melt() and cast() functions in the reshape package:
http://www.statmethods.net/management/reshape.html
Based on the replies from Phil and aosmith I revisited my question again. I have the luxury of creating the source dataframe (dfNodes) so I approached the problem from further upstream, first by creating dfNodes by combining dataframes, then creating dfLinks independently from dfNodes instead of based on dfNodes. I feel my approach is very much a kludge but it does give me the result I am looking for. Here is the code. Other approaches, advice and criticism welcomed! As you can tell, I am new to R.
# Separate dataframes for the different categories of data
vehicle <- data.frame(source=0, name="vehicle", group=0)
carGroup <- data.frame(source=1, name="car", group=1)
truckGroup <- data.frame(source=2, name="truck", group=2)
busGroup <- data.frame(source=3, name="bus", group=3)
cars <- data.frame(source=c(4,5,6),
name=c("volt", "miata", "prius"),
group=c(1,1,1))
trucks <- data.frame(source=c(7,8),
name=c("tahoe", "suburban"),
group=c(2,2))
buses <- data.frame(source=9,
name="crusader",
group=3)
# 1. Build the dfNodes dataframe
dfNodes <- rbind(vehicle, carGroup, truckGroup, busGroup, cars, trucks, buses)
names(dfNodes)[1]<-"id"
dfNodes
# 2. Build dfLinks dataframe. Only source and target columns needed
# Bind Vehicle to the 3 different vehicle dataframes
vehicleToCars <- cbind(vehicle, cars)
vehicleToTrucks <- cbind(vehicle, trucks)
vehicleToBuses <- cbind(vehicle, buses)
# Bind the Vehicle Groups to the vehicles
carGroupToCars <- cbind(carGroup, cars)
truckGroupToTrucks <- cbind(truckGroup, trucks)
busGroupToBuses <- cbind(busGroup, buses)
# Stack into the final dfLinks dataframe
dfLinks <- rbind(vehicleToCars, vehicleToTrucks, vehicleToBuses,
carGroupToCars, truckGroupToTrucks, busGroupToBuses)
names(dfLinks)[4]<-"target"
dfLinks

finding set of multinomial combinations

Let's say I have a vector of integers 1:6
w=1:6
I am attempting to obtain a matrix of 90 rows and 6 columns that contains the multinomial combinations from these 6 integers taken as 3 groups of size 2.
6!/(2!*2!*2!)=90
So, columns 1 and 2 of the matrix would represent group 1, columns 3 and 4 would represent group 2 and columns 5 and 6 would represent group 3. Something like:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 2 4 5 3 6
1 2 4 6 3 5
...
Ultimately, I would want to expand this to other multinomial combinations of limited size (because the numbers get large rather quickly) but I am having trouble getting things to work. I've found several functions that do binomial combinations (only 2 groups) but I could not locate any functions that do this when the number of groups is greater than 2.
I've tried two approaches to this:
Building up the matrix from nothing using for loops and attempting things with the reshape package (thinking that might be something there for this with melt() )
working backwards from the permutation matrix (720 rows) by attempting to retain unique rows within groups and or removing duplicated rows within groups
Neither worked for me.
The permutation matrix can be obtained with
library(gtools)
dat=permutations(6, 6, set=TRUE, repeats.allowed=FALSE)
I think working backwards from the full permutation matrix is a bit excessive but I'm tring anything at this point.
Is there a package with a prebuilt function for this? Anyone have any ideas how I shoud proceed?
Here is how you can implement your "working backwards" approach:
gps <- list(1:2, 3:4, 5:6)
get.col <- function(x, j) x[, j]
is.ordered <- function(x) !colSums(diff(t(x)) < 0)
is.valid <- Reduce(`&`, Map(is.ordered, Map(get.col, list(dat), gps)))
dat <- dat[is.valid, ]
nrow(dat)
# [1] 90

Unit of Analysis Conversion

We are working on a social capital project so our data set has a list of an individual's organizational memberships. So each person gets a numeric ID and then a sub ID for each group they are in. The unit of analysis, therefore, is the group they are in. One of our variables is a three point scale for the type of group it is. Sounds simple enough?
We want to bring the unit of analysis to the individual level and condense the type of group it is into a variable signifying how many different types of groups they are in.
For instance, person one is in eight groups. Of those groups, three are (1s), three are (2s), and two are (3s). What the individual level variable would look like, ideally, is 3, because she is in all three types of groups.
Is this possible in the least?
##simulate data
##individuals
n <- 10
## groups
g <- 5
## group types
gt <- 3
## individuals*group membership
N <- 20
## inidividuals data frame
di <- data.frame(individual=sample(1:n,N,replace=TRUE),
group=sample(1:g,N, replace=TRUE))
## groups data frame
dg <- data.frame(group=1:g, type=sample(1:gt,g,replace=TRUE))
## merge
dm <- merge(di,dg)
## order - not necessary, but nice
dm <- dm[order(dm$individual),]
## group type per individual
library(plyr)
dr <- ddply(dm, "individual", function(x) length(unique(x$type)))
> head(dm)
group individual type
2 2 1 2
8 2 1 2
20 5 1 1
9 3 3 2
12 3 3 2
17 4 3 2
> head(dr)
individual V1
1 1 2
2 3 1
3 4 2
4 5 1
5 6 1
6 7 1
I think what you're asking is whether it is possible to count the number of unique types of group to which an individual belongs.
If so, then that is certainly possible.
I wouldn't be able to tell you how to do it in R since I don't know a lot of R, and I don't know what your data looks like. But there's no reason why it wouldn't be possible.
Is this data coming from a database? If so, then it might be easier to write a SQL query to compute the value you want, rather than to do it in R. If you describe your schema, there should be lots of people here who could give you the query you need.

Resources