How to count subjects on a longitudinal patient study in R? - r

I have a database with multiple patient visits, like
1
1
1
1
2
2
3
3
3
3
4
4
4
4
They are in a column (although here are shown in a row) and I would like to know how to count how many subjects do I have. Like in this case: 4
I don't know which code to use in R.
Thank you.

If I'm not wrong, you just want to know how many subjects you have.
In your case you have 4 subjects: 1, 2, 3 and 4.
Then, is the column that you say is stored in some data.frame, for example, you have one option:
length(unique(data$subjects))
Or if it's stored in a vector:
length(unique(vector.subjects))
I hope this is what you were looking for.
unique shows the different values that you may find on the vector. In this case: 1, 2, 3 and 4.
length counts the number of elements of unique vector (1, 2, 3 and 4)

Related

Combining data using R (or maybe Excel) -- looping to match stimuli

I have two sets of data, which correspond to different experiment tasks that I want to merge for analysis. The problem is that I need to search and match up certain rows for particular stimuli and for particular participants. I'd like to use a script to save some trouble. This is probably quite simple, but I've never done it before.
Here's my problem more specifically:
In the first data set, each row corresponds to a two-alternative forced choice task where two stimuli are presented at a time and the participant selects one. In the second data set, each row corresponds to a single item task where the participants are asked if they have ever seen the stimulus before. The stimuli in the second task match the stimuli in the pairs on the first task (twice as many rows). I want to be able to match up and add two columns to the first dataset--one that states if the leftside item was recognized later and one for the rightside stimulus.
I assume this could be done with nested loops, but I'm not sure if there is a elegant way to do this or perhaps a package.
As I understand it, your first dataset looks something like this:
(dat1 <- data.frame(person=1:2, stim1=1:2, stim2=3:4))
# person stim1 stim2
# 1 1 1 3
# 2 2 2 4
This would mean person 1 got stimuli 1 and 3 and person 2 got stimuli 2 and 4. Then your second dataset looks something like this:
(dat2 <- data.frame(person=c(1, 1, 2, 2), stim=c(1, 3, 4, 2), responded=c(0, 1, 0, 1)))
# person stim responded
# 1 1 1 0
# 2 1 3 1
# 3 2 4 0
# 4 2 2 1
This gives information about how each person responded to each stimulus they were given.
You can merge these two by matching person/stimulus pairs with the match function:
dat1$response1 <- dat2$responded[match(paste(dat1$person, dat1$stim1), paste(dat2$person, dat2$stim))]
dat1$response2 <- dat2$responded[match(paste(dat1$person, dat1$stim2), paste(dat2$person, dat2$stim))]
dat1
# person stim1 stim2 response1 response2
# 1 1 1 3 0 1
# 2 2 2 4 1 0
Another option (starting from the original dat1 and dat2) would be to merge twice with the merge function. You have a little less control on the names of the output columns, but it requires a bit less typing:
merged <- merge(dat1, dat2, by.x=c("person", "stim1"), by.y=c("person", "stim"))
merged <- merge(merged, dat2, by.x=c("person", "stim2"), by.y=c("person", "stim"))

Formatting data for two sample t-tests on R

Suppose I have the dataset that has the following information:
1) Number (of products bought, for example)
1 2 3
2) Frequency for each number (e.g., how many people purchased that number of products)
2 5 10
Let's say I have the above information for each of the 2 groups: control and test data.
How do I format the data such that it would look like this:
controldata<-c(1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
(each number * frequency listed as a vector)
testdata<- (similar to above)
so that I can perform the two independent sample t-test on R?
If I don't even need to make them a vector / if there's an alternative clever way to format the data to perform the t-test, please let me know!
It would be simple if the vector is small like above, but I can have the frequency>10000 for each number.
P.S.
Control and test data have a different sample size.
Thanks!
Use rep. Using your data above
rep(c(1, 2, 3), c(2, 5, 10))
# [1] 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Or, for your case
control_data = rep(n_bought, frequency)

R select multiple rows by conditional row number

I have a R dataframe like this one:
a<-c(1,2,3,4,5)
b<-c(6,7,8,9,10)
df<-data.frame(a,b)
colnames(df)<-c("a","b")
df
a b
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I would like to get the 1st, 2nd, 3rd AND 5th row of the column a, so 1 2 3 5, by selecting rows by their number.
I have tried df$a[1:3,5] but I get Error in df$a[1:3, 5] : incorrect number of dimensions.
What DOES work is c(df$a[1:3],df$a[5]) but I was wondering if there was an easier way to achieve this with R?
Your data frame has two dimensions (rows and columns). When you use the square brackets to extract values, R expects everything prior to the comma to indicate the rows desired, and everything after the comma to indicate the columns desired (see: ?[). Hence, df[1:3,5] means rows 1 through 3, from column 5. To turn your desired rows into a single vector, you need to concatenate (i.e., c(1:3,5)). That would all go before the comma, the column indicator, 1 or "a", would go after the comma. Thus, df[c(1:3,5), 1] is what you need.
For alternative answer (that might be more appropriate to a dataframe with many more columns), df[c(1:3, 5), "a"] as suggested by #Mamoun Benghezal would also get it done!

Changing vector of 1-10 to vector of 1-3 using R

I am using R to analyze a survey. Several of the columns include numbers 1-10, depending on how survey respondents answered the respective questions. I'd like to change the 1-10 scale to a 1-3 scale. Is there a simple way to do this? I was writing a complicated set of for loops and if statements, but I feel like there must be a better way in R.
I'd like to change numbers 1-3 to 1; numbers 4 and 8 to 2; numbers 5-7 to 3, and numbers 9 and 10 to NA.
So in the snippet below, OriginalColumn would become NewColumn.
OriginalColumn=c(4,9,1,10,8,3,2,7,5,6)
NewColumn=c(2,NA,1,NA,2,1,1,3,3,3)
Is there an easy way to do this without a bunch of crazy for loops? Thanks!
You can do this using positional indexing:
> c(1,1,1,2,3,3,3,2,NA,NA)[OriginalColumn]
[1] 2 NA 1 NA 2 1 1 3 3 3
It is better than repeated/nested ifelse because it is vectorized (thus easier to read, write, and understand; and probably faster). In essence, you're creating a new vector that contains that new values for every value you want to replace. So, for values 1:3 you want 1, thus the first three elements of the vector are 1, and so forth. You then use your original vector to extract the new values based on the positions of the original values.
You could also try
library(car)
recode(OriginalColumn, '1:3=1; c(4,8)=2; 5:7=3; else=NA')
#[1] 2 NA 1 NA 2 1 1 3 3 3

create new dataframe based on 2 columns

I have a large dataset "totaldata" containing multiple rows relating to each animal. Some of them are LactationNo 1 readings, and others are LactationNo 2 readings. I want to extract all animals that have readings from both LactationNo 1 and LactationNo 2 and store them in another dataframe "lactboth"
There are 16 other columns of variables of varying types in each row that I need to preserve in the new dataframe.
I have tried merge, aggregate and %in%, but perhaps I'm using them incorrectly eg.
(lactboth <- totaldata[totaldata$LactationNo %in% c(1,2), ])
Animal Id is column 1, and lactationno is column 2. I can't figure out how to select only those AnimalId with LactationNo=1&2
Have also tried
lactboth <- totaldata[ which(totaldata$LactationNo==1 & totaldata$LactationNo ==2), ]
I feel like this should be simple, but couldn't find an example to follow quite the same. Help appreciated!!
If I understand your question correctly, then your dataset looks something like this:
AnimalId LactationNo
1 A 1
2 B 2
3 E 2
4 A 2
5 E 2
and you'd like to select animals that happen to have both lactation numbers 1 & 2 (like A in this particular example). If that's the case, then you can simply use merge:
lactboth <- merge(totaldata[totaldata$LactationNo == 1,],
totaldata[totaldata$LactationNo == 2,],
by.x="AnimalId",
by.y="AnimalId")[,"AnimalId"]

Resources