Subset a dataframe according to very specific conditions [duplicate] - r

This question already has answers here:
Collapsing data frame by selecting one row per group
(4 answers)
Remove duplicated rows using dplyr
(6 answers)
Closed 6 years ago.
My apologies for this title, i didn't succeeded to find a good explicit title.
Here is a reproducible code for what my data looks like :
subject = gl(3,4,12)
item = factor(c("A","B","B","A","A","A","B","B","A","B","A","B"))
set.seed(123)
rt = runif(12, 1000, 2000)
df = data.frame(subject, item, rt)
> df
subject item rt
1 A 1287.578
1 B 1788.305
1 B 1408.977
1 A 1883.017
2 A 1940.467
2 A 1045.556
2 B 1528.105
2 B 1892.419
3 A 1551.435
3 B 1456.615
3 A 1956.833
3 B 1453.334
I would like to subset my data.frame in order to keep only the first occurence of each item for each subject.
For each subject, the item order is random and each item has been seen twice but i would like to keep only the first occurence.
Any idea of a simple way to do this ?

Related

split the lines of a data frame into a variable number of lines based on a character in R [duplicate]

This question already has answers here:
Split delimited strings in a column and insert as new rows [duplicate]
(6 answers)
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 10 months ago.
I have this df:
df = data.frame(ID = c(1,2,3),
A = c("h;d;c", "j;k", "k"))
And i want to retrieve a new df with splited rows based on ";" character, just like this:
ID A
1 1 h
2 1 d
3 1 c
4 2 j
5 2 k
6 3 k
I searched for other questions, but they need an exact amount of expected characters. (Split data frame string column into multiple columns)
Thanks for the help!

How to make random sample from dataframe on unique values in R [duplicate]

This question already has answers here:
Take random sample by group
(9 answers)
Sample n random rows per group in a dataframe
(5 answers)
Closed 3 years ago.
I have a dataframe from which I want to draw a random sample--not just any sample but one that contains exactly one randomly sampled row from each of the unique values in the column word:
set.seed(123)
df <- data.frame(
word = sample(LETTERS[1:5], 50, replace = T),
value = sample(1:10, 50, replace = T)
)
head(df)
word value
1 B 1
2 D 5
3 C 8
4 E 2
5 E 6
6 A 3
What I've done to solve this problem is this:
1. Store unique words in vector:
UniqueWords <- unique(df$word)
2. Set up a for loop:
for(i in UniqueWords){
df_sample[i,] <- df[sample(1:nrow(df[df$word==UniqueWords[i], ]), 1), ]
}
The loop, however, does not produce the correct result. How can it be tweaked or, alternatively, what other method can be used?

How to count the number of occurence of First Charcter of each string of a column in R [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 4 years ago.
I have a data set which has a single column containing multiple names.
For eg
Alex
Brad
Chrisitne
Alexa
Brandone
And almost 100 records like this. I want to display record as
A 2
B 2
C 1
Which means i need to show this frequency from higher to lower and if there is a tie breaker , the the values should be shown in Alphabetical Order .
I have been trying to solve this but i am not able to.
Any pointer on these ?
df <- data.frame(name = c("Alex", "Brad", "Brad"))
first_characters <- substr(df$name, 1, 1)
result <- sort(table(first_characters), decreasing = TRUE)
# from wide to long
data.frame(result)

How can I reference a specific row(s) in a data frame using an instance of a column variable in r? [duplicate]

This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 4 years ago.
Lets say I have the following data frame in r:
> patientData
patientID age diabetes status
1 1 25 Type 1 Poor
2 2 34 Type 2 Improved
3 3 28 Type 1 Excellent
4 4 52 Type 1 Poor
How can I reference a specific row or group of rows by using the specific value/level of a particular column rather than the row index? For instance, if I wanted to set a variable x to equal all of the rows which contain a patient with Type 1 diabetes or all of the rows that contain a patient in "Improved" status, how would I do that?
Try this one:
library(dplyr)
patientData %>%
filter(diabetes == "Type 1")
Next time, please provide a Minimum Reproducible Example.

Split a data frame by the last column programmatically [duplicate]

This question already has answers here:
Refer to the last column in R
(8 answers)
Closed 5 years ago.
I want to split a data frame, with an arbitrary number of columns, by the last column, without providing a column name or number. Something like [imaginary code land]:
d <- split(MY_DATA, ncol(MYDATA))
A sample data set might be something like:
pepsi 1
dr_pep 2
coke 1
Where our data set has no headers, by the last column would represent a desired grouping like the following:
dr_pep 2 --> group 2
pepsi 1 --> group 1
coke 1
df <- read.table(text = 'pepsi 1
dr_pep 2
coke 1', header=F)
split(df, df[,ncol(df)])

Resources