Removing rows by reference using data.table? [duplicate] - r

This question already has answers here:
How to delete a row by reference in data.table?
(7 answers)
Closed 8 years ago.
I am trying to figure out how to remove a group of rows from a dataset by reference. For example, with this data set:
testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04","2013-10-05","2013-11-06")),
yr = c(2013,2013,2013,2013,2013),
mo = c(07,08,09,10,11),
da = c(02,03,04,05,06),
plant = LETTERS[1:5],
product = as.factor(letters[26:22]),
rating = runif(25))
I want to remove all rows where the product is "y". I have no idea how to go about this.

you can use either of the following commands -
testset_new <- subset(testset,product!="y")
or
testset_new <- testset[testset$product!="y",]

Related

create virtual patient data set [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Closed 3 years ago.
I would like to create a virtual patient data set in R , where in I would like to have the following time points TIME = seq(1, 10, 1), be repeated for 3 unique patient IDs
Thanks to the comment from #markus, you can use expand.grid:
df <- expand.grid(TIME = 1:10, ID = 1:3)

consolidate rows with same value using R [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 years ago.
I have a CSV file where there are 4 columns. I would like to get answer by adding the 4th column values where the 3rd column values are same.
The data that i have looks like this:
Now i want to aggregate the above data like this:
Anyone can help me with your ideas!
Using aggregate would do the trick here. Below I'm summing the value variable using id as the group (notice ids 6 and 10 are repeating).
df <- data.frame(id = c(1,2,3,4,5,6,6,7,8,9,10,10),
value = c(9,5,6,8,4,3,2,5,3,5,1,2))
df_sum <- aggregate(value ~ id, data=df, FUN=sum)

How to count the number of occurence of First Charcter of each string of a column in R [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 4 years ago.
I have a data set which has a single column containing multiple names.
For eg
Alex
Brad
Chrisitne
Alexa
Brandone
And almost 100 records like this. I want to display record as
A 2
B 2
C 1
Which means i need to show this frequency from higher to lower and if there is a tie breaker , the the values should be shown in Alphabetical Order .
I have been trying to solve this but i am not able to.
Any pointer on these ?
df <- data.frame(name = c("Alex", "Brad", "Brad"))
first_characters <- substr(df$name, 1, 1)
result <- sort(table(first_characters), decreasing = TRUE)
# from wide to long
data.frame(result)

Conditional Replace Values [duplicate]

This question already has answers here:
Update a Value in One Column Based on Criteria in Other Columns
(4 answers)
Closed 5 years ago.
I have a data set called NFL. I am trying to flag PlayType by "Sack", replace the NA in PlayerPosition with "QB", and then go back to normal. I can't figure out the code to make it happen. So far I have this which is wrong:
NFL$PlayerPosition[NFL$PlayType == "Sack"] <- "QB"
This works?
NFL[NFL$PlayType == "Sack",]$PlayerPosition <- "QB"
Is this what you are trying to do? It should work.
#Create dummy data
NFL <- data.frame(PlayType = c("A","B","C","Sack"),PlayerPosition = c(NA,NA,NA,NA))
#filter
NFL[NFL$PlayType == "Sack",]$PlayerPosition <- 'QA'

Padding multiple columns in a data frame or data table [duplicate]

This question already has answers here:
Filling missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 5 years ago.
I have a data frame like the following and would like to pad the dates.
Notice that four days are missing for id 3.
df = data.frame(
id = rep(1,1,1,2,2,3,3,3),
date = lubridate::ymd("2017-01-01","2017-01-02","2017-01-03",
"2017-05-10","2017-05-11","2017-01-03",
"2017-01-08","2017-01-09"),
type = c("A","A","A","B","B","C","C","C"),
val1 = rnorm(8),
val2 = rnorm(8))
df
I tried the padr package as I wanted a quick solution, but this doesn't seem to work.
?pad
padr::pad(df)
library(dplyr)
df %>% padr::pad(group = c('id'))
df %>% padr::pad(group = c('id','date'))
Any ideas on tools or other packages to pad a dataset across multiple columns and based on groupings
EDIT:
So there are three missing dates in my df.
"2017-01-03","2017-01-08","2017-01-09"
Thus, I want the final dates to include three extra rows that contain
"2017-01-04","2017-01-05","2017-01-06","2017-01-07"

Resources