This question already has answers here:
Remove all rows where length of string is more than n
(4 answers)
Closed 1 year ago.
I'm working with an untidy dataset and want to filter out any object with an ID shorter than 6 digits (these rows contain errors).
I created a new column that calculates the number of characters for each ID, and then I filter for all objects with 6 or more digits, like so:
clean_df <- df %>%
mutate(chars = nchar(id)) %>%
filter(chars >= 6)
This is working just fine, but I'm wondering if there's an easier way.
Using str_length() from the stringr package (part of the tidyverse):
library(tidyverse)
clean_df <- df %>%
filter(str_length(id) >= 6)
If id's are numeric, just use log10
df %>%
filter(log10(id)>=5)
You can skip mutate
df %>%
filter(nchar(id) >= 6)
This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))
This question already has answers here:
Lookup value from another column that matches with variable
(3 answers)
Replace values in a dataframe based on lookup table
(8 answers)
Closed 3 years ago.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
I have complete data frame "data" but I do not have values for everybody which is in my second data frame "data1". However for administrative reasons I must use the full data. Basically "WANT" maintains the structure of "data" but fills in the values where they are available.
Here is a simple solution.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
select(-score) %>%
left_join(data1)
I may be reaching but maybe you need.
set.seed(1)
data=data.frame("id"=1:10,
"score"=sample(50:100,10))
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
mutate(score1 = score) %>%
select(-score) %>%
left_join(data1) %>%
mutate(score = if_else(is.na(score),
score1,
score)) %>%
select(-score1)
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have daily data (df$date is the daily field):
Which I want to group by week (df$wbm = "week beginning monday") in a new data frame (df2). When I run the below statement, the data frame that is returned is the same as the original:
df2<- df%>%
group_by(wbm)
The function runs without throwing an error, but it just produces the same data frame.
How can I drop date and ensure that my variables are grouped by wbm?
The group_by steps adds a grouping attribute, but we didn't give any command as to how to summarise it. If we need to get the sum of the columns that have column names as 'var' grouped by 'wbm', then use summarise_at
library(dplyr)
df%>%
group_by(wbm) %>%
summarise_at(vars(matches('^var\\d+$')), sum)
If it is only a single column to be summarised, it can be summarise
df %>%
group_by(wbm) %>%
summarise(var1 = sum(var1))
This question already has answers here:
Select multiple columns with dplyr::select() with numbers as names
(2 answers)
Closed 6 years ago.
I want to reshape the data and then select a specific column.
data(ChickWeight)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet=="1")
It creates the column names for me, which are numbers. So how could I select the column that named "0"? I know that %>% select(3) may work, but I need the solution to select columns with their names being number.
Use backticks to select columns with their names being number
data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)