dplyr filter tens of columns [duplicate]

dplyr filter tens of columns [duplicate] - r

This question already has answers here:
filter for complete cases in data.frame using dplyr (case-wise deletion)
(7 answers)
Closed 5 years ago.
Suppose I have a 27 columns data frame. The first column is the ID, and the rest of columns (A to Z) are just data. I want to take out all the rows whose A to Z columns are NA. How should I do it?
The straightforward way is just
data %>%
filter(!(is.na(A) & is.na(B) .... & is.na(Z)))
Is there a more efficient or easier way to do it?
This question is different from This one because I want to exclude rows whose value are ALL NA, and keep the rows whose value are partially NA.

Using tidyverse:
library(tidyverse)
Load data:
ID <- c(1:8)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
mydf <- data_frame(ID,Col1,Col2,Col3)
mydf %>%
slice(which(complete.cases(.)))
Whether you want to preserve selected columns removing rows with all NAs you may run:
mydf %>%
mutate(full_incomplete_cases=rowSums(is.na(.[-1]))) %>%
filter(full_incomplete_cases<length(mydf[,-1])) %>%
select(ID:Col3)

Related

Tidyverse filter by width of variable [duplicate]

This question already has answers here:
Remove all rows where length of string is more than n
(4 answers)
Closed 1 year ago.
I'm working with an untidy dataset and want to filter out any object with an ID shorter than 6 digits (these rows contain errors).
I created a new column that calculates the number of characters for each ID, and then I filter for all objects with 6 or more digits, like so:
clean_df <- df %>%
mutate(chars = nchar(id)) %>%
filter(chars >= 6)
This is working just fine, but I'm wondering if there's an easier way.

Using str_length() from the stringr package (part of the tidyverse):
library(tidyverse)
clean_df <- df %>%
filter(str_length(id) >= 6)

If id's are numeric, just use log10
df %>%
filter(log10(id)>=5)

You can skip mutate
df %>%
filter(nchar(id) >= 6)

Group by and count based on muliple conditions in R [duplicate]

This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help

It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))

An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))

Fill In One Data Frame With Another [duplicate]

This question already has answers here:
Lookup value from another column that matches with variable
(3 answers)
Replace values in a dataframe based on lookup table
(8 answers)
Closed 3 years ago.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
I have complete data frame "data" but I do not have values for everybody which is in my second data frame "data1". However for administrative reasons I must use the full data. Basically "WANT" maintains the structure of "data" but fills in the values where they are available.

Here is a simple solution.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
select(-score) %>%
left_join(data1)
I may be reaching but maybe you need.
set.seed(1)
data=data.frame("id"=1:10,
"score"=sample(50:100,10))
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
mutate(score1 = score) %>%
select(-score) %>%
left_join(data1) %>%
mutate(score = if_else(is.na(score),
score1,
score)) %>%
select(-score1)

r: Summarizing Data Frame Using Group By [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have daily data (df$date is the daily field):
Which I want to group by week (df$wbm = "week beginning monday") in a new data frame (df2). When I run the below statement, the data frame that is returned is the same as the original:
df2<- df%>%
group_by(wbm)
The function runs without throwing an error, but it just produces the same data frame.
How can I drop date and ensure that my variables are grouped by wbm?

The group_by steps adds a grouping attribute, but we didn't give any command as to how to summarise it. If we need to get the sum of the columns that have column names as 'var' grouped by 'wbm', then use summarise_at
library(dplyr)
df%>%
group_by(wbm) %>%
summarise_at(vars(matches('^var\\d+$')), sum)
If it is only a single column to be summarised, it can be summarise
df %>%
group_by(wbm) %>%
summarise(var1 = sum(var1))

dplyr select column when column name is number [duplicate]

This question already has answers here:
Select multiple columns with dplyr::select() with numbers as names
(2 answers)
Closed 6 years ago.
I want to reshape the data and then select a specific column.
data(ChickWeight)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet=="1")
It creates the column names for me, which are numbers. So how could I select the column that named "0"? I know that %>% select(3) may work, but I need the solution to select columns with their names being number.

Use backticks to select columns with their names being number
data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

dplyr filter tens of columns [duplicate] - r

Related

Tidyverse filter by width of variable [duplicate]

Group by and count based on muliple conditions in R [duplicate]

Fill In One Data Frame With Another [duplicate]

r: Summarizing Data Frame Using Group By [duplicate]

dplyr select column when column name is number [duplicate]

Categories

Resources