extracting columns from dataframe with atleast three values above cutoff - r

I am new to R programming. Need help to filter my data.For example my data set is mtcars. I want to extract columns which have at least three values above 18. How do i do that.thanks
I have used sort function but that is good only for one column each. not as a whole data frame.

You can get the names of the columns with the following code do this:
library(dplyr)
library(tidyr)
columns = mtcars %>% gather() %>% filter(value > 18) %>% count(key) %>% filter(n > 3) %>%
select(key)
And then filter the dataframe with:
mtcars[, c(t(columns))]
gather transforms the dataframe to one that has two columns:
key is the name of the column
value is the value taken by the observation for the column
The value above 18 are filtered and we count the number of observations by key (the name of the column).

Related

Find combinations of rows grouped by a column, picking n rows from each group in data.frame in R

Essentially this problem, but picking 2+ rows from each group.
In the scenario here, each group has 2+ rows, and I need to generate all combinations where I select two of them. Columns contain group, unique IDs, and numerical values.
Example data:
dat <- data.frame(Group = c("A","A","A","B","B","C","C","C","D","D","D","D")) %>%
group_by(Group) %>%
mutate(ID = paste(Group, seq_along(Group), sep ="_")) %>%
ungroup() %>%
mutate(value = sample(1:length(ID), replace=TRUE))
How do I find all possible data.frame combinations where n = 2 rows of each group are chosen?
Desired output could be a single list/data.frame of those combinations with unique IDs as in the answer to the linked question above, or preferably, a list of the unique data.frames themselves - each containing the 2 rows per group and their Group, ID, and value columns.
Thank you for your help!

R: Add count for unique values within Group, disregarding other variables within dataframe

I would like to add a new variable to my data frame, which, for each group says the number of unique entries with relation to one variable (state), while disregaring others.
Data input
df <- data.frame(id=c(1,2,3,4,5,6,7,8,9),
state=c("CT","CT","AK","TX","TX","AZ","GA","TX","WA"),
group=c(1,1,2,3,3,3,4,4,4),
age=c(12,33,57,98,45,67,16,85,22)
)
df
Desired output
want <- data.frame(id=c(1,2,3,4,5,6,7,8,9),
state=c("CT","CT","AK","TX","TX","AZ","GA","TX","WA"),
group=c(1,1,2,3,3,3,4,4,4),
age=c(12,33,57,98,45,67,16,85,22),
count=c(1,1,1,2,2,2,3,3,3)
)
want
We need a group by n_distinct
library(dplyr)
df %>%
group_by(group) %>%
mutate(count = n_distinct(state)) %>%
ungroup

Changing a Column to an Observation in a Row in R

I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.

R dplyr: select all columns after transforming selected columns

I have a tibble and want to only do some mutation on selected columns. In the case below, all columns which have the word 'date' will be transformed by a function (as.Date()).
After I have performed some transformations on the selected columns, I want to get back all the columns from my tibble df.
Is there a way to do so?
df %>% select(contains('date')) %>% mutate_all(as.Date) %>% select(all)
Thanks
We can use mutate_at instead of select and then mutate_all. This would select only the columns of interest and modify those while keeping the others columns as such
library(dplyr)
df %>%
mutate_at(vars(contains('date')), as.Date)

Error dplyr summarise

I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')

Resources