Operations on single row in dplyr [duplicate] - r

This question already has answers here:
dplyr mutate/replace several columns on a subset of rows
(12 answers)
Closed 3 years ago.
Is it possible to performa dplyr operations with pipes for single rows of a dataframe? For example say I have the following a dataframe (call it df) and want to do some manipulations to the columns of that dataframe:
df <- df %>%
mutate(col1 = col1 + col2)
This code sets one column equal to the sum of that column and another. What if I want to do this, but only for a single row?
df[1,] <- df[1,] %>%
mutate(col1 = col1 + col2)
I realize this is an easy operation in base R, but I am super curious and would love to use dplyr operations and piping to make this happen. Is this possible or does it go against dplyr grammar?
Here's an example. Say I have a dataframe:
df = data.frame(a = rep(1, 100), b = rep(1,100))
The first example I showed:
df <- df %>%
mutate(a = a + b)
Would result in column a_xPlacexHolderxColumnaPlacexHolderx_ being 2 for all rows.
The second example would only result in the first row of column a_xPlacexHolderxColumnaPlacexHolderx_ being 2.

mutate() is for creating columns.
You can do something like df[1,1] <- df[1,1] + df[1,2]
An Example:

You can mutate() and case_when() for conditional manipulation.
df %>%
mutate(a = case_when(row_number(a) == 1 ~ a + b,
TRUE ~ a))
results in
# A tibble: 100 x 2
a b
<dbl> <dbl>
1 2 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
# … with 90 more rows
Data
library(tidyverse)
df <- tibble(a = rep(1, 100), b = rep(1,100))

Related

Find a set of column names and replace them with new names using dplyr

I have below data frame
library(dplyr)
data = data.frame('A' = 1:3, 'CC' = 1:3, 'DD' = 1:3, 'M' = 1:3)
Now let define a vectors of strings which represents a subset of column names of above data frame
Target_Col = c('CC', 'M')
Now I want to find the column names in data that match with Target_Col and then replace them with
paste0('Prefix_', Target_Col)
I prefer to do it using dplyr chain rule.
Is there any direct function available to perform this?
Other solutions can be found here!
clickhere
vars<-cbind.data.frame(Target_Col,paste0('Prefix_', Target_Col))
data <- data %>%
rename_at(vars$Target_Col, ~ vars$`paste0("Prefix_", Target_Col)`)
or
data %>% rename_with(~ paste0('Prefix_', Target_Col), all_of(Target_Col))
We may use
library(stringr)
library(dplyr)
data %>%
rename_with(~ str_c('Prefix_', .x), all_of(Target_Col))
A Prefix_CC DD Prefix_M
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
With dplyrs rename_with
library(dplyr)
rename_with(data, function(x) ifelse(x %in% Target_Col, paste0("Prefix_", x), x))
A Prefix_CC DD Prefix_M
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3

How to count cumulative number of implied groupings in a single column of a dataframe in base R or dplyr? [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 5 months ago.
Suppose we start with this data frame myDF generated by the code immediately beneath:
> myDF
index
1 2
2 2
3 4
4 4
5 6
6 6
7 6
Generating code: myDF <- data.frame(index = c(2,2,4,4,6,6,6))
I'd like to add a column cumGrp to data frame myDF that provides a cumulative count of implicitly grouped elements, as illustrated below. Any suggestions of simple concise base R or dplyr code to do this?
> myDF
index cumGrp cumGrp explained
1 2 1 1st grouping of same index numbers (2) adjacent to each other
2 2 1 Same as above
3 4 2 2nd grouping of same index numbers (4) adjacent to each other
4 4 2 Same as above
5 6 3 3rd grouping of same index numbers (6) adjacent to each other
6 6 3 Same as above
7 6 3 Same as above
Many possible ways:
dplyr::cur_group_id
library(dplyr)
myDF %>%
group_by(index) %>%
mutate(cumGrp = cur_group_id())
cumsum
library(dplyr)
myDF %>%
mutate(cumGrp = cumsum(index != lag(index, default = 0)))
as.numeric + factor
myDF |>
transform(cumGrp = as.numeric(factor(index)))
data.table::.GRP
library(data.table)
setDT(myDF)[, num := .GRP, by = index]
match
myDF |>
transform(cumGrp = match(index, unique(index)))
collapse::group
library(collapse)
myDF |>
settransform(cumGrp = group(index))

Filter groups by occurrence of multiple values in another column [duplicate]

This question already has answers here:
Extract tuples with specified common values in another column in SQL
(5 answers)
Closed 3 years ago.
Similar to this question but with an added wrinkle. I'd like to filter only groups of rows that have both of two (or all of several) values in a particular column in any row of the group.
For example, say I have this dataframe:
df <- data.frame(Group=LETTERS[c(1,1,1,2,2,2,3,3,3,3)], Value=c(5, 1:4, 1:4, 5))
And I want those letters where any letter has both a row with a corresponding value of 4 AND a row with a corresponding value of 5, so this:
Group Value
<fct> <dbl>
1 C 2
2 C 3
3 C 4
4 C 5
I can do that with a pair of any calls inside filter like this:
df %>%
group_by(Group) %>%
filter(any(Value == 4),
any(Value == 5))
Is there a way to do the filter call in one line? Something like: (note this doesn't work, all_of is not a real function)
df %>%
group_by(Group) %>%
filter(all_of(Value == 4 & Value == 5))
all is a valid function and can be used in combination with %in% (for vectors of length >=1)
library(dplyr)
df %>%
group_by(Group) %>%
filter(all(c(4, 5) %in% Value))
# A tibble: 4 x 2
# Groups: Group [1]
# Group Value
# <fct> <dbl>
#1 C 2
#2 C 3
#3 C 4
#4 C 5
Or with sum of logical vector
df %>%
group_by(Group) %>%
filter(!sum(!c(4, 5) %in% Value))

how to count repetitions of first occuring value with dplyr

I have a dataframe with groups that essentially looks like this
DF <- data.frame(state = c(rep("A", 3), rep("B",2), rep("A",2)))
DF
state
1 A
2 A
3 A
4 B
5 B
6 A
7 A
My question is how to count the number of consecutive rows where the first value is repeated in its first "block". So for DF above, the result should be 3. The first value can appear any number of times, with other values in between, or it may be the only value appearing.
The following naive attempt fails in general, as it counts all occurrences of the first value.
DF %>% mutate(is_first = as.integer(state == first(state))) %>%
summarize(count = sum(is_first))
The result in this case is 5. So, hints on a (preferably) dplyr solution to this would be appreciated.
You can try:
rle(as.character(DF$state))$lengths[1]
[1] 3
In your dplyr chain that would just be:
DF %>% summarize(count_first = rle(as.character(state))$lengths[1])
# count_first
# 1 3
Or to be overzealous with piping, using dplyr and magrittr:
library(dplyr)
library(magrittr)
DF %>% summarize(count_first = state %>%
as.character %>%
rle %$%
lengths %>%
first)
# count_first
# 1 3
Works also for grouped data:
DF <- data.frame(group = c(rep(1,4),rep(2,3)),state = c(rep("A", 3), rep("B",2), rep("A",2)))
# group state
# 1 1 A
# 2 1 A
# 3 1 A
# 4 1 B
# 5 2 B
# 6 2 A
# 7 2 A
DF %>% group_by(group) %>% summarize(count_first = rle(as.character(state))$lengths[1])
# # A tibble: 2 x 2
# group count_first
# <dbl> <int>
# 1 1 3
# 2 2 1
No need of dplyrhere but you can modify this example to use it with dplyr. The key is the function rle
state = c(rep("A", 3), rep("B",2), rep("A",2))
x = rle(state)
DF = data.frame(len = x$lengths, state = x$values)
DF
# get the longest run of consecutive "A"
max(DF[DF$state == "A",]$len)

Product of several columns on a data frame by a vector using dplyr

I would like to multiply several columns on a dataframe by the values of a vector (all values within the same column should be multiplied by the same value, which will be different according to the column), while keeping the other columns as they are.
Since I'm using dplyr extensively I thought that it might be useful to use mutate_each function, so I can modify all columns at the same time, but I am completely lost on the syntax on the fun() part.
On the other hand, I've read this solution which is simple and works fine, but only works for all columns instead of the selected ones.
That's what I've done so far:
Imagine that I want to multiply all columns in df but letters by weight_df vector as follows:
df = data.frame(
letters = c("A", "B", "C", "D"),
col1 = c(3, 3, 2, 3),
col2 = c(2, 2, 3, 1),
col3 = c(4, 1, 1, 3)
)
> df
letters col1 col2 col3
1 A 3 2 4
2 B 3 2 1
3 C 2 3 1
4 D 3 1 3
>
weight_df = c(1:3)
If I use select before applying mutate_each I get rid of letters columns (as expected), and that's not what I want (a part from the fact that the vector is not applyed per columns basis but per row basis! and I want the opposite):
df = df %>%
select(-letters) %>%
mutate_each(funs(. * weight_df))
> df
col1 col2 col3
1 3 2 4
2 6 4 2
3 6 9 3
4 3 1 3
But if I don't select any particular columns, all values within letters are removed (which makes a lot of sense, by the way), but that's not what I want, neither (a part from the fact that the vector is not applyed per columns basis but per row basis! and I want the opposite):
df = df %>%
mutate_each(funs(. * issb_weight))
> df
letters col1 col2 col3
1 NA 3 2 4
2 NA 6 4 2
3 NA 6 9 3
4 NA 3 1 3
(Please note that this is a very simple dataframe and the original one has way more rows and columns -which unfortunately are not labeled in such an easy way and no patterns can be obtained)
The problem here is that you are basically trying to operate over rows, rather columns, hence methods such as mutate_* won't work. If you are not satisfied with the many vectorized approaches proposed in the linked question, I think using tydeverse (and assuming that letters is unique identifier) one way to achieve this is by converting to long form first, multiply a single column by group and then convert back to wide (don't think this will be overly efficient though)
library(tidyr)
library(dplyr)
df %>%
gather(variable, value, -letters) %>%
group_by(letters) %>%
mutate(value = value * weight_df) %>%
spread(variable, value)
#Source: local data frame [4 x 4]
#Groups: letters [4]
# letters col1 col2 col3
# * <fctr> <dbl> <dbl> <dbl>
# 1 A 3 4 12
# 2 B 3 4 3
# 3 C 2 6 3
# 4 D 3 2 9
using dplyr. This filters numeric columns only. Gives flexibility for choosing columns. Returns the new values along with all the other columns (non-numeric)
index <- which(sapply(df, is.numeric) == TRUE)
df[,index] <- df[,index] %>% sweep(2, weight_df, FUN="*")
> df
letters col1 col2 col3
1 A 3 4 12
2 B 3 4 3
3 C 2 6 3
4 D 3 2 9
try this
library(plyr)
library(dplyr)
df %>% select_if(is.numeric) %>% adply(., 1, function(x) x * weight_df)

Resources