In Rstudio, I have a dataframe which contains 4 columns and I need to get the list of every different triplet of the 3 first columns sorted decreasingly by the sum on the 4th column. For example, with:
A B C 2
D E F 5
A B C 4
G H I 5
D E F 3
I need as a result:
D E F 8
A B C 6
G H I 5
I've tried the following different approach but I can't manage to have exactly the result I need:
df_list<-df_raw_data %>%
group_by(param1, param2, param3) %>%
summarise_all(total = sum(param4))
arrange(df_list, desc(total))
and:
df_list<-unique(df_raw_data[, c('param1', 'param2', 'param3')])
cbind(df_list, total)
for(i in 1:nrow(df_raw_data))
{
filter ???????????
}
I would prefer to use the dplyr package since it's a more elegant solution.
EDIT: Okay, thanks for your working answers. I think that I've lost some time figuring out that the plyr package shouldn't be loaded after dplyr...
We can use group_by_at to select the columns to group.
library(dplyr)
dat2 <- dat %>%
group_by_at(vars(-V4)) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
Or use group_by_if to select columns to group based on column types.
dat2 <- dat %>%
group_by_if(is.character) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
DATA
dat <- read.table(text = "A B C 2
D E F 5
A B C 4
G H I 5
D E F 3",
header = FALSE, stringsAsFactors = FALSE)
Would this be what you are looking for?
df <- data_frame(var1 = c("A", "D", "A", "G", "D"),
var2 = c("B", "E", "B", "H", "E"),
var3 = c("C", "F", "C", "I", "F"),
var4 = c(2, 5, 4, 5, 3))
df %>% group_by(var1, var2, var3) %>%
summarise(sum = sum(var4)) %>%
arrange(desc(sum))
Related
I'm looking to stack every other column under the previous column in R. Any suggestions?
For example:
c1
c2
c3
c4
A
1
D
4
B
2
E
5
C
3
F
6
dat <- data.frame(
c1 = c("A", "B", "C"),
c2 = c(1, 2, 3),
c3 = c("D", "E", "F"),
c4 = c(4, 5, 6))
To look like this:
c1
c2
A
D
1
4
B
E
2
5
C
F
3
6
dat2 <- data.frame(
c1 = c("A", 1, "B", 2, "C", 3),
c2 = c("D", 4, "E", 5, "F", 6))
Thanks in advance.
A basic way with stack():
as.data.frame(sapply(seq(1, ncol(dat), 2), \(x) stack(dat[x:(x+1)])[[1]]))
# V1 V2
# 1 A D
# 2 B E
# 3 C F
# 4 1 4
# 5 2 5
# 6 3 6
You could also rename the data with a structural foramt and pass it into tidyr::pivot_longer():
library(dplyr)
library(tidyr)
dat %>%
rename_with(~ paste0('c', ceiling(seq_along(.x) / 2), '_', 1:2)) %>%
mutate(across(, as.character)) %>%
pivot_longer(everything(), names_to = c(".value", NA), names_sep = '_')
# # A tibble: 6 × 2
# c1 c2
# <chr> <chr>
# 1 A D
# 2 1 4
# 3 B E
# 4 2 5
# 5 C F
# 6 3 6
The rename line transforms c1 - c4 to c1_1, c1_2, c2_1, c2_2.
For an even numbered number of columns you can do something like:
do.call(cbind, lapply(seq(length(dat)/2), \(x) stack(dat[ ,x + ((x-1):x)])[1])) |>
set_names(names(dat)[seq(length(dat)/2)])
c1 c2
1 A D
2 B E
3 C F
4 1 4
5 2 5
6 3 6
This gets the first column when calling stack and cbinds all the stacked columns.
How do I rearrange the rows in tibble?
I wish to reorder rows such that: row with x = "c" goes to the bottom of the tibble, everything else remains same.
library(dplyr)
tbl <- tibble(x = c("a", "b", "c", "d", "e", "f", "g", "h"),
y = 1:8)
An alternative to dplyr::arrange(), using base R:
tbl[order(tbl$x == "c"), ] # Thanks to Merijn van Tilborg
Output:
# x y
# <chr> <int>
# 1 a 1
# 2 b 2
# 3 d 4
# 4 e 5
# 5 f 6
# 6 g 7
# 7 h 8
# 8 c 3
tbl |> dplyr::arrange(x == "c")
Using forcats, convert to factor having c the last, then arrange. This doesn't change the class of the column x.
library(forcats)
tbl %>%
arrange(fct_relevel(x, "c", after = Inf))
# # A tibble: 8 x 2
# x y
# <chr> <int>
# 1 a 1
# 2 b 2
# 3 d 4
# 4 e 5
# 5 f 6
# 6 g 7
# 7 h 8
# 8 c 3
If the order of x is important, it is better to keep it as factor class, below will change the class from character to factor with c being last:
tbl %>%
mutate(x = fct_relevel(x, "c", after = Inf)) %>%
arrange(x)
I have a data frame with five thousands rows. I need to create a new column with a unique identifier based on column "gender", then the number 21, and a sequential number starting on 0001. It is important that the sequential number restarts with a different letter in column "gender" (gender + 21 + seq#).
df <- data_frame(
name = c("A", "B", "C", "D", "E", "F", "G", "H", "I"),
gender = c("F", "F", "F", "M","M","F","M","F","F")
)
df
name gender
<chr> <chr>
1 A F
2 B F
3 C F
4 D M
5 E M
6 F F
7 G M
8 H F
9 I F
With unique identifier:
df
name gender id
1 A F F210001
2 B F F210002
3 C F F210003
4 D M M210001
5 E M M210002
6 F F F210004
7 G M M210003
8 H F F210005
9 I F F210006
Any help on how to achieve this will be very appreciated.
An option is paste with rowid
library(dplyr)
library(stringr)
library(data.table)
df1 <- df %>%
mutate(id = str_c(gender, rowid(gender) + 210000))
Or do a group_by/row_number
df1 <- df %>%
group_by(gender) %>%
mutate(id = str_c(cur_group(), row_number() + 210000)) %>%
ungroup
in base R you could use ave:
transform(df, group = ave(gender, gender, FUN = function(x)sprintf("%s21%04d",x,seq(x))))
name gender group
1 A F F210001
2 B F F210002
3 C F F210003
4 D M M210001
5 E M M210002
6 F F F210004
7 G M M210003
8 H F F210005
9 I F F210006
This is an example we can work with:
df <- tibble(y = c("a", "a", "a", "a", "a", "a"), z = c("b", "b", "b", "b", "b", "b"), a = c("aaa", "aaa", "aaa", "bbb", "bbb", "bbb"),
b = c(1,2,3,1,2,3), c = c(5,10,15,100,95,90))
df
# A tibble: 6 x 5
y z a b c
<chr> <chr> <chr> <dbl> <dbl>
1 a b aaa 1 5
2 a b aaa 2 10
3 a b aaa 3 15
4 a b bbb 1 100
5 a b bbb 2 95
6 a b bbb 3 90
I want to group the values in column y, z and a and combine column b and c to a single string. The final result should look exactly like this:
# A tibble: 2 x 4
y z a result
<chr> <chr> <chr> <chr>
1 a b aaa {"1":5,"2":10,"3":15}
2 a b bbb {"1":100,"2":95,"3":90}
Which i can almost achieve with:
b <- by(df[-1:-3], df$a, function(x)
sprintf("{%s}", toString(Reduce(paste0, c(x, "\"", "\":")[c(3, 1, 4, 2)]))))
data.frame(a=unique(df$a), result=do.call(rbind, as.list(b)), row.names=NULL)
a result
1 aaa {"1":5, "2":10, "3":15}
2 bbb {"1":100, "2":95, "3":90}
This only groups by column a, though and not by all three (y, z and a) columns. I got the hint that i can do fix it with the aggregate function but have a hard time appying it.
Using dplyr you can make use sprintf/paste0 :
library(dplyr)
df %>%
group_by(y, z, a) %>%
summarise(result = paste0('{', toString(sprintf('"%d":"%d"', b, c)), '}')) %>%
ungroup %>% data.frame()
# y z a result
#1 a b aaa {"1":"5", "2":"10", "3":"15"}
#2 a b bbb {"1":"100", "2":"95", "3":"90"}
Using by this can be written as :
do.call(rbind, by(df, list(df$y, df$z, df$a), function(x)
cbind(unique(x[1:3]),
result = paste0('{', toString(sprintf('"%d":"%d"', x$b, x$c)), '}'))))
I'm trying to write a function that will allow me to change the case of certain fields in my data frame to lowercase. I'm trying to do this by using the function, for, and tolower commands, but I'm not having any luck. I'm still fairly new to R, so I might be missing something obvious. I would appreciate any help anyone can provide.
standardize_lowercase <- function(df, objs) {
for(i in 1:length(objs)) {
df[i] <- tolower(df[i])
}
}
I'm using df to refer to my main data frame, and objs would be a character vector with the names of the fields from the data frame I would like to convert to lowercase.
We can use the dplyr package as follows. Provide the column names as a string and tolower to the mutate_at function.
library(dplyr)
# Create example data frame
dat <- data_frame(A = c("A", "B", "C"),
B = c("A", "B", "C"),
C = c("A", "B", "C"),
D = c("A", "B", "C"),
E = c("A", "B", "C"))
# Assuming that we want to change the column B, C, E to lower case
obj <- c("B", "C", "E")
dat2 <- dat %>%
mutate_at(vars(obj), funs(tolower(.)))
dat2
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c
Or here is a base R solution using lapply.
dat[obj] <- lapply(dat[obj], tolower)
dat
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c
Here is an example to convert the second option to a function.
dat_tolower <- function(data, target){
data[target] <- lapply(data[target], tolower)
return(data)
}
dat_tolower(dat, target = obj)
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c