I'm trying to write a function that will allow me to change the case of certain fields in my data frame to lowercase. I'm trying to do this by using the function, for, and tolower commands, but I'm not having any luck. I'm still fairly new to R, so I might be missing something obvious. I would appreciate any help anyone can provide.
standardize_lowercase <- function(df, objs) {
for(i in 1:length(objs)) {
df[i] <- tolower(df[i])
}
}
I'm using df to refer to my main data frame, and objs would be a character vector with the names of the fields from the data frame I would like to convert to lowercase.
We can use the dplyr package as follows. Provide the column names as a string and tolower to the mutate_at function.
library(dplyr)
# Create example data frame
dat <- data_frame(A = c("A", "B", "C"),
B = c("A", "B", "C"),
C = c("A", "B", "C"),
D = c("A", "B", "C"),
E = c("A", "B", "C"))
# Assuming that we want to change the column B, C, E to lower case
obj <- c("B", "C", "E")
dat2 <- dat %>%
mutate_at(vars(obj), funs(tolower(.)))
dat2
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c
Or here is a base R solution using lapply.
dat[obj] <- lapply(dat[obj], tolower)
dat
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c
Here is an example to convert the second option to a function.
dat_tolower <- function(data, target){
data[target] <- lapply(data[target], tolower)
return(data)
}
dat_tolower(dat, target = obj)
# # A tibble: 3 x 5
# A B C D E
# <chr> <chr> <chr> <chr> <chr>
# 1 A a a A a
# 2 B b b B b
# 3 C c c C c
Related
How do I rearrange the rows in tibble?
I wish to reorder rows such that: row with x = "c" goes to the bottom of the tibble, everything else remains same.
library(dplyr)
tbl <- tibble(x = c("a", "b", "c", "d", "e", "f", "g", "h"),
y = 1:8)
An alternative to dplyr::arrange(), using base R:
tbl[order(tbl$x == "c"), ] # Thanks to Merijn van Tilborg
Output:
# x y
# <chr> <int>
# 1 a 1
# 2 b 2
# 3 d 4
# 4 e 5
# 5 f 6
# 6 g 7
# 7 h 8
# 8 c 3
tbl |> dplyr::arrange(x == "c")
Using forcats, convert to factor having c the last, then arrange. This doesn't change the class of the column x.
library(forcats)
tbl %>%
arrange(fct_relevel(x, "c", after = Inf))
# # A tibble: 8 x 2
# x y
# <chr> <int>
# 1 a 1
# 2 b 2
# 3 d 4
# 4 e 5
# 5 f 6
# 6 g 7
# 7 h 8
# 8 c 3
If the order of x is important, it is better to keep it as factor class, below will change the class from character to factor with c being last:
tbl %>%
mutate(x = fct_relevel(x, "c", after = Inf)) %>%
arrange(x)
Let's say I've got some data:
data <- tibble(A = c("a", "b", "c", "d"),
B = c("e", "f", "g", NA_character_),
C = c("h", "i", NA_character_, NA_character_))
Which looks like this:
# A tibble: 4 x 3
A B C
<chr> <chr> <chr>
1 a e h
2 b f i
3 c g NA
4 d NA NA
What I'd like to do is get the value that's furthest to the right into a new column:
# A tibble: 4 x 4
A B C D
<chr> <chr> <chr> <chr>
1 a e h h
2 b f i i
3 c g NA g
4 d NA NA d
I know I could do it with case_when and a bunch of logical !is.na(A) ~ A, statements, but say I've got a load of columns and that's not feasible. I feel like there probably is an easy way that I just don't know about and haven't been able to find. Thanks
coalesce would be more easier
library(dplyr)
data %>%
mutate(D = coalesce(C, B, A))
-output
# A tibble: 4 x 4
# A B C D
# <chr> <chr> <chr> <chr>
#1 a e h h
#2 b f i i
#3 c g <NA> g
#4 d <NA> <NA> d
Or if there are many column, rev the column names, convert to symbols and evaluate (!!!)
data %>%
mutate(D = coalesce(!!! rlang::syms(rev(names(.)))))
This is an example we can work with:
df <- tibble(y = c("a", "a", "a", "a", "a", "a"), z = c("b", "b", "b", "b", "b", "b"), a = c("aaa", "aaa", "aaa", "bbb", "bbb", "bbb"),
b = c(1,2,3,1,2,3), c = c(5,10,15,100,95,90))
df
# A tibble: 6 x 5
y z a b c
<chr> <chr> <chr> <dbl> <dbl>
1 a b aaa 1 5
2 a b aaa 2 10
3 a b aaa 3 15
4 a b bbb 1 100
5 a b bbb 2 95
6 a b bbb 3 90
I want to group the values in column y, z and a and combine column b and c to a single string. The final result should look exactly like this:
# A tibble: 2 x 4
y z a result
<chr> <chr> <chr> <chr>
1 a b aaa {"1":5,"2":10,"3":15}
2 a b bbb {"1":100,"2":95,"3":90}
Which i can almost achieve with:
b <- by(df[-1:-3], df$a, function(x)
sprintf("{%s}", toString(Reduce(paste0, c(x, "\"", "\":")[c(3, 1, 4, 2)]))))
data.frame(a=unique(df$a), result=do.call(rbind, as.list(b)), row.names=NULL)
a result
1 aaa {"1":5, "2":10, "3":15}
2 bbb {"1":100, "2":95, "3":90}
This only groups by column a, though and not by all three (y, z and a) columns. I got the hint that i can do fix it with the aggregate function but have a hard time appying it.
Using dplyr you can make use sprintf/paste0 :
library(dplyr)
df %>%
group_by(y, z, a) %>%
summarise(result = paste0('{', toString(sprintf('"%d":"%d"', b, c)), '}')) %>%
ungroup %>% data.frame()
# y z a result
#1 a b aaa {"1":"5", "2":"10", "3":"15"}
#2 a b bbb {"1":"100", "2":"95", "3":"90"}
Using by this can be written as :
do.call(rbind, by(df, list(df$y, df$z, df$a), function(x)
cbind(unique(x[1:3]),
result = paste0('{', toString(sprintf('"%d":"%d"', x$b, x$c)), '}'))))
In Rstudio, I have a dataframe which contains 4 columns and I need to get the list of every different triplet of the 3 first columns sorted decreasingly by the sum on the 4th column. For example, with:
A B C 2
D E F 5
A B C 4
G H I 5
D E F 3
I need as a result:
D E F 8
A B C 6
G H I 5
I've tried the following different approach but I can't manage to have exactly the result I need:
df_list<-df_raw_data %>%
group_by(param1, param2, param3) %>%
summarise_all(total = sum(param4))
arrange(df_list, desc(total))
and:
df_list<-unique(df_raw_data[, c('param1', 'param2', 'param3')])
cbind(df_list, total)
for(i in 1:nrow(df_raw_data))
{
filter ???????????
}
I would prefer to use the dplyr package since it's a more elegant solution.
EDIT: Okay, thanks for your working answers. I think that I've lost some time figuring out that the plyr package shouldn't be loaded after dplyr...
We can use group_by_at to select the columns to group.
library(dplyr)
dat2 <- dat %>%
group_by_at(vars(-V4)) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
Or use group_by_if to select columns to group based on column types.
dat2 <- dat %>%
group_by_if(is.character) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
DATA
dat <- read.table(text = "A B C 2
D E F 5
A B C 4
G H I 5
D E F 3",
header = FALSE, stringsAsFactors = FALSE)
Would this be what you are looking for?
df <- data_frame(var1 = c("A", "D", "A", "G", "D"),
var2 = c("B", "E", "B", "H", "E"),
var3 = c("C", "F", "C", "I", "F"),
var4 = c(2, 5, 4, 5, 3))
df %>% group_by(var1, var2, var3) %>%
summarise(sum = sum(var4)) %>%
arrange(desc(sum))
I would like to perform a a non-trivial group_by, grouping and summarizing a data frame by single elements of lists found in one of its variables.
df <- data.frame(x = 1:5)
df$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E")
df
x y
1 1 A
2 2 A, B
3 3 C
4 4 B, D, C
5 5 E
Now grouping by y (and say counting no. of rows), which is a variable holding lists of elements, the required end results should be:
data.frame(group = c("A", "B", "C", "D", "E"), n = c(2,2,2,1,1))
group n
1 A 2
2 B 2
3 C 2
4 D 1
5 E 1
Because "A" appears in 2 rows, "B" in 2 rows, etc.
Note: the sum of n is not necessarily equal to number of rows in the data frame.
We can use simple base R solution with table to calculate the frequency after unlisting the list and then create a data.table based on that table object
tbl <- table(unlist(df$y))
data.frame(group = names(tbl), n = as.vector(tbl))
# group n
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1
Or another option with tidyverse
library(dplyr)
library(tidyr)
unnest(df) %>%
group_by(group = y) %>%
summarise(n=n())
# <chr> <int>
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1
Or as #alexis_laz mentioned in the comments, an alternative is as.data.frame.table
as.data.frame(table(group = unlist(df$y)), responseName = "n")
simple base R solution: (actually this is dup question, unable to locate it though)
sapply(unique(unlist(df$y)), function(x) sum(grepl(x, df$y))
# A B C D E
# 2 2 2 1 1