R dplyr mutate_at accessing colnames

R dplyr mutate_at accessing colnames - r

How could one access the column name being processed by dplyr::mutate_at?
Let's say we would like to convert a column of a data frame into factors with levels stored in a separate list.
df <- data.frame("C1"=c("A","B","C"), "C2"=c("D","E","F"))
df
C1 C2
1 A D
2 B E
3 C F
lst <- list("C2"=c("F","E","D"), "C3"=c("G","H","I"))
lst
$C2
[1] "F" "E" "D"
$C3
[1] "G" "H" "I"
All of the following trigger error or replace all the column values by NA:
df %>%
mutate_at(vars(C2), function(x) factor(x, levels=lst$.))
df %>%
mutate_at(vars(C2), function(x) factor(x, levels=lst[[colnames(.)]]))
df %>%
mutate_at(vars(C2), function(x){col = as.name(.); factor(x, levels=lst$col))

You can use Map in base R or map2 from purrr after getting the common columns using intersect.
cols <- intersect(names(lst), names(df))
df[cols] <- Map(function(x, y) factor(x, levels = y), df[cols], lst[cols])
Or
df[cols] <- purrr::map2(df[cols], lst[cols], ~factor(.x, levels = .y))

Related

Row name of the smallest element

I have the following dataframe:
d <- data.frame(a=c(1,2,3,4), b=c(20,19,18,17))
row.names(d) <- c("A", "B", "C", "D")
I want another data.frame, with the same columns and 2 rows, which contain the row names of the 2 smallest elements in that column.
In the example the expected result would be:
# Expected results
exp <- data.frame(a=c("A", "B"), b=c("C","D"))

We loop over the columns with lapply, order the values, use that index to subset the n corresponding row.names of 'd', and wrap with data.frame
n <- 2
data.frame(lapply(d, function(x) sort(head(row.names(d)[order(x)], n))))
-output
# a b
#1 A C
#2 B D
With R 4.1.0, we can also use the |> operator for chaining the functions (applied in the order for easier understanding) along with \(x) - for lambda function in base R
# // ordered the column values
# // get corresponding row names
lapply(d, \(x) row.names(d)[order(x)] |>
head(n) |> # // get the first n values
sort()) |> # // sort them
data.frame() # // convert the list to data.frame
# a b
#1 A C
#2 B D
Or using dplyr
library(dplyr)
d %>%
summarise(across(everything(),
~ sort(head(row.names(d)[order(.)], n))))
# a b
#1 A C
#2 B D

Using sapply in base R -
rn <- rownames(d)
sapply(d, function(x) rn[order(x) %in% 1:2])
# a b
#[1,] "A" "C"
#[2,] "B" "D"

Find unique level of list of data set each column

I have a list of 18 datasets, each dataset has some columns, how I write a loop to find the intersect by the index of column, and return list of index of column.
df1 <- data.frame(id = c(1:5), loc = c("a","b","c","a","b"))
df2 <- data.frame(id = c(3:7), ta = c("c","b","d","a","b"))
df3 <- data.frame(id = c(1:5), az = c("d","a","e","d","b"))
df <- list(df1, df2, df3)
df <- lapply(df, function(i) lapply(i, function(j) as.character(j)))
intersect(df[[1]][1], df[[2]][1], df[[3]][1])
intersect(df[[1]][2], df[[2]][2], df[[3]][2])

With tidyverse, we can use map/reduce
library(purrr)
library(dplyr)
map(df, pull, 1) %>%
reduce(intersect)
#[1] 3 4 5
Or as a function
f1 <- function(lstA, ind) {
map(lstA, pull, ind) %>%
reduce(intersect)
}
f1(df, 1)
#[1] 3 4 5
f1(df, 2)
#[1] "a" "b"

You may use Reduce on the intersect function and the [ in an sapply to choose sub list number.
Single:
Reduce(intersect, sapply(df, `[`, 1))
# [1] "3" "4" "5"
Reduce(intersect, sapply(df, `[`, 2))
# [1] "a" "b"
Or altogether:
lapply(1:2, function(i) Reduce(intersect, sapply(df, `[`, i)))
# [[1]]
# [1] "3" "4" "5"
#
# [[2]]
# [1] "a" "b"

Matching across datasets and columns

I have a vector with words, e.g., like this:
w <- LETTERS[1:5]
and a dataframe with tokens of these words but also tokens of other words in different columns, e.g., like this:
set.seed(21)
df <- data.frame(
w1 = c(sample(LETTERS, 10)),
w2 = c(sample(LETTERS, 10)),
w3 = c(sample(LETTERS, 10)),
w4 = c(sample(LETTERS, 10))
)
df
w1 w2 w3 w4
1 U R A Y
2 G X P M
3 Q B S R
4 E O V T
5 V D G W
6 T A Q E
7 C K L U
8 D F O Z
9 R I M G
10 O T T I
# convert factor to character:
df[] <- lapply(df[], as.character)
I'd like to extract from dfall the tokens of those words that are contained in the vector w. I can do it like this but that doesn't look nice and is highly repetitive and error prone if the dataframe is larger:
extract <- c(df$w1[df$w1 %in% w],
df$w2[df$w2 %in% w],
df$w3[df$w3 %in% w],
df$w4[df$w4 %in% w])
I tried this, using paste0 to avoid addressing each column separately but that doesn't work:
extract <- df[paste0("w", 1:4)][df[paste0("w", 1:4)] %in% w]
extract
data frame with 0 columns and 10 rows
What's wrong with this code? Or which other code would work?

To answer your question, "What's wrong with this code?": The code df[paste0("w", 1:4)][df[paste0("w", 1:4)] %in% w] is the equivalent of df[df %in% w] because df[paste0("w", 1:4)], which you use twice, simply returns the entirety of df. That means df %in% w will return FALSE FALSE FALSE FALSE because none of the variables in df are in w (w contains strings but not vectors of strings), and df[c(F, F, F, F)] returns an empty data frame.
If you're dealing with a single data type (strings), and the output can be a character vector, then use a matrix instead of a data frame, which is faster and is, in this case, a little easier to subset:
mat <- as.matrix(df)
mat[mat %in% w]
#[1] "B" "D" "E" "E" "A" "B" "E" "B"
This produces the same output as your attempt above with extract <- ….
If you want to keep some semblance of the original data frame structure then you can try the following, which outputs a list (necessary as the returned vectors for each variable might have different lengths):
lapply(df, function(x) x[x %in% w])
#### OUTPUT ####
$w1
[1] "B" "D" "E"
$w2
[1] "E" "A"
$w3
[1] "B"
$w4
[1] "E" "B"
Just call unlist or unclass on the returned list if you want a vector.

Making new dataframes from old dataframes by column number

I'm trying to re-organize my dataframes by Column orders
for Example
x <- data.frame("A" = c(1,1), "B" = c(2,2), "C" = c(3,3))
y <- data.frame("A" = c(2,2), "B" = c(3,3), "C" = c(4,4))
z <- data.frame("A" = c(3,3), "B" = c(4,4), "C" = c(5,5))
Say I have dataframes as above.
What I want to do is make new dataframes by column orders of those above dataframes. (Simply put, I want to put all the "A"s ,"B"s and "C"s, to 3 new dataframes.
the below dataframes are my wanted results
a <- data.frame("A" = c(1,1), "A" = c(2,2), "A" = c(3,3))
b <- data.frame("B" = c(2,2), "B" = c(3,3), "B" = c(4,4))
c <- data.frame("C" = c(3,3), "C" = c(4,4), "C" = c(5,5))

We can do this with tidyverse
library(tidyverse)
list(x, y, z) %>%
transpose %>%
map(~ do.call(cbind, .x))
Or with base R
lapply(names(x), function(nm) cbind(x[, nm], y[, nm], z[, nm]))

Assuming you have equal number of columns in all the dataframes, one way is to use lapply over list of dataframes and subset them sequentially.
lst1 <- list(x, y, z)
lapply(seq_len(ncol(x)), function(i) cbind.data.frame(lapply(lst1, `[`, i)))
#[[1]]
# A A A
#1 1 2 3
#2 1 2 3
#[[2]]
# B B B
#1 2 3 4
#2 2 3 4
#[[3]]
# C C C
#1 3 4 5
#2 3 4 5
If your dataframes are not already sorted by names you might want to do that first.
lst1 <- lapply(list(x, y, z), function(i) i[order(names(i))])
We can also use purrr using the same logic
library(purrr)
map(seq_len(ncol(x)), ~cbind.data.frame(map(lst1, `[`, .)))

How to find common variables in different data frames?

I have several data frames with similar (but not identical) series of variables (columns). I want to find a way for R to tell me what are the common variables across different data frames.
Example:
`a <- c(1, 2, 3)
b <- c(4, 5, 6)
c <- c(7, 8, 9)
df1 <- data.frame(a, b, c)
b <- c(1, 3, 5)
c <- c(2, 4, 6)
df2 <- data.frame(b, c)`
With df1 and df2, I would want some way for R to tell me that the common variables are b and c.

1) For 2 data frames:
intersect(names(df1), names(df2))
## [1] "b" "c"
To get the names that are in df1 but not in df2:
setdiff(names(df1), names(df2))
1a) and for any number of data frames (i.e. get the names common to all of them):
L <- list(df1, df2)
Reduce(intersect, lapply(L, names))
## [1] "b" "c"
2) An alternative is to use duplicated since the common names will be the ones that are duplicated if we concatenate the names of the two data frames.
nms <- c(names(df1), names(df2))
nms[duplicated(nms)]
## [1] "b" "c"
2a) To generalize that to n data frames use table and look for the names that occur the same number of times as data frames:
L <- list(df1, df2)
tab <- table(unlist(lapply(L, names)))
names(tab[tab == length(L)])
## [1] "b" "c"

Use intersect:
intersect(colnames(df1),colnames(df2))
OR
We can also check for the colname using %in%:
colnames(df1)[colnames(df1) %in% colnames(df2)]
Output:
[1] "b" "c"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R dplyr mutate_at accessing colnames - r

You can use Map in base R or map2 from purrr after getting the common columns using intersect. cols <- intersect(names(lst), names(df)) df[cols] <- Map(function(x, y) factor(x, levels = y), df[cols], lst[cols]) Or df[cols] <- purrr::map2(df[cols], lst[cols], ~factor(.x, levels = .y))

Related

Row name of the smallest element

Find unique level of list of data set each column

Matching across datasets and columns

Making new dataframes from old dataframes by column number

How to find common variables in different data frames?

Categories

Resources