Consider a 3x3 char dataframe:
example <- data.frame(one = c("a","b","c"),
two = c("a","b","b"),
three = c ("c","a","b"))
I want to resize these data to 6x2 and add the following content:
desired <- data.frame(one = c("a","a","b","b",
"c","b"),
two = c("a","c","b","a","b","b"))
For the original example dataframe, I want to rbind() the contents of example[,2:3] beneath each row index.
This can be achieved by:
ex <- as.matrix(example)
des <- as.data.frame(rbind(ex[,1:2], ex[,2:3]))
Maybe using library(tidyverse) for an arbitrary number of columns would be nicer?
For each pair of columns, transpose the sub-data.frame defined by them and coerce to vector. Then coerce to data.frame and set the result's names.
The code that follows should be scalable, it does not hard code the number of columns.
desired2 <- as.data.frame(
lapply(seq(names(example))[-1], \(k) c(t(example[(k-1):k])))
)
names(desired2) <- names(example)[-ncol(example)]
identical(desired, desired2)
#[1] TRUE
The code above rewritten as a function.
reformat <- function(x){
y <- as.data.frame(
lapply(seq(names(x))[-1], \(k) c(t(x[(k-1):k])))
)
names(y) <- names(x)[-ncol(x)]
y
}
reformat(example)
example %>% reformat()
Another example, with 6 columns input.
ex1 <- example
ex2 <- example
names(ex2) <- c("fourth", "fifth", "sixth")
ex <- cbind(ex1, ex2)
reformat(ex)
ex %>% reformat()
A tidyverse approach using tidyr::pivot_longer may look like so:
library(dplyr)
library(tidyr)
pivot_longer(example, -one, values_to = "two") %>%
select(-name)
#> # A tibble: 6 × 2
#> one two
#> <chr> <chr>
#> 1 a a
#> 2 a c
#> 3 b b
#> 4 b a
#> 5 c b
#> 6 c b
A base-R solution with Map:
#iterate over example$one, example$two, and example$three at the same
#time, creating the output you need.
mylist <- Map(function(x ,y ,z ) {
data.frame(one = c(x, y), two = c(y, z))
},
example$one #x,
example$two #y,
example$three #z)
do.call(rbind, mylist)
one two
a.1 a a
a.2 a c
b.1 b b
b.2 b a
c.1 c b
c.2 b b
Related
I have a vector containing "potential" column names:
col_vector <- c("A", "B", "C")
I also have a data frame, e.g.
library(tidyverse)
df <- tibble(A = 1:2,
B = 1:2)
My goal now is to create all columns mentioned in col_vector that don't yet exist in df.
For the above exmaple, my code below works:
df %>%
mutate(!!sym(setdiff(col_vector, colnames(.))) := NA)
# A tibble: 2 x 3
A B C
<int> <int> <lgl>
1 1 1 NA
2 2 2 NA
Problem is that this code fails as soon as a) more than one column from col_vector is missing or b) no column from col_vector is missing. I thought about some sort of if_else, but don't know how to make the column creation conditional in such a way - preferably in a tidyverse way. I know I can just create a loop going through all the missing columns, but I'm wondering if there is a more direc approach.
Example data where code above fails:
df2 <- tibble(A = 1:2)
df3 <- tibble(A = 1:2,
B = 1:2,
C = 1:2)
This should work.
df[,setdiff(col_vector, colnames(df))] <- NA
Solution
This base operation might be simpler than a full-fledged dplyr workflow:
library(tidyverse) # For the setdiff() function.
# ...
# Code to generate 'df'.
# ...
# Find the subset of missing names, and create them as columns filled with 'NA'.
df[, setdiff(col_vector, names(df))] <- NA
# View results
df
Results
Given your sample col_vector and df here
col_vector <- c("A", "B", "C")
df <- tibble(A = 1:2, B = 1:2)
this solution should yield the following results:
# A tibble: 2 x 3
A B C
<int> <int> <lgl>
1 1 1 NA
2 2 2 NA
Advantages
An advantage of my solution, over the alternative linked above by #geoff, is that you need not code by hand the set of column names, as symbols and strings within the dplyr workflow.
df %>% mutate(
#####################################
A = ifelse("A" %in% names(.), A, NA),
B = ifelse("B" %in% names(.), B, NA),
C = ifelse("C" %in% names(.), B, NA)
# ...
# etc.
#####################################
)
My solution is by contrast more dynamic
##############################
df[, setdiff(col_vector, names(df))] <- NA
##############################
if you ever decide to change (or even dynamically calculate!) your variable names midstream, since it determines the setdiff() at runtime.
Note
Incredibly, #AustinGraves posted their answer at precisely the same time (2021-10-25 21:03:05Z) as I posted mine, so both answers qualify as original solutions.
I have a list of lists of dataframes, each having one column, like this:
list(list(A = data.frame(X = 1:5),
B = data.frame(Y = 6:10),
C = data.frame(Z = 11:15)),
list(A = data.frame(X = 16:20),
B = data.frame(Y = 21:25),
C = data.frame(Z = 26:30)),
list(A = data.frame(X = 31:35),
B = data.frame(Y = 36:40),
C = data.frame(Z = 41:45))) -> dflist
I need to make it so that the column names X, Y and Z inside of each dataframe are changed to A, B and C. An important thing to add is that the names A, B and C are not known beforehand, but must be extracted from the list element names. I have a simple script that can accomplish this:
for(i in 1:3){
for(j in 1:3){
colnames(dflist[[i]][[j]]) <- names(dflist[[i]])[[j]]
}
}
However, I need to do this in tidyverse style. I have found similar questions on here, however, they only deal with lists of dataframes and not with lists of lists of dataframes and I can't find a way to make it work.
Using combination of map and imap -
library(dplyr)
library(purrr)
map(dflist, function(x)
imap(x, function(data, name)
data %>% rename_with(function(y) name)))
#[[1]]
#[[1]]$A
# A
#1 1
#2 2
#3 3
#4 4
#5 5
#[[1]]$B
# B
#1 6
#2 7
#3 8
#4 9
#5 10
#[[1]]$C
# C
#1 11
#2 12
#3 13
#4 14
#5 15
#...
#...
Also possible without purrr, using lapply and mapply (the latter with SIMPLIFY=FALSE). If dflist is your list of lists:
lapply(dflist, function(x){
mapply(function(y,z){
`colnames<-`(y, z)
}, y=x, z=names(x), SIMPLIFY=F)
})
#or on one line:
lapply(dflist, function(x) mapply(function(y,z) `colnames<-`(y, z), y=x, z=names(x), SIMPLIFY=F))
A solution with purrr:walk:
library(tidyverse)
walk(1:length(dflist),
function(x)
walk(names(dflist[[x]]), ~ {names(dflist[[x]][[.x]]) <<- .x}))
I would like to iterate through a stored list of columns and procedures to create n new columns based on this list. In the example below, we start with 3 columns, a, b, c and two simple functions func1, func1.
The data frame col_mod contains two sets of modifications that should be applied to the data frame. Each of these modifications should be an addition to the data frame, rather than replacements of the specified columns.
In col_mod row 1, we see that column a should be modified using func1, and in row 2, we see that column c should be modified using func2. The new names of these columns should be a_new and c_new, respectively.
At the bottom of the reprex below, I obtain my desired result, but I would like to do so without hard coding each modification individually . Is there any way to use maybe something from purrr:map or anything similiar?
library(tidyverse)
## fake data
dat <- data.frame(a = 1:5,
b = 6:10,
c = 11:15)
## functions
func1 <- function(x) {x + 2}
func2 <- function(x) {x - 4}
## modification list
col_mod <- data.frame("col" = c("a", "c"),
"func" = c("func1", "func2"),
stringsAsFactors = FALSE)
## desired end result
dat %>%
mutate("a_new" = func1(a),
"c_new" = func2(c))
edit: if it is easier to store the modifications in a list, as shown below, a solution using that would be fine as well, as I am able to store the modifications in either a data frame or list.
col_mod <- list("set1" = list("a", "func1"),
"set2" = list("c", "func2"))
We can do this with the help of Map, use match.fun to apply the function
dat[paste0(col_mod$col, '_new')] <- Map(function(x, y) match.fun(y)(x),
dat[col_mod$col], col_mod$func)
dat
# a b c a_new c_new
#1 1 6 11 3 7
#2 2 7 12 4 8
#3 3 8 13 5 9
#4 4 9 14 6 10
#5 5 10 15 7 11
Using col_mod as dataframe.
col_mod <- data.frame("col" = c("a", "c"),"func" = c("func1", "func2"))
We can use the tidyverse approach to do this
library(dplyr)
library(purrr)
library(stringr)
library(tibble)
imap_dfc(deframe(col_mod), ~ dat %>%
transmute(!! str_c(.y, "_new") := match.fun(.x)(!! rlang::sym(.y)))) %>%
bind_cols(dat, .)
I need to merge several different dataframes.
On the one hand, I have several data frames with metadata A and, on the other hand, respective information B.
A.
[1] "LOJun_Meta" "LOMay_Meta" "VOJul_Meta" "VOJun_Meta" "VOMay_Meta" "ZOJun_Meta"
[7] "ZOMay_Meta"
B.
[1] "LOJun_All." "LOMay_all." "VOJul_All." "VOJun_all." "VOMay_all." "ZOJun_all."
[7] "ZOMay_all."
The names of the data frames are already in a list format (i.e. list1 and list2) and the data frames are already imported in R.
My aim is to create a loop which would merge dplyr > left-join the respective dataframes. For example:
LOJun_Meta + LOJun_All; LoMay_Meta + LOJun_all etc...
What I have a hard time on is creating the loop that would "synchronize" the "merging" procedure.
I am unsure if I should create a function which would have two inputs and would do such "merging".
It would be something like
merging(list1, list2){
for i in length(list):
left_join(list1[i], list[2], by = c("PrimaryKey" = "ForeignKey"))
}
I reckon the problem is that the function should refer to data frames which are not list1 & list2 values but data frame names stored in list1 & list2.
Any ideas?
Thanks a lot! Cheers
A diagram of what I intend to achieve is presented below:
[Diagram of loop - dplyr / several dataframes1
An example of what I am keen to automate would be this action:
ZOMay<- left_join(ZOMay_Meta, ZOMay_all., by = c("Primary Key" = "Foreign key"))
ZOJun<- left_join(ZOJun_Meta, ZOJun_all., by = c("Primary Key" = "Foreign Key"))
write.csv(ZOMay, file = "ZOMay_Consolidated.csv")
write.csv(ZOMay, file = "ZOJun_Consolidated.csv")
Here's an example of how you could build a reproducible example for your situation:
library(tidyverse)
df1a <- data_frame(id = 1:3, var1 = LETTERS[1:3])
df2a <- data_frame(id = 1:3, var1 = LETTERS[4:6])
df1b <- data_frame(id = 1:3, var2 = LETTERS[7:9])
df2b <- data_frame(id = 1:3, var2 = LETTERS[10:12])
list1 <- list(df1a, df2a)
list2 <- list(df1b, df2b)
Now as I understand it you want to do a left_join for df1a and df1b, as well as df2a and df2b. Instead of a loop, you can use map2 from the purrr package. This will iterate over two lists and apply a function to each pair of elements.
map2(list1, list2, left_join)
# [[1]]
# # A tibble: 3 x 3
# id var1 var2
# <int> <chr> <chr>
# 1 1 A G
# 2 2 B H
# 3 3 C I
#
# [[2]]
# # A tibble: 3 x 3
# id var1 var2
# <int> <chr> <chr>
# 1 1 D J
# 2 2 E K
# 3 3 F L
I have several data.frames in my Global Environment that I need to merge. Many of the data.frames have identical column names. I want to append a suffix to each column that marks its originating data.frame. Because I have many data.frames, I wanted to automate the process as in the following example.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj){
s <- sub('df','',eval(o))
names(get(o))[-1] <- paste0(names(get(o))[-1],'.',s)
}
# Error in get(o) <- `*vtmp*` : could not find function "get<-"'
But the individual pieces of the assignment work fine:
names(get(o))[-1]
# [1] "x"
paste0(names(get(o))[-1],'.',s)
# [1] "x.1"
I've used get in a similar way to write.csveach object to a file.
for(o in obj){
write.csv(get(o),file = paste0(o,'.csv'),row.names = F)
}
Any ideas why it's not working in the assignment to change the column names?
The error "could not find function get<-" is R telling you that you can't use <- to update a "got" object. You could probably use assign, but this code is already difficult enough to read. The better solution is to use a list.
From your example:
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
# put your data frames in a list
df_names = ls(pattern = "df[0-9]+")
df_names # make sure this is the objects you want
# [1] "df1" "df2"
df_list = mget(df_names)
# now we can use a simple for loop (or lapply, mapply, etc.)
for(i in seq_along(df_list)) {
names(df_list[[i]])[-1] =
paste(names(df_list[[i]])[-1],
sub('df', '', names(df_list)[i]),
sep = "."
)
}
# and the column names of the data frames in the list have been updated
df_list
# $df1
# id x.1
# 1 1 A
# 2 2 B
# 3 3 C
# 4 4 D
# 5 5 E
#
# $df2
# id x.2
# 1 1 F
# 2 2 G
# 3 3 H
# 4 4 I
# 5 5 J
It's also now easy to merge them:
Reduce(f = merge, x = df_list)
# id x.1 x.2
# 1 1 A F
# 2 2 B G
# 3 3 C H
# 4 4 D I
# 5 5 E J
For more discussion and examples, see How do I make a list of data frames?
Using setnames from library(data.table) you can do
for(o in obj) {
oldnames = names(get(o))[-1]
newnames = paste0(oldnames, ".new")
setnames(get(o), oldnames, newnames)
}
You can use eval which evaluate an R expression in a specified environment.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj) {
s <- sub('df', '', o)
new_name <- paste0(names(get(o))[-1], '.', s)
eval(parse(text = paste0('names(', o, ')[-1] <- ', substitute(new_name))))
}
modify df1 and df2
id x.1
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E