I have the following data which I have split by name into separate data frames. After I run the following code, the variables in each data set are automatically named "X..i..".
I would like to rename the variable of each separate data frame so it matches the data set.
# load data
df1_raw <- data.frame(name = c("A", "B", "C", "A", "C", "B"),
start = c(1, 3, 4, 5, 2, 1),
end = c(6, 5, 7, 8, 6, 7))
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- lapply(df1, data.frame) # convert to df
# rename variables
name <- c("A", "B", "C")
for (i in seq_along(df1)) {
names(df1[i]) <- name[i]
}
The last for loop does not work to rename variables. When I type names(df1$A) I still get "X..i..". The output I would like from names(df1$A) is "A".
Does anyone have any thoughts on how to rename these variables? Thanks!
You need to use [[]] when indexing from a list
for (i in seq_along(df1)) {
names(df1[[i]]) <- name[i]
}
Alternatively you could change how you create the list so you don't have to rename after the fact
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- Map(function(x,name) {as.data.frame(setNames(list(x), name))}, df1, names(df1))
I think the solution by #MrFlick is enough for addressing the issue of renaming within a for loop.
Here is a base R workaround that may work for you
lapply(
split(df1_raw, df1_raw$name),
function(x) {
with(
x,
setNames(
data.frame(unlist(mapply(seq, start, end))),
unique(name)
)
)
}
)
which gives
$A
A
1 1
2 2
3 3
4 4
5 5
6 6
7 5
8 6
9 7
10 8
$B
B
1 3
2 4
3 5
4 1
5 2
6 3
7 4
8 5
9 6
10 7
$C
C
1 4
2 5
3 6
4 7
5 2
6 3
7 4
8 5
9 6
Related
I am new to R. I have a data frame that contains start and end values for 45 types of items, and I used dplyr to subset that data into 45 separate data frames. I have written a for loop that outputs a sequence from start to end for each row of the data frame. I would like to use this for loop on all data frames without having to copy and paste the code 45 times and tailor it to the name of each data frame. See below for an example:
A_list <- list()
B_list <- list()
C_list <- list()
dfA <- data.frame(name = c("A", "A"), start = c(1, 3), end = c(6, 5))
dfB <- data.frame(name = c("B", "B"), start = c(2, 1), end = c(7, 8))
dfC <- data.frame(name = c("C", "C"), start = c(1, 2), end = c(4, 7))
for(i in seq_along(dfA$start)) {
output <- seq.int(dfA$start[i], dfA$end[i])
A_list[[i]] <- output
}
I tried making a list of names of each data frame and then referring to it in the for loop, but this didn't work.
list_df_names <- list(dfA, dfB, dfC)
seq.int(list_df_names[1:3]$start[i], list_df_names[1:3]$end[i])
Does anyone have any thoughts on how to do this?
We can loop of list of datasets, then create the sequence between the 'start', 'end' columns with Map to have a list of lists. If needed to create separate objects (not recommended), use list2env after setting the names of the nested list with the preferred object names
out <- lapply(list_df_names, function(x) Map(seq.int, x$start, x$end))
names(out) <- paste0(c('A', 'B', 'C'), "_list")
list2env(out, .GlobalEnv)
-output
A_list
#[[1]]
#[1] 1 2 3 4 5 6
#[[2]]
#[1] 3 4 5
B_list
#[[1]]
#[1] 2 3 4 5 6 7
#[[2]]
#[1] 1 2 3 4 5 6 7 8
C_list
#[[1]]
#[1] 1 2 3 4
#[[2]]
#[1] 2 3 4 5 6 7
I want to give labels to data frames using a combination of a small function in combination with lapply()
I have the following code:
df1 <- data.frame(c(1,2,3), c(3,4,5))
df2 <- data.frame(c(6,7,8), c(9,10,11))
f.generate.name <- function(x) {
x$name <- deparse(substitute(x))
return(x)
}
my_list <- list(df1, df2)
# This works fine.
f.generate.name(df1)
# This does not work.
lapply(my_list, f.generate.name)
which produces the following output
[[1]]
c.1..2..3. c.3..4..5. name
1 1 3 X[[i]]
2 2 4 X[[i]]
3 3 5 X[[i]]
[[2]]
c.6..7..8. c.9..10..11. name
1 6 9 X[[i]]
2 7 10 X[[i]]
3 8 11 X[[i]]
What I want instead is:
[[1]]
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
[[2]]
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
What is the best way of dong this without using loops? How can I tweak the lapply() function or the function that I created in order to achieve the desired result?
Base R
lapply() cannot iterate over more than one argument. You can use mapply() or its wrapper Map() in this case that always returns a list.
Map(f = function(x, y){
x$name <- y
x },
my_list,
names(my_list))
$df1
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
$df2
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
Tidyverse
In case you are open to a purrr solution, you can use imap(). It makes the names of the object conveniently available. There is no need to write a function then:
my_list <- list(df1 = df1, df2 = df2)
imap(my_list, ~{
.x$name <- .y
.x
})
$df1
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
$df2
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
Really the question is where do the names come from? An unnamed list like my_list in the question has lost the df1 and df2 names as we can see by looking at its internals:
dput(my_list) # no df1 or df2 seen
## list(structure(list(c.1..2..3. = c(1, 2, 3), c.3..4..5. = c(3,
## 4, 5)), class = "data.frame", row.names = c(NA, -3L)), structure(list(
## c.6..7..8. = c(6, 7, 8), c.9..10..11. = c(9, 10, 11)), class =
## "data.frame", row.names = c(NA,
## -3L)))
Thus either we need to create a named list in the first place or else provide a vector of names. We show both using only base R.
Named list
First create a named list of the data frames and then use Map as shown:
L <- mget(ls("^df")) # create named list
Map(data.frame, L, name = names(L))
Unnamed list
Alternately if all you have is an unnamed list then we can Map over that and a vector of names:
my_list <- list(df1, df2) # unnamed list as in question
Map(data.frame, my_list, name = c("df1", "df2"))
Pass individual data frames
Yet another approach is to pass the individual data frames instead of a list. Because we have not destroyed the original names by creating an unnamed list we can still retrieve them. On R 4.0 and later deparse1 could optionally be used in place of deparse in the code.
add_names <- function(...) {
mc <- match.call()
Map(data.frame, list(...), names = sapply(mc[-1], deparse))
}
add_names(df1, df2)
Below is a function called change_names which works, but only on a specific data frame name. In short, I am having issues understanding how to manipulate the assign function so it can handle different data frame names.
The function basically changes the names on columns of files as I read them in a for loop. For example, one file could have a column name 'A' which should be 'X' while another file could have the column name 'D' which should also be named 'X'.
I have tried a few different outlets to actually change original data frame, 'tempPullList', but I need to be able to use the function on a different data frame.
#====example different files====
file1 <- data.frame(A = rep(1:10), Y = rep(c("Yellow","Red","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file2 <- data.frame(D = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file3 <- data.frame(X = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
C = rep(c("Drink", "Food"), 5))
file_list <- list(file1, file2, file3)
#====Package Bank====
library(data.table)
library(dplyr)
#====Function====
change_names <- function(x){
#a list of columns to be renamed
#through out the files
chgCols <- c("A",
"B",
"C",
"D")
#the names the columns will be changed to
namekey <- c(A = "X",
B = "Y",
C = "Z",
D = "X")
chgCols <- match(chgCols, colnames(x)) #find any unwanted column indexes in data frame
chgCols <- colnames(x[, chgCols[!is.na(chgCols)]]) #match indexes to column names w/o NA's
x <- x %>% #rename associated columns
plyr::rename(namekey[chgCols]) #from 'namekey' in dataframe
assign('tempPullList', x, envir = .GlobalEnv)
}
#====Read in Files====
PullList <- data.frame()
for(file in 1:length(file_list)){
tempPullList <- data.frame(file_list[file])
print(file)
change_names(x = tempPullList)
PullList <- rbindlist(list(PullList, tempPullList),
fill = T)
}
Again, right now I am only able to do it when the data frame is called 'tempPullList' I need to be able to do it with another data frame.
i am pretty new to writing functions and especially assigning variables within functions. I would like this function to be as variable as possible. I am currently working on making chgCols and namekey to be inputs. So any advice on that as well would also be helpful
Example data:
column_name_lookup <- data.frame(orig = c("a","b","c","d"),
new = c("X","Y","z","X"),
stringsAsFactors = FALSE)
test_df <- data.frame(a = 1:5,
c = 2:6,
b = 3:7,
e = 4:8,
d = 5:9)
a c b e d
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
Code to change names:
new_names <- column_name_lookup$new[match(names(test_df),column_name_lookup$orig)]
names(test_df) <- ifelse(is.na(new_names),names(test_df),new_names)
X z Y e X
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
I have a very simply question about lapply. I am transitioning from STATA to R and I think there is some very basic concept that I am not getting about looping in R. But I have been reading about it all afternoon and can't figure out a reasonable way to do this very simple thing.
I have three data frames df1, df2, and df3 that all have the same column names, in the same order, etc.
I want to rename their columns all at once.
I put the data frames in a list:
dflist <- list(df1, df2, df3)
What I want the new names to be:
varlist <- c("newname1", "newname2", "newname3")
Write a function that replaces names with those in varlist, and lapply it over the data frames
ChangeNames <- function(x) {
names(x) <- varlist
return(x)
}
dflist <- lapply(dflist, ChangeNames)
So, as far as I understand, R has changed the names of the copies of the data frames that I put in the list, but not the original data frames themselves. I want the data frames themselves to be renamed, not the elements of the list (which are trapped in a list).
Now, I can go
df1 <- as.data.frame(dflist[1])
df2 <- as.data.frame(dflist[2])
df2 <- as.data.frame(dflist[3])
But that seems weird. You need a loop to get back the elements of a loop?
Basically: once you've put some data frames in a list and run your function on them via lapply, how do you get them back out of the list, without starting back at square one?
If you just want to change the names, that isn't too hard in R. Bear in mind that the assignment operator, <-, can be applied in sequence. Hence:
names(df1) <- names(df2) <- names(df3) <- c("newname1", "newname2", "newname3")
I am not sure I understand correctly, do you want to rename the columns of the data frames or the components of the list that contain the data frames?
If it is the first, please always search before asking, the question has been asked here.
So what you can easily do in case you have even more data frames in the list is:
# Creating some sample data first
> dflist <- list(df1 = data.frame(a = 1:3, b = 2:4, c = 3:5),
+ df2 = data.frame(a = 4:6, b = 5:7, c = 6:8),
+ df3 = data.frame(a = 7:9, b = 8:10, c = 9:11))
# See how it looks like
> dflist
$df1
a b c
1 1 2 3
2 2 3 4
3 3 4 5
$df2
a b c
1 4 5 6
2 5 6 7
3 6 7 8
$df3
a b c
1 7 8 9
2 8 9 10
3 9 10 11
# And do the trick
> dflist <- lapply(dflist, setNames, nm = c("newname1", "newname2", "newname3"))
# See how it looks now
> dflist
$df1
newname1 newname2 newname3
1 1 2 3
2 2 3 4
3 3 4 5
$df2
newname1 newname2 newname3
1 4 5 6
2 5 6 7
3 6 7 8
$df3
newname1 newname2 newname3
1 7 8 9
2 8 9 10
3 9 10 11
So the names were changed from a, b and c to newname1, newname2and newname3 for each data frame in the list.
If it is the second, you can do this:
> names(dflist) <- c("newname1", "newname2", "newname3")
I have a data frame that I want to find the row numbers where these rows are in common with another data frame.
To make the question clear, say I have data frame A and data frame B:
dfA <- data.frame(NAME = rep(c("a", "b"), each = 3),
TRIAL = rep(1:3, 2),
DATA = runif(6))
dfB <- data.frame(NAME = c("a", "b"),
TRIAL = c(2, 3))
dfA
# NAME TRIAL DATA
# 1 a 1 0.62948592
# 2 a 2 0.88041819
# 3 a 3 0.02479411
# 4 b 1 0.48031827
# 5 b 2 0.86591315
# 6 b 3 0.93448264
dfB
# NAME TRIAL
# 1 a 2
# 2 b 3
I want to get dfA's row number where dfA and dfB have the same NAME and TRIAL, in this case, row numbers are 2 and 6.
I tried the following code, gives me row 2, 3, 5, 6. It separately matches NAME and TRIAL, doesn't work.
which(dfA$NAME %in% dfB$NAME & dfA$TRIAL %in% dfB$TRIAL)
# 2 3 5 6
Then I tried to create a dummy column and match this col. Works, but the code would be verbose if dfB has many columns...
dfA$dummy <- paste0(dfA$NAME, dfA$TRIAL)
dfB$dummy <- paste0(dfB$NAME, dfB$TRIAL)
which(dfA$dummy %in% dfB$dummy)
# 2 6
I'm wondering if there are better ways to solve the problem, thanks for your help!
You can do:
merge(transform(dfA, row.num = 1:nrow(dfA)), dfB)$row.num
# [1] 2 6
And if the whole goal of finding the indices is so that you can subset dfA, then you can just do merge(dfA, dfB).
Or use duplicated:
apply(dfB, 1, function(x)
which(duplicated(rbind(x, dfA[1:2])))-1)
# [1] 2 6