Below is a function called change_names which works, but only on a specific data frame name. In short, I am having issues understanding how to manipulate the assign function so it can handle different data frame names.
The function basically changes the names on columns of files as I read them in a for loop. For example, one file could have a column name 'A' which should be 'X' while another file could have the column name 'D' which should also be named 'X'.
I have tried a few different outlets to actually change original data frame, 'tempPullList', but I need to be able to use the function on a different data frame.
#====example different files====
file1 <- data.frame(A = rep(1:10), Y = rep(c("Yellow","Red","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file2 <- data.frame(D = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file3 <- data.frame(X = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
C = rep(c("Drink", "Food"), 5))
file_list <- list(file1, file2, file3)
#====Package Bank====
library(data.table)
library(dplyr)
#====Function====
change_names <- function(x){
#a list of columns to be renamed
#through out the files
chgCols <- c("A",
"B",
"C",
"D")
#the names the columns will be changed to
namekey <- c(A = "X",
B = "Y",
C = "Z",
D = "X")
chgCols <- match(chgCols, colnames(x)) #find any unwanted column indexes in data frame
chgCols <- colnames(x[, chgCols[!is.na(chgCols)]]) #match indexes to column names w/o NA's
x <- x %>% #rename associated columns
plyr::rename(namekey[chgCols]) #from 'namekey' in dataframe
assign('tempPullList', x, envir = .GlobalEnv)
}
#====Read in Files====
PullList <- data.frame()
for(file in 1:length(file_list)){
tempPullList <- data.frame(file_list[file])
print(file)
change_names(x = tempPullList)
PullList <- rbindlist(list(PullList, tempPullList),
fill = T)
}
Again, right now I am only able to do it when the data frame is called 'tempPullList' I need to be able to do it with another data frame.
i am pretty new to writing functions and especially assigning variables within functions. I would like this function to be as variable as possible. I am currently working on making chgCols and namekey to be inputs. So any advice on that as well would also be helpful
Example data:
column_name_lookup <- data.frame(orig = c("a","b","c","d"),
new = c("X","Y","z","X"),
stringsAsFactors = FALSE)
test_df <- data.frame(a = 1:5,
c = 2:6,
b = 3:7,
e = 4:8,
d = 5:9)
a c b e d
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
Code to change names:
new_names <- column_name_lookup$new[match(names(test_df),column_name_lookup$orig)]
names(test_df) <- ifelse(is.na(new_names),names(test_df),new_names)
X z Y e X
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
Related
I have the following data which I have split by name into separate data frames. After I run the following code, the variables in each data set are automatically named "X..i..".
I would like to rename the variable of each separate data frame so it matches the data set.
# load data
df1_raw <- data.frame(name = c("A", "B", "C", "A", "C", "B"),
start = c(1, 3, 4, 5, 2, 1),
end = c(6, 5, 7, 8, 6, 7))
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- lapply(df1, data.frame) # convert to df
# rename variables
name <- c("A", "B", "C")
for (i in seq_along(df1)) {
names(df1[i]) <- name[i]
}
The last for loop does not work to rename variables. When I type names(df1$A) I still get "X..i..". The output I would like from names(df1$A) is "A".
Does anyone have any thoughts on how to rename these variables? Thanks!
You need to use [[]] when indexing from a list
for (i in seq_along(df1)) {
names(df1[[i]]) <- name[i]
}
Alternatively you could change how you create the list so you don't have to rename after the fact
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- Map(function(x,name) {as.data.frame(setNames(list(x), name))}, df1, names(df1))
I think the solution by #MrFlick is enough for addressing the issue of renaming within a for loop.
Here is a base R workaround that may work for you
lapply(
split(df1_raw, df1_raw$name),
function(x) {
with(
x,
setNames(
data.frame(unlist(mapply(seq, start, end))),
unique(name)
)
)
}
)
which gives
$A
A
1 1
2 2
3 3
4 4
5 5
6 6
7 5
8 6
9 7
10 8
$B
B
1 3
2 4
3 5
4 1
5 2
6 3
7 4
8 5
9 6
10 7
$C
C
1 4
2 5
3 6
4 7
5 2
6 3
7 4
8 5
9 6
In Stata, I can create a variable after or before another one. E.g. gen age=., after(sex)
I would like to do the same in R. Is it possible?
My database has 300 variables, so I don't want to count it to discover its numbered position and also I might change from time to time.
You could do:
library(tibble)
data <- data.frame(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))
add_column(data, d = "", .after = "b")
# a b d c
# 1 1 1
# 2 2 2
# 3 3 3
Or another way could be:
data.frame(append(data, list(d = ""), after = match("b", names(data))))
First add the new column to the end of your data frame. Then, find the index of the column after which you want that new column to actually appear, and interpolate it:
df$new_col <- ...
index <- match("col_before", names(df))
df <- df[, c(names(df)[c(1:index)], "new_col", names(df)[c((index+1):(ncol(df)-1))])]
Sample:
df <- data.frame(v1=c(1:3), v2=c(4:6), v3=c(7:9))
df$new_col <- c(7,7,7)
index <- match("v2", names(df))
df <- df[, c(names(df)[c(1:index)], "new_col", names(df)[c((index+1):(ncol(df)-1))])]
df
v1 v2 new_col v3
1 1 4 7 7
2 2 5 7 8
3 3 6 7 9
This question already has answers here:
How to select columns programmatically in a data.table?
(2 answers)
Closed 4 years ago.
I have list of variable names in a vector of strings v and a data table my.dt that contains all these variables.
> v
[1] "var1" "var2" "var3"
I want to use those variables whose names are in the vector v, such that i create new variable that is cbind of these 3, or any number of names that appear in v, like:
new <- cbind(my.dt[,"var1"],my.dt[,"var2"],my.dt[,"var3"])
new1 <- rowSums(new, na.rm=TRUE) * ifelse(rowSums(is.na(new)) == ncol(new), NA, 1)
How can i get this, having in mind that number of variables is not fixed, so i dont want to refer to each element like v[1], v[2] etc.
Based on your comment, you are working on a data.table. You will need to add with = FALSE to the code as follows.
library(data.table)
my.dt <- data.table( ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18 )
v <- c("a", "ID")
my.dt[, v, with = FALSE]
# a ID
# 1: 1 b
# 2: 2 b
# 3: 3 b
# 4: 4 a
# 5: 5 a
# 6: 6 c
Notice that if you are working on a data frame, you don't need with = FALSE.
my.dt <- data.frame( ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18 )
v <- c("a", "ID")
my.dt[, v]
# a ID
# 1 1 b
# 2 2 b
# 3 3 b
# 4 4 a
# 5 5 a
# 6 6 c
I have long list of data frames (e.g., 100s) with names d1,d2,d3,..d100. I want to combine them in r as df <- cbind(d1:d100)? is there any efficient way of combining them except writing all column names?
You could first pack all your data frames into a list and then cbind them using do.call. Here I am assuming your data frames are called d1, d2, ... and that they all have the same number of rows:
## Sample data:
d1 <- data.frame(A = 1:3, B = 4:6)
d2 <- data.frame(C = 7:9)
d3 <- data.frame(D = 10:12, E = 13:15)
## Put them into a list:
myList <- lapply(1:3, function(ii){get(paste0("d", ii))})
## Combine them into one big data frame:
myDataFrame <- do.call('cbind', myList)
myDataFrame
# A B C D E
# 1 1 4 7 10 13
# 2 2 5 8 11 14
# 3 3 6 9 12 15
So, I have several dataframes like this
1 2 a
2 3 b
3 4 c
4 5 d
3 5 e
......
1 2 j
2 3 i
3 4 t
3 5 r
.......
2 3 t
2 4 g
6 7 i
8 9 t
......
What I want is, I want to merge all of these files into one single file showing the values of third column for each pair of values in columns 1 and columns 2 and 0 if that pair is not present.
So, the output for this will be, since, there are three files (there are more)
1 2 aj0
2 3 bit
3 4 ct0
4 5 d00
3 5 er0
6 7 00i
8 9 00t
......
What I did was combine all my text .txt files in a single list.
Then,
L <- lapply(seq_along(L), function(i) {
L[[i]][, paste0('DF', i)] <- 1
L[[i]]
})
Which will indicate the presence of a value when we will be merging them.
I don't know how to proceed further. Any inputs will be great. Thanks!
Here is one way to do it with Reduce
# function to generate dummy data
gen_data<- function(){
data.frame(
x = 1:3,
y = 2:4,
z = sample(LETTERS, 3, replace = TRUE)
)
}
# generate list of data frames to merge
L <- lapply(1:3, function(x) gen_data())
# function to merge by x and y and concatenate z
f <- function(x, y){
d <- merge(x, y, by = c('x', 'y'), all = TRUE)
# set merged column to zero if no match is found
d[['z.x']] = ifelse(is.na(d[['z.x']]), 0, d[['z.x']])
d[['z.y']] = ifelse(is.na(d[['z.y']]), 0, d[['z.y']])
d$z <- paste0(d[['z.x']], d[['z.y']])
d['z.x'] <- d['z.y'] <- NULL
return(d)
}
# merge data frames
Reduce(f, L)