I'm working from this answer trying to optimize the second argument in the plyr:rename, as suggested by Jared.
In short they are renaming some columns in a data frame using plyr like this,
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df
newNames <- c("new_col1", "new_col2", "new_col3")
oldNames <- names(df)
require(plyr)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
In passing Jared writes '[a]nd you can be creative in making that second argument to rename so that it is not so manual.'
I've tried being creative like this,
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df
secondArgument <- paste0('"', oldNames, '"','=', '"',newNames, '"',collapse = ',')
df <- rename(df, secondArgument)
df
But it does not work, can anyone help me automates this?
Thanks!
Update Sun Sep 9 11:55:42PM
I realized I should have been more specific in my question.
I'm using plyr::rename because I, in my real life example, have other variables and I don't always know the position of the variables I want to rename. I'll add an update to my question
My case look like this, but with 100+ variables
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
df2 <- rename(df2, c("col1"="new_col1", "col3"="new_col3"))
df2
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
newNames <- c("new_col1", "new_col3")
oldNames <- names(df[,c('col1', 'col3')])
secondArgument <- paste0('"', oldNames, '"','=', '"',newNames, '"',collapse = ',')
df2 <- rename(df2, secondArgument)
df2
Please add an comment if there is anything I need to clarify.
Solution to modified question:
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
newNames <- c("new_col1", "new_col3")
oldNames <- names(df2[,c('col1', 'col3')])
(Isn't oldNames equal toc('col1','col3') by definition?)
Solution with plyr:
secondArgument <- setNames(newNames,oldNames)
library(plyr)
df2 <- rename(df2, secondArgument)
df2
Or in base R you could do:
names(df2)[match(oldNames,names(df2))] <- newNames
Set the names on newNames to the names from oldNames:
R> names(newNames) <- oldNames
R> newNames
col1 col2 col3
"new_col1" "new_col2" "new_col3"
R> df <- rename(df, newNames)
R> df
new_col1 new_col2 new_col3
1 1 3 6
2 2 4 7
3 3 5 8
plyr::rename requires a named character vector, with new names as values, and old names as names.
This should work:
names(newNames) <- oldNames
df <- rename(df, newNames)
df
new_col1 new_col2 new_col3
1 1 3 6
2 2 4 7
3 3 5 8
Related
so I know that you can apply a function to a list of dfs, but I cannot work this here out:
I have a list of n dataframes, let's say from df1 to df4 (numbered accordingly).
df1 <- data.frame(name= c("mark", "peter", "lily"), column1= c(1,2,3),column2= c(4,5,6))
df2 <- data.frame(name= c("mark", "liam", "peter"), column1= c(7,8,9),column2= c(1,2,3))
df3 <- data.frame(name= c("felix", "liam", "peter"), column1= c(3,5,8),column2= c(1,5,8))
df4 <- data.frame(name= c("felix", "lily", "liam"), column1= c(6,2,6),column2= c(4,2,2))
Now I story my dsf in a list:
df_list = mget(ls(pattern = "df[1-4]"))
Then, I have this functions
df_combined <- dfx %>%
left_join(dfy, by="name") %>%
mutate(combined=(column1.x + column2.x)/column2.y)) %>%
filter(!is.na(combined)) %>%
select(name,combined)
add_match_column<-function(df){
df %>% left_join(df_combined)
}
df_list_matched <- df_list %>%
map(add_match_column)
Now is there a way to apply this function to consecutive dfs?
I.e. in the first "iteration" is dfx = df1, dfy = df2 and in the following "iteration" dfx = df2, dfy = df3 and so on...
Please keep in mind that I have way more dfs in reality, numbered by years.
Edits:
If dfx is the last df of the list, then the code should stop there not making any further iterations
The output should be that the result of the df_combined function is a new column in every dfy. Thus, the first df in my list is left out.
Here is a base R solution using lapply for iteration:
df_list_match <- function(df1, df2){
new <- merge(df1, df2, by="name", all.y=T)
new$combined <- (new$column1.x + new$column2.x)/(new$column2.y)
new <- new[!is.na(new$combined), c(1,4,5,6)]
names(new) <- c("name", "column1", "column2", "combined")
return(new)
}
result <- lapply(2:length(df_list), function(x) {df_list_match(df_list[[x-1]],df_list[[x]]) })
result
[[1]]
name column1 column2 combined
1 mark 7 1 5.000000
2 peter 9 3 2.333333
[[2]]
name column1 column2 combined
1 liam 5 5 2.0
2 peter 8 8 1.5
[[3]]
name column1 column2 combined
1 felix 6 4 1
2 liam 6 2 5
If you want to have all original entries from the data.frame (display NA if an entry in dfy is not in dfx) you can just delete !is.na(new$combined) in the function.
I have ...
X <- list(df1 <- data.frame(A=c("CC","CC(=O)C","CC(=O)O"),conc=c(1.0,2.2,1.0),Vol=c(2,4.5,6),stringsAsFactors=FALSE), df2 <- data.frame(A=c("COC","O=CC"),conc=c(2.0,3.2),Vol=c(10,23),stringsAsFactors=FALSE))
X
[1]]
A conc Vol
1 CC 1.0 2.0
2 CC(=O)C 2.2 4.5
3 CC(=O)O 1.0 6.0
[[2]]
A conc Vol
1 COC 2.0 10
2 O=CC 3.2 23
and I want to change the first column to rownames.
I have tried the obvious,
X <-lapply(X,function(a) {rownames(a) <- a$A})
but this doesn't work ...
I began experimenting with the mapply function but I didn't get very far ...
Assuming when you said, "I want to change the first column to rownames" that you want to define the rownames with column A and drop column A.
lapply(X, function(df) {
#new df without rowname col
df_out <- df[,-1]
#set rownames as first col from input df
rownames(df_out) <- df[[1]]
df_out
})
If you want to set column A to the current rownames then the code in #Sotos comment is a possible answer.
This is possible using dplyr and tibble column_to_rownames
X <- list(df1 <- data.frame(A=c("CC","CC(=O)C","CC(=O)O"),conc=c(1.0,2.2,1.0),Vol=c(2,4.5,6),stringsAsFactors=FALSE), df2 <- data.frame(A=c("COC","O=CC"),conc=c(2.0,3.2),Vol=c(10,23),stringsAsFactors=FALSE))
library(dplyr)
library(tibble)
X <- lapply(X, column_to_rownames, "A")
I have a dataframe say df. I have extracted a sample 5% rows from df and created a new dataframe df1 to do few manipulations in the dataset. Now I need to append df1 to df and overwrite the existing rows of df1 as it is a subset of df.
I tried to extract the rows that are not present in df using
df2 <- subset(df, !(rownames(df) %in% rownames(df1[])))
But this didnt work.
Can anyone help please.
Save the filter and re-use it like so
set.seed(357)
xy <- data.frame(col1 = letters[1:5], col2 = runif(5))
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 0.27987766
4 d 0.22486212
5 e 0.65348521
your.condition <- xy$col1 %in% c("c", "d")
newxy1 <- xy[your.condition, ]
newxy1$col2 <- 1:2
xy[your.condition, "col2"] <- newxy1$col2
xy
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 1.00000000
4 d 2.00000000
5 e 0.65348521
You should always try to make a reproducible example so that it is easy for others to help you
I have tried to do that with the help of mtcars dataset
#Copied mtcars data into df
df = mtcars
# sample 5 rows from df
df1 = df[sample(1:nrow(df), 5), ]
# did few manipulations in the dataset
df1 = df1 * 2
# overwrite the existing rows of df1 as it is a subset of df
df[rownames(df1), ] <- df1
Objective: Change the Column Names of all the Data Frames in the Global Environment from the following list
colnames of the ones in global environment
So.
0) The Column names are:
colnames = c("USAF","WBAN","YR--MODAHRMN")
1) I have the following data.frames: df1, df2.
2) I put them in a list:
dfList <- list(df1,df2)
3) Loop through the list:
for (df in dfList){
colnames(df)=colnames
}
But this creates a new df with the column names that I need, it doesn't change the original column names in df1, df2. Why? Could lapply be a solution? Thanks
Can something like:
lapply(dfList, function(x) {colnames(dfList)=colnames})
work?
With lapply you can do it as follows.
Create sample data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(X = 1, Y = 2, Z = 3)
dfList <- list(df1,df2)
colnames <- c("USAF","WBAN","YR--MODAHRMN")
Then, lapply over the list using setNames and supply the vector of new column names as second argument to setNames:
lapply(dfList, setNames, colnames)
#[[1]]
# USAF WBAN YR--MODAHRMN
#1 1 2 3
#
#[[2]]
# USAF WBAN YR--MODAHRMN
#1 1 2 3
Edit
If you want to assign the data.frames back to the global environment, you can modify the code like this:
dfList <- list(df1 = df1, df2 = df2)
list2env(lapply(dfList, setNames, colnames), .GlobalEnv)
Just change your for-loop into an index for-loop like this:
Data
df1 <- data.frame(a=runif(5), b=runif(5), c=runif(5))
df2 <- data.frame(a=runif(5), b=runif(5), c=runif(5))
dflist <- list(df1,df2)
colnames = c("USAF","WBAN","YR--MODAHRMN")
Solution
for (i in seq_along(dflist)){
colnames(dflist[[i]]) <- colnames
}
Output
> dflist
[[1]]
USAF WBAN YR--MODAHRMN
1 0.8794153 0.7025747 0.2136040
2 0.8805788 0.8253530 0.5467952
3 0.1719539 0.5303908 0.5965716
4 0.9682567 0.5137464 0.4038919
5 0.3172674 0.1403439 0.1539121
[[2]]
USAF WBAN YR--MODAHRMN
1 0.20558383 0.62651334 0.4365940
2 0.43330717 0.85807280 0.2509677
3 0.32614750 0.70782919 0.6319263
4 0.02957656 0.46523151 0.2087086
5 0.58757198 0.09633181 0.6941896
By using for (df in dfList) you are essentially creating a new df each time and change the column names to that leaving the original list (dfList) untouched.
If you want the for loop to work, you should not pass the whole data.frame as the argument.
for (df in 1:length(dfList))
colnames(dfList[[df]]) <- colnames
dfList <- lapply(dfList, `names<-`, colnames)
Create the sample data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(X = 1, Y = 2, Z = 3)
dfList <- list(df1,df2)
name <- c("USAF","WBAN","YR--MODAHRMN")
Then create a function to set the colnames:
res=lapply(dfList, function(x){colnames(x)=c(name);x})
[[1]]
USAF WBAN YR--MODAHRMN
1 1 2 3
[[2]]
USAF WBAN YR--MODAHRMN
1 1 2 3
A tidyverse solution with rename_with:
library(dplyr)
library(purrr)
map(dflist, ~ rename_with(., ~ colnames))
Or, if it's only for one column:
map(dflist, ~ rename(., new_col = old_col))
This also works with lapply:
lapply(dflist, rename_with, ~ colnames)
lapply(dflist, rename, new_col = old_col)
Every week I a incomplete dataset for a analysis. That looks like:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
Some var1 values are missing. The dataset should end up looking like this:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.
Thanks for your help.
QUESTION UPDATE AFTER COMMENT
var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
Here is one way of doing it by making use of run-length encoding (rle) and its inverse rle.inverse:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
The results:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
Here is a simpler way:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
The tidyr packages has the fill() function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
Here is another way which is slightly shorter and doesn't coerce to character:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
Results:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
Below is my unfill function, encontered same problem, hope will help.
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}