List of Dataframes First Column to rownames - r

I have ...
X <- list(df1 <- data.frame(A=c("CC","CC(=O)C","CC(=O)O"),conc=c(1.0,2.2,1.0),Vol=c(2,4.5,6),stringsAsFactors=FALSE), df2 <- data.frame(A=c("COC","O=CC"),conc=c(2.0,3.2),Vol=c(10,23),stringsAsFactors=FALSE))
X
[1]]
A conc Vol
1 CC 1.0 2.0
2 CC(=O)C 2.2 4.5
3 CC(=O)O 1.0 6.0
[[2]]
A conc Vol
1 COC 2.0 10
2 O=CC 3.2 23
and I want to change the first column to rownames.
I have tried the obvious,
X <-lapply(X,function(a) {rownames(a) <- a$A})
but this doesn't work ...
I began experimenting with the mapply function but I didn't get very far ...

Assuming when you said, "I want to change the first column to rownames" that you want to define the rownames with column A and drop column A.
lapply(X, function(df) {
#new df without rowname col
df_out <- df[,-1]
#set rownames as first col from input df
rownames(df_out) <- df[[1]]
df_out
})
If you want to set column A to the current rownames then the code in #Sotos comment is a possible answer.

This is possible using dplyr and tibble column_to_rownames
X <- list(df1 <- data.frame(A=c("CC","CC(=O)C","CC(=O)O"),conc=c(1.0,2.2,1.0),Vol=c(2,4.5,6),stringsAsFactors=FALSE), df2 <- data.frame(A=c("COC","O=CC"),conc=c(2.0,3.2),Vol=c(10,23),stringsAsFactors=FALSE))
library(dplyr)
library(tibble)
X <- lapply(X, column_to_rownames, "A")

Related

Extract and append data to new datasets in a for loop

I have (what I think) is a really simple question, but I can't figure out how to do it. I'm fairly new to lists, loops, etc.
I have a small dataset:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df
I need to loop through this dataset and create a list of datasets, such that this is the outcome:
[[1]]
one
[[2]]
one
two
[[3]]
one
two
three
This is more or less as far as I've gotten:
blah <- list()
for(i in 1:3){
blah[[i]]<- i
}
The length will be variable when I use this in the future, so I need to automate it in a loop. Otherwise, I would just do
one <- df[1,]
two <- df[2,]
list(one, rbind(one, two))
Any ideas?
You can try using lapply :
result <- lapply(seq(nrow(df)), function(x) df[seq_len(x), , drop = FALSE])
result
#[[1]]
# df
#1 one
# [[2]]
# df
#1 one
#2 two
#[[3]]
# df
#1 one
#2 two
#3 three
#[[4]]
# df
#1 one
#2 two
#3 three
#4 four
seq(nrow(df)) creates a sequence from 1 to number of rows in your data (which is 4 in this case). function(x) part is called as anonymous function where each value from 1 to 4 is passed to one by one. seq_len(x) creates a sequence from 1 to x i.e 1 to 1 in first iteration, 1 to 2 in second and so on. We use this sequence to subset the rows from dataframe (df[seq_len(x), ]). Since the dataframe has only 1 column when we subset it , it changes it to a vector. To avoid that we add drop = FALSE.
Base R solution:
# Coerce df vector of data.frame to character, store as new data.frame: str_df => data.frame
str_df <- transform(df, df = as.character(df))
# Allocate some memory in order to split data into a list: df_list => empty list
df_list <- vector("list", nrow(str_df))
# Split the string version of the data.frame into a list as required:
# df_list => list of character vectors
df_list <- lapply(seq_len(nrow(str_df)), function(i){
str_df[if(i == 1){1}else{1:i}, grep("df", names(str_df))]
}
)
Data:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df

Speeding up an R for loop to paste multiple variables together

I'm new here but could use some help. I have a list of data frames, and for each element within my list (i.e., data.frame) I want to quickly paste one column in a data set to multiple other columns in the same data set, separated only by a period (".").
So if I have one set of data in a list of data frames:
list1[[1]]
A B C
2 1 5
4 2 2
Then I want the following result:
list1[[1]]
A B C
2.5 1.5 5
4.2 2.2 2
Where C is pasted to A and B individually. I then want this operation to take place for each data frame in my list.
I have tried the following:
pasteX<-function(df) {for (i in 1:dim(df)[2]-1) {
df[,i]<-as.numeric(sprintf("%s.%s", df[,i], df$C))
}
return(df)}
list2<-lapply(list1, pasteX)
But this approach is verrrry slow for larger matrices and lists. Any recommendations for making this code faster? Thanks!
Assuming everything is integers < 10
lapply(list1, function(x){
x[,-3] <- x[,-3] + x[,3]/10
x})
We can use Map
list1[[1]][-3] <- Map(function(x, y) as.numeric(sprintf('%s.%s', x, y)),
list1[[1]][-3], list1[[1]][3])
If there are many datasets, loop using lapply, convert the first two columns to matrix and paste with the third column, update the output, and return the dataset
lapply(list1, function(x) {
x[1:2] <- as.numeric(sprintf('%s.%s', as.matrix(x[1:2]), x[,3]));
x })
#[[1]]
# A B C
#1 2.5 1.5 5
#2 4.2 2.2 2
Or using tidyverse
library(tidyverse)
map(list1, ~ .x %>%
mutate_at(1:2, funs(as.numeric(sprintf('%s.%s', ., C)))))
Or with data.table
library(data.table)
lapply(list1, function(x) setDT(x)[, (1:2) :=
lapply(.SD, function(x) as.numeric(sprintf('%s.%s', x, C))) ,
.SDcols = 1:2][])
try this:
df <- data.frame(a = c(1,2,3), b = c(3,2,1), c = c(2,1,1))
pastex <- function(x){
m<- sapply(df[,1:2], function(x) as.numeric(paste(x, df$c, sep = '.')))
m <- as.data.frame(m)
m <- cbind(m, df["c"])
return(m)
}
mylist <- list(df1 = df, df2 = df)
lapply(mylist, pastex)

Changing Column Names in a List of Data Frames in R

Objective: Change the Column Names of all the Data Frames in the Global Environment from the following list
colnames of the ones in global environment
So.
0) The Column names are:
colnames = c("USAF","WBAN","YR--MODAHRMN")
1) I have the following data.frames: df1, df2.
2) I put them in a list:
dfList <- list(df1,df2)
3) Loop through the list:
for (df in dfList){
colnames(df)=colnames
}
But this creates a new df with the column names that I need, it doesn't change the original column names in df1, df2. Why? Could lapply be a solution? Thanks
Can something like:
lapply(dfList, function(x) {colnames(dfList)=colnames})
work?
With lapply you can do it as follows.
Create sample data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(X = 1, Y = 2, Z = 3)
dfList <- list(df1,df2)
colnames <- c("USAF","WBAN","YR--MODAHRMN")
Then, lapply over the list using setNames and supply the vector of new column names as second argument to setNames:
lapply(dfList, setNames, colnames)
#[[1]]
# USAF WBAN YR--MODAHRMN
#1 1 2 3
#
#[[2]]
# USAF WBAN YR--MODAHRMN
#1 1 2 3
Edit
If you want to assign the data.frames back to the global environment, you can modify the code like this:
dfList <- list(df1 = df1, df2 = df2)
list2env(lapply(dfList, setNames, colnames), .GlobalEnv)
Just change your for-loop into an index for-loop like this:
Data
df1 <- data.frame(a=runif(5), b=runif(5), c=runif(5))
df2 <- data.frame(a=runif(5), b=runif(5), c=runif(5))
dflist <- list(df1,df2)
colnames = c("USAF","WBAN","YR--MODAHRMN")
Solution
for (i in seq_along(dflist)){
colnames(dflist[[i]]) <- colnames
}
Output
> dflist
[[1]]
USAF WBAN YR--MODAHRMN
1 0.8794153 0.7025747 0.2136040
2 0.8805788 0.8253530 0.5467952
3 0.1719539 0.5303908 0.5965716
4 0.9682567 0.5137464 0.4038919
5 0.3172674 0.1403439 0.1539121
[[2]]
USAF WBAN YR--MODAHRMN
1 0.20558383 0.62651334 0.4365940
2 0.43330717 0.85807280 0.2509677
3 0.32614750 0.70782919 0.6319263
4 0.02957656 0.46523151 0.2087086
5 0.58757198 0.09633181 0.6941896
By using for (df in dfList) you are essentially creating a new df each time and change the column names to that leaving the original list (dfList) untouched.
If you want the for loop to work, you should not pass the whole data.frame as the argument.
for (df in 1:length(dfList))
colnames(dfList[[df]]) <- colnames
dfList <- lapply(dfList, `names<-`, colnames)
Create the sample data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(X = 1, Y = 2, Z = 3)
dfList <- list(df1,df2)
name <- c("USAF","WBAN","YR--MODAHRMN")
Then create a function to set the colnames:
res=lapply(dfList, function(x){colnames(x)=c(name);x})
[[1]]
USAF WBAN YR--MODAHRMN
1 1 2 3
[[2]]
USAF WBAN YR--MODAHRMN
1 1 2 3
A tidyverse solution with rename_with:
library(dplyr)
library(purrr)
map(dflist, ~ rename_with(., ~ colnames))
Or, if it's only for one column:
map(dflist, ~ rename(., new_col = old_col))
This also works with lapply:
lapply(dflist, rename_with, ~ colnames)
lapply(dflist, rename, new_col = old_col)

R - co-locate columns with the same name after merge

Situation
I have two data frames, df1 and df2with the same column headings
x <- c(1,2,3)
y <- c(3,2,1)
z <- c(3,2,1)
names <- c("id","val1","val2")
df1 <- data.frame(x, y, z)
names(df1) <- names
a <- c(1, 2, 3)
b <- c(1, 2, 3)
c <- c(3, 2, 1)
df2 <- data.frame(a, b, c)
names(df2) <- names
And am performing a merge
#library(dplyr) # not needed for merge
joined_df <- merge(x=df1, y=df2, c("id"),all=TRUE)
This gives me the columns in the joined_df as id, val1.x, val2.x, val1.y, val2.y
Question
Is there a way to co-locate the columns that had the same heading in the original data frames, to give the column order in the joined data frame as id, val1.x, val1.y, val2.x, val2.y?
Note that in my actual data frame I have 115 columns, so I'd like to stay clear of using joned_df <- joined_df[, c(1, 2, 4, 3, 5)] if possible.
Update/Edit: also, I would like to maintain the original order of column headings, so sorting alphabetically is not an option (-on my actual data, I realise it would work with the example I have given).
My desired output is
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1
Update with solution for general case
The accepted answer solves my issue nicely.
I've adapted the code slightly here to use the original column names, without having to hard-code them in the rep function.
#specify columns used in merge
merge_cols <- c("id")
# identify duplicate columns and remove those used in the 'merge'
dup_cols <- names(df1)
dup_cols <- dup_cols [! dup_cols %in% merge_cols]
# replicate each duplicate column name and append an 'x' and 'y'
dup_cols <- rep(dup_cols, each=2)
var <- c("x", "y")
newnames <- paste(dup_cols, ".", var, sep = "")
#create new column names and sort the joined df by those names
newnames <- c(merge_cols, newnames)
joined_df <- joined_df[newnames]
How about something like this
numrep <- rep(1:2, each = 2)
numrep
var <- c("x", "y")
var
newnames <- paste("val", numrep, ".", var, sep = "")
newdf <- cbind(joined_df$id, joined_df[newnames])
names(newdf)[1] <- "id"
Which should give you the dataframe like this
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1

paste0-build an argument inside plyr:rename (now with update)

I'm working from this answer trying to optimize the second argument in the plyr:rename, as suggested by Jared.
In short they are renaming some columns in a data frame using plyr like this,
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df
newNames <- c("new_col1", "new_col2", "new_col3")
oldNames <- names(df)
require(plyr)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
In passing Jared writes '[a]nd you can be creative in making that second argument to rename so that it is not so manual.'
I've tried being creative like this,
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df
secondArgument <- paste0('"', oldNames, '"','=', '"',newNames, '"',collapse = ',')
df <- rename(df, secondArgument)
df
But it does not work, can anyone help me automates this?
Thanks!
Update Sun Sep 9 11:55:42PM
I realized I should have been more specific in my question.
I'm using plyr::rename because I, in my real life example, have other variables and I don't always know the position of the variables I want to rename. I'll add an update to my question
My case look like this, but with 100+ variables
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
df2 <- rename(df2, c("col1"="new_col1", "col3"="new_col3"))
df2
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
newNames <- c("new_col1", "new_col3")
oldNames <- names(df[,c('col1', 'col3')])
secondArgument <- paste0('"', oldNames, '"','=', '"',newNames, '"',collapse = ',')
df2 <- rename(df2, secondArgument)
df2
Please add an comment if there is anything I need to clarify.
Solution to modified question:
df2 <- data.frame(col1=1:3,col2=3:5,col3=6:8)
df2
newNames <- c("new_col1", "new_col3")
oldNames <- names(df2[,c('col1', 'col3')])
(Isn't oldNames equal toc('col1','col3') by definition?)
Solution with plyr:
secondArgument <- setNames(newNames,oldNames)
library(plyr)
df2 <- rename(df2, secondArgument)
df2
Or in base R you could do:
names(df2)[match(oldNames,names(df2))] <- newNames
Set the names on newNames to the names from oldNames:
R> names(newNames) <- oldNames
R> newNames
col1 col2 col3
"new_col1" "new_col2" "new_col3"
R> df <- rename(df, newNames)
R> df
new_col1 new_col2 new_col3
1 1 3 6
2 2 4 7
3 3 5 8
plyr::rename requires a named character vector, with new names as values, and old names as names.
This should work:
names(newNames) <- oldNames
df <- rename(df, newNames)
df
new_col1 new_col2 new_col3
1 1 3 6
2 2 4 7
3 3 5 8

Resources