This question already has answers here:
How to add multiple columns to a data.frame in one go?
(2 answers)
Closed 4 years ago.
I am in the process of reformatting a few data frames and was wondering if there is a more efficient way to add named columns to data frames, rather than the below:
colnames(df) <- c("c1", "c2)
to rename the current columns and:
df$c3 <- ""
to create a new column.
Is there a way to do this in a quicker manner? I'm trying to add dozens of named columns and this seems like an inefficient way of going through the process.
use your method in a shorter way:
cols_2_add=c("a","b","c","f")
df[,cols_2_add]=""
A way to add additional columns can be achieved using merge. Apply merge on existing dataframe with the one created with a desired columns and empty rows. This will be helpful if you want to create columns of different types.
For example:
# Existing dataframe
df <- data.frame(x=1:3, y=4:6)
#use merge to create say desired columns as a, b, c, d and e
merge(df, data.frame(a="", b="", c="", d="", e=""))
# Result
# x y a b c d e
#1 1 4
#2 2 5
#3 3 6
# Desired columns of different types
library(dplyr)
bind_rows(df, data.frame(a=character(), b=numeric(), c=double(), d=integer(),
e=as.Date(character()), stringsAsFactors = FALSE))
# x y a b c d e
#1 1 4 <NA> NA NA NA <NA>
#2 2 5 <NA> NA NA NA <NA>
#3 3 6 <NA> NA NA NA <NA>
A simple loop can help here
name_list <- c('a1','b1','c1','d1')
# example df
df <- data.frame(a = runif(3))
# this adds a new column
for(i in name_list)
{
df[[i]] <- runif(3)
}
# output
a a1 b1 c1 d1
1 0.09227574 0.08225444 0.4889347 0.2232167 0.8718206
2 0.94361151 0.58554887 0.7095412 0.2886408 0.9803941
3 0.22934864 0.73160433 0.6781607 0.7598064 0.4663031
# in case of data.table, for-set provides faster version:
# example df
df <- data.table(a = runif(3))
for(i in name_list)
set(df, j=i, value = runif(3))
Related
How to simply "paste" two data frames next to each other, filling unequal rows with NAs (e.g. because I want to make a "kable" or sth similar)?
df1 <- data.frame(a = c(1,2,3),
b = c(3,4,5))
df2 <- data.frame(a = c(4,5),
b = c(5,6))
# The desired "merge"
a b a b
1 3 4 5
2 4 5 6
3 5 NA NA
Thanks to Ronak Shah, I found an easy answer in the answers to this post: How to cbind or rbind different lengths vectors without repeating the elements of the shorter vectors?
Without having to hack anything together, one can use cbind.na from the qpcR: package:
df1 <- data.frame(a = c(1,2,3),
b = c(3,4,5))
df2 <- data.frame(a = c(4,5),
b = c(5,6))
comb <- qpcR:::cbind.na(df1, df2)
As this answer is 4 years old, I wonder if there are more "modern" solutions in the popular packages like tidyverse et. al.
In base R you could do:
nr <- max(nrow(df1), nrow(df2))
cbind(df1[1:nr, ], df2[1:nr, ])
# a b a b
# 1 1 3 4 5
# 2 2 4 5 6
# 3 3 5 NA NA
I have a vector of variable names and several matrices with single rows.
I want to create a new matrix. The new matrix is created by match/merge the row names of the matrices with single rows.
Example:
A vector of variable names
Complete_names <- c("D","C","A","B")
Several matrices with single rows
Matrix_1 <- matrix(c(1,2,3),3,1)
rownames(Matrix_1) <- c("D","C","B")
Matrix_2 <- matrix(c(4,5,6),3,1)
rownames(Matrix_1) <- c("A","B","C")
Desired output:
Desired_output <- matrix(c(1,2,NA,3,NA,6,4,5),4,2)
rownames(Desired_output) <- c("D","C","A","B")
[,1] [,2]
D 1 NA
C 2 6
A NA 4
B 3 5
I know there are several similar postings like this, but those previous answers do not work perfectly for this one.
The main job can be done with merge, returning a data frame:
merge(Matrix_1, Matrix_2, by = "row.names", all = TRUE)
# Row.names V1.x V1.y
# 1 A NA 4
# 2 B 3 5
# 3 C 2 6
# 4 D 1 NA
Depending on your purposes you may then further modify names or get rid of Row.names.
The answers offered by Julius Vainora and achimneyswallow work well, but just to exactly obtain the desired output I want:
temp <- merge(Matrix_1, Matrix_2, by = "row.names", all = TRUE)
temp$Row.names <- factor(temp$Row.names, levels=Complete_names)
temp <- temp[order(temp$Row.names),]
rownames(temp) <- temp[,1]
Desired_output <- as.matrix(temp[,-1])
V1.x V1.y
D 1 NA
C 2 6
A NA 4
B 3 5
This question already has answers here:
Split dataframe using two columns of data and apply common transformation on list of resulting dataframes
(3 answers)
Closed 4 years ago.
Suppose I have a dataframe with 3 columns. I would like to create separate sub-dataframes for each of the unique combinations of a few columns.
For example, suppose we have just 3 columns,
a <- c(1,5,2,3,4,5,3,2,1,3)
b <- c("a","a","f","d","f","c","a","r","a","c")
c <- c(.2,.6,.4,.545,.98,.312,.112,.4,.9,.5)
df <- data.frame(a,b,c)
I would like to get a separate dataframe for each of the unique combinations of Column 'a' and 'b'
I started with using unique to get a list of the unique combinations as the following,
factors <- unique(df[,c('a','b')])
a b
1 1 a
2 5 a
3 2 f
4 3 d
5 4 f
6 5 c
7 3 a
8 2 r
10 3 c
But I am not sure what to do next.
The code below are for illustration purposes. Ideally this will be done through a loop where it uses each of the rows in factors to create the dataframes.
df_1_a <- df %>% filter(a==1, b=='a')
a b c
1 1 a 0.2
2 1 a 0.9
df_3_a <- %>% filter(a==3, b=='a')
a b c
1 3 a 0.112
.
.
.
This is kinda dirty and I'm not sure that answer your question but try this :
a <- c(1,5,2,3,4,5,3,2,1,3)
b <- c("a","a","f","d","f","c","a","r","a","c")
c <- c(.2,.6,.4,.545,.98,.312,.112,.4,.9,.5)
d <- paste0(a,b)
df <- data.frame(a,b,c,d)
df_splited <- split(df,df$d)
You obtain a list composed of dataframes with unique combinaison of a,b
You can use split after you get the unique combinations you are after.
a <- c(1,5,2,3,4,5,3,2,1,3)
b <- c("a","a","f","d","f","c","a","r","a","c")
c <- c(.2,.6,.4,.545,.98,.312,.112,.4,.9,.5)
df <- data.frame(a,b,c,stringsAsFactors = FALSE)
fx <- unique(df[,c('a','b')])
fx_list <- split(fx,rownames(fx))
I have many data.frames, for example:
df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5))
df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6))
df3 = data.frame(names=c('c','e'),data3=c(1,2))
and I need to merge these data.frames, without delete the name duplicates
> result
names data1 data2 data3
1 'a' 1 1 NA
2 'b' 2 NA NA
3 'c' 3 4 1
4 'c' 4 5 NA
5 'd' 5 6 NA
6 'e' NA 2 2
7 'e' NA 3 NA
I cant find function like merge with option to handle with name duplicates. Thank you for your help.
To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.
First define a function, run.seq, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq column to each component. Finally use Reduce to merge them all.
run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))
out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]
The last line gives:
> out
names data1 data2 data3
1 a 1 1 NA
2 b 2 NA NA
3 c 3 4 1
4 c 4 5 NA
5 d 5 6 NA
6 e NA 2 2
7 e NA 3 NA
EDIT: Revised run.seq so that input need not be sorted.
See other questions:
How to join data frames in R (inner, outer, left, right)
recombining-a-list-of-data-frames-into-a-single-data-frame
...
Examples:
library(reshape)
out <- merge_recurse(L)
or
library(plyr)
out<-join(df1, df2, type="full")
out<-join(out, df3, type="full")
*can be looped
or
library(plyr)
out<-ldply(L)
I think there is just not enough information in your example data frames to do this. Which 'c' in dataframe 1 should be paired with which 'c' in data frame 2? We cannot tell, so R can't either. I suspect you will have to add another variable to each of your dataframes that uniquely identifies these duplicate cases.
While merging 3 data.frames using plyr library, I encounter some values with the same name but with different values each in different data.frames.
How does the do.call(rbind.fill,list) treat this problem: by arithmetic or geometric average?
From the help page for rbind.fill:
Combine data.frames by row, filling in missing columns. rbinds a list of data frames
filling missing columns with NA.
So I'd expect it to fill columns that do not match with NA. It is also not necessary to use do.call() here.
dat1 <- data.frame(a = 1:2, b = 4:5)
dat2 <- data.frame(b = 3:2, c = 8:9)
dat3 <- data.frame(a = 5:6, c = 1:2)
rbind.fill(dat1, dat2, dat3)
a b c
1 1 4 NA
2 2 5 NA
3 NA 3 8
4 NA 2 9
5 5 NA 1
6 6 NA 2
Are you expecting something different?