How to merge all dataframe columns into a single one in R - r

I know how to manually merge specific columns of a dataframe into a single column:
df_new <- data.frame(paste(df$a, df$b, df$c))
My question is how can I do this dynamically with all of the dataframe's columns?

You can use do.call: ‘do.call’ constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
do.call(paste, df)

A solution from the tidyverse could be tidyr::unite():
df <- data.frame(x = letters[1:4], y = LETTERS[1:4], z = 1:4)
df_new <- tidyr::unite(df, col = "union", sep = " ")
where col is the name of the newly constructed column in the dataframe. sep is equivalent to its use in paste.

Related

How to dynamically complement colnames in a list of data.frames by information from a vector

I have a list of data.frames whereas the first column's colname in each data.frame is supposed to be complemented by dynamic information from a vector.
Example:
set.seed(1)
df1 <- data.frame(matrix(sample(32), ncol = 8))
names(df1) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
set.seed(2)
df2 <- data.frame(matrix(sample(32), ncol = 8))
names(df2) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
list_dfs <- list(df1, df2)
add_info <- c("add1", "other")
How can I add information from add_info to change the colname for a1 in df1 to "a1 add1" and a1 in df2 to "a1 add2" in a scalable way within the given list structure? The other colnames are not supposed to be changed.
I tried several approaches setting colnames using paste0 within lapply or a for loop and reviewed similar questions on SO but couldn't solve this problem so far.
You can do the following:
list_dfs <- lapply(1:length(list_dfs), function(i) {
setNames(list_dfs[[i]],paste(names(list_dfs[[i]]),add_info[[i]]))
})
Now the first dataframe in the list has its original name concatenated with the first element of add_info, the second has its names concatenated with second element of add_info. You can easily scale this to longer lists of data.frames and corresponding add_info-vectors.
Update:
If you only want to change the first name, do
list_dfs <- lapply(1:length(list_dfs), function(i) {
lastNames <- names(list_dfs[[i]])[2:NCOL(list_dfs[[i]])]
firstName <- paste(names(list_dfs[[i]])[1],add_info[[i]])
setNames(list_dfs[[i]],c(firstName,lastNames))
})

How do you replace an entire column in one dataframe with another column in another dataframe?

I have two dataframes. I want to replace the ids in dataframe1 with generic ids. In dataframe2 I have mapped the ids from dataframe1 with the generic ids.
Do I have to merge the two dataframes and after it is merged do I delete the column I don't want?
Thanks.
With dplyr
library(dplyr)
left_join(df1, df2, by = 'ids')
We can use merge and then delete the ids.
dataframe1 <- data.frame(ids = 1001:1010, variable = runif(min=100,max = 500,n=10))
dataframe2 <- data.frame(ids = 1001:1010, generics = 1:10)
result <- merge(dataframe1,dataframe2,by="ids")[,-1]
Alternatively we can use match and replace by assignment.
dataframe1$ids <- dataframe2$generics[match(dataframe1$ids,dataframe2$ids)]
Subsetting data frames isn't very difficult in R: hope this helps, you didn't provide much code so I hope this will be of help to you:
#create 4 random columns (vectors) of data, and merge them into data frames:
a <- rnorm(n=100,mean = 0,sd=1)
b <- rnorm(n=100,mean = 0,sd=1)
c <- rnorm(n=100,mean = 0,sd=1)
d<- rnorm(n=100,mean = 0,sd=1)
df_ab <- as.data.frame(cbind(a,b))
df_cd <- as.data.frame(cbind(c,d))
#if you want column d in df_cd to equal column a in df_ab simply use the assignment operator
df_cd$d <- df_ab$a
#you can also use the subsetting with square brackets:
df_cd[,"d"] <- df_ab[,"a"]

R Paste List to Bind

data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)
Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"
You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

R strsplit function in a data frame

I create a data frame which now I want to separate one new column by split the ":" in first column.
data frame:
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results:ASL|435 214.4421
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results:ASS1|445 2863.8055
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results:OTC|5009 0
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results:ASL|435 332.7522
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results:ASS1|445 3322.629
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results:OTC|5009 0
desired output:
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results ASL|435 214.4421
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results ASS1|445 2863.8055
unc.edu.0057f9f7-779b-4914-8290-abbad2a0d81e.2556919.rsem.genes.normalized_results OTC|5009 0
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results ASL|435 332.7522
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results ASS1|445 3322.629
unc.edu.050c2191-b96c-41e7-abdb-e52cbe82f268.2456235.rsem.genes.normalized_results OTC|5009 0
I have tried
strsplit(df$V1, split = "\\:")
but Error in strsplit(t$V1, split = "\:") : non-character argument come out. Thank you.
The error is because we have a variable of class factor. Convert it to character and it should work
lst <- strsplit(as.character(df$V1), split = ":", fixed = TRUE)
If we need to create two columns, one easy way is with read.table
df1 <- read.table(text = as.character(df$V1), sep=":", stringsAsFactors=FALSE)
Or using separate from tidyr
library(tidyr)
separate(df1, V1, into = c("V1", "V2"))
tidyr::separate(data = df, col = V1, into = c('a', 'b'), sep = ':')

Resources