Create a ordered list in a dataframe [duplicate] - r

This question already has answers here:
Sorting each row of a data frame [duplicate]
(2 answers)
Row wise Sorting in R
(2 answers)
Row-wise sort then concatenate across specific columns of data frame
(2 answers)
Closed 5 years ago.
I have the following data frame:
col1 <- c("a", "b", "c")
col2 <- c("c", "a", "d")
col3 <- c("b", "c", "a")
df <- data.frame(col1,col2,col3)
I want to create a new column in this data frame that has, for each row, the ordered list of the columns col1, col2, col3. So, for the first row it would be a list like "a", "b", "c".
The way I'm handling it is to create a loop but since I have 50k rows, it's quite inefficient, so I'm looking for a better solution.
rown <- nrow(df)
i = 0
while(i<rown){
i = i +1
col1 <- df$col1[i]
col2 <- df$col2[i]
col3 <- df$col3[i]
col1 <- as.character(col1)
col2 <- as.character(col2)
col3 <- as.character(col3)
list1 <- c(col1, col2, col3)
list1 <- list1[order(sapply(list1, '[[', 1))]
a <- list1[1]
b <- list1[2]
c <- list1[3]
df$col.list[i] <- paste(a, b, c, sep = " ")
}
Any ideas on how to make this code more efficient?
EDIT: the other question is not relevant in my case since I need to paste the three columns after sorting each row, so it's the paste statement that is dynamic, I'm not trying to change the data frame by sorting.
Expected output:
col1 col2 col3 col.list
a c b a b c
b a c a b c
c d a a c d

Related

Assigning complex values to character elements of data frame in R

There are three columns in my data frame which are characters, "A","B", and "C" (this order can vary for different data frames). I want to assign values to them, A= 1+0i, B=2+3i and C=3+2i. I use as.complex(factor(col1)) and the same thing for column two and three, but it makes all three column equal to 1+0i!!
col1 <- c("A","A", "A")
col2 <- c("B", "B","B")
col3 <- c("C","C","C")
df <- data.frame(col1,col2,col3)
print(df)
A= 1+0i
B=2+3i
C=3+2i
df2<- transform(df, col1=as.complex(as.factor(col1)),col2=as.complex(as.factor(col2)),col3=as.complex(as.factor(col3)))
sapply(df2,class)
View(df2)
So this is a weird thing you're doing. You have a column of strings, letters like "A" and "B". Then you have objects with the same names, A = 1 + 0i, etc. Normally we don't treat object names as "data", but you're sort of mixing the two here. The solution I'd propose is to make everything data: combine your A, B, and C values into a vector, and give the vector names accordingly. Then we can replace the values in the data frame with the corresponding values from our named vector:
vec = c(A, B, C)
names(vec) = c("A", "B", "C")
df[] = lapply(df, \(x) vec[x])
df
# col1 col2 col3
# 1 1+0i 2+3i 3+2i
# 2 1+0i 2+3i 3+2i
# 3 1+0i 2+3i 3+2i

How to create new columns of duplicate and unique values in R

I'm trying to use a .csv file with 2 columns (e.g. col1 and col2) to generate 3 new columns: a column composed of all the values in common (duplicates) that col1 and col2 have (col3), a column composed of the values unique to col1 (col4) and a column composed of the values unique to col2 (col5). I'd then like to export the file.
For this purpose I hoped to use dplyr with the functions mutate(), pipes %>%, as well as the other functions duplicated(), unique() and write_csv(). However, nothing I've tried so far has worked.
My data isn't practical to paste here, as it is tens of thousands of rows of categorical values (names), so I can provide a simple example of the type of data I am working with:
col1 <- data.frame(c("a", "b", "c", "d", "e"))
col2 <- data.frame(c("a", "c", "e", "f", "g"))
If the code is correct, I would like to create 3 new objects (columns: col3 for duplicates, col4 and col5 for unique values of each original column), which return:
col3 = a, c, e
col4 = b, d
col5 = f, g
and then save this file as a .csv, for example:
write_csv(col_umns, path = "data/col_umns.csv")
Many thanks for any help!

How can I make a data.frame using chr vertor [duplicate]

This question already has answers here:
aggregating unique values in columns to single dataframe "cell" [duplicate]
(2 answers)
Closed 2 years ago.
I have the data.frame below.
> Chr Chr
> A E
> A F
> A E
> B G
> B G
> C H
> C I
> D E
and... I want to convert the dataset as belows as you may be noticed.
I want to coerce all chr vectors into an row.
chr chr
A E,F
B G
C H,I
D E
they are all characters, so I tried to do several things so that I want to make.
Firstly, I used unique function for FILTER <- unique(chr[,15])1st column and try to subset them using
FILTER data that I created using rbind or bind rows function.
Secondly, I tested to check whether my idea works or not
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
result = unique(Top10Data[TGT == FILTER[i]]$`NM`))
print(result)
}
to this stage, it seems to be working well.
The problem for me is that when I used both functions, the data frame only creates 1 column and ignored the others vector (2nd variables from above data.frame) all.
Only For the chr [1,1], those functions do work well, but I have chr vectors such as chr[1,n], which is unable to be coerced.
here's my code for your reference.
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
CGONM <- rbind(NN,unique(Top10Data[TGT == FILTER[i]]$`NM`))
}
Base R solutions:
# Solution 1:
df_str_agg1 <- aggregate(var2~var1, df, FUN = function(x){
paste0(unique(x), collapse = ",")})
# Solution 2:
df_str_agg2 <- data.frame(do.call("rbind",lapply(split(df, df$var1), function(x){
data.frame(var1 = unique(x$var1),
var2 = paste0(unique(x$var2), collapse = ","))
}
)
),
row.names = NULL
)
Tidyverse solution:
library(tidyverse)
df_str_agg3 <-
df %>%
group_by(var1) %>%
summarise(var2 = str_c(unique(var2), collapse = ",")) %>%
ungroup()
Data:
df <- data.frame(var1 = c("A", "A", "A", "B", "B", "C", "C", "D"),
var2 = c("E", "F", "E", "G", "G", "H", "I", "E"), stringsAsFactors = FALSE)

Remove rows from data.table that meet condition

I have a data table
DT <- data.table(col1=c("a", "b", "c", "c", "a"), col2=c("b", "a", "c", "a", "b"), condition=c(TRUE, FALSE, FALSE, TRUE, FALSE))
col1 col2 condition
1: a b TRUE
2: b a FALSE
3: c c FALSE
4: c a TRUE
5: a b FALSE
and would like to remove rows on the following conditions:
each row for which condition==TRUE (rows 1 and 4)
each row that has the same values for col1 and col2 as a row for which the condition==TRUE (that is row 5, col1=a, col2=b)
finally each row that has the same values for col1 and col2 for which condition==TRUE, but with col1 and col2 switched (that is row 2, col1=b and col2=a)
So only row 3 should stay.
I'm doing this by making a new data table DTcond with all rows meeting the condition, looping over the values for col1 and col2, and collecting the indices from DT which will be removed.
DTcond <- DT[condition==TRUE,]
indices <- c()
for (i in 1:nrow(DTcond)) {
n1 <- DTcond[i, col1]
n2 <- DTcond[i, col2]
indices <- c(indices, DT[ ((col1 == n1 & col2 == n2) | (col1==n2 & col2 == n1)), which=T])
}
DT[!indices,]
col1 col2 condition
1: c c FALSE
This works but is terrible slow for large datasets and I guess there must be other ways in data.table to do this without loops or apply. Any suggestions how I could improve this (I'm new to data.table)?
You can do an anti join:
mDT = DT[(condition), !"condition"][, rbind(.SD, rev(.SD), use.names = FALSE)]
DT[!mDT, on=names(mDT)]
# col1 col2 condition
# 1: c c FALSE

Conditional adding of values in R with two dataframes [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
I have two dataframes (df1, df2).
df1 <- data.frame(term = c("A", "B", "C", "D", "E", "F"))
df2 <- data.frame(term = c("C", "F", "G"), freq = c(7, 3, 5))
In df1, I want to add a column ("freq") based on the values of "freq" in df2. So if the term in df1 and the term in df2 match, the count ("freq") of this term should be added to df1. Else it should be "0" (zero).
How can I do it, so that the processing time is as small as possible? Is there a way how to do it with dplyr? I cannot figure it out!!!
If we need a faster option, a data.table join can be used along with assigning (:=) the NA values to 0 in place.
library(data.table)
setDT(df2)[df1, on = "term"][is.na(freq), freq := 0][]
Or to avoid copies, as #Arun mentioned, create a 'freq' column in 'df1' and then join on 'term' replace the 'freq' with the corresponding 'i.freq' values.
setDT(df1)[, freq := 0][df2, freq := i.freq, on = "term"]
Or use left_join
library(dplyr)
left_join(df1, df2, by = 'term') %>%
mutate(freq = replace(freq, is.na(freq), 0)

Resources