Dynamic column rename based on a separate data frame in R

Dynamic column rename based on a separate data frame in R - r

Generate df1 and df2 like this
pro <- c("Hide-Away", "Hide-Away")
sourceName <- c("New Rate2", "FST")
standardName <- c("New Rate", "SFT")
df1 <- data.frame(pro, sourceName, standardName, stringsAsFactors = F)
A <- 1; B <- 2; C <-3; D <- 4; G <- 5; H <- 6; E <-7; FST <-8; Z <-8
df2<- data.frame(A,B,C,D,G,H,E,FST)
colnames(df2)[1]<- "New Rate2"
Then run this code.
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of DF2 will be like
New Rate2 B C D G H E FST
1 2 3 4 5 6 7 8
The output of DF2 will be like
New Rate B C D G H E SFT
1 2 3 4 5 6 7 8
So clearly the code worked and swapped the names correctly. But now create df2 with the below code instead. And make sure to regenrate df1 to what it was before.
df2<- data.frame(FST,B,C,D,G,H,E,Z)
colnames(df2)[8]<- "New Rate2"
and then run
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of df2 will be
FST B C D G H E New Rate2
8 2 3 4 5 6 7 8
The output of df2 will be
New Rate B C D G H E SFT
8 2 3 4 5 6 7 8
So the order of the columns has not been preserved. I know this is because of the %in code but I am not sure of an easy fix to make the column swapping more dynamic.

I am not totally sure about the question, as it seems a little vague. I'll try my best though--the best way I know to dynamically set column names is setnames from the data.table package. So let's say that I have a set of source names and a set of standard names, and I want to swap the source for the standard (which I take to be the question).
Given the data above, I have a data.frame structured like so:
> df2
A B C D G H E FST
1 1 2 3 4 5 6 7 8
as well as two vectors, sourceName and standardName.
sourceName <- c("A", "FST")
standardName <- c("New A", "FST 2: Electric Boogaloo")
I want to dynamically swap sourceName for standardName, and I can do this with setnames like so:
df3 <- as.data.table(df2)
setnames(df3, sourceName, standardName)
> df3
New A B C D G H E FST 2: Electric Boogaloo
1: 1 2 3 4 5 6 7 8

Trying to follow your example, in your second pass I get an index value of 0,
> df2
New Rate B C D G H E SFT
1 8 2 3 4 5 6 7 8
> df1
sourceName standardName
1 New Rate2 New Rate
2 FST SFT
> index<-which(colnames(df2) %in% df1[,1])
> index
integer(0)
which would account for your expected ordering on assignment to column names.

Related

put duplicated rows in different data.frame(s)

Let
x=c(1,2,2,3,4,1)
y=c("A","B","C","D","E","F")
df=data.frame(x,y)
df
x y
1 1 A
2 2 B
3 2 C
4 3 D
5 4 E
6 1 F
How can I put duplicate rows in this data frame in different data frames
like this :
df1
x y
1 A
1 F
df2
x y
2 B
2 C
Thank you for help

You could use split
split(df, f = df$x)
f = df$x is used to specify the grouping column
check ?split for more details
to remove the non duplicated rows you could use
mylist = split(df, f = df$x)[df$x[duplicated(df$x)]]
names(mylist) = c('df1', 'df2')
list2env(mylist,envir=.GlobalEnv) # to separate the data frames

r reshape data using row with NA to identify new column

I have a dataset in R that looks like this:
DF <- data.frame(name=c("A","b","c","d","B","e","f"),
x=c(NA,1,2,3,NA,4,5))
I would like to reshape it into:
rDF <- data.frame(name=c("b","c","d","e","f"),
x=c(1,2,3,4,5),
head=c("A","A","A","B","B"))
where the first row with an NA identifies a new column, and takes that "row value" until the next row with an NA, and then changes "row value".
I have tried both spread and melt, but it does not give me what I want.
library(tidyr)
DF %>% spread(name,x)
library(reshape2)
melt(DF, id=c('name'))
Any suggestions?

Here's a possible data.table/zoo packages combination solution
library(data.table) ; library(zoo)
setDT(DF)[is.na(x), head := name]
na.omit(DF[, head := na.locf(head)], "x")
# name x head
# 1: b 1 A
# 2: c 2 A
# 3: d 3 A
# 4: e 4 B
# 5: f 5 B
Or as suggested by #Arun, just using data.table
na.omit(setDT(DF)[, head := name[is.na(x)], by=cumsum(is.na(x))])

You can try:
library(data.table)
library(magrittr)
split(DF, cumsum(is.na(DF$x))) %>%
lapply(function(u) transform(u[-1,], head=u[1,1])) %>%
rbindlist
# name x head
#1: b 1 A
#2: c 2 A
#3: d 3 A
#4: e 4 B
#5: f 5 B

Here's an approach using only base R functions:
idx <- is.na(DF$x)
x <- rle(cumsum(idx))$lengths
DF$head <- rep(DF$name[idx], x)
DF[!idx,]
# name x head
#2 b 1 A
#3 c 2 A
#4 d 3 A
#6 e 4 B
#7 f 5 B

data.frame() : make object's string value the object name to use for columns [duplicate]

Is there a way in R to have a variable evaluated as a column name when creating a data frame (or in similar situations like using cbind)?
For example
a <- "mycol";
d <- data.frame(a=1:10)
this creates a data frame with one column named a rather than mycol.
This is less important than the case that would help me remove quite a few lines from my code:
a <- "mycol";
d <- cbind(some.dataframe, a=some.sequence)
My current code has the tortured:
names(d)[dim(d)[2]] <- a;
which is aesthetically barftastic.

> d <- setNames( data.frame(a=1:10), a)
> d
mycol
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10

Is structure(data.frame(1:10),names="mycol") aesthetically pleasing to you? :-)

just use colnames after creation.
eg
a <- "mycolA"
b<- "mycolB"
d <- data.frame(a=1:10, b=rnorm(1:10))
colnames(d)<-c(a,b)
d
mycolA mycolB
1 -1.5873866
2 -0.4195322
3 -0.9511075
4 0.2259858
5 -0.6619433
6 3.4669774
7 0.4087541
8 -0.3891437
9 -1.6163175
10 0.7642909

Simple solution:
df <- data.frame(1:5, letters[1:5])
logics <- c(T,T,F,F,T)
cities <- c("Warsaw","London","Paris","NY","Tokio")
m <- as.matrix(logics)
m2 <- as.matrix(cities)
name <- "MyCities"
colnames(m) <- deparse(substitute(logics))
colnames(m2) <- eval(name)
df<-cbind(df,m)
cbind(df,m2)
X1.5 letters.1.5. logics MyCities
1 a TRUE Warsaw
2 b TRUE London
3 c FALSE Paris
4 d FALSE NY
5 e TRUE Tokio

pair-wise duplicate removal from dataframe [duplicate]

This question already has an answer here:
Select equivalent rows [A-B & B-A] [duplicate]
(1 answer)
Closed 5 years ago.
This seems like a simple problem but I can't seem to figure it out. I'd like to remove duplicates from a dataframe (df) if two columns have the same values, even if those values are in the reverse order. What I mean is, say you have the following data frame:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c('A','B','B','C','A','A','B','B')
df <-data.frame(a,b)
a b
1 A A
2 A B
3 A B
4 B C
5 B A
6 B A
7 C B
8 C B
If I now remove duplicates, I get the following data frame:
df[duplicated(df),]
a b
3 A B
6 B A
8 C B
However, I would also like to remove the row 6 in this data frame, since "A", "B" is the same as "B", "A". How can I do this automatically?
Ideally I could specify which two columns to compare since the data frames could have varying columns and can be quite large.
Thanks!

Extending Ari's answer, to specify columns to check if other columns are also there:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c('A','B','B','C','A','A','B','B')
df <-data.frame(a,b)
df$c = sample(1:10,8)
df$d = sample(LETTERS,8)
df
a b c d
1 A A 10 B
2 A B 8 S
3 A B 7 J
4 B C 3 Q
5 B A 2 I
6 B A 6 U
7 C B 4 L
8 C B 5 V
cols = c(1,2)
newdf = df[,cols]
for (i in 1:nrow(df)){
newdf[i, ] = sort(df[i,cols])
}
df[!duplicated(newdf),]
a b c d
1 A A 8 X
2 A B 7 L
4 B C 2 P

One solution is to first sort each row of df:
for (i in 1:nrow(df))
{
df[i, ] = sort(df[i, ])
}
df
a b
1 A A
2 A B
3 A B
4 B C
5 A B
6 A B
7 B C
8 B C
At that point it's just a matter of removing the duplicated elements:
df = df[!duplicated(df),]
df
a b
1 A A
2 A B
4 B C
As thelatemail mentioned in the comments, your code actualy keeps the duplicates. You need to use !duplicated to remove them.

The other answers use a for loop to assign a value for each and every row. While this is not an issue if you have 100 rows, or even a thousand, you're going to be waiting a while if you have large data of the order of 1M rows.
Stealing from the other linked answer using data.table, you could try something like:
df[!duplicated(data.frame(list(do.call(pmin,df),do.call(pmax,df)))),]
A comparison benchmark with a larger dataset (df2):
df2 <- df[sample(1:nrow(df),50000,replace=TRUE),]
system.time(
df2[!duplicated(data.frame(list(do.call(pmin,df2),do.call(pmax,df2)))),]
)
# user system elapsed
# 0.07 0.00 0.06
system.time({
for (i in 1:nrow(df2))
{
df2[i, ] = sort(df2[i, ])
}
df2[!duplicated(df2),]
}
)
# user system elapsed
# 42.07 0.02 42.09

Using apply will be a better option than loops.
newDf <- data.frame(t(apply(df,1,sort)))
All you need to do now is remove duplicates.
newDf <- newDf[!duplicated(newDf),]

Using variable value as column name in data.frame or cbind

Is there a way in R to have a variable evaluated as a column name when creating a data frame (or in similar situations like using cbind)?
For example
a <- "mycol";
d <- data.frame(a=1:10)
this creates a data frame with one column named a rather than mycol.
This is less important than the case that would help me remove quite a few lines from my code:
a <- "mycol";
d <- cbind(some.dataframe, a=some.sequence)
My current code has the tortured:
names(d)[dim(d)[2]] <- a;
which is aesthetically barftastic.

> d <- setNames( data.frame(a=1:10), a)
> d
mycol
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10

Is structure(data.frame(1:10),names="mycol") aesthetically pleasing to you? :-)

just use colnames after creation.
eg
a <- "mycolA"
b<- "mycolB"
d <- data.frame(a=1:10, b=rnorm(1:10))
colnames(d)<-c(a,b)
d
mycolA mycolB
1 -1.5873866
2 -0.4195322
3 -0.9511075
4 0.2259858
5 -0.6619433
6 3.4669774
7 0.4087541
8 -0.3891437
9 -1.6163175
10 0.7642909

Simple solution:
df <- data.frame(1:5, letters[1:5])
logics <- c(T,T,F,F,T)
cities <- c("Warsaw","London","Paris","NY","Tokio")
m <- as.matrix(logics)
m2 <- as.matrix(cities)
name <- "MyCities"
colnames(m) <- deparse(substitute(logics))
colnames(m2) <- eval(name)
df<-cbind(df,m)
cbind(df,m2)
X1.5 letters.1.5. logics MyCities
1 a TRUE Warsaw
2 b TRUE London
3 c FALSE Paris
4 d FALSE NY
5 e TRUE Tokio

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Dynamic column rename based on a separate data frame in R - r

Related

put duplicated rows in different data.frame(s)

r reshape data using row with NA to identify new column

data.frame() : make object's string value the object name to use for columns [duplicate]

pair-wise duplicate removal from dataframe [duplicate]

Using variable value as column name in data.frame or cbind

Categories

Resources