Match row names and column names to values in another data frame - r

I have two data frames as follows :
df1 <- t(data.frame(seq(1,6,by=1),seq(6,1,by=-1)))
colnames(df1) <- c("A","B","C","D","E","F)
rownames(df1) <- c("a","b")
df2 <- data.frame(rep(colnames(df1),2),rep(rownames(df1),6))
colnames(df2) <- c("Vector1","Vector2")
Such that
df1
A B C D E F
a 1 2 3 4 5 6
b 6 5 4 3 2 1
df2
Vector1 Vector2
A a
B b
C a
D b
E a
F b
A a
B b
C a
D b
E a
F b
I want to match the column values of df2 to column names and row names of df1, and fill the corresponding value to a new column in df2 as follows:
Vector1 Vector2 Newcol
A a 1
B b 5
C a 3
D b 3
E a 5
F b 1
A a 1
B b 5
C a 3
D b 3
E a 5
F b 1
Any suggestions would be much appreciated. Thanks.

We can use merge with melt. The melt returns a three column data.frame, merge it with the second dataset to create the new column
library(reshape2)
merge(df2, melt(df1), by.x = c("Vector1", "Vector2"), by.y = c("Var2", "Var1"))
Or a base R option would be to get the numeric index with match after pasteing the 'df2' rowwise (do.call(paste) and get the pasted column names and row names of 'df1' using outer. Using the numeric index, we get the values in 'df1' to create the 'Newcol'
df2$Newcol <- df1[match(do.call(paste, df2),
t(outer(colnames(df1), rownames(df1), FUN = paste)))]
df2$Newcol
#[1] 1 5 3 3 5 1 1 5 3 3 5 1

Related

How to fill in the columns of R dataframe with the corresponding column names as values

I have the following dataframe in R
DF_1<-data.frame("SL:NO"= c(1:3))
DF_1$A<-NA
DF_1$B<-NA
SL.NO A B
1 NA NA
2 NA NA
3 NA NA
How do i fill the columns that are empty so that columns A, and B are filled with A, B . the result should be
Sl.NO A B
1 A B
2 A B
3 A B
I have used a nested for loop as follows.
for( i in namelist){
for(j in 1:nrow(DF_1)){
DF_1[j,i]=i }}
Is there a simpler more elegant way to do the same
We can use Map to replace NA values in each column
DF_1[] <- Map(function(x, y) replace(x, is.na(x), y), DF_1, names(DF_1))
DF_1
# SL.NO A B
#1 1 A B
#2 2 A B
#3 3 A B

Dynamic column rename based on a separate data frame in R

Generate df1 and df2 like this
pro <- c("Hide-Away", "Hide-Away")
sourceName <- c("New Rate2", "FST")
standardName <- c("New Rate", "SFT")
df1 <- data.frame(pro, sourceName, standardName, stringsAsFactors = F)
A <- 1; B <- 2; C <-3; D <- 4; G <- 5; H <- 6; E <-7; FST <-8; Z <-8
df2<- data.frame(A,B,C,D,G,H,E,FST)
colnames(df2)[1]<- "New Rate2"
Then run this code.
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of DF2 will be like
New Rate2 B C D G H E FST
1 2 3 4 5 6 7 8
The output of DF2 will be like
New Rate B C D G H E SFT
1 2 3 4 5 6 7 8
So clearly the code worked and swapped the names correctly. But now create df2 with the below code instead. And make sure to regenrate df1 to what it was before.
df2<- data.frame(FST,B,C,D,G,H,E,Z)
colnames(df2)[8]<- "New Rate2"
and then run
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of df2 will be
FST B C D G H E New Rate2
8 2 3 4 5 6 7 8
The output of df2 will be
New Rate B C D G H E SFT
8 2 3 4 5 6 7 8
So the order of the columns has not been preserved. I know this is because of the %in code but I am not sure of an easy fix to make the column swapping more dynamic.
I am not totally sure about the question, as it seems a little vague. I'll try my best though--the best way I know to dynamically set column names is setnames from the data.table package. So let's say that I have a set of source names and a set of standard names, and I want to swap the source for the standard (which I take to be the question).
Given the data above, I have a data.frame structured like so:
> df2
A B C D G H E FST
1 1 2 3 4 5 6 7 8
as well as two vectors, sourceName and standardName.
sourceName <- c("A", "FST")
standardName <- c("New A", "FST 2: Electric Boogaloo")
I want to dynamically swap sourceName for standardName, and I can do this with setnames like so:
df3 <- as.data.table(df2)
setnames(df3, sourceName, standardName)
> df3
New A B C D G H E FST 2: Electric Boogaloo
1: 1 2 3 4 5 6 7 8
Trying to follow your example, in your second pass I get an index value of 0,
> df2
New Rate B C D G H E SFT
1 8 2 3 4 5 6 7 8
> df1
sourceName standardName
1 New Rate2 New Rate
2 FST SFT
> index<-which(colnames(df2) %in% df1[,1])
> index
integer(0)
which would account for your expected ordering on assignment to column names.

put duplicated rows in different data.frame(s)

Let
x=c(1,2,2,3,4,1)
y=c("A","B","C","D","E","F")
df=data.frame(x,y)
df
x y
1 1 A
2 2 B
3 2 C
4 3 D
5 4 E
6 1 F
How can I put duplicate rows in this data frame in different data frames
like this :
df1
x y
1 A
1 F
df2
x y
2 B
2 C
Thank you for help
You could use split
split(df, f = df$x)
f = df$x is used to specify the grouping column
check ?split for more details
to remove the non duplicated rows you could use
mylist = split(df, f = df$x)[df$x[duplicated(df$x)]]
names(mylist) = c('df1', 'df2')
list2env(mylist,envir=.GlobalEnv) # to separate the data frames

Filtering a R DataFrame with repeated values in columns

I have a R DataFrame and I want to make another DF from this one, but only with the values which appears more than X times in a determinate column.
>DataFrame
Value Column
1 a
4 a
2 b
6 c
3 c
4 c
9 a
1 d
For example a want a new DataFrame only with the values in Column which appears more than 2 times, to get something like this:
>NewDataFrame
Value Column
1 a
4 a
6 c
3 c
4 c
9 a
Thank you very much for your time.
We can use table to get the count of values in 'Column' and subset the dataset ('df1') based on the names in 'tbl' that have a count greater than 'n'
n <- 2
tbl <- table(DataFrame$Column) > n
NewDataFrame <- subset(DataFrame, Column %in% names(tbl)[tbl])
# Value Column
#1 1 a
#2 4 a
#4 6 c
#5 3 c
#6 4 c
#7 9 a
Or using ave from base R
NewDataFrame <- DataFrame[with(DataFrame, ave(Column, Column, FUN=length)>n),]
Or using data.table
library(data.table)
NewDataFrame <- setDT(DataFrame)[, .SD[.N>n] , by = Column]
Or
NewDataFrame <- setDT(DataFrame)[, if(.N > n) .SD, by = Column]
Or dplyr
NewDataFrame <- DataFrame %>%
group_by(Column) %>%
filter(n()>2)

Match one column of a data.frame with all the columns in another data.frame

I have two data.frames:
DF1
Col1 Col2 ...... ...... Col2000
A H
c d
d e
n b
e A
b n
H c
DF2
A
b
c
d
e
n
H
I need simply to match the only one column in DF2 with each column in DF1. I need to match them because I need to know exactly the ranking of the match. Anyway I tried to write a function but since I'm not an R expert something goes wrong in my code:
lapply(DF1, function(x) match(DF1[,i], DF2[,1]))
To get a correct result, you need a correct command :
lapply(DF1, function(x) match(x, DF2[,1]))
is doing what you're trying to do. Take :
DF1 <- data.frame(
Col1 = c('A','c','d','n','e','b','H'),
Col2 = c('H','d','e','b','A','n','c')
)
DF2 <- data.frame(c('A','b','c','d','e','n','H'))
Then:
> lapply(DF1, function(x) match(x, DF2[,1]))
$Col1
[1] 1 3 4 6 5 2 7
$Col2
[1] 7 4 5 2 1 6 3

Resources