join matching columns in a data.frame or data.table - r

I have the following data.frames:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
> a
id v1 v2
1 1 a <NA>
2 2 <NA> b
3 3 <NA> c
> b
id v1 v2
1 1 <NA> A
2 2 B <NA>
3 3 C <NA>
note: There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value
I would like to merge these data frames on matching values of "id':
ab <- merge(a, b, by = "id")
but I would also like to combine the two columns v1 and v2, so that the data.frame ab will look like this:
ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c"))
> ab
id v1 v2
1 1 a A
2 2 B b
3 3 C c
instead, I get this:
> merge(a, b, by = "id")
id v1.x v2.x v1.y v2.y
1 1 a <NA> <NA> A
2 2 <NA> b B <NA>
3 3 <NA> c C <NA>
it would be helpful to have examples using both data.frame and data.table, so here are the data.table versions of above:
A <- data.table(a, key = 'id')
B <- data.table(b, key = 'id')
A[B]

The type of merge you specify probably won't be possible using merge (with data frames), although saying that usually invites being proved wrong.
You also omit some details: will there always be a single unique non-NA value in each column for each id value? If so, this will work:
ab <- rbind(a,b)
> colFun <- function(x){x[which(!is.na(x))]}
> ddply(ab,.(id),function(x){colwise(colFun)(x)})
id v1 v2
1 1 a A
2 2 B b
3 3 C c
A similar strategy should work with data.tables as well:
abDT <- data.table(ab,key = "id")
> abDT[,list(colFun(v1),colFun(v2)),by = id]
id V1 V2
[1,] 1 a A
[2,] 2 B b
[3,] 3 C c

If your data is as simple as it is above joran's answer is likely the simplest way. Here's may approach in base:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
data.frame(mapply(a, b, FUN = decider))
If your data has different id's (some overlap and some do not, then here's a different approach:
a <- data.frame(id = c(1,2,4,5), v1 = c('a', NA, "q", NA), v2 = c(NA, 'b', 'c', "e"))
b <- data.frame(id = 1:4, v1 = c(NA, "A", "C", 'B'), v2 = c("A", NA, "D", NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
DF <- data.frame(mapply(a, b, FUN = decider))
DF2 <- rbind(b[!b$id %in% DF$id , ], DF)
DF2 <- DF2[order(DF2$id), ]
rownames(DF2) <- 1:nrow(DF2)

Related

How do I remove na in R and make below value to go up

I have a data frame like below:
how do I remove na and use below value to go up?
Thanks
id name.america name.europe name.asia
1 a <NA> <NA>
2 <NA> b <NA>
3 <NA> <NA> c
4 d <NA> <NA>
Change to:
id name.america name.europe name.asia
1 a b c
2 d
We can loop through the columns and remove the NA, then make the lengths of the list elements same by appending NA at the end after getting the max length of the list element. Based on that, subset the 'id' column of the dataset and append with the output
lst <- lapply(df1[-1], na.omit)
lst1 <- lapply(lst, `length<-`, max(lengths(lst)))
out <- data.frame(lst1)
out1 <- cbind(id = df1$id[seq_len(nrow(out))], out)
out1
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d <NA> <NA>
If we need NA to be changed to blanks ("") - not recommended
out1[is.na(out1)] <- ""
data
df1 <- structure(list(id = 1:4, name.america = c("a", NA, NA, "d"),
name.europe = c(NA, "b", NA, NA), name.asia = c(NA, NA, "c",
NA)), class = "data.frame", row.names = c(NA, -4L))
tidyverse-based solution
require(tidyverse)
df1 %>%
gather(key = "name", value = "val", -id) %>%
na.omit() %>%
select(-id) %>%
group_by(name) %>%
mutate(id = 1:n()) %>%
spread(key = name, value = val)
Results
# A tibble: 2 x 4
id name.america name.asia name.europe
<int> <chr> <chr> <chr>
1 1 a c b
2 2 d NA NA
Notes
If desired you can re-order columns with select or that variable prior to transformation.
NAs are left as such. If desired, you can use tidyr::replace_na to insert some string or space. I would discourage you from doing that.
Data
Taken from #akrun's answer above.
df1 <- structure(
list(
id = 1:4,
name.america = c("a", NA, NA, "d"),
name.europe = c(NA, "b", NA, NA),
name.asia = c(NA, NA, "c",
NA)
),
class = "data.frame",
row.names = c(NA, -4L)
)
df1[, -1] <- lapply(df1[,-1], function(x) c(na.omit(x), rep("",length(x)-length(na.omit(x)))))
df1[1:max(colSums(!(df1[,-1]==""))),]
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d

Copy selective row values from 1 dataframe to another in R

I have a dataframe:df <- data.frame(id = c('1','2','3'), b = c('b1', 'NA', 'b3'), c = c('c1', 'c2', 'NA'), d = c('d1', 'NA', 'NA'))
id b c d
1 b1 c1 d1
2 NA c2 NA
3 b3 NA NA
I have extracted values with id = 1 from df to another dataframe say df2 so df2 has 1 row
id b c d
1 b1 c1 d1
I need to copy all values from df2 to df1 wherever there is not an NA in df1
Result Table:
id b c d
1 b1 c1 d1
2 b1 c2 d1
3 b3 c1 d1
Thank you in advance. I asked similar question before but deleting it.
Based on your last comment that df2[3,3] should be c2 and not c1, a straightforward answer is to use zoo::na.locf.
library(zoo)
df2 <- na.locf(df)
# id b c d
# 1 1 b1 c1 d1
# 2 2 b1 c2 d1
# 3 3 b3 c2 d1
Data
df <- structure(list(id = c(1, 2, 3), b = c("b1", NA, "b3"), c = c("c1",
"c2", NA), d = c("d1", NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
Assuming that there is a mistake in your question -> df2 will be equal to b1-c1-d1 not b1-c2-d1, here is the solution :
Initialize dataframe
df <- data.frame(id = c('1','2','3'), b = c('b1', 'NA', 'b3'), c = c('c1', 'c2', 'NA'), d = c('d1', 'NA', 'NA'))
Converting string NAs to actual detectable NAs
df <- data.frame(lapply(df, function(x) { gsub("NA", NA, x) }))
Obtaining default value row
df2<-df[df$id==1,]
For all rows, check if the column cell is na, then fill it with the df2 cell of the same column
for (r in 1:nrow(df)) for( c in colnames(df)) df[r,c]<-ifelse(is.na(df[r,c]),as.character(df2[1,c]),as.character(df[r,c]))

Column names into data frame

How can I create a data frame which contains the column names of all Environment objects (df)
Ex. Having this 3 df as all the objects in the global environment.
chocolate <- data.frame(a = 1, b = 2, c = 3)
banana <- data.frame(a = 2, d = 4, c = 3)
pear <- data.frame(d = 1, e = 4)
Desired output
output <- data.frame(id = c("chocolate","banana", "pear"),
v2 = c("a", "a", NA),
v3 = c("b", NA, NA),
v4 = c("c", "c", NA),
v5 = c(NA, "d", "d"),
v6 = c(NA, NA, "e"))
output
We can try
library(data.table)
lst <- mget(paste0("df", 1:3))
setnames(rbindlist(lapply(setNames(lst, seq_along(lst)), function(x) {
x[] <- names(x)
x}), fill = TRUE, idcol = 'id'), 2:6, paste0("V", 1:5))[]
# id V1 V2 V3 V4 V5
#1: 1 a b c NA NA
#2: 2 a NA c d NA
#3: 3 NA NA NA d e

Get elements by position from one data frame to another

Let's say we have two data frames:
df1 <- data.frame(A = letters[1:3], B = letters[4:6], C = letters[7:9], stringsAsFactors = FALSE)
A B C
1 a d g
2 b e h
3 c f i
df2 <- data.frame(V1 = 1:3, V2 = 4:6, V3 = 7:9)
V1 V2 V3
1 1 4 7
2 2 5 8
3 3 6 9
I need to build a function that takes as input a single value or a vector containing elements from one of the data frames and returns the elements from the other data frame according to their positional indexes.
The function should work like this:
> matchdf(values = c("a", "e", "i"), dfin = df1, dfout = df2)
[1] 1 5 9
> matchdf(values = c(1, 5, 9), dfin = df2, dfout = df1)
[1] "a" "e" "i"
> matchdf(values = c(1, 1, 1), dfin = df2, dfout = df1)
[1] "a" "a" "a"
This is what I have tried so far:
requiere(dplyr)
toVec <- function(df) df %>% as.matrix %>% as.vector
matchdf <- function(values, dfin, dfout) toVec(dfout)[toVec(dfin) %in% values]
# But sometimes the output values aren't in correct order:
> matchdf(c("c", "i", "h"), dt1, dt2)
[1] 3 8 9
# should output 3 9 8
> matchdf(values = c("a", "a", "a"), dfin = dt1, dfout = dt2)
[1] 1
# Should output 1 1 1
Feel free to use data.table or/and dplyr if it eases the task. I would prefer a solution without for loops.
Assumptions:
elements from df1 are different from df2
dim(df1) = dim(df2)
matchdf <- function(values, dfin, dfout){
unlist(sapply(values,
function(val) dfout[dfin == val],
USE.NAMES = F)
)
}
matchdf(c("c", "i", "h"), df1, df2)
#should output 3 9 8
[1] 3 9 8
matchdf(values = c("a", "a", "a"), dfin = df1, dfout = df2)
#should output 1 1 1
[1] 1 1 1
matchdf(values = c("X", "Y", "a"), dfin = df1, dfout = df2)
#should output vector, not list
[1] 1

R matching two columns in the same data.frame

How can I combind two columns in the same data frame into one column, a simple example would be:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
a
id v1 v2
1 a <NA>
2 <NA> b
3 <NA> c
And the output I need would be look like this:
a
id v1 v2 v3
1 a <NA> a
2 <NA> b b
3 <NA> c c
I found a similar post join matching columns in a data.frame or data.table, but I can not figure it out with my own case, please help, thanks
It's not clear exactly what you want. What happens if v1 and v2 have different values?
This method will prefer the value of v1
a <- data.frame(id = 1:4, v1 = c('a', NA, NA,'d'), v2 = c(NA, 'b', 'c','e'))
a <- as.data.table(a)
a[,v3 := v1]
a[is.na(v1), v3 := v2]
Using traditional data.frame methods:
a$v3 <- as.character(a$v1)
a[is.na(a$v1),"v3"] <- as.character(a[is.na(a$v1),"v2"])
Hmm, ifelse() maybe?
> a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'),
stringsAsFactors=FALSE)
> a$v3 <- ifelse(is.na(a$v1), a$v2, a$v1)
> a
id v1 v2 v3
1 1 a <NA> a
2 2 <NA> b b
3 3 <NA> c c

Resources