How can I combind two columns in the same data frame into one column, a simple example would be:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
a
id v1 v2
1 a <NA>
2 <NA> b
3 <NA> c
And the output I need would be look like this:
a
id v1 v2 v3
1 a <NA> a
2 <NA> b b
3 <NA> c c
I found a similar post join matching columns in a data.frame or data.table, but I can not figure it out with my own case, please help, thanks
It's not clear exactly what you want. What happens if v1 and v2 have different values?
This method will prefer the value of v1
a <- data.frame(id = 1:4, v1 = c('a', NA, NA,'d'), v2 = c(NA, 'b', 'c','e'))
a <- as.data.table(a)
a[,v3 := v1]
a[is.na(v1), v3 := v2]
Using traditional data.frame methods:
a$v3 <- as.character(a$v1)
a[is.na(a$v1),"v3"] <- as.character(a[is.na(a$v1),"v2"])
Hmm, ifelse() maybe?
> a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'),
stringsAsFactors=FALSE)
> a$v3 <- ifelse(is.na(a$v1), a$v2, a$v1)
> a
id v1 v2 v3
1 1 a <NA> a
2 2 <NA> b b
3 3 <NA> c c
Related
I have a data frame like below:
how do I remove na and use below value to go up?
Thanks
id name.america name.europe name.asia
1 a <NA> <NA>
2 <NA> b <NA>
3 <NA> <NA> c
4 d <NA> <NA>
Change to:
id name.america name.europe name.asia
1 a b c
2 d
We can loop through the columns and remove the NA, then make the lengths of the list elements same by appending NA at the end after getting the max length of the list element. Based on that, subset the 'id' column of the dataset and append with the output
lst <- lapply(df1[-1], na.omit)
lst1 <- lapply(lst, `length<-`, max(lengths(lst)))
out <- data.frame(lst1)
out1 <- cbind(id = df1$id[seq_len(nrow(out))], out)
out1
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d <NA> <NA>
If we need NA to be changed to blanks ("") - not recommended
out1[is.na(out1)] <- ""
data
df1 <- structure(list(id = 1:4, name.america = c("a", NA, NA, "d"),
name.europe = c(NA, "b", NA, NA), name.asia = c(NA, NA, "c",
NA)), class = "data.frame", row.names = c(NA, -4L))
tidyverse-based solution
require(tidyverse)
df1 %>%
gather(key = "name", value = "val", -id) %>%
na.omit() %>%
select(-id) %>%
group_by(name) %>%
mutate(id = 1:n()) %>%
spread(key = name, value = val)
Results
# A tibble: 2 x 4
id name.america name.asia name.europe
<int> <chr> <chr> <chr>
1 1 a c b
2 2 d NA NA
Notes
If desired you can re-order columns with select or that variable prior to transformation.
NAs are left as such. If desired, you can use tidyr::replace_na to insert some string or space. I would discourage you from doing that.
Data
Taken from #akrun's answer above.
df1 <- structure(
list(
id = 1:4,
name.america = c("a", NA, NA, "d"),
name.europe = c(NA, "b", NA, NA),
name.asia = c(NA, NA, "c",
NA)
),
class = "data.frame",
row.names = c(NA, -4L)
)
df1[, -1] <- lapply(df1[,-1], function(x) c(na.omit(x), rep("",length(x)-length(na.omit(x)))))
df1[1:max(colSums(!(df1[,-1]==""))),]
# id name.america name.europe name.asia
#1 1 a b c
#2 2 d
I have a dataframe:df <- data.frame(id = c('1','2','3'), b = c('b1', 'NA', 'b3'), c = c('c1', 'c2', 'NA'), d = c('d1', 'NA', 'NA'))
id b c d
1 b1 c1 d1
2 NA c2 NA
3 b3 NA NA
I have extracted values with id = 1 from df to another dataframe say df2 so df2 has 1 row
id b c d
1 b1 c1 d1
I need to copy all values from df2 to df1 wherever there is not an NA in df1
Result Table:
id b c d
1 b1 c1 d1
2 b1 c2 d1
3 b3 c1 d1
Thank you in advance. I asked similar question before but deleting it.
Based on your last comment that df2[3,3] should be c2 and not c1, a straightforward answer is to use zoo::na.locf.
library(zoo)
df2 <- na.locf(df)
# id b c d
# 1 1 b1 c1 d1
# 2 2 b1 c2 d1
# 3 3 b3 c2 d1
Data
df <- structure(list(id = c(1, 2, 3), b = c("b1", NA, "b3"), c = c("c1",
"c2", NA), d = c("d1", NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
Assuming that there is a mistake in your question -> df2 will be equal to b1-c1-d1 not b1-c2-d1, here is the solution :
Initialize dataframe
df <- data.frame(id = c('1','2','3'), b = c('b1', 'NA', 'b3'), c = c('c1', 'c2', 'NA'), d = c('d1', 'NA', 'NA'))
Converting string NAs to actual detectable NAs
df <- data.frame(lapply(df, function(x) { gsub("NA", NA, x) }))
Obtaining default value row
df2<-df[df$id==1,]
For all rows, check if the column cell is na, then fill it with the df2 cell of the same column
for (r in 1:nrow(df)) for( c in colnames(df)) df[r,c]<-ifelse(is.na(df[r,c]),as.character(df2[1,c]),as.character(df[r,c]))
How can I create a data frame which contains the column names of all Environment objects (df)
Ex. Having this 3 df as all the objects in the global environment.
chocolate <- data.frame(a = 1, b = 2, c = 3)
banana <- data.frame(a = 2, d = 4, c = 3)
pear <- data.frame(d = 1, e = 4)
Desired output
output <- data.frame(id = c("chocolate","banana", "pear"),
v2 = c("a", "a", NA),
v3 = c("b", NA, NA),
v4 = c("c", "c", NA),
v5 = c(NA, "d", "d"),
v6 = c(NA, NA, "e"))
output
We can try
library(data.table)
lst <- mget(paste0("df", 1:3))
setnames(rbindlist(lapply(setNames(lst, seq_along(lst)), function(x) {
x[] <- names(x)
x}), fill = TRUE, idcol = 'id'), 2:6, paste0("V", 1:5))[]
# id V1 V2 V3 V4 V5
#1: 1 a b c NA NA
#2: 2 a NA c d NA
#3: 3 NA NA NA d e
I have a data-set having millions of rows and i need to apply the 'group by' operation in it using R.
The data is of the form
V1 V2 V3
a u 1
a v 2
b w 3
b x 4
c y 5
c z 6
performing 'group by' using R, I want to add up the values in column 3 and concatenate the values in column 2 like
V1 V2 V3
a uv 3
b wx 7
c yz 11
I have tried doing the opertaion in excel but due to a lot of tuples i can't use excel. I am new to R so any help would be appreciated.
Many possible ways to solve, here are two
library(data.table)
setDT(df)[, .(V2 = paste(V2, collapse = ""), V3 = sum(V3)), by = V1]
# V1 V2 V3
# 1: a uv 3
# 2: b wx 7
# 3: c yz 11
Or
library(dplyr)
df %>%
group_by(V1) %>%
summarise(V2 = paste(V2, collapse = ""), V3 = sum(V3))
# Source: local data table [3 x 3]
#
# V1 V2 V3
# 1 a uv 3
# 2 b wx 7
# 3 c yz 11
Data
df <- structure(list(V1 = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), V2 = structure(1:6, .Label = c("u",
"v", "w", "x", "y", "z"), class = "factor"), V3 = 1:6), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
Another option, using aggregate
# Group column 2
ag.2 <- aggregate(df$V2, by=list(df$V1), FUN = paste0, collapse = "")
# Group column 3
ag.3 <- aggregate(df$V3, by=list(df$V1), FUN = sum)
# Merge the two
res <- cbind(ag.2, ag.3[,-1])
Another option with sqldf
library(sqldf)
sqldf('select V1,
group_concat(V2,"") as V2,
sum(V3) as V3
from df
group by V1')
# V1 V2 V3
#1 a uv 3
#2 b wx 7
#3 c yz 11
Or using base R
do.call(rbind,lapply(split(df, df$V1), function(x)
with(x, data.frame(V1=V1[1L], V2= paste(V2, collapse=''), V3= sum(V3)))))
using ddply
library(plyr)
ddply(df, .(V1), summarize, V2 = paste(V2, collapse=''), V3 = sum(V3))
# V1 V2 V3
#1 a uv 3
#2 b wx 7
#3 c yz 11
You could also just use the groupBy function in the 'caroline' package:
x <-cbind.data.frame(V1=rep(letters[1:3],each=2), V2=letters[21:26], V3=1:6, stringsAsFactors=F)
groupBy(df=x, clmns=c('V2','V3'),by='V1',aggregation=c('paste','sum'),collapse='')
I have the following data.frames:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
> a
id v1 v2
1 1 a <NA>
2 2 <NA> b
3 3 <NA> c
> b
id v1 v2
1 1 <NA> A
2 2 B <NA>
3 3 C <NA>
note: There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value
I would like to merge these data frames on matching values of "id':
ab <- merge(a, b, by = "id")
but I would also like to combine the two columns v1 and v2, so that the data.frame ab will look like this:
ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c"))
> ab
id v1 v2
1 1 a A
2 2 B b
3 3 C c
instead, I get this:
> merge(a, b, by = "id")
id v1.x v2.x v1.y v2.y
1 1 a <NA> <NA> A
2 2 <NA> b B <NA>
3 3 <NA> c C <NA>
it would be helpful to have examples using both data.frame and data.table, so here are the data.table versions of above:
A <- data.table(a, key = 'id')
B <- data.table(b, key = 'id')
A[B]
The type of merge you specify probably won't be possible using merge (with data frames), although saying that usually invites being proved wrong.
You also omit some details: will there always be a single unique non-NA value in each column for each id value? If so, this will work:
ab <- rbind(a,b)
> colFun <- function(x){x[which(!is.na(x))]}
> ddply(ab,.(id),function(x){colwise(colFun)(x)})
id v1 v2
1 1 a A
2 2 B b
3 3 C c
A similar strategy should work with data.tables as well:
abDT <- data.table(ab,key = "id")
> abDT[,list(colFun(v1),colFun(v2)),by = id]
id V1 V2
[1,] 1 a A
[2,] 2 B b
[3,] 3 C c
If your data is as simple as it is above joran's answer is likely the simplest way. Here's may approach in base:
a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
data.frame(mapply(a, b, FUN = decider))
If your data has different id's (some overlap and some do not, then here's a different approach:
a <- data.frame(id = c(1,2,4,5), v1 = c('a', NA, "q", NA), v2 = c(NA, 'b', 'c', "e"))
b <- data.frame(id = 1:4, v1 = c(NA, "A", "C", 'B'), v2 = c("A", NA, "D", NA))
decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
DF <- data.frame(mapply(a, b, FUN = decider))
DF2 <- rbind(b[!b$id %in% DF$id , ], DF)
DF2 <- DF2[order(DF2$id), ]
rownames(DF2) <- 1:nrow(DF2)