I have two columns with factors, I wanted to merge. As I have a lot of observations I wonder if there's a quick option with dplyr or tidyr.
Col1 Col2
A NA
B NA
NA C
A A
NA B
A NA
B B
I know that this shouldn't be difficult but I'm clearly missing something here. I've tried several options but as I want to keep the factors, all the ones I know didn't work.
Note that when both columns have a result, they will always be the same. But this is part of the data characteristics I have.
I expect to have something such as:
Col1 Col2 Col3
A NA A
B NA B
NA C C
A A A
NA B B
A NA A
B B B
I think this should do it using dplyr:
library('dplyr')
dat %>%
mutate(Col3 = if_else(is.na(Col1),Col2, Col1))
Related
My data have different length factors like this.
variable <- c("A,B,C","A,B","A,C","B,C")
I had used strsplit and other similar function, but I can't solve my problem
I need to get a data.frame like this
A B C
1 A B C
2 A B NA
3 A NA C
4 NA B C
Thanks
We could split the data on comma, create dataframe and assign names based on variable name. We can then bind rows by column names using bind_rows from dplyr.
dplyr::bind_rows(sapply(strsplit(variable, ","), function(x)
setNames(as.data.frame(t(x)), x)))
# A B C
#1 A B C
#2 A B <NA>
#3 A <NA> C
#4 <NA> B C
We can use rbindlist
library(data.table)
rbindlist(lapply(strsplit(variable, ","),
function(x) setNames(as.list(x), x)), fill = TRUE)
I have a dataframe with three variables. The levels of each never repeat themselves, in fact they are levels of ONE variable. I need to recreate this variable. How do I do it in R?
The dataframe look likes this:
Var1: NA NA NA A NA
Var2: NA NA B NA NA
Var3: C C NA NA C
I want to have a single variable out of these three so it would look like this:
Final Var: C C B A C
Let your data frame be called dat, and has only the variables listed (no other variables that you don't want included, all variables have the same type, viz., character). Then you can coerce the data frame to a matrix and to a vector, and drop the NAs:
> dat
V1 V2 V3
1 <NA> <NA> C
2 <NA> <NA> C
3 <NA> B <NA>
4 A <NA> <NA>
> final <- as.vector(as.matrix(dat))
> final <- final[!is.na(final)]
> final
[1] "A" "B" "C" "C"
You would first need to replace NA with empty cells like so
df[is.na(df)] <- " "
And then use the paste function to concatenate the variables.
Cheers
I have a df with 15,105 rows and 127 columns. I'd like to exclude some specific colunms' rows that have NA. I´m using the following command:
wave1b <- na.omit(wave1, cols=c("Bx", "Deq", "Gef", "Has", "Pla", "Ty"))
However, when I run it it returns with 19 rows only, when it was expected to return with 14,561 rows (if it should have excluded only the NA in those specific colunms requested). I'm afirming this, cause I did a subset on the df in order to test the accuracy of the missing deletion.
Does anyone could help me solving this issue? Thank you!
I think this code is not efficient but it could work:
df <- data.frame(A = rep(NA,3), B = c(NA,2,3),C=c(1,NA,2))
df
A B C
1 NA NA 1
2 NA 2 NA
3 NA 3 2
It removes only the rows which have missing values for the columns B and C:
df[-which(is.na(df$B)|is.na(df$C)),]
A B C
3 NA 3 2
You can use complete.cases
> df[complete.cases(df[, -1]), ]
A B C
3 NA 3 2
If I have a dataframe like so
a <- c(NA,1,2,NA,4)
b <- c(6,7,8,9,10)
c <- c(NA,12,13,14,15)
d <- c(16,NA,18,NA,20)
df <- data.frame(a,b,c,d)
How can I delete columns "a" and "c" by asking R to delete those columns that contain an NA in the first row?
My actual dataset is much bigger, and this is only by way of a reproducible example.
Please note that this isn't the same as asking to delete columns with any NAs in it. My columns may have other NA values in it. I'm looking to delete just the ones with an NA in the first row.
You can use a vector of booleans indicating wether the first row is missing in this case.
res <- df[,!is.na(df[1,])]
> res
b d
1 6 16
2 7 NA
3 8 18
4 9 NA
5 10 20
I have many data.frames, for example:
df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5))
df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6))
df3 = data.frame(names=c('c','e'),data3=c(1,2))
and I need to merge these data.frames, without delete the name duplicates
> result
names data1 data2 data3
1 'a' 1 1 NA
2 'b' 2 NA NA
3 'c' 3 4 1
4 'c' 4 5 NA
5 'd' 5 6 NA
6 'e' NA 2 2
7 'e' NA 3 NA
I cant find function like merge with option to handle with name duplicates. Thank you for your help.
To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.
First define a function, run.seq, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq column to each component. Finally use Reduce to merge them all.
run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))
out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]
The last line gives:
> out
names data1 data2 data3
1 a 1 1 NA
2 b 2 NA NA
3 c 3 4 1
4 c 4 5 NA
5 d 5 6 NA
6 e NA 2 2
7 e NA 3 NA
EDIT: Revised run.seq so that input need not be sorted.
See other questions:
How to join data frames in R (inner, outer, left, right)
recombining-a-list-of-data-frames-into-a-single-data-frame
...
Examples:
library(reshape)
out <- merge_recurse(L)
or
library(plyr)
out<-join(df1, df2, type="full")
out<-join(out, df3, type="full")
*can be looped
or
library(plyr)
out<-ldply(L)
I think there is just not enough information in your example data frames to do this. Which 'c' in dataframe 1 should be paired with which 'c' in data frame 2? We cannot tell, so R can't either. I suspect you will have to add another variable to each of your dataframes that uniquely identifies these duplicate cases.