ifelse rows the same in R [duplicate] - r

This question already has answers here:
ifelse matching vectors in r
(2 answers)
Closed 9 years ago.
I have a dataframe that looks like this:
> df<-data.frame(A=c(NA,1,2,3,4),B=c(NA,5,NA,3,4),C=c(NA,NA,NA,NA,4))
> df
A B C
1 NA NA NA
2 1 5 NA
3 2 NA NA
4 3 3 NA
5 4 4 4
I am trying to create a "D" column based on the row values in df, where D gets an NA if the values in the row are different (i.e. row 2) or all NAs (i.e. row 1), and the value in the row if the values in that row are the same, excluding NAs (i.e. rows 3, 4, 5). This would produce a vector and dataframe that looks like this:
> df$D<-c(NA,NA,2,3,4)
> df
A B C D
1 NA NA NA NA
2 1 5 NA NA
3 2 NA NA 2
4 3 3 NA 3
5 4 4 4 4
Thank you in advance for your suggestions.

You can use apply() to do calculation for each row and then use unique() and !is.na(). With !is.na() you select values that are not NA. With unique() you get unique values and then with length() get number of unique values. If number is 1 then use first non NA value, if not then NA.
df$D<-apply(df,1,function(x)
ifelse(length(unique(x[!is.na(x)]))==1,x[!is.na(x)][1],NA))

Here is one possible approach:
FUN <- function(x) {
no.na <- x[!is.na(x)]
len <- length(no.na)
if (len == 0) return(NA)
if (len == 1) return(no.na)
runs <- rle(no.na)[[2]]
if(length(runs) > 1) return(NA)
runs
}
df$D <- apply(df, 1, FUN)
## > df
## A B C D
## 1 NA NA NA NA
## 2 1 5 NA NA
## 3 2 NA NA 2
## 4 3 3 NA 3
## 5 4 4 4 4

Related

adding two variables which has NA present

lets say data is 'ab':
a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
ab <-c(a,b)
I would like to have new variable which is sum of the two but keeping NA's as follows:
desired output:
ab$c <-(6,2,7,NA,5,6)
so addition of number + NA should equal number
I tried following but does not work as desired:
ab$c <- a+b
gives me : 6 NA 7 NA NA NA
Also don't know how to include "na.rm=TRUE", something I was trying.
I would also like to create third variable as categorical based on cutoff <=4 then event 1, otherwise 0:
desired output:
ab$d <-(1,1,1,NA,0,0)
I tried:
ab$d =ifelse(ab$a<=4|ab$b<=4,1,0)
print(ab$d)
gives me logical(0)
Thanks!
a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
dfd <- data.frame(a,b)
dfd$c <- rowSums(dfd, na.rm = TRUE)
dfd$c <- ifelse(is.na(dfd$a) & is.na(dfd$b), NA_integer_, dfd$c)
dfd$d <- ifelse(dfd$c >= 4, 1, 0)
dfd
a b c d
1 1 5 6 1
2 2 NA 2 0
3 3 4 7 1
4 NA NA NA NA
5 5 NA 5 1
6 NA 6 6 1

Data.table: rbind a list of data tables with unequal columns [duplicate]

This question already has answers here:
rbindlist data.tables with different number of columns
(1 answer)
Rbind with new columns and data.table
(5 answers)
Closed 4 years ago.
I have a list of data tables that are of unequal lengths. Some of the data tables have 35 columns and others have 36.
I have this line of code, but it generates an error
> lst <- unlist(full_data.lst, recursive = FALSE)
> model_dat <- do.call("rbind", lst)
Error in rbindlist(l, use.names, fill, idcol) :
Item 1362 has 35 columns, inconsistent with item 1 which has 36 columns. If instead you need to fill missing columns, use set argument 'fill' to TRUE.
Any suggestions on how I can modify that so that it works properly.
Here's a minimal example of what you are trying to do.
No need to use any other package to do this. Just set fill=TRUE in rbindlist.
You can do this:
df1 <- data.table(m1 = c(1,2,3))
df2 <- data.table(m1 = c(1,2,3), m2=c(3,4,5))
df3 <- rbindlist(list(df1, df2), fill=T)
print(df3)
m1 m2
1: 1 NA
2: 2 NA
3: 3 NA
4: 1 3
5: 2 4
6: 3 5
If I understood your question correctly, I could possibly see only two options for having your data tables appended.
Option A: Drop the extra variable from one of the datasets
table$column_Name <- NULL
Option B) Create the variable with missing values in the incomplete dataset.
full_data.lst$column_Name <- NA
And then do rbind function.
Try to use rbind.fill from package plyr:
Input data, 3 dataframes with different number of columns
df1<-data.frame(a=c(1,2,3,4,5),b=c(1,2,3,4,5))
df2<-data.frame(a=c(1,2,3,4,5,6),b=c(1,2,3,4,5,6),c=c(1,2,3,4,5,6))
df3<-data.frame(a=c(1,2,3),d=c(1,2,3))
full_data.lst<-list(df1,df2,df3)
The solution
library("plyr")
rbind.fill(full_data.lst)
a b c d
1 1 1 NA NA
2 2 2 NA NA
3 3 3 NA NA
4 4 4 NA NA
5 5 5 NA NA
6 1 1 1 NA
7 2 2 2 NA
8 3 3 3 NA
9 4 4 4 NA
10 5 5 5 NA
11 6 6 6 NA
12 1 NA NA 1
13 2 NA NA 2
14 3 NA NA 3

Replace values within a range in a data frame in R

I have ranked rows in a data frame based on values in each column.Ranking 1-10. not every column in picture
I have code that replaces values to NA or 1. But I can't figure out how to replace range of numbers, e.g. 3-6 with 1 and then replace the rest (1-2 and 7-10) with NA.
lag.rank <- as.matrix(lag.rank)
lag.rank[lag.rank > n] <- NA
lag.rank[lag.rank <= n] <- 1
At the moment it only replaces numbers above or under n. Any suggestions? I figure it should be fairly simple?
Is this what your are trying to accomplish?
> x <- sample(1:10,20, TRUE)
> x
[1] 1 2 8 2 6 4 9 1 4 8 6 1 2 5 8 6 9 4 7 6
> x <- ifelse(x %in% c(3:6), 1, NA)
> x
[1] NA NA NA NA 1 1 NA NA 1 NA 1 NA NA 1 NA 1 NA 1 NA 1
If your data aren't integers but numeric you can use between from the dplyr package:
x <- ifelse(between(x,3,6), 1, NA)

How do I lag a data.frame?

I'd like to lag whole dataframe in R.
In python, it's very easy to do this, using shift() function
(ex: df.shift(1))
However, I could not find any as an easy and simple method as in pandas shift() in R.
How can I do this?
> x = data.frame(a=c(1,2,3),b=c(4,5,6))
> x
a b
1 1 4
2 2 5
3 3 6
What I want is,
> lag(x,1)
>
a b
1 NA NA
2 1 4
3 2 5
Any good idea?
Pretty simple in base R:
rbind(NA, head(x, -1))
a b
1 NA NA
2 1 4
3 2 5
head with -1 drops the final row and rbind with NA as the first argument adds a row of NAs.
You can also use row indexing [, like this
x[c(NA, 1:(nrow(x)-1)),]
a b
NA NA NA
1 1 4
2 2 5
This leaves an NA in the row name of the first variable, to "fix" this, you can strip the data.frame class and then reassign it:
data.frame(unclass(x[c(NA, 1:(nrow(x)-1)),]))
a b
1 NA NA
2 1 4
3 2 5
Here, you can use rep to produce the desired lags
data.frame(unclass(x[c(rep(NA, 2), 1:(nrow(x)-2)),]))
a b
1 NA NA
2 NA NA
3 1 4
and even put this into a function
myLag <- function(dat, lag) data.frame(unclass(dat[c(rep(NA, lag), 1:(nrow(dat)-lag)),]))
Give it a try
myLag(x, 2)
a b
1 NA NA
2 NA NA
3 1 4
library(dplyr)
x %>% mutate_all(lag)
a b
1 NA NA
2 1 4
3 2 5
Just for completeness this would be analogous to how zoo implements it (but for a data.frame since the zoo lag(...) method doesn't work on data.frame objects):
lag.df <- function(x, lag) {
if (lag < 0)
rbind(NA, head(x, lag))
else
rbind(tail(x, -lag), NA)
}
and use like this:
x <- data.frame(dt=c(as.Date('2019-01-01'), as.Date('2019-01-02'), as.Date('2019-01-03')), a=c(1,2,3),b=c(4,5,6))
lag.df(x, -1)
lag.df(x, 1)
or you can just use zoo:
library(zoo)
x <- data.frame(dt=c(as.Date('2019-01-01'), as.Date('2019-01-02'), as.Date('2019-01-03')), a=c(1,2,3),b=c(4,5,6))
x.zoo <- read.zoo(x)
lag(x.zoo, -1)
lag(x.zoo, 1)

Conditionals calculations across rows R

First, I'm brand new to R and am making the switch from SAS. I have a dataset that is 1000 rows by 24 columns, where the columns are different treatments. I want to count the number of times an observation meets a criteria across rows of my dataset listed below.
Gene A B C D
1 AARS_3 NA NA 4.168365 NA
2 AASDHPPT_21936 NA NA NA -3.221287
3 AATF_26432 NA NA NA NA
4 ABCC2_22 4.501518 3.17992 NA NA
5 ABCC2_26620 NA NA NA NA
I was trying to create column vectors that counted
1) Number of NAs
2) Number of columns <0
3) Number of columns >0
I would then use cbind to add these to my large dataset
I solved the first one with :
NA.Count <- (apply(b01,MARGIN=1,FUN=function(x) length(x[is.na(x)])))
I tried to modify this to count evaluate the !is.na and then count the number of times the value was less than zero with this:
lt0 <- (apply(b01,MARGIN=1,FUN=function(x) ifelse(x[!is.na(x)],count(x[x<0]))))
which didn't work at all.
I tried a dozen ways to get dplyr mutate to work with this and did not succeed.
What I want are the last two columns below; and if you had a cleaner version of the NA.Count I did, that would also be greatly appreciated.
Gene A B C D NA.Count lt0 gt0
1 AARS_3 NA NA 4.168365 NA 3 0 1
2 AASDHPPT_21936 NA NA NA -3.221287 3 1 0
3 AATF_26432 NA NA NA NA 4 0 0
4 ABCC2_22 4.501518 3.17992 NA NA 2 0 2
5 ABCC2_26620 NA NA NA NA 4 0 0
Here is one way to do it taking advantage of the fact that TRUE equals 1 in R.
# test data frame
lil_df <- data.frame(Gene = c("AAR3", "ABCDE"),
A = c(NA, 3),
B = c(2, NA),
C = c(-1, -2),
D = c(NA, NA))
# is.na
NA.count <- rowSums(is.na(lil_df[,-1]))
# less than zero
lt0 <- rowSums(lil_df[,-1]<0, na.rm = TRUE)
# more that zero
mt0 <- rowSums(lil_df[,-1]>0, na.rm = TRUE)
# cbind to data frame
larger_df <- cbind(lil_df, NA.count, lt0, mt0 )
larger_df
Gene A B C D NA.count lt0 mt0
1 AAR3 NA 2 -1 NA 2 1 1
2 ABCDE 3 NA -2 NA 2 1 1

Resources