How to apply coalesce to a matrix - r

I know I can coalesce three vectors in this way:
y <- c(1, NA, NA, NA, 5)
z <- c(NA, NA, 3, 4, 6)
k <- c(NA, 1, NA, NA, 8)
coalesce(y,z,k)
However, in real data, I only get the whole dataset like:
d <- rbind(y,z,k)
And in real life every time d has different row length (sometimes 3 rows, sometimes 4 rows) which means we cannot just write:
coalesce(d[1,],d[2,],d[3,])
I have thought about split function:
split(t(d),rep(1:nrow(d),each = ncol(d)))
And got:
$`1`
[1] 1 NA NA NA 5
$`2`
[1] NA NA 3 4 6
$`3`
[1] NA 1 NA NA 8
But the results by split function cannot be applied to coalesce function.
Is there a way to realize my thoughts?

You can use it within do.call(), i.e.
do.call(coalesce, as.data.frame(t(d)))
#[1] 1 1 3 4 5

Related

Getting the class of all columns of data.frames in a list [duplicate]

This question already has answers here:
Determine the data types of a data frame's columns
(11 answers)
Closed 2 years ago.
Example Data:
df1 <- as.data.frame(rbind(c(1,2,3), c(1, NA, 4), c(NA, NA, NA), c(4,6,7), c(4, 8, NA)))
df2 <- as.data.frame(rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)))
dfList <- list(df1,df2)
colnames <- c("A","B","C")
dfList[[1]]
V1 V2 V3
1 1 2 3
2 1 NA 4
3 NA NA NA
4 4 6 7
5 4 8 NA
dfList[[2]]
V1 V2 V3
1 1 2 3
2 1 NA 4
3 4 6 7
4 NA NA NA
5 4 8 NA
I would like to do some checks on my list of data.frames. For example, see if all the columns in the first data.frame have the same classes as the columns in the other data.frames. I thought I'd try:
dfList <- lapply(dfList , function(x) lapply(class(x)))
But that does not work. What would be the correct syntax for this?
Credit to #det (see comments).
dfList <- lapply(dfList , function(x) lapply(x, class))

R fuction composition for the substitution of values in dataframe

given the following reproducible example
my objective is to row-wise substitute the original values with NA in adjacent columns of a data frame; I know it's a problem (with so many variants) already posted but I've not yet found the solution with the approach I'm trying to accomplish: i.e. by applying a function composition
in the reproducible example the column driving the substitution with NA of the original values is column a
this is what I've done so far
the very last code snippet is a failing attempt of what I'm actually searching for...
#-----------------------------------------------------------
# ifelse approach, it works but...
# it's error prone: i.e. copy and paste for all columns can introduce a lot of troubles
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
df$b<-ifelse(is.na(df$a), NA, df$b)
df$c<-ifelse(is.na(df$a), NA, df$c)
df
#--------------------------------------------------------
# extraction and subsitution approach
# same as above
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
df$b[is.na(df$a)]<-NA
df$c[is.na(df$a)]<-NA
df
#----------------------------------------------------------
# definition of a function
# it's a bit better, but still error prone because of the copy and paste
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
fix<-function(x,y){
ifelse(is.na(x), NA, y)
}
df$b<-fix(df$a, df$b)
df$c<-fix(df$a, df$c)
df
#------------------------------------------------------------
# this approach is not working as expected!
# the idea behind is of function composition;
# lapply does the fix to some columns of data frame
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
fix2<-function(x){
x[is.na(x[1])]<-NA
x
}
df[]<-lapply(df, fix2)
df
any help for this particular approach?
I'm stuck on how to properly conceive the substitute function passed to lapply
thanx
Using lexical closure
If you use lexical closureing - you define a function which generates first the function you need.
And then you can use this function as you wish.
# given a column all other columns' values at that row should become NA
# if the driver column's value at that row is NA
# using lexical scoping of R function definitions, one can reach that.
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
# whatever vector given, this vector's value should be changed
# according to first column's value
na_accustomizer <- function(df, driver_col) {
## Returns a function which will accustomize any vector/column
## to driver column's NAs
function(vec) {
vec[is.na(df[, driver_col])] <- NA
vec
}
}
df[] <- lapply(df, na_accustomizer(df, "a"))
df
## a b c
## 1 1 3 NA
## 2 2 NA 5
## 3 NA NA NA
#
# na_accustomizer(df, "a") returns
#
# function(vec) {
# vec[is.na(df[, "a"])] <- NA
# vec
# }
#
# which then can be used like you want:
# df[] <- lapply(df, na_accustomize(df, "a"))
Using normal functions
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
df
# define it for one column
overtake_NA <- function(df, driver_col, target_col) {
df[, target_col] <- ifelse(is.na(df[, driver_col]), NA, df[, target_col])
df
}
# define it for all columns of df
overtake_driver_col_NAs <- function(df, driver_col) {
for (i in 1:ncol(df)) {
df <- overtake_NA(df, driver_col, i)
}
df
}
overtake_driver_col_NAs(df, "a")
# a b c
# 1 1 3 NA
# 2 2 NA 5
# 3 NA NA NA
Generalize for any predicate function
driver_col_to_other_cols <- function(df, driver_col, pred) {
## overtake any value of the driver column to the other columns of df,
## whenever predicate function (pred) is fulfilled.
# define it for one column
overtake_ <- function(df, driver_col, target_col, pred) {
selectors <- do.call(pred, list(df[, driver_col]))
if (deparse(substitute(pred)) != "is.na") {
# this is to 'recorrect' NA's which intrude into the selector vector
# then driver_col has NAs. For sure "is.na" is not the only possible
# way to check for NA - so this edge case is not covered fully
selectors[is.na(selectors)] <- FALSE
}
df[, target_col] <- ifelse(selectors, df[, driver_col], df[, target_col])
df
}
for (i in 1:ncol(df)) {
df <- overtake_(df, driver_col, i, pred)
}
df
}
driver_col_to_other_cols(df, "a", function(x) x == 1)
# a b c
# 1 1 1 1
# 2 2 NA 5
# 3 NA 4 6
## if the "is.na" check is not done, then this would give
## (because of NA in selectorvector):
# a b c
# 1 1 1 1
# 2 2 NA 5
# 3 NA NA NA
## hence in the case that pred doesn't check for NA in 'a',
## these NA vlaues have to be reverted to the original columns' value.
driver_col_to_other_cols(df, "a", is.na)
# a b c
# 1 1 3 NA
# 2 2 NA 5
# 3 NA NA NA
Try this function, in input you have your original dataset and in output the cleaned one:
Input
df<-data.frame(a=c(1, 2, NA), b=c(3, NA, 4), c=c(NA, 5, 6))
> df
a b c
1 1 3 NA
2 2 NA 5
3 NA 4 6
Function
fix<-function(df,var_x,list_y)
{
df[is.na(df[,var_x]),list_y]<-NA
return(df)
}
Output
fix(df,"a",c("b","c"))
a b c
1 1 3 NA
2 2 NA 5
3 NA NA NA

Subtract columns in R data frame but keep values of var1 or var2 when the other is NA

I wanted to subtract one column from the other in R and this turned out more complicated than I thought.
Suppose this is my data (columns a and b) and column c is what I want, namely a - b but keeping a when b==NA and vice versa:
a b c
1 2 1 1
2 2 NA 2
3 NA 3 3
4 NA NA NA
Now I tried different things but most of the time it returned NA when at least one column was NA. For example:
matrixStats::rowDiffs(data, na.rm=T) # only works for matrix-format, and returns NA's
dat$c <- dat$a - dat$b + ifelse(is.na(dat$b),dat$a,0) + ifelse(is.na(dat$a),dat$b,0) # seems like a desparately basic solution, but not even this does the trick as it also returns NA's
apply(dat[,(1:2)], MARGIN = 1,FUN = diff, na.rm=T) # returns NA's
dat$b<-dat$b*(-1)
dat$c<-rowSums(dat,na.rm=T) # this kind of works but it's a really ugly workaround
Also, if you can think of a dplyr solution, please share your knowledge. I didn't even know what to try.
Will delete this question if you think it's a duplicate of an existing one, though none of the existing threads were particularly helpful.
Try this (Base R Solution):
If df$b is NA then simply take the value of df$a else if df$a is NA then simply take the value of df$b else do df$a-df$b
df$c=ifelse(is.na(df$b),df$a,ifelse(is.na(df$a),df$b,df$a-df$b))
Output:
df
a b c
1 2 1 1
2 2 NA 2
3 NA 3 3
4 NA NA NA
You may try using the coalesce function from the dplyr package:
dat <- data.frame(a=c(2, 2, NA, NA), b=c(1, NA, 3, NA))
dat$c <- coalesce(dat$a - coalesce(dat$b, 0), dat$b)
dat$c
a b c
1 2 1 1
2 2 NA 2
3 NA 3 3
4 NA NA NA
The idea here is to take a minus b, or a alone if b be NA. If that entire expression is still NA, then it implies that a is also NA, in which case we take b.
Here is one option with base R where we replace the NA elements with 0, Reduce it to a single vector by taking the rowwise difference and change the rows that have all NA elements to NA
df1$c <- abs(Reduce(`-`, replace(df1, is.na(df1), 0))) *
NA^ (!rowSums(!is.na(df1)) )
df1$c
#[1] 1 2 3 NA
Or using similar method with data.table
library(data.table)
setDT(df1)[!is.na(a) | !is.na(b), c := abs(Reduce(`-`,
replace(.SD, is.na(.SD), 0)))]
data
df1 <- structure(list(a = c(2L, 2L, NA, NA), b = c(1L, NA, 3L, NA)),
row.names = c("1", "2", "3", "4"), class = "data.frame")

How to get two likert variables properly into one (ggplot2) sj.likert plot of the sjPlot package?

I have a dataframe with two likert variables. I want to plot these two variables by using the sjp.likert function of the sjPlot package. The plot doesn't make sense.
My data (mydf) looks like this:
structure(list(var1 = c(1, 1, 5, NA, 3, NA, 1, NA, 4, 3, 5, 5,
4, 2, 2, NA, NA, 5, NA, NA), var2 = c(NA, NA, NA, 3, NA, 3, NA,
5, NA, NA, NA, 2, NA, NA, NA, 4, 4, NA, 1, 1)), .Names = c("var1",
"var2"), row.names = c(NA, 20L), class = "data.frame")
var1 var2
1 1 NA
2 1 NA
3 5 NA
4 NA 3
5 3 NA
6 NA 3
7 1 NA
8 NA 5
9 4 NA
10 3 NA
11 5 NA
12 5 2
13 4 NA
14 2 NA
15 2 NA
16 NA 4
17 NA 4
18 5 NA
19 NA 1
20 NA 1
This is the code I use:
library(sjPlot)
library(RColorBrewer)
likert_5 <- mydf
levels_5 <- list(c(1,2,3,4,5))
varnames <- names(likert_5
sjp.likert(likert_5, legendLabels=levels_5, barColor="brewer",legendSize=0.5,axisLabelSize=0.5,valueLabelSize=2,colorPalette="BrBG", orderBy="pos",legendPos="bottom",axisLabels.y=varnames)
This is the result:
I think you agree that this doesn't make sense at all. The two variable names are the same and there are four levels instead of five. Does anyone know what's going wrong here?
Many thanks in advance!
I believe this is a bug in the sjp.likert function. Adding arguments one by one, I found that the plot works fine until the argument orderBy = "pos"is included. Examining the source code of the function shows:
sjp.likert
# ...
# questionCount <- nrow(pos)/(length(legendLabels)/2)
# if (!is.null(orderBy)) {
# ...
# orderUniqueItems <- rev(1 + questionCount - orderUniqueItems)
# axisLabels.y <- axisLabels.y[orderUniqueItems]
# }
# ...
Using your data, I end up with the following:
questionCount
# [1] 1.6
orderUniqueItems
# [1] 1.6 0.6
varnames[c(1.6, 0.6)]
# [1] "var1"
I think the author actually wanted questionCount <-ceiling(orderUniqueItems <- c(unique(orderRelatedItems))), which with your data would produce:
questionCount
# [1] 2
orderUniqueItems
# [1] 2 1
varnames[c(1.6, 0.6)]
# [1] "var2" "var1"
A quick fix would be to save the returned plot and modify the labels manually (using the author's code to create the labels with 'n=' pasted on).
for (i in 1:length(varnames)) {
varnames[i] <- paste(varnames[i], sprintf(" (n=%i)", length(na.omit(likert_5[,i]))), sep = "")
}
myplot <- sjp.likert(likert_5, legendLabels=levels_5, barColor="brewer", legendSize=0.5, axisLabelSize=0.5, valueLabelSize=2, colorPalette="BrBG", orderBy="pos", legendPos="bottom")
myplot$plot + scale_x_discrete(labels=varnames[c(2,1)])
Edit:
Regarding the missing middle level, I also found this in the code:
if (!is.null(neutral)) {
out <- out[out$Response != neutral, ]
}
Which deletes the middle 'neutral' category from the output. There doesn't seem to be an option to change this, and none of the author's examples use an odd number of categories. So it seems to be a feature, rather than a bug.
You might consider the likert package, specifically the function likert.bar.plot with the argument include.center = TRUE.

propagating data within a vector

I'm learning R and I'm curious... I need a function that does this:
> fillInTheBlanks(c(1, NA, NA, 2, 3, NA, 4))
[1] 1 1 1 2 3 3 4
> fillInTheBlanks(c(1, 2, 3, 4))
[1] 1 2 3 4
and I produced this one... but I suspect there's a more R way to do this.
fillInTheBlanks <- function(v) {
## replace each NA with the latest preceding available value
orig <- v
result <- v
for(i in 1:length(v)) {
value <- v[i]
if (!is.na(value))
result[i:length(v)] <- value
}
return(result)
}
Package zoo has a function na.locf():
R> library("zoo")
R> na.locf(c(1, 2, 3, 4))
[1] 1 2 3 4
R> na.locf(c(1, NA, NA, 2, 3, NA, 4))
[1] 1 1 1 2 3 3 4
na.locf: Last Observation Carried Forward;
Generic function for replacing each ‘NA’ with the most recent non-‘NA’ prior to it.
See the source code of the function na.locf.default, it doesn't need a for-loop.
I'm doing some minimal copy&paste from the zoo library (thanks again rcs for pointing me at it) and this is what I really needed:
fillInTheBlanks <- function(S) {
## NA in S are replaced with observed values
## accepts a vector possibly holding NA values and returns a vector
## where all observed values are carried forward and the first is
## also carried backward. cfr na.locf from zoo library.
L <- !is.na(S)
c(S[L][1], S[L])[cumsum(L)+1]
}
Just for fun (since it's slower than fillInTheBlanks), here's a version of na.locf relying on rle function:
my.na.locf <- function(v,fromLast=F){
if(fromLast){
return(rev(my.na.locf(rev(v))))
}
nas <- is.na(v)
e <- rle(nas)
v[nas] <- rep.int(c(NA,v[head(cumsum(e$lengths),-1)]),e$lengths)[nas]
return(v)
}
e.g.
v1 <- c(3,NA,NA,NA,1,2,NA,NA,5)
v2 <- c(NA,NA,NA,1,7,NA,NA,5,NA)
my.na.locf(v1)
#[1] 3 3 3 3 1 2 2 2 5
my.na.locf(v2)
#[1] NA NA NA 1 7 7 7 5 5
my.na.locf(v1,fromLast=T)
#[1] 3 1 1 1 1 2 5 5 5
my.na.locf(v2,fromLast=T)
#[1] 1 1 1 1 7 5 5 5 NA
another simple answer. This one takes care of 1st value being NA. Thats a dead end so my loop stats from index 2.
my_vec <- c(1, NA, NA, 2, 3, NA, 4)
fill.it <- function(vector){
new_vec <- vector
for (i in 2:length(new_vec)){
if(is.na(new_vec[i])) {
new_vec[i] <- new_vec[i-1]
} else {
next
}
}
return(new_vec)
}
Multiple R packages have a na.locf function included, which exactly does that. (imputeTS, zoo, spacetime,...)
Here is a example with imputeTS:
library("imputeTS")
x <- c(1, NA, NA, 2, 3, NA, 4)
na.locf(x)
There are also more advanced methods for replacing missing values provided by the imputeTS package. (and by zoo also)

Resources