Adding a column name to a table column without a name - r

I have data as follows:
dat <- structure(c(TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE), dim = c(3L, 3L), dimnames = list(c("A", "B",
"C"), c("[0,25) D", "[0,25) E", NA)))
name_vec <- "[0,25) F"
What I want to do is the following:
colnames(dat )[length(dat )] <- name_vec[i]
But this gives the error:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
I am failing to understand why this does not work or what the error means.
Any help would be appreciated.

I guess you want to do the following:
colnames(dat)[ncol(dat)] <- name_vec
[0,25) D [0,25) E [0,25) F
A TRUE TRUE TRUE
B FALSE TRUE TRUE
C TRUE TRUE TRUE
Since your dat is a matrix, length would return the number of elements in the matrix, which is 9 in your case. However, you do not have 9 columns, therefore it gives you the error.
class(dat)
[1] "matrix" "array"
length(dat)
[1] 9
ncol(dat)
[1] 3
So the correct function to use should be ncol.

Related

discard elements from list recursively r

I have a nested list with some NAs, and I want to discard the NAs from the list.
purrr::discard does not work recursively:
l <- list(a = NA, b = T, c = c(F, F))
purrr::discard(l, is.na)
Throws this error:
Error: Predicate functions must return a single TRUE or FALSE, not a logical vector of length 2
I would like to end up with the following list in this case:
l2 <- list(b = T, c = c(F, F))
(purrr version: 0.3.2)
is.na(c(T,T,T)) returns c(F,F,F). To use discard, the function needs to return a single value for each list element as the error suggests.
This should work.
purrr::discard(l,function(x) all(is.na(x)))
This will work only if all the elements in an index of the list are NA.
To remove all NA elements this should work
library(tidyverse)
l <- list(a = NA, b = c(T,NA), c = c(F, F)) # Define a list
lapply(l,function(x) x[!is.na(x)])%>% # Remove all nested NA's
purrr::discard(.,function(x) length(x) == 0) # Remove all empty elements
EDIT(another option)
purrr::discard(l,function(x) isTRUE(anyNA(x)))
$b
[1] TRUE
$c
[1] FALSE FALSE
You can identify all NA elements and zap them:
purrr::list_modify(l,a=purrr::zap())
$b
[1] TRUE
$c
[1] FALSE FALSE
EDIT 2
If you want to remove all nested NAs, you can write up a helper zap_if():
zap_if <- function(x){
unlist(lapply(x, function(z) z[!is.na(z)]))
}
purrr::map(l,zap_if)
Result:
$a
[1] 1
$b
[1] TRUE
$c
[1] FALSE FALSE
Data for the zap_if part:
l <- list(a = c(NA,1), b = T, c = c(F, F))

CSV to CSV comparison in R

I need to compare two csv files in R and write the records that is not matching in both the files. I was able to do above task with the below code,
library(dplyr)
a <- c("ads", "ads", "abc")
b <- c(121, 345, 23.300)
c <- c(21,22,23)
srce <- cbind.data.frame(a,b,c)
d <- c("ads", "ds", "abc")
e <- c(121, 345, 23)
f <- c(21,22,23)
trgt <- cbind.data.frame(d, e, f)
colnames(trgt) <- colnames(srce)
#Compare csv files
nn <- anti_join(srce, trgt)
The final output gives me rows with mismatch,
But i need to find out the cells which are mismatch in two files,
Is there a way to identify the cells which are mismatch rather than the entire records?
Thanks
Balaji.SJ
If you use the stringsAsFactors = FALSE argument with cbind.data.frame, a simple logical comparison will do the trick:
library(dplyr)
a <- c("ads", "ads", "abc")
b <- c(121, 345, 23.300)
c <- c(21,22,23)
srce <- cbind.data.frame(a,b,c, stringsAsFactors = FALSE)
d <- c("ads", "ds", "abc")
e <- c(121, 345, 23)
f <- c(21,22,23)
trgt <- cbind.data.frame(d, e, f, stringsAsFactors = FALSE)
colnames(trgt) <- colnames(srce)
# logical comparison:
srce == trgt
a b c
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] TRUE FALSE TRUE
To compare two dataframes and extract the column names of columns with nonmatches, calculate the colSums of the matrix srce != trgt and display the names of all columns with colSums > 0:
names(which(colSums(srce != trgt, na.rm = TRUE) > 0))
[1] "a" "b"

Selecting logical vector elements based on certain conditions

I have a situation where I would like to detect conditions between two logical, named vectors based on the TRUE / FALSE combination at each position in the vector. For example:
x <- c(TRUE, FALSE, FALSE, TRUE)
names(x) <- c("a", "b", "c", "d")
y <- c(TRUE, TRUE, FALSE, FALSE)
names(y) <- names(x)
For each element in these two vectors I want to detect 3 conditions:
x[i] is TRUE and y[i] is TRUE;
x[i] is FALSE and y[i] is TRUE,
x[i] is TRUE and y[i] is FALSE.
The length of x and y are the same but could be longer than this example. I want to retrieve the name of the element for each condition and assign the element name to a new variable. For this example:
v1 <- "a"
v2 <- "b"
v3 <- "d"
In a longer version of these two vectors I might end up with something like:
v1 <- c("a", "e")
v2 <- c("b", "f", "g")
v3 <- c("d", "i", "k", "l")
What is the best vectorized way to do this. I think it is simple but I am unable to come up with the answer. Thanks in advance.
We can efficiently use split, but before that, we need a single grouping index. Here is a possibility:
g <- x + y + x
split(names(x), g)
To understand the above grouping index, consider this:
x <- c(TRUE, TRUE, FALSE, FALSE)
y <- c(TRUE, FALSE, TRUE, FALSE)
x + y + x
#[1] 3 2 1 0
So you can see that 4 combinations of TRUE and FALSE are mapped to 4 integer values.
Ah, so "a" get assigned to T-T, "b" to T-F, etc. But, why the x + y + x?? I don't follow adding x twice.
If you only do x + y, the result is only 0, 1 and 2. You won't be able to differentiate T-F and F-T as they are both 1.
#thelatemail offers a more readable way:
split(names(x), interaction(x, y, drop=TRUE))
Update
Ah... stupid me... Why did I bother creating g. I suddenly remember that we can pass a list to f argument in split:
split(names(x), list(x, y))
Note, internally in split.default:
if (is.list(f))
f <- interaction(f, drop = drop, sep = sep)

Replace elements of vector by vector

I want to replace few elements of vector by whole second vector. Condition is, that replaced elements of first vector are equal to third vector. Here is an example:
a <- 1:10
b <- 5:7
v <- rnorm(2, mean = 1, sd = 5)
my output should be
c(a[1:4], v, a[8:10])
I have already tried
replace(a, a == b, v)
a[a == b] <- v
but with a little success. Can anyone help?
The == operator is best used to match vectors of the same length, or when one of the vector is only length 1.
Try this, and notice in neither case do you get the positional match that you desire.
> a == b
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In a == b : longer object length is not a multiple of shorter object length
> b == a
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In b == a : longer object length is not a multiple of shorter object length
Instead, use match() - this gives you the index position where there is a match in the values.
> match(b, a)
[1] 5 6 7
Then:
a <- 1:10
b <- 5:7
v <- rnorm(3, mean=1, sd=5)
a[match(b, a)] <- v
The results:
a
[1] 1.0000000 2.0000000 3.0000000 4.0000000 -4.6843669 0.9014578 -0.7601413 8.0000000
[9] 9.0000000 10.0000000
Here' another option:
a[a %in% b] <- v
Since in the example described in the OP there are three common numbers in the vectors a and b while v <- rnorm(2, mean = 1, sd = 5)
contains only 2 numbers, the vector v will be recycled and a warning will be issued.
The warning and recycling can be prevented, e.g., by defining v as
v <- rnorm(sum(a %in% b), mean = 1, sd = 5)

R - Vectorized implementation of ternary operator?

The title says it about as well as I can. What I have:
A B
TRUE FALSE
FALSE TRUE
TRUE TRUE
what I want:
C
if(A[1]&&B[1]){some.value.here}else if(A[1]){other.value}else{another.value}
if(A[2]&&B[2]){some.value.here}else if(A[2]){other.value}else{another.value}
if(A[3]&&B[3]){some.value.here}else if(A[3]){other.value}else{another.value}
I've tried ifelse but only got atomic results not vectors.
Using ifelse works fine if with a little nesting. (It would have been nice to see your attempt to figure out where you went wrong.)
A = c(TRUE, FALSE, TRUE)
B = c(FALSE, TRUE, TRUE)
C = ifelse(A & B, "both", ifelse(A, "A only", "not A"))
cbind(A, B, C)
# A B C
# [1,] "TRUE" "FALSE" "A only"
# [2,] "FALSE" "TRUE" "not A"
# [3,] "TRUE" "TRUE" "both"
If you have a data frame with two columns, try using conditionals.
As a placeholder for your real replacement values, I chose "justA", "justB", and "both".
df$result[df$A & df$B] <- "both"
df$result[df$A & !df$B] <- "justA"
df$result[df$B & !df$A] <- "justB"
df
A B result
1 TRUE FALSE justA
2 FALSE TRUE justB
3 TRUE TRUE both
4 FALSE TRUE justB
Data
df <- data.frame(A=sample(c(T,F), 4, T), B=sample(c(T,F), 4, T))
df$result <- NA
If A and B are vectors:
> A = c(TRUE, FALSE, TRUE)
> B = c(FALSE, TRUE, TRUE)
You can use mapply():
> mapply(function (x, y) ifelse(x && y, 1, 2), A, B)
[1] 2 2 1

Resources