How can I keep NA when I change levels

How can I keep NA when I change levels - r

I build a vector of factors containing NA.
my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA
I change one of those levels.
levels(my_vec)[levels(my_vec) == "b"] <- "c"
NA disappears.
levels(my_vec)
# [1] "a" "c"
How can I keep it ?
EDIT
#rawr gave a nice solution that can work most of the time, it works for my previous specific example, but not for the one I'll show below
#Hack-R had a pragmatic option using addNA, I could make it work with that but I'd rather a fully general solution
See this generalized issue
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c" # NA disppeared
#rawr's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA "c" "c" # c is duplicated
#Hack-R's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA # NA is in the end
I want levels(my_vec) == c("a",NA,"c")

You have to quote NA, otherwise R treats it as a null value rather than a factor level. Factor levels sort alphabetically by default, but obviously that's not always useful, so you can specify a different order by passing a new list order to levels()
require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))
#now reorder levels
my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])
Levels: a NA c

I finally created a function that first replaces the NA value with a temp one (inspired by #lmo), then does the replacement I wanted the standard way, then puts NA back in its place using #rawr's suggestion.
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a c c
# Levels: a <NA> c
As a bonus level_sub can be used with na_rep = NULL which will remove the NA, and it will look good in pipe chains :).
level_sub <- function(x,from,to,na_rep = "NA"){
if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
levels(x)[levels(x) %in% from] <- to
if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
x
}
Nevertheless it seems that R really doesn't want you to add NA to factors.
levels(my_vec) <- c(NA,"a") will have a strange behavior but that doesn't stop here. While subset will keep NA levels in your columns, rbind will quietly remove them! I wouldn't be surprised if further investigation revealed that half R functions remove NA factors, making them very unsafe to work with...

Related

Adding values to a vector using for loop

I have a list named mylist that has character elements in it which I'm trying to merge and save in another object.
The following piece of code:
result <- c()
for (i in length(mylist)) {
temp <- paste(mylist[[i]][2], mylist[[i]][3], mylist[[i]][4], sep="")
result[i] <- temp
}
result
Results in the following output:
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
Why am I getting NA's instead of the merged characters for EVERY result[i]?

The reason for the unexpected result has already been explained by Brent and damir.
However, I suggest to use seq_along(mylist) as it is more safe than 1:length(mylist) in case mylist is empty for some reason.
result <- c()
for (i in seq_along(mylist)) {
result[i] <- paste(mylist[[i]][2:4], collapse = "")
}
result
[1] "BCD" "CDE" "DEF" "EFG" "FGH"
If mylist is empty, length(mylist) would become 0 but the loop would be executed twice for 1:0.
In addition, the collapse parameter tells paste() to concatenate the elements of a vector thereby saving a lot of typing.
By the way, the same result can be achieved by using sapply():
sapply(mylist, function(x) paste(x[2:4], collapse = ""))
[1] "BCD" "CDE" "DEF" "EFG" "FGH"
Data
The OP has not provided a reproducible example but says that he has "a list named mylist that has character elements". So, here are some made-up data:
mylist <- lapply(1:5, function(i) LETTERS[i + (0:3)])
mylist
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "B" "C" "D" "E"
[[3]]
[1] "C" "D" "E" "F"
[[4]]
[1] "D" "E" "F" "G"
[[5]]
[1] "E" "F" "G" "H"

length(myList) is just a scalar value (i.e., a single value). You need to add the starting value like this: 1:length(myList). I am guessing that the last value in myList has some result?

Your loop only starts and ends at length(myList), because the value of i is always length(myList). So, your loop "loops" only once.
To change this, you need to specify the start and end value of i. Something like,
for (i in 1:length(myList))

Extracting one column based on max of other columns of a Dataframe in R

I am trying to fetch the value in column in 'a' corresponding to the max values od columns 'c','d' and 'e' and then store it in a vector.
I have written below code which gives column 'a' data along with two NA.
Can somebody help me to fetch the exact data using sapply.
a<-c('A','B','C','D','E')
b<-c(10,30,45,25,40)
c<-c(19,23,25,37,39)
d<-c(43,21,17,14,26)
e<-c(NA,23,45,32,NA)
df<-data.frame(a,b,c,d,e)
A1<-vector("character",3)
for (i in 3:5){
A1[i]<-c(df[which(df[,i]==max(df[,i],na.rm = TRUE)),1])
A1
}
Actual Result: > A1
[1] "" "" "E" "A" "C"
Expected Result: A1 should have "E" "A" "C"
Please suggest a solution using sapply.
Thanks

We can use mapply
unname(mapply(function(x, y) x[which(y == max(y, na.rm = TRUE))], df[1], df[3:5]))
#[1] "E" "A" "C"
In the loop, the indexing starts from 3:5 which is the index for the columns while the 'A1' vector object is initialized to 3 elements. If the assignment starts from the 3rd element onwards, the vector just appends new elements while keeping the first 2 elements untouched.
A1<-vector("character",3)
A1
#[1] "" "" ""
A2 <- A1
A2[3:5] <- 15
A2
#[1] "" "" "15" "15" "15" #### this is the same thing happening in the loop
Instead, we can loop over the sequence and then assign
i1 <- 3:5
for(i in seq_along(i1)) {
A1[i] <- df[which(df[,i1[i]]==max(df[,i1[i]],na.rm = TRUE)),1]
}
A1
#[1] "E" "A" "C"

Extraction with names when names are repetitive

I came across this challenge. How to extract when there is repetition in names?
X <- 1:5
names(X) <- c(letters[1:4], "a")
X
a b c d a
1 2 3 4 5
names(X)
[1] "a" "b" "c" "d" "a"
X["a"]
a
1

To extract when there is repetition in names:
X[names(X) %in% "a"]
# a a
# 1 5
Why is R accepting repetitive names?
Note that names is a generic accessor function. You can set a names to anything, it doesn't really have to be unique.

Other solutions:
X[grepl("a", names(X))]
X[names(X) == "a"]
Also in general it is better to have unique names, so
you can reference them without confusion.
The following command does it for you.
make.unique(names(X))
[1] "a" "b" "c" "d" "a.1"
BTW the first of the solution I proposed above, would still pick
all the columns that contain a.

R: are there built-in functions to sort lists?

in R I have produced the following list L:
>L
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "D"
[[3]]
[1] NULL
I would like to manipulate the list L arriving at a database df like
>df
df[,1] df[,2]
"A" 1
"B" 1
"C" 1
"D" 2
where the 2nd column gives the position in the list L of the corresponding element in column 1.
My question is: is(are) there a() built-in R function(s) which can do this manipulation quickly? I can do it using "brute force", but my solution does not scale well when I consider much bigger lists.
I thank you all!

You'll get a warning because of your NULL value, but you can use stack if you give your list items names:
L <- list(c("A", "B", "C"), "D", NULL)
stack(setNames(L, seq_along(L)))
# values ind
# 1 A 1
# 2 B 1
# 3 C 1
# 4 D 2
# Warning message:
# In stack.default(setNames(L, seq_along(L))) :
# non-vector elements will be ignored
If the warning displeases you, you can, of course, run stack on the non-NULL elements, but do it after you name your list elements so that the "ind" column reflects the correct value.
I'll show in 2 steps just for clarity:
names(L) <- seq_along(L)
stack(L[!sapply(L, is.null)])
Similarly, if you've gotten rid of the NULL list elements, you can use melt from "reshape2". You don't gain anything in brevity, and I'm not sure that you gain anything in efficiency either, but I thought I'd share it as an option.
library(reshape2)
names(L) <- seq_along(L)
melt(L[!sapply(L, is.null)])

Ananda's answer is seemingly better than this, but I'll put it up anyway:
> cbind(unlist(L), rep(1:length(L), sapply(L, length)))
[,1] [,2]
[1,] "A" "1"
[2,] "B" "1"
[3,] "C" "1"
[4,] "D" "2"

Shuffling a vector - all possible outcomes of sample()?

I have a vector with five items.
my_vec <- c("a","b","a","c","d")
If I want to re-arrange those values into a new vector (shuffle), I could use sample():
shuffled_vec <- sample(my_vec)
Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?
Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.

I think permn from the combinat package does what you want
library(combinat)
permn(my_vec)
A smaller example
> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"
[[2]]
[1] "a" "b" "a"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "b" "a" "a"
[[5]]
[1] "a" "b" "a"
[[6]]
[1] "a" "a" "b"
If the duplicates are a problem you could do something similar to this to get rid of duplicates
strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")
Or probably a better approach to removing duplicates...
dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]

Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.
Let's start with the input data:
my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier
To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
First, create a data.frame with the permutations:
tmp <- data.frame(permutations(length(my_vec)))
You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:
>tmp
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
...
119 5 4 3 1 2
120 5 4 3 2 1
Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
gsub() won't work because you have more than one value in the replacement array.
You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.
remap <- function(x,
old,
new){
return(gsub2(pattern = old,
replacement = new,
fixed = TRUE,
x = as.character(x)))
}
Almost there. We do the mapping like this:
shuffled_vec <- as.data.frame(lapply(tmp,
remap,
old = as.character(my_vec_ind),
new = my_vec))
which can be simplified to...
shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))),
remap,
old = as.character(my_vec_ind),
new = my_vec))
.. should you feel the need.
That gives you your required answer:
> shuffled_vec
X1 X2 X3 X4 X5
1 a b a c d
2 a b a d c
3 a b c a d
...
119 d c a a b
120 d c a b a

Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:
permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) :
# too few different elements
You can adapt it however like so:
apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])
# [,1] [,2] [,3] [,4]
#[1,] "a" "a" "a" "a" ...
#[2,] "b" "b" "b" "b" ...
#[3,] "a" "a" "c" "c" ...
#[4,] "c" "d" "a" "d" ...
#[5,] "d" "c" "d" "a" ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I keep NA when I change levels - r

Related

Adding values to a vector using for loop

Extracting one column based on max of other columns of a Dataframe in R

Extraction with names when names are repetitive

R: are there built-in functions to sort lists?

Shuffling a vector - all possible outcomes of sample()?

Categories

Resources