Subsetting in R (Index Explanation)

Subsetting in R (Index Explanation) - r

a <- c("a", "b", "c", "d", "e")
u <- a > "a"
a[u]
The code gives me the output as: "b" "c" "d" "e".
What does a[u] mean ? Do vector a has a new index u of a vector type?

u is a logical vector which is used to subset a.
u
#[1] FALSE TRUE TRUE TRUE TRUE
As 1st element is FALSE, we select all TRUE elements from a by doing a[u]
a[u]
#[1] "b" "c" "d" "e"
It will be more clear with another example. Consider
a <- 11:15
u <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
a[u]
#[1] 12 13 15
So all the elements in a where u is TRUE are selected i.e 12, 13 and 15.

You can figure this out yourself by looking at the contents of the u vector:
u <- a > "a"
u
[1] FALSE TRUE TRUE TRUE TRUE
When you then subset the vector a using this boolean vector u, you are telling R to output a vector consisting only of elements for which the input index be TRUE. This leaves you with just:
[1] "b" "c" "d" "e"
To be more explicit:
"a" "b" "c" "d" "e"
F T T T T
^^ |______________|
drop keep the rest

Related

Reorder vector so no certain items are positioned next to each other

Please consider the following example:
[[1]]
[1] 11 12 13 14
[[2]]
[1] 1 2 3
[[3]]
[1] 4
[[4]]
[1] 5
[[5]]
[1] 6
[[6]]
[1] 7
[[7]]
[1] 8
[[8]]
[1] 9
[[9]]
[1] 10
[[10]]
[1] 15
[[11]]
[1] 16
[[12]]
[1] 17
In this example, I have 12 unique values in a vector that is 17 elements long. For simplicity, let's say that this vector is:
foo_bar <- c("b","b","b","c","d","e","f","g","h","i","a","a","a","a", "j", "k", "l")
The first code block shows the index positions in foo_bar of each of the unique values (the letters a–l).
I am attempting to write an algorithm that reorders foo_bar so that, for all indices except the final one (index 17 in the foo_bar example), position i and position i+1 never contains the same two values. Here's an example of what would be an appropriate outcome:
reordered_foo_bar <- c("b","c","b","d","b","e","f","g","h","a","i","a","j","a","k","a", "l")

something like this?
foo_bar <- c("b","b","b","c","d","e","f","g","h","i","a","a","a","a", "j", "k", "l")
test == FALSE
while (test == FALSE) {
new_foo_bar <- sample(foo_bar, size = length(foo_bar), replace = FALSE)
test <- length(rle(new_foo_bar)$lengths) == length(foo_bar)
}
new_foo_bar
# [1] "f" "a" "g" "b" "h" "d" "j" "c" "e" "i" "a" "b" "k" "a" "l" "a" "b"

First we identify the indices of the unique values in the vector.
indices <-
unique(foo_bar) %>%
sort() %>%
lapply(function(x) which(foo_bar == x))
Then we create a position score based on 1) which order the value has when ordered by decreasing frequency and 2) how many previous occurences of this value has occurred, and we add these two values together. However, to ensure that we get a different value inserted between them, we divide 2) by 2. Finally, we order the position scores and reorder foo_bar with this new order.
This solution is also robust in case it is not possible to prevent duplicate values next to each other (for example because the values are c("a","a","b","a").
out <-
lengths(indices) %>%
lapply(., function(x) 1:x) %>%
{lapply(len_seq(.), function(x) (unlist(.[x]) + x / 2))} %>%
unlist() %>%
order() %>%
{unlist(indices)[.]} %>%
foo_bar[.]
The output is then:
> "a" "b" "a" "c" "b" "d" "a" "e" "b" "f" "a" "g" "h" "i" "j" "k" "l"

Unable to change name value in a vector

named_vector=c(a=1,b=2,c=3,d=4,e=5,f=6,g=7)
names(named_vector)[names(named_vector)=='c'] <- 'k'
names(named_vector[names(named_vector)])=='c'<-'k'
Unable to change name of a member 'c' in named_vector using line 3, but working fine with line 2
getting the error message as --------------------->
Error in names(named_vector[names(named_vector)]) == "c" <- "k" :
could not find function "==<-"

You can index by numeric position:
`names(named_vector)[3] <- "new name" `

Line 3 doesn't work because you're nesting your data too much. If you break this down
names(named_vector[names(named_vector)]) == 'c' <- 'k'
you get
# Gives you all the names back
names(named_vector)
# [1] "a" "b" "c" "d" "e" "f" "g"
# Putting it back in, you simply get all the values again
names(named_vector[c("a", "b", "c", "d", "e", "f", "g")])
# The inner part simply gives you the `named_vector` again
named_vector[c("a", "b", "c", "d", "e", "f", "g")]
# a b c d e f g
# 1 2 3 4 5 6 7
This is not to mention that the assignment is being saved into a vector
names(named_vector[names(named_vector)]) == 'c'
# [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE
So Line 2 works because you're indexing your vector names by the equality of which label you wish you change.
names(named_vector)[names(named_vector) == 'c'] <- 'K'

how to break a vector into subvectors in R

I have a vector like:
A B C A B A B D D E
and I'd like to break it into as many vectors as the number of "A" I have, like:
A B C
A B
A B D D E
is there a way to accomplish this task?

You can use split and cumsum:
split(x, cumsum(x == "A"))
What you get in return is a list of vectors. A list seems most useful to me here since it allows vectors of different sizes in each element (unlike a data.frame for instance).

Not as elegant as split approach but we can go also for strsplit:
strsplit(paste0("A", strsplit(paste0(vec, collapse = ""), "A")[[1]][-1]),"")
# [[1]]
# [1] "A" "B" "C"
# [[2]]
# [1] "A" "B"
# [[3]]
# [1] "A" "B" "D" "D" "E"

R subset with condition using %in% or ==. Which one should be used? [duplicate]

This question already has answers here:
Subset dataframe by multiple logical conditions of rows to remove
(8 answers)
Closed 8 years ago.
Usually, if I want to subset a dataframe conditioning of some values a variable I'm using subset and %in%:
x <- data.frame(u=1:10,v=LETTERS[1:10])
x
subset(x, v %in% c("A","D"))
Now, I found out that also == gives the same result:
subset(x, v == c("A","D"))
I'm just wondering if they are identically or if there is a reason to prefere one over the other.
Thanks for help.
Edit (#MrFlick): This question asks not the same as this here which asks how to not include several values: (!x %in% c('a','b')). I asked why I got the same if I use ==or %in%.

You should use the first one %in% because you got the result only because in the example dataset, it was in the order of recycling of A, D. Here, it is comparing
rep(c("A", "D"), length.out= nrow(x))
# 1] "A" "D" "A" "D" "A" "D" "A" "D" "A" "D"
x$v==rep(c("A", "D"), length.out= nrow(x))# only because of coincidence
#[1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
subset(x, v == c("D","A"))
#[1] u v
#<0 rows> (or 0-length row.names)
while in the above
x$v==rep(c("D", "A"), length.out= nrow(x))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
whereas %in% works
subset(x, v %in% c("D","A"))
# u v
#1 1 A
#4 4 D

Shuffling a vector - all possible outcomes of sample()?

I have a vector with five items.
my_vec <- c("a","b","a","c","d")
If I want to re-arrange those values into a new vector (shuffle), I could use sample():
shuffled_vec <- sample(my_vec)
Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?
Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.

I think permn from the combinat package does what you want
library(combinat)
permn(my_vec)
A smaller example
> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"
[[2]]
[1] "a" "b" "a"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "b" "a" "a"
[[5]]
[1] "a" "b" "a"
[[6]]
[1] "a" "a" "b"
If the duplicates are a problem you could do something similar to this to get rid of duplicates
strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")
Or probably a better approach to removing duplicates...
dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]

Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.
Let's start with the input data:
my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier
To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
First, create a data.frame with the permutations:
tmp <- data.frame(permutations(length(my_vec)))
You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:
>tmp
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
...
119 5 4 3 1 2
120 5 4 3 2 1
Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
gsub() won't work because you have more than one value in the replacement array.
You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.
remap <- function(x,
old,
new){
return(gsub2(pattern = old,
replacement = new,
fixed = TRUE,
x = as.character(x)))
}
Almost there. We do the mapping like this:
shuffled_vec <- as.data.frame(lapply(tmp,
remap,
old = as.character(my_vec_ind),
new = my_vec))
which can be simplified to...
shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))),
remap,
old = as.character(my_vec_ind),
new = my_vec))
.. should you feel the need.
That gives you your required answer:
> shuffled_vec
X1 X2 X3 X4 X5
1 a b a c d
2 a b a d c
3 a b c a d
...
119 d c a a b
120 d c a b a

Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:
permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) :
# too few different elements
You can adapt it however like so:
apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])
# [,1] [,2] [,3] [,4]
#[1,] "a" "a" "a" "a" ...
#[2,] "b" "b" "b" "b" ...
#[3,] "a" "a" "c" "c" ...
#[4,] "c" "d" "a" "d" ...
#[5,] "d" "c" "d" "a" ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Subsetting in R (Index Explanation) - r

a <- c("a", "b", "c", "d", "e") u <- a > "a" a[u] The code gives me the output as: "b" "c" "d" "e". What does a[u] mean ? Do vector a has a new index u of a vector type?

Related

Reorder vector so no certain items are positioned next to each other

Unable to change name value in a vector

how to break a vector into subvectors in R

R subset with condition using %in% or ==. Which one should be used? [duplicate]

Shuffling a vector - all possible outcomes of sample()?

Categories

Resources