Splitting vector based on vector of chunk-lengths - r

I've got a vector of binary numbers. I know the consecutive length of each group of objects; how can I split based on that information (without for loop)?
x = c("1","0","1","0","0","0","0","0","1")
.length = c(group1 = 2,group2=4, group3=3)
x is the binary number vector that I need to split. .length is the information that I am given. .length essentially tells me that the first group has 2 elements and they are the first two elements 1,0. The second group has 4 elements and contain the 4 numbers that follow the group 1 numbers, 1,0,0,0, etc.
Is there a way of splitting that and returning the splitted item in to a list?
The ugly way is to do with via a for loop keep track of the current cumsum, but I am looking for a more elegant way if there is one.

You can use rep to set up the split-by variable, the use split
x = c("1","0","1","0","0","0","0","0","1")
.length = c(group1 = 2,group2=4, group3=3)
split(x, rep.int(seq_along(.length), .length))
# $`1`
# [1] "1" "0"
#
# $`2`
# [1] "1" "0" "0" "0"
#
# $`3`
# [1] "0" "0" "1"
If you wanted to take the group names with you to the split list, you can change rep to replicate the names
split(x, rep.int(names(.length), .length))
# $group1
# [1] "1" "0"
#
# $group2
# [1] "1" "0" "0" "0"
#
# $group3
# [1] "0" "0" "1"

Another option is
split(x,cumsum(sequence(.length)==1))
#$`1`
#[1] "1" "0"
#$`2`
#[1] "1" "0" "0" "0"
#$`3`
#[1] "0" "0" "1"
to get the group names
split(x, sub('.$', '', names(sequence(.length))))
#$group1
#[1] "1" "0"
#$group2
#[1] "1" "0" "0" "0"
#$group3
#[1] "0" "0" "1"

Related

Create list from a vector according to lengths vector [duplicate]

I've got a vector of binary numbers. I know the consecutive length of each group of objects; how can I split based on that information (without for loop)?
x = c("1","0","1","0","0","0","0","0","1")
.length = c(group1 = 2,group2=4, group3=3)
x is the binary number vector that I need to split. .length is the information that I am given. .length essentially tells me that the first group has 2 elements and they are the first two elements 1,0. The second group has 4 elements and contain the 4 numbers that follow the group 1 numbers, 1,0,0,0, etc.
Is there a way of splitting that and returning the splitted item in to a list?
The ugly way is to do with via a for loop keep track of the current cumsum, but I am looking for a more elegant way if there is one.
You can use rep to set up the split-by variable, the use split
x = c("1","0","1","0","0","0","0","0","1")
.length = c(group1 = 2,group2=4, group3=3)
split(x, rep.int(seq_along(.length), .length))
# $`1`
# [1] "1" "0"
#
# $`2`
# [1] "1" "0" "0" "0"
#
# $`3`
# [1] "0" "0" "1"
If you wanted to take the group names with you to the split list, you can change rep to replicate the names
split(x, rep.int(names(.length), .length))
# $group1
# [1] "1" "0"
#
# $group2
# [1] "1" "0" "0" "0"
#
# $group3
# [1] "0" "0" "1"
Another option is
split(x,cumsum(sequence(.length)==1))
#$`1`
#[1] "1" "0"
#$`2`
#[1] "1" "0" "0" "0"
#$`3`
#[1] "0" "0" "1"
to get the group names
split(x, sub('.$', '', names(sequence(.length))))
#$group1
#[1] "1" "0"
#$group2
#[1] "1" "0" "0" "0"
#$group3
#[1] "0" "0" "1"

Pipe that leads to a map ends up giving a list of incorrect length

Using the combn function, I want to generate all possible combinations of the vector c("1", "2", "3") when choosing 2 elements (m = 2.) The code looks like this:
comparisons <- combn(c("1", "2", "3"), m = 2)
[,1] [,2] [,3]
[1,] "1" "1" "2"
[2,] "2" "3" "3"
I then transpose this data-frame, so it becomes this:
comparisons <- t(comparisons)
[,1] [,2]
[1,] "1" "2"
[2,] "1" "3"
[3,] "2" "3"
The last step is to generate a list, where each element is a row from this transposed data-frame. I used map, and it gave me exactly what I wanted:
comparisons <- map(1:3, ~ comparisons[.x, ])
[[1]]
[1] "1" "2"
[[2]]
[1] "1" "3"
[[3]]
[1] "2" "3"
This is all fine and dandy, but when I try to pipe all of these together in one nice assignment, the resulting list is incorrect.
comparisons <- combn(c("1", "2", "3"), m = 2) %>%
t() %>%
map(1:3, ~ .[.x, ])
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
Here is the thing, when I turn your matrix into a tibble and then a list I get to your desired output. Since every data frame/tibble is also a list so every column is equivalent to one element of the list.
package(purrr)
comparisons %>%
as_tibble() %>%
as.list() %>% # Up here it will get your desire output but if you want to transpose it however you can run the last line of code.
transpose()
$a # Before running transpose
[1] "1" "2"
$b
[1] "1" "3"
$c
[1] "2" "3"
# After running tranpose
[[1]]
[[1]]$a
[1] "1"
[[1]]$b
[1] "1"
[[1]]$c
[1] "2"
[[2]]
[[2]]$a
[1] "2"
[[2]]$b
[1] "3"
[[2]]$c
[1] "3"

R subset string values including vertical bar(|)

I am trying to subset a data based on a column value. I am trying to subset if that specific column has only one level information. Here how my data look like.
data <- cbind(v1=c("a", "ab", "a|12|bc", "a|b", "ac","bc|2","b|bc|12"),
v2=c(1,2,3,5,3,1,2))
> data
v1 v2
[1,] "a" "1"
[2,] "ab" "2"
[3,] "a|12|bc" "3"
[4,] "a|b" "5"
[5,] "ac" "3"
[6,] "bc|2" "1"
[7,] "b|bc|12" "2"
I want to subset only with the character values that were not including "|", like below:
> data
v1 v2
[1,] "a" "1"
[2,] "ab" "2"
[3,] "ac" "3"
basically, I am trying to get rid of two-level (x|y) or three level values (x|y|z). Any thoughts on this?
Thanks!
We can use grep to find the row that have |, use the invert option to get the row index of elements that have no |, use that to subset the rows of the matrix
data[grep("|", data[,1], invert = TRUE, fixed = TRUE), ]
# v1 v2
#[1,] "a" "1"
#[2,] "ab" "2"
#[3,] "ac" "3"
NOTE: The fixed = TRUE is used or else it will check with the regex mode on and | is a metacharacter for OR condition. Other option are to escape (\\|) or place it inside square brackets ([|]) to capture the literal character (when fixed = FALSE)
Using logical grepl this can be done as follows. I will leave it in two code lines for clarity but it's straightforward to make of it a one-liner.
i <- !grepl("\\|", data[, 1])
data[i, ]
# v1 v2
#[1,] "a" "1"
#[2,] "ab" "2"
#[3,] "ac" "3"

Append value to more than one position in vector [duplicate]

This question already has answers here:
Insert elements into a vector at given indexes
(8 answers)
insert elements in a vector in R
(6 answers)
Closed 4 years ago.
How do we append a single value to multiple positions in a vector?
x=c(1,2,3)
append(x, "a", c(1,3))
[1] "1" "a" "2" "3"
Warning messages:
1: In if (!after) c(values, x) else if (after >= lengx) c(x, values) else c(x[1L:after], :
條件的長度 > 1,因此只能用其第一元素
2: In if (after >= lengx) c(x, values) else c(x[1L:after], values, :
條件的長度 > 1,因此只能用其第一元素
3: In 1L:after : numerical expression has 2 elements: only the first used
4: In (after + 1L):lengx :
numerical expression has 2 elements: only the first used
With the above code, only the first position is registered, with a warning message.
lapply(c(1,3), function(y) append(x, 'a', y))
yields this result:
[[1]]
[1] "1" "a" "2" "3"
[[2]]
[1] "1" "2" "3" "a"
Expected output:
1 a 2 3 a
You can use `Reduce function:
x=1:10
pos=c(3,5,7,10)
Reduce(function(i,j)append(i,"a",j),cumsum(c(pos[1],diff(pos)+1)),init=x)
[1] "1" "2" "3" "a" "4" "5" "a" "6" "7" "a" "8" "9" "10" "a"

Data Frame creating column from existing column only takes into account the first row

I have a dataframe like this
head(test)
sku array
1 AQ665ELABLKLANID-81796 0,0,0,1,1,1,2
2 AQ665ELABLKMANID-81797 2,0,0,0,1,1,0,0,1
3 AQ665ELABLKNANID-81798 0,1,2,1,1,0,4,1
4 AQ665ELABLKOANID-81799 0,1,0,1
5 AQ665ELABLKPANID-81800 1,4,4,2,3,7,2,2
6 AQ665ELABLKRANID-81802 0,1,1,0
And I would like to add a column named first that contains for each row the first element of array:
test$first = strsplit(test$array,",")[[1]][1]
But what I get is the following :
head(test)
sku array first
1 AQ665ELABLKLANID-81796 0,0,0,1,1,1,2 0
2 AQ665ELABLKMANID-81797 2,0,0,0,1,1,0,0,1 0
3 AQ665ELABLKNANID-81798 0,1,2,1,1,0,4,1 0
4 AQ665ELABLKOANID-81799 0,1,0,1 0
5 AQ665ELABLKPANID-81800 1,4,4,2,3,7,2,2 0
6 AQ665ELABLKRANID-81802 0,1,1,0 0
I dont understand why all the rows get the value only from the array of the first row
I think you actually want:
test$first <- sapply(strsplit(test$array,","),"[",1)
test
# sku array first
#1 AQ665ELABLKLANID-81796 0,0,0,1,1,1,2 0
#2 AQ665ELABLKMANID-81797 2,0,0,0,1,1,0,0,1 2
#3 AQ665ELABLKNANID-81798 0,1,2,1,1,0,4,1 0
#4 AQ665ELABLKOANID-81799 0,1,0,1 0
#5 AQ665ELABLKPANID-81800 1,4,4,2,3,7,2,2 1
#6 AQ665ELABLKRANID-81802 0,1,1,0 0
In your attempt,
strsplit(test$array,",")[[1]]
gives you the split-apart version of test$array[1], from which you then subset the first element, which happens to be 0. Hence, all your values end up being 0.
I suppose some regex could also be of use here. Something along the lines of the following might come in handy:
gsub("(^[0-9]+)(,.*)", "\\1", test$array)
# [1] "0" "2" "0" "0" "1" "0"
gsub("(^.*?),(.*)", "\\1", test$array, perl=TRUE)
# [1] "0" "2" "0" "0" "1" "0"
There are some packages (like "stringi" and "stringr") that make this kind of stuff easier to do.
library(stringi)
stri_extract_first_regex(test$array, pattern="[0-9]+")
# [1] "0" "2" "0" "0" "1" "0"
This also lets you easily extract the last value with:
stri_extract_last_regex(test$array, pattern="[0-9]+")
# [1] "2" "1" "1" "1" "2" "0"

Resources