remove blanks from strsplit in R - r

> dc1
V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] "" "Box"
[[2]]
[1] "" "Office" "Ball"
[[3]]
[1] "" "Office"
[[4]]
[1] "" "Office"
[[5]]
[1] "" "Box"
[[6]]
[1] "" "Box"
How do i remove the blank ("") from strsplit results.The result should look like:
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"

You can check use lapply on your list. I changed the definition of your strsplit to match your intended output.
dc1 <- read.table(text = 'V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box', header = TRUE)
out <- strsplit(as.character(dc1[,2]),"\\|")
> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"

You could use:
library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"

I do not have a global solution, but for your example you could try :
strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")
It removes the first | (this is what the regex "^\\|" says), which is the reason for the "", before performing the split.

In this case, you can just remove the first element of each vector by calling "[" in sapply
> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"
# [[2]]
# [1] "Office" "Ball"
# [[3]]
# [1] "Office"
# [[4]]
# [1] "Office"
# [[5]]
# [1] "Box"
# [[6]]
# [1] "Box"

Another method uses nzchar() after unlisting the result of strsplit():
out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))
out[nzchar(x=out)] # removes the extraneous "" marks

library("stringr")
lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"

This post is cold but if this helps someone:
strsplit(as.character(dc1[,2]),"^\\|") %>%
lapply(function(x){paste0(x, collapse="")})

Related

How to split a list into n groups in all possible combinations of group length and elements within group in R?

My goal is to divide a list into n groups in all possible combinations (where the group has a variable length).
I found the same question answered here (Python environment), but I'm unable to replicate it in the R environment.
Could anyone kindly help me? Thanks a lot.
If you want an easy implementation for the similar objective, you can try listParts from package partitions, e.g.,
> x <- 4
> partitions::listParts(x)
[[1]]
[1] (1,2,3,4)
[[2]]
[1] (1,2,4)(3)
[[3]]
[1] (1,2,3)(4)
[[4]]
[1] (1,3,4)(2)
[[5]]
[1] (2,3,4)(1)
[[6]]
[1] (1,4)(2,3)
[[7]]
[1] (1,2)(3,4)
[[8]]
[1] (1,3)(2,4)
[[9]]
[1] (1,4)(2)(3)
[[10]]
[1] (1,2)(3)(4)
[[11]]
[1] (1,3)(2)(4)
[[12]]
[1] (2,4)(1)(3)
[[13]]
[1] (2,3)(1)(4)
[[14]]
[1] (3,4)(1)(2)
[[15]]
[1] (1)(2)(3)(4)
where x is the number of elements in the set, and all partitions denotes the indices of elements.
If you want to choose the number of partitions, below is a user function that may help
f <- function(x, n) {
res <- listParts(x)
subset(res, lengths(res) == n)
}
such that
> f(x, 2)
[[1]]
[1] (1,2,4)(3)
[[2]]
[1] (1,2,3)(4)
[[3]]
[1] (1,3,4)(2)
[[4]]
[1] (2,3,4)(1)
[[5]]
[1] (1,4)(2,3)
[[6]]
[1] (1,2)(3,4)
[[7]]
[1] (1,3)(2,4)
> f(x, 3)
[[1]]
[1] (1,4)(2)(3)
[[2]]
[1] (1,2)(3)(4)
[[3]]
[1] (1,3)(2)(4)
[[4]]
[1] (2,4)(1)(3)
[[5]]x
[1] (2,3)(1)(4)
[[6]]
[1] (3,4)(1)(2)
Update
Given x <- LETTERS[1:4], we can run
res <- rapply(listParts(length(x)), function(v) x[v], how = "replace")
such that
> res
[[1]]
[1] (A,B,C,D)
[[2]]
[1] (A,B,D)(C)
[[3]]
[1] (A,B,C)(D)
[[4]]
[1] (A,C,D)(B)
[[5]]
[1] (B,C,D)(A)
[[6]]
[1] (A,D)(B,C)
[[7]]
[1] (A,B)(C,D)
[[8]]
[1] (A,C)(B,D)
[[9]]
[1] (A,D)(B)(C)
[[10]]
[1] (A,B)(C)(D)
[[11]]
[1] (A,C)(B)(D)
[[12]]
[1] (B,D)(A)(C)
[[13]]
[1] (B,C)(A)(D)
[[14]]
[1] (C,D)(A)(B)
[[15]]
[1] (A)(B)(C)(D)

How do I split up this object and insert its elements into the list as individual objects? R

I have the following list. As you can see the 5th element contains multiple variables. I want to split the 5th element up and insert each single variable into the overall list as an individual element.
[[1]]
[1] "319"
[[2]]
[1] "321"
[[3]]
[1] "328"
[[4]]
[1] "333"
[[5]]
[1] "344" " 345" " 346"
[[6]]
[1] "353"
I'm coding in R Studio. I want it to do this -->
[[1]]
[1] "319"
[[2]]
[1] "321"
[[3]]
[1] "328"
[[4]]
[1] "333"
[[5]]
[1] "344"
[[6]]
[1] "345"
[[7]]
[1] "346"
[[8]]
[1] "353"

multicombine in parallel processing in R

I use the following code:
library(foreach)
library(doParallel)
N<-5
cl<-makeCluster(4)
registerDoParallel(cl)
comb <- function(x, ...) {
lapply(seq_along(x),
function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}
oper <- foreach(i=1:10, .combine='comb', .multicombine=TRUE,
.init=list(list(), list(), list())) %dopar% {
list(i+4, i+3, i+2)
}
stopCluster(cl)
If I need to insert K different functions. Is there a way to define in the .init=list(list(), list(), list()) a list that is a function of K(K=3 in this case) instead of adding ,list()? Is each oper runs on separate core (CPU)?
The output is:
> oper[[1]]
[[1]]
[1] 5
[[2]]
[1] 6
[[3]]
[1] 7
[[4]]
[1] 8
[[5]]
[1] 9
[[6]]
[1] 10
[[7]]
[1] 11
[[8]]
[1] 12
[[9]]
[1] 13
[[10]]
[1] 14
> oper[[2]]
[[1]]
[1] 4
[[2]]
[1] 5
[[3]]
[1] 6
[[4]]
[1] 7
[[5]]
[1] 8
[[6]]
[1] 9
[[7]]
[1] 10
[[8]]
[1] 11
[[9]]
[1] 12
[[10]]
[1] 13
> oper[[3]]
[[1]]
[1] 3
[[2]]
[1] 4
[[3]]
[1] 5
[[4]]
[1] 6
[[5]]
[1] 7
[[6]]
[1] 8
[[7]]
[1] 9
[[8]]
[1] 10
[[9]]
[1] 11
[[10]]
[1] 12
I would like to add some additional functions (K) without any need to add the ,list() at the relevant place. So when I write oper[[K]] I'll get the relevant result.

R: relisting a flat list

This question has a nice solution of flattening lists while preserving their data types (which unlist does not):
flatten = function(x, unlist.vectors=F) {
while(any(vapply(x, is.list, logical(1)))) {
if (! unlist.vectors)
x = lapply(x, function(x) if(is.list(x)) x else list(x))
x = unlist(x, recursive=F)
}
x
}
If I give it the following list, it behaves as expected:
> a = list(c(1,2,3), list(52, 561), "a")
> flatten(a)
[[1]]
[1] 1 2 3
[[2]]
[1] 52
[[3]]
[1] 561
[[4]]
[1] "a"
Now I'd like to restructure the flat list like a. relist fails miserably:
> relist(flatten(a), skeleton=a)
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] 52
[[1]][[3]]
[1] 561
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[1] "a"
[[2]][[2]]
[[2]][[2]][[1]]
NULL
[[3]]
[[3]][[1]]
NULL
Now, I could of course do relist(unlist(b), a) but that loses data types again. What is a good way to restructure a flat list?
Bonus points if it handles the analogous attribute to unlist.vectors correctly.
One way to do it is:
relist2 = function(x, like, relist.vectors=F) {
if (! relist.vectors)
like = rapply(a, function(f) NA, how='replace')
lapply(relist(x, skeleton=like), function(e) unlist(e, recursive=F))
}
This retains the classes and distinguishes between lists and vectors:
> relist2(flatten(a), like=a)
[[1]]
[1] 1 2 3
[[2]]
[[2]][[1]]
[1] 52
[[2]][[2]]
[1] 561
[[3]]
[1] "a"
> relist2(flatten(a, unlist.vectors=T), like=a, relist.vectors=T)
[[1]]
[1] 1 2 3
[[2]]
[[2]][[1]]
[1] 52
[[2]][[2]]
[1] 561
[[3]]
[1] "a"

adding a field to each element of a list

I have a list
> (mylist <- list(list(a=1),list(a=2),list(a=3)))
[[1]]
[[1]]$a
[1] 1
[[2]]
[[2]]$a
[1] 2
[[3]]
[[3]]$a
[1] 3
and I want to add field b to each sublist from 11:13 to get something like
> (mylist <- list(list(a=1,b=11),list(a=2,b=12),list(a=3,b=13)))
[[1]]
[[1]]$a
[1] 1
[[1]]$b
[1] 11
[[2]]
[[2]]$a
[1] 2
[[2]]$b
[1] 12
[[3]]
[[3]]$a
[1] 3
[[3]]$b
[1] 13
How do I do this?
(note that I have a large number of such relatively small lists, so this will be called in apply and has to be reasonably fast).
mylist <- list(list(a=1),list(a=2),list(a=3))
b.vals <- 11:13
mylist <- lapply(
1:length(mylist),
function(x) {
mylist[[x]]$b <- b.vals[[x]]
mylist[[x]]
} )

Resources