I am working on text mining using tm package. I have corpus, with 320 documents in it, I would like to search for a keyword in corpus contents, such that it should return document number, So I have written like
miningCases <- lapply(myCorpusCopy,function(x){grepl(as.character(x), pattern = "\\<mining")})
Here are the first 8 results, when I print miningCases
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] FALSE
[[4]]
[1] FALSE
[[5]]
[1] FALSE
[[6]]
[1] FALSE
[[7]]
[1] TRUE
[[8]]
[1] FALSE
I want to get something like 1 7, such it found pattern in 1st and 7th document. Any way to do this?
Related
I have a list as such
$`1`
[1] TRUE
$`5`
[1] TRUE
$`14`
[1] FALSE
$`17`
[1] TRUE
$`19`
[1] TRUE
$`20`
[1] TRUE
Is there an easy way to count the total number of TRUE values in the list?
I tried doing this trucount <- function(z){sum(z,na.rm = TRUE)} , but it doesn't work.
In the above example, the solution would return 5
You can use isTRUE():
> ll = list(`1`=TRUE, `5`=TRUE, `14`=TRUE, `17`=TRUE, `19`=TRUE, `20`=TRUE, `21`=FALSE)
> length(which(sapply(ll, isTRUE)))
[1] 6
I am just practicing basic for loops to compare against their purrr::map() equivalent. However I am lost why the simple print function appears to double the output vs. its equivalent for loop.
#this simple for loop behaves as expected and gives us the numbers 1 through 10.
.x <- 1:10
for (i in .x){
print(i)
}
#result
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
#this doubles the output in an embedded list - I don't understand why
map(.x=.x,~print(.x))
#results below
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
[[7]]
[1] 7
[[8]]
[1] 8
[[9]]
[1] 9
[[10]]
[1] 10
I would have though they would produce the same (however I know the map results would be in a list unless I specify the output (eg. map_chr or map_df).
According to R documentations print prints its argument and returns it invisibly (via invisible(x)).
So your map function is essentially doing
.x <- 1:10
funcy <- function() {
out = list()
for (i in .x){
out[[i]] = print(i)
}
return(out)
}
funcy()
The print function gets called every loop and when the loop ends the function returns the stored values in a list
The purrr library has a function specifically designed for tasks such as this: walk.
If you don't want to return anything and are only calling functions for the purpose of their downstream effects (print or write_csv), you can use walk instead of map.
walk(1:10, print)
# [1] 1
# [1] 2
# [1] 3
# [1] 4
# [1] 5
# [1] 6
# [1] 7
# [1] 8
# [1] 9
# [1] 10
I need to check if an object is a list if yes skip to the next loop.
if(is.list(x)) next
The above code is throwing the error below
Error: no loop for break/next, jumping to top level
Using this dummy data
my_list <- list(1,2,3 , list(12) , 5 , list(23))
You can use this for loop
for(i in 1:length(my_list)){
if(is.list(my_list[[i]])) next
else my_list[[i]] <- my_list[[i]] + 100
}
Output
[[1]]
[1] 101
[[2]]
[1] 102
[[3]]
[1] 103
[[4]]
[[4]][[1]]
[1] 12
[[5]]
[1] 105
[[6]]
[[6]][[1]]
[1] 23
> dc1
V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] "" "Box"
[[2]]
[1] "" "Office" "Ball"
[[3]]
[1] "" "Office"
[[4]]
[1] "" "Office"
[[5]]
[1] "" "Box"
[[6]]
[1] "" "Box"
How do i remove the blank ("") from strsplit results.The result should look like:
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
You can check use lapply on your list. I changed the definition of your strsplit to match your intended output.
dc1 <- read.table(text = 'V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box', header = TRUE)
out <- strsplit(as.character(dc1[,2]),"\\|")
> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
You could use:
library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
I do not have a global solution, but for your example you could try :
strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")
It removes the first | (this is what the regex "^\\|" says), which is the reason for the "", before performing the split.
In this case, you can just remove the first element of each vector by calling "[" in sapply
> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"
# [[2]]
# [1] "Office" "Ball"
# [[3]]
# [1] "Office"
# [[4]]
# [1] "Office"
# [[5]]
# [1] "Box"
# [[6]]
# [1] "Box"
Another method uses nzchar() after unlisting the result of strsplit():
out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))
out[nzchar(x=out)] # removes the extraneous "" marks
library("stringr")
lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
This post is cold but if this helps someone:
strsplit(as.character(dc1[,2]),"^\\|") %>%
lapply(function(x){paste0(x, collapse="")})
I have the following two lists:
First list:
[[1]]
[1] "ab" "iew" "rer" "fdd"
[[2]]
[1] "ff" "de
[[3]]
[1] "cc"
Second list:
[[1]]
[1] "iew" "vfr"
[[2]]
[1] "ff" "cdc"
[[3]]
[1] "vf" "cde"
My goal is to compare these two multi-dimensional lists, so that the result would be:
[[1]]
[1] FALSE TRUE FALSE FALSE
[[2]]
[1] TRUE FALSE
[[3]]
[1] FALSE
What is the best vectorized way to preform this intersect() function?
Here's an alternative using mapply
> mapply("%in%", Firt.list, Second.list)
[[1]]
[1] FALSE TRUE FALSE FALSE
[[2]]
[1] TRUE FALSE
[[3]]
[1] FALSE
Where First.list and Second.list are:
Firt.list <- list(c("ab", "iew", "rer", "fdd" ), c("ff", "de"), c("cc"))
Second.list <- list(c("iew", "vfr"), c("ff", "cdc"), c("vf", "cde"))
If you want to know which values are the intersects of the lists, then try this
> mapply("intersect", Firt.list, Second.list)
[[1]]
[1] "iew"
[[2]]
[1] "ff"
[[3]]
character(0)