Extract words from a string and make a list in R
str <- "qwerty keyboard"
result <- strsplit(str,"[[:space:]]")
What I get was..(down below)
result
[[1]]
[1] "qwerty" "keyboard"
What I need is..(down below)
result
[[1]]
[1] "qwerty"
[[2]]
[1] "keyboard"
[OR]
result
[[1]]
[1] "qwerty"
[2] "keyboard"
I am looking for a solution, if someone knows please post your solution here.
thanks in advance..
try:
str <- "qwerty keyboard"
result_1 <- strsplit(str,"[[:space:]]")[[1]][1]
result_2 <- strsplit(str,"[[:space:]]")[[1]][2]
result <- list(result_1,result_2)
Or
as.list(strsplit(str, '\\s+')[[1]])
as.list(unlist(strsplit(str, '[[:space:]]')))
As an alternative to strsplit(), you can make a list out of the result from scan().
as.list(scan(text=str, what=""))
# Read 2 items
# [[1]]
# [1] "qwerty"
#
# [[2]]
# [1] "keyboard"
Related
I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!
Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.
Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>
I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA
I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!
Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.
Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>
I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA
This is probably a very simple question, however I have googled for hours without a satisfying answer. Let's suppose I have a list like the following one:
theList <- list(c("de", "labore", "solis"), c("sapiento", "post", "eventum"), c("sursum", "corda"))
> theList
[[1]]
[1] "de" "labore" "solis"
[[2]]
[1] "sapiento" "post" "eventum"
[[3]]
[1] "sursum" "corda"
If I want to print all the vectors that compose the list I would think of something like
for(i in 1:length(theList)) {
print(theList[[i]])
}
[1] "de" "labore" "solis"
[1] "sapiento" "post" "eventum"
[1] "sursum" "corda"
however there must be a more elegant solution, probably using some member of the apply family...
I think I got an answer after doing some more googling and experimenting...
extract <- function(m) {
sapply(seq_along(m), function(x) m[[x]])
}
> extract(theList)
[[1]]
[1] "de" "labore" "solis"
[[2]]
[1] "sapiento" "post" "eventum"
[[3]]
[1] "sursum" "corda"
So it would be a matter of performing the desired operation (e.g. printing) on the result from the sapply
I am having trouble figuring out how to use do.call to call and run a list of functions.
for example:
make.draw = function(i){i;function()runif(i)}
function.list = list()
for (i in 1:3) function.list[[i]] = make.draw(i)
will result in
> function.list[[1]]()
[1] 0.2996515
> function.list[[2]]()
[1] 0.7276203 0.4704813
> function.list[[3]]()
[1] 0.9092999 0.7307774 0.4647443
what I want to do is create a function that calls all three functions in the list at one go. from what I understand as.call() can be used to do this but I am having trouble connecting the dots and getting 6 uniform random draws from function.list.
Did you want something like this?
lapply(function.list, do.call, list())
# [[1]]
# [1] 0.5777857
# [[2]]
# [1] 0.8970102 0.5892031
# [[3]]
# [1] 0.4712016 0.2624851 0.2353192
make.draw = function(i){runif(i)}
Map(make.draw, 1:3)
#[[1]]
#[1] 0.03442084
#[[2]]
#[1] 0.6899443 0.8896434
#[[3]]
#[1] 0.3899678 0.2845898 0.4920698
Is there a function in R that matches regexp and returns only the matched parts?
Something like grep -o, so:
> ogrep('.b.',c('abc','1b2b3b4'))
[[1]]
[1] abc
[[2]]
[1] 1b2 3b4
Try stringr:
library(stringr)
str_extract_all(c('abc','1b2b3b4'), '.b.')
# [[1]]
# [1] "abc"
#
# [[2]]
# [1] "1b2" "3b4"
I can't believe nobody ever mentioned regmatches!
x <- c('abc','1b2b3b4')
regmatches(x, gregexpr('.b.', x))
# [[1]]
# [1] "abc"
# [[2]]
# [1] "1b2" "3b4"
It makes me wonder, didn't regmatches exist two and half years ago?
You should probably give Gabor Grothendieck the check for writing the gsubfn package:
require(gsubfn)
#Loading required package: gsubfn
strapply(c('abc','1b2b3b4'), ".b.", I)
#Loading required package: tcltk
#Loading Tcl/Tk interface ... done
[[1]]
[1] "abc"
[[2]]
[1] "1b2" "3b4"
This just applies the identity function , I, to the matches of the pattern.
You need to combine gregexpr with substring, I reckon:
> s = c('abc','1b2b3b4')
> m = gregexpr('.b.',s)
> substring(s[1],m[[1]],m[[1]]+attr(m[[1]],'match.length')-1)
[1] "abc"
> substring(s[2],m[[2]],m[[2]]+attr(m[[2]],'match.length')-1)
[1] "1b2" "3b4"
The returned list 'm' has the start and lengths of matches. Loop over s to get all the substrings.