I have a vector of character strings (v1) like so:
> head(v1)
[1] "do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight"
[2] "going_to_see_harry_sunday_happiness"
[3] "this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake"
[4] "why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp"
[5] "like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5"
[6] "my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj"
And another vector of character strings (v2) like so:
> head(v2)
[1] "here_i_go" "going" "naieve_ass" "your_synapsis" "my_picture_with" "roll"
What is the quickest way that I can return a list of vectors where each list item represents each vector item in v1 and each vector item is a regular expression match where an item in v2 appeared in that v1 item, like so:
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
I'd like to leave another option with stri_extract_all_regex() in the stringi package. You can create your regular expression directly from v2 and use it in pattern.
library(stringi)
stri_extract_all_regex(str = v1, pattern = paste(v2, collapse = "|"))
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[1] NA
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
If you want speed, I'd use stringi. You don't seem to have any regex, just fixed patterns, so we can use a fixed stri_extract, and (since you don't mention what to do with multiple matches) I'll assume only extracting the first match is fine, giving us a little more speed with stri_extract_first_fixed.
It's probably not worth benchmarking on such a small example, but this should be quite fast.
library(stringi)
matches = lapply(v1, stri_extract_first_fixed, v2)
lapply(matches, function(x) x[!is.na(x)])
# [[1]]
# [1] "here_i_go"
#
# [[2]]
# [1] "going"
#
# [[3]]
# [1] "naieve_ass" "your_synapsis"
#
# [[4]]
# character(0)
#
# [[5]]
# [1] "going"
Thanks for sharing data, but next time please share it copy/pasteably. dput is nice for that. Here's a copy/pasteable input:
v1 = c(
"do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight" ,
"going_to_see_harry_sunday_happiness" ,
"this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake",
"why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp" ,
"like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5" ,
"my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj")
v2 = c("here_i_go", "going", "naieve_ass", "your_synapsis", "my_picture_with", "roll" )
Related
In R, when I run two functions in lapply, it runs the first function on the entire list, then run the second function on the list. Is it possible to force it runs both functions on the first element on the list before moving onto the second element?
I am using the function print and nchar for illustration purpose -- I wrote more complex functions that generate data.frame.
lapply(c("a","bb","cdd"), function(x) {
print(x)
nchar(x)
})
the output would be
[1] "a"
[1] "bb"
[1] "cdd"
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
I would like to have something like this:
[[1]]
[1] "a"
[1] 1
[[2]]
[1] "bb"
[1] 2
[[3]]
[1] "cdd"
[1] 3
is this possible?
Juan Antonio Roladan Diaz and cash2 both suggested using list, which kind of works:
lapply(c("a","bb","cdd"), function(x) {
list(x, nchar(x))
})
[[1]]
[[1]][[1]]
[1] "a"
[[1]][[2]]
[1] 1
[[2]]
[[2]][[1]]
[1] "bb"
[[2]][[2]]
[1] 2
[[3]]
[[3]][[1]]
[1] "cdd"
[[3]][[2]]
[1] 3
But it is a bit too messy.
using print gives a better result,
lapply(c("a","bb","cdd"), function(x) {
print(x)
print(nchar(x))
})
[1] "a"
[1] 1
[1] "bb"
[1] 2
[1] "cdd"
[1] 3
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
but is there a way to suppress nchar from being print out again?
invisible(lapply(c("a","bb","cdd"), function(x) { print(x); print(nchar(x)) }))
This happens because the function prints x, then returns nchar(x); the returned elements are put into a list by lapply and returned, and printed out on the REPL.
Replace nchar(x) with print(nchar(x)). Or, if you want the list returned, just return list(x, nchar(x)) from the inner function.
letter[2] is equivalent to '['(letters,i=2) , second argument is i.
What is the name of the first argument so the 2 following expressions would be equivalent ?
lapply(1:3,function(x){letters[x]})
lapply(1:3,`[`,param1 = letters) # param1 to be replaced with solution
For you to be able to define a function similar to the one above, you will have to pass two arguments to your function. The function [ does take various inputs. We can use Map instead of lapply to give it both the data where to extract from and the Indices to indicate the part of the data to be extracted:
Map("[",list(letters),1:3)
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
This is similar to what you have above. Hope this helps
You have to be could be more specific than "[", for instance:
lapply(1:3, `[.numeric_version`, x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"
(Not sure [.numeric_version is the most appropriate, though... I'm digging a bit more)
rlang::as_closure and purrr::as_mapper ,both based on rlang::as_function (see doc)
will both convert [ to a function with named parameters:
lapply(1:3, purrr::as_mapper(`[`), .x = letters)
lapply(1:3, rlang::as_closure(`[`), .x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"
I have 2 lists, I want to check if the second list in the first list, if yes, paste letters "a","b"... to each element in the first list
list1 <- list("Year","Age","Enrollment","SES","BOE")
list2 <- list("Year","Enrollment","SES")
I try to use lapply
text <- letters[1:length(list2)]
listText<- lapply(list1,function(i) ifelse(i %in% list2,paste(i,text[i],sep="^"),i))
I got wrong output
> listText
[[1]]
[1] "Year^NA"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^NA"
[[4]]
[1] "SES^NA"
[[5]]
[1] "BOE"
This is the output I want
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"
We can use match to find the index and then use it to subset the first list and paste the letters
i1 <- match(unlist(list2), unlist(list1))
list1[i1] <- paste(list1[i1], letters[seq(length(i1))], sep="^")
You just need change to :
text <- as.character(letters[1:length(list2)])
names(text) <- unlist(list2)
The result is :
> listText
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"
I have a string in R that looks like this:
"{[PP]}{[BGH]}{[AC]}{[ETL]}....{[D]}"
I want to convert it into a list so that:
List[[1]] = {[PP]}
List[[2]] = {[BGH]}
....
List[[N]] = {[D]}
If there were commas you could do strsplit but I want to keep the brackets and not get rid of them. Not sure how to do this in R
without regular expressions:
s <- "{[PP]}{[BGH]}{[AC]}{[ETL]}{[D]}"
as.list(paste("{", strsplit(s, "\\{")[[1]][-1], sep = ""))
[[1]]
[1] "{[PP]}"
[[2]]
[1] "{[BGH]}"
[[3]]
[1] "{[AC]}"
[[4]]
[1] "{[ETL]}"
[[5]]
[1] "{[D]}"
strsplit still works if you pass this regular expression (?<=})(?={) which constrains the position to split on:
strsplit(s, "(?<=})(?={)", perl = T)
# [[1]]
# [1] "{[PP]}" "{[BGH]}" "{[AC]}" "{[ETL]}" "{[D]}"
Or as #thelatemail suggested:
strsplit(s, "(?<=})", perl = T)
obligatory stringi answer:
library(stringi)
dat <- "{[PP]}{[BGH]}{[AC]}{[ETL]}{[more]{[D]}"
as.list(stri_match_all_regex(dat, "(\\{\\[[[:alpha:]]+\\]\\})")[[1]][,2])
## [[1]]
## [1] "{[PP]}"
##
## [[2]]
## [1] "{[BGH]}"
##
## [[3]]
## [1] "{[AC]}"
##
## [[4]]
## [1] "{[ETL]}"
##
## [[5]]
## [1] "{[D]}"
There is a convenient function in qdap for this i.e. bracketXtract
library(qdap)
setNames(as.list(bracketXtract(s, "curly", TRUE)), NULL)
#[[1]]
#[1] "{[PP]}"
#[[2]]
#[1] "{[BGH]}"
#[[3]]
#[1] "{[AC]}"
#[[4]]
#[1] "{[ETL]}"
#[[5]]
#[1] "{[D]}"
By default, with = FALSE. So without using with = TRUE, it will remove the bracket.
data
s <- "{[PP]}{[BGH]}{[AC]}{[ETL]}{[D]}"
After several operations on an igraph object (g), I have ended up with the "id" attribute becoming full of nested lists.
It looks like this:
head(V(g)$id)
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "http://www.parliament.uk/"
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[1] "http://www.businesslink.gov.uk/"
[[3]]
[[3]][[1]]
[[3]][[1]][[1]]
[1] "http://www.number10.gov.uk/"
... and so forth.
I need to 'unnest' this list so it becomes:
head(V(g)$id)
[1] "http://www.parliament.uk/" "http://www.businesslink.gov.uk/"
[3] "http://www.number10.gov.uk/" "http://www.ombudsman.org.uk/"
[5] "http://www.hm-treasury.gov.uk/" "http://data.gov.uk/"
The nested list is causing problems when igraph exports the object to a graphml file. It results in the "id" being assigned default labels (e.g. n0, n1, n2...).
I have tried several other questions, particularly this one. However, I cannot get it to work. It is really frustrating!
Are you just looking for unlist, perhaps?
L <- list(list(list("A")), list(list("B")))
L
# [[1]]
# [[1]][[1]]
# [[1]][[1]][[1]]
# [1] "A"
#
#
#
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] "B"
#
#
#
unlist(L)
# [1] "A" "B"