I have a list of a character list data. The type of data and example is as follows,
typeof(data)
[1] "list"
print(data[1:3])
$...
[1] ",75=20140102,268=18,"
[2] "0,83=337407,"
[3] "0,83=337408,"
[4] "0,83=3374779,"
$...
[1] ",75=20140122,268=336,"
[2] "3,273=143000000,1020=50,"
[3] "1,270=422,271=1,273=143000000,"
[4] "0,83=337427,107=ZCH4,"
[5] "1020=58,"
$...
[1] ",52=20140102143000085,75=20140102,268=17,"
[2] "0,83=33744562,107=ZCH4,"
for each element in the list,i want to combine data[[i]][1] and the rest of its elements. I am doing it with a loop now, it works, but very slow.
Here is my code,
My current code is:
for (j in 1:length(data)){
for (k in 2:length(data[[j]])){
table[j+k,1]<- paste0(data[[j]][1], data[[j]][k]) #record every combination
} }
Since the data is pretty large, the loop runs very slow.
Desired results:
[1] ",75=20140102,268=18, 0,83=337407," "
[2] ",75=20140102,268=18, 0,83=337408,"
[3] ",75=20140102,268=18, 0,83=3374779,"
[4] ",75=20140122,268=336, 3,273=143000000,1020=50,"
[5] ",75=20140122,268=336, 1,270=422,271=1,273=143000000,"
[6] ",75=20140122,268=336, 0,83=337427,107=ZCH4,"
[7] ",75=20140122,268=336, 1020=58,"
[8] ",52=20140102143000085,75=20140102,268=17, 0,83=33744562,107=ZCH4,"
Thank you so much if someone can speed up the programming.
lapply(dat, function(x) paste0(x[1], x[2:length(x)]))
will do that quicker.
Example:
test <- list(a = list("test", "again", "meep"), b = list("and", "again", "doot"))
> test
$a
$a[[1]]
[1] "test"
$a[[2]]
[1] "again"
$a[[3]]
[1] "meep"
$b
$b[[1]]
[1] "and"
$b[[2]]
[1] "again"
$b[[3]]
[1] "doot"
> lapply(test, function(x) paste0(x[1], x[2:length(x)]))
$a
[1] "testagain" "testmeep"
$b
[1] "andagain" "anddoot"
Related
I have a list that contains vectors of length 2, the first element of the vector denotes one data type, and the second - the second data type
[[1]]
[1] "51224.99" "0.879"
[[2]]
[1] "51224.50" "0.038"
[[3]]
[1] "51224.00" "0.038"
[[4]]
[1] "51223.50" "0.038"
[[5]]
[1] "51223.00" "0.062"
[[6]]
[1] "51222.50" "0.038"
[[7]]
[1] "51222.00" "0.038"
[[8]]
[1] "51221.86" "0.370"
[[9]]
[1] "51221.82" "0.015"
[[10]]
[1] "51221.50" "0.038"
[[11]]
[1] "51221.44" "2.100"
[[12]]
[1] "51221.39" "0.196"
[[13]]
[1] "51221.00" "0.038"
[[14]]
[1] "51220.50" "0.038"
[[15]]
[1] "51220.19" "0.292"
[[16]]
[1] "51220.00" "0.038"
[[17]]
[1] "51219.97" "0.012"
[[18]]
[1] "51219.62" "0.684"
[[19]]
[1] "51219.50" "0.038"
[[20]]
[1] "51219.02" "2.311"
I need to find the maximum value by the second element of the vector. That is, in the end result, I should get the following result:
[1] "51219.02" "2.311"
since the maximum second number in vectors is 2.311
Assuming your list is called yourList, you can do the follwoing:
secondValuesNumeric <- as.numeric(sapply(yourList,"[[",2))
maxIndex <- which.max(secondValuesNumeric)
result <- yourList[[maxIndex]]
You could just turn it into a dataframe! Here's how I would do it:
(first I make an example list to mimic yours):
ex <- purrr::map(seq(12), function(i) c(rnorm(1), rnorm(1)))
Then you can use purrr to turn it into a dataframe, and filter to filter to where the second value is the max of that column:
purrr::map_df(ex, function(x) data.frame(val1 = x[1], val2 = x[2])) %>%
dplyr::filter(val2 == max(val2))
You should be able to use this example^ by replacing ex with the name of your list.
Here is an option with rbind, extract the second column, convert to numeric, find the max index to subset the list
lst1[which.max(as.numeric(do.call(rbind, lst1)[,2]))]
In R, when I run two functions in lapply, it runs the first function on the entire list, then run the second function on the list. Is it possible to force it runs both functions on the first element on the list before moving onto the second element?
I am using the function print and nchar for illustration purpose -- I wrote more complex functions that generate data.frame.
lapply(c("a","bb","cdd"), function(x) {
print(x)
nchar(x)
})
the output would be
[1] "a"
[1] "bb"
[1] "cdd"
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
I would like to have something like this:
[[1]]
[1] "a"
[1] 1
[[2]]
[1] "bb"
[1] 2
[[3]]
[1] "cdd"
[1] 3
is this possible?
Juan Antonio Roladan Diaz and cash2 both suggested using list, which kind of works:
lapply(c("a","bb","cdd"), function(x) {
list(x, nchar(x))
})
[[1]]
[[1]][[1]]
[1] "a"
[[1]][[2]]
[1] 1
[[2]]
[[2]][[1]]
[1] "bb"
[[2]][[2]]
[1] 2
[[3]]
[[3]][[1]]
[1] "cdd"
[[3]][[2]]
[1] 3
But it is a bit too messy.
using print gives a better result,
lapply(c("a","bb","cdd"), function(x) {
print(x)
print(nchar(x))
})
[1] "a"
[1] 1
[1] "bb"
[1] 2
[1] "cdd"
[1] 3
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
but is there a way to suppress nchar from being print out again?
invisible(lapply(c("a","bb","cdd"), function(x) { print(x); print(nchar(x)) }))
This happens because the function prints x, then returns nchar(x); the returned elements are put into a list by lapply and returned, and printed out on the REPL.
Replace nchar(x) with print(nchar(x)). Or, if you want the list returned, just return list(x, nchar(x)) from the inner function.
I have a vector of character strings (v1) like so:
> head(v1)
[1] "do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight"
[2] "going_to_see_harry_sunday_happiness"
[3] "this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake"
[4] "why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp"
[5] "like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5"
[6] "my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj"
And another vector of character strings (v2) like so:
> head(v2)
[1] "here_i_go" "going" "naieve_ass" "your_synapsis" "my_picture_with" "roll"
What is the quickest way that I can return a list of vectors where each list item represents each vector item in v1 and each vector item is a regular expression match where an item in v2 appeared in that v1 item, like so:
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
I'd like to leave another option with stri_extract_all_regex() in the stringi package. You can create your regular expression directly from v2 and use it in pattern.
library(stringi)
stri_extract_all_regex(str = v1, pattern = paste(v2, collapse = "|"))
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[1] NA
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
If you want speed, I'd use stringi. You don't seem to have any regex, just fixed patterns, so we can use a fixed stri_extract, and (since you don't mention what to do with multiple matches) I'll assume only extracting the first match is fine, giving us a little more speed with stri_extract_first_fixed.
It's probably not worth benchmarking on such a small example, but this should be quite fast.
library(stringi)
matches = lapply(v1, stri_extract_first_fixed, v2)
lapply(matches, function(x) x[!is.na(x)])
# [[1]]
# [1] "here_i_go"
#
# [[2]]
# [1] "going"
#
# [[3]]
# [1] "naieve_ass" "your_synapsis"
#
# [[4]]
# character(0)
#
# [[5]]
# [1] "going"
Thanks for sharing data, but next time please share it copy/pasteably. dput is nice for that. Here's a copy/pasteable input:
v1 = c(
"do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight" ,
"going_to_see_harry_sunday_happiness" ,
"this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake",
"why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp" ,
"like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5" ,
"my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj")
v2 = c("here_i_go", "going", "naieve_ass", "your_synapsis", "my_picture_with", "roll" )
I have 2 lists, I want to check if the second list in the first list, if yes, paste letters "a","b"... to each element in the first list
list1 <- list("Year","Age","Enrollment","SES","BOE")
list2 <- list("Year","Enrollment","SES")
I try to use lapply
text <- letters[1:length(list2)]
listText<- lapply(list1,function(i) ifelse(i %in% list2,paste(i,text[i],sep="^"),i))
I got wrong output
> listText
[[1]]
[1] "Year^NA"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^NA"
[[4]]
[1] "SES^NA"
[[5]]
[1] "BOE"
This is the output I want
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"
We can use match to find the index and then use it to subset the first list and paste the letters
i1 <- match(unlist(list2), unlist(list1))
list1[i1] <- paste(list1[i1], letters[seq(length(i1))], sep="^")
You just need change to :
text <- as.character(letters[1:length(list2)])
names(text) <- unlist(list2)
The result is :
> listText
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"
Suppose I've got the following list, which represents a directory structure :
> pages <- list("about.Rmd", "index.Rmd", c("stats", "index.Rmd"), c("stats", "substats", "index.Rmd"))
> pages
[[1]]
[1] "about.Rmd"
[[2]]
[1] "index.Rmd"
[[3]]
[1] "stats" "index.Rmd"
[[4]]
[1] "stats" "substats" "index.Rmd"
I'd like to create a recursive version of this list, something that would look like this :
> rpages <- list("about.Rmd", "index.Rmd", stats=list("index.Rmd", substats=list("index.Rmd")))
> rpages
[[1]]
[1] "about.Rmd"
[[2]]
[1] "index.Rmd"
$stats
$stats[[1]]
[1] "index.Rmd"
$stats$substats
$stats$substats[[1]]
[1] "index.Rmd"
I've tried different ways to do it, but I fear I'm now lost in a sea of lapply and sapply.
Thanks in advance for any hint.
I think this does it:
build_list = function(item, res) {
if (length(item) > 1) {
res[[item[1]]] = build_list(tail(item, -1), res[[item[1]]])
} else {
res = c(res, list(item))
}
res
}
res = list()
for (i in seq_along(pages)) res = build_list(pages[[i]], res)
res
#[[1]]
#[1] "about.Rmd"
#
#[[2]]
#[1] "index.Rmd"
#
#$stats
#$stats[[1]]
#[1] "index.Rmd"
#
#$stats$substats
#$stats$substats[[1]]
#[1] "index.Rmd"