Check list in list in R - r

I have 2 lists, I want to check if the second list in the first list, if yes, paste letters "a","b"... to each element in the first list
list1 <- list("Year","Age","Enrollment","SES","BOE")
list2 <- list("Year","Enrollment","SES")
I try to use lapply
text <- letters[1:length(list2)]
listText<- lapply(list1,function(i) ifelse(i %in% list2,paste(i,text[i],sep="^"),i))
I got wrong output
> listText
[[1]]
[1] "Year^NA"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^NA"
[[4]]
[1] "SES^NA"
[[5]]
[1] "BOE"
This is the output I want
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"

We can use match to find the index and then use it to subset the first list and paste the letters
i1 <- match(unlist(list2), unlist(list1))
list1[i1] <- paste(list1[i1], letters[seq(length(i1))], sep="^")

You just need change to :
text <- as.character(letters[1:length(list2)])
names(text) <- unlist(list2)
The result is :
> listText
[[1]]
[1] "Year^a"
[[2]]
[1] "Age"
[[3]]
[1] "Enrollment^b"
[[4]]
[1] "SES^c"
[[5]]
[1] "BOE"

Related

find the maximum value in a list of vectors in R

I have a list that contains vectors of length 2, the first element of the vector denotes one data type, and the second - the second data type
[[1]]
[1] "51224.99" "0.879"
[[2]]
[1] "51224.50" "0.038"
[[3]]
[1] "51224.00" "0.038"
[[4]]
[1] "51223.50" "0.038"
[[5]]
[1] "51223.00" "0.062"
[[6]]
[1] "51222.50" "0.038"
[[7]]
[1] "51222.00" "0.038"
[[8]]
[1] "51221.86" "0.370"
[[9]]
[1] "51221.82" "0.015"
[[10]]
[1] "51221.50" "0.038"
[[11]]
[1] "51221.44" "2.100"
[[12]]
[1] "51221.39" "0.196"
[[13]]
[1] "51221.00" "0.038"
[[14]]
[1] "51220.50" "0.038"
[[15]]
[1] "51220.19" "0.292"
[[16]]
[1] "51220.00" "0.038"
[[17]]
[1] "51219.97" "0.012"
[[18]]
[1] "51219.62" "0.684"
[[19]]
[1] "51219.50" "0.038"
[[20]]
[1] "51219.02" "2.311"
I need to find the maximum value by the second element of the vector. That is, in the end result, I should get the following result:
[1] "51219.02" "2.311"
since the maximum second number in vectors is 2.311
Assuming your list is called yourList, you can do the follwoing:
secondValuesNumeric <- as.numeric(sapply(yourList,"[[",2))
maxIndex <- which.max(secondValuesNumeric)
result <- yourList[[maxIndex]]
You could just turn it into a dataframe! Here's how I would do it:
(first I make an example list to mimic yours):
ex <- purrr::map(seq(12), function(i) c(rnorm(1), rnorm(1)))
Then you can use purrr to turn it into a dataframe, and filter to filter to where the second value is the max of that column:
purrr::map_df(ex, function(x) data.frame(val1 = x[1], val2 = x[2])) %>%
dplyr::filter(val2 == max(val2))
You should be able to use this example^ by replacing ex with the name of your list.
Here is an option with rbind, extract the second column, convert to numeric, find the max index to subset the list
lst1[which.max(as.numeric(do.call(rbind, lst1)[,2]))]

getting words from the list of characters

I have a list of characters (hashtags), where some of elements includes more than one word.
It looks like this:
head(hashtags)
[[1]]
[1] "FutureofAccountechs"
[[2]]
[1] "internet" "Cornwall"
[[3]]
[1] "beer"
[[4]]
[1] NA
[[5]]
[1] "popsicle" "natural"
[[6]]
[1] "coffee" "ethical"
I need to extract individual words from this list. Accessing them accessing via element number:
> hashtags[[5]]
[1] "popsicle" "natural"
> hashtags[5][[1]]
[1] "popsicle" "natural"
Any ideas how I can "flatten" them, so they are
"FutureofAccountechs"
"internet"
"Cornwall"
"beer"
NA
"popsicle"
"natural"
"coffee"
"ethical"

R - lapply several functions in one lapply by elements

In R, when I run two functions in lapply, it runs the first function on the entire list, then run the second function on the list. Is it possible to force it runs both functions on the first element on the list before moving onto the second element?
I am using the function print and nchar for illustration purpose -- I wrote more complex functions that generate data.frame.
lapply(c("a","bb","cdd"), function(x) {
print(x)
nchar(x)
})
the output would be
[1] "a"
[1] "bb"
[1] "cdd"
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
I would like to have something like this:
[[1]]
[1] "a"
[1] 1
[[2]]
[1] "bb"
[1] 2
[[3]]
[1] "cdd"
[1] 3
is this possible?
Juan Antonio Roladan Diaz and cash2 both suggested using list, which kind of works:
lapply(c("a","bb","cdd"), function(x) {
list(x, nchar(x))
})
[[1]]
[[1]][[1]]
[1] "a"
[[1]][[2]]
[1] 1
[[2]]
[[2]][[1]]
[1] "bb"
[[2]][[2]]
[1] 2
[[3]]
[[3]][[1]]
[1] "cdd"
[[3]][[2]]
[1] 3
But it is a bit too messy.
using print gives a better result,
lapply(c("a","bb","cdd"), function(x) {
print(x)
print(nchar(x))
})
[1] "a"
[1] 1
[1] "bb"
[1] 2
[1] "cdd"
[1] 3
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
but is there a way to suppress nchar from being print out again?
invisible(lapply(c("a","bb","cdd"), function(x) { print(x); print(nchar(x)) }))
This happens because the function prints x, then returns nchar(x); the returned elements are put into a list by lapply and returned, and printed out on the REPL.
Replace nchar(x) with print(nchar(x)). Or, if you want the list returned, just return list(x, nchar(x)) from the inner function.

R - Grepl vector over vector

I have a vector of character strings (v1) like so:
> head(v1)
[1] "do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight"
[2] "going_to_see_harry_sunday_happiness"
[3] "this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake"
[4] "why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp"
[5] "like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5"
[6] "my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj"
And another vector of character strings (v2) like so:
> head(v2)
[1] "here_i_go" "going" "naieve_ass" "your_synapsis" "my_picture_with" "roll"
What is the quickest way that I can return a list of vectors where each list item represents each vector item in v1 and each vector item is a regular expression match where an item in v2 appeared in that v1 item, like so:
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
I'd like to leave another option with stri_extract_all_regex() in the stringi package. You can create your regular expression directly from v2 and use it in pattern.
library(stringi)
stri_extract_all_regex(str = v1, pattern = paste(v2, collapse = "|"))
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[1] NA
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
If you want speed, I'd use stringi. You don't seem to have any regex, just fixed patterns, so we can use a fixed stri_extract, and (since you don't mention what to do with multiple matches) I'll assume only extracting the first match is fine, giving us a little more speed with stri_extract_first_fixed.
It's probably not worth benchmarking on such a small example, but this should be quite fast.
library(stringi)
matches = lapply(v1, stri_extract_first_fixed, v2)
lapply(matches, function(x) x[!is.na(x)])
# [[1]]
# [1] "here_i_go"
#
# [[2]]
# [1] "going"
#
# [[3]]
# [1] "naieve_ass" "your_synapsis"
#
# [[4]]
# character(0)
#
# [[5]]
# [1] "going"
Thanks for sharing data, but next time please share it copy/pasteably. dput is nice for that. Here's a copy/pasteable input:
v1 = c(
"do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight" ,
"going_to_see_harry_sunday_happiness" ,
"this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake",
"why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp" ,
"like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5" ,
"my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj")
v2 = c("here_i_go", "going", "naieve_ass", "your_synapsis", "my_picture_with", "roll" )

R: Multidimensional array index assignment

Consider the following array assignments:
temp=array(list(),2)
temp[[2]][[2]]=c("a","b")
temp[[1]][[2]]="c"
This produces the following result:
temp
[[1]]
[1] NA "c"
[[2]]
[[2]][[1]]
NULL
[[2]][[2]]
[1] "a" "b"
Instead, I want the result to be:
temp
[[1]]
[[1]][[1]]
NULL
[[1]][[2]]
[1] "c"
[[2]]
[[2]][[1]]
NULL
[[2]][[2]]
[1] "a" "b"
How do I make the assignment so that the former is produced rather than the latter?
You can initialize the list(s) with replicate instead of array. Lists and arrays behave differently
x <- replicate(2, list())
x[[1]][[2]] <- "c"
x[[2]][[2]] <- c("a", "b")
x
Note:
is.array(x)
# [1] FALSE
sapply(x, is.array)
# [1] FALSE FALSE

Resources