Convert a list of characters partially to int - r

I have a list of data:
$nPerm
[1] "1000"
$minGSSize
[1] "10"
$maxGSSize
[1] "100"
$by
[1] "DOSE"
$seed
[1] "TRUE"
This list is supposed to be flexible, so these values could be different and could be something else.
All the data in this list is in character class, the numbers and words also. I would like to know if it is possible to convert only the numbers to numeric, but leave the others as characters/strings.
Thank you in advance!

L <- list(a="1000", b="DOSE", c="99")
type.convert(L, as.is = TRUE)
# $a
# [1] 1000
# $b
# [1] "DOSE"
# $c
# [1] 99

Evan's answer is very neat, just for completeness also a {purrr} option:
L <- list(a="1000", b="DOSE", c="99")
L |> purrr::map(~ifelse(stringr::str_detect(.x,"^[:digit:]+$"), as.numeric(.x), .x))

Related

Sort matrix colnames to match element order in a list

I have a particular list such as:
my_list<-list("Cluster18904", "Cluster6294", "Cluster17424", "Cluster26257",
"Cluster27053", "Cluster2905", "Cluster16096", "Cluster14552")
which looks like:
my_list
[[1]]
[1] "Cluster18904"
[[2]]
[1] "Cluster6294"
[[3]]
[1] "Cluster17424"
[[4]]
[1] "Cluster26257"
[[5]]
[1] "Cluster27053"
[[6]]
[1] "Cluster2905"
[[7]]
[1] "Cluster16096"
[[8]]
[1] "Cluster14552"
and I have a matrix with the same colnames but I'm looking for a solution in order to order the column of the matrix to match the same order as in my_list
I tried:
as.data.frame(matrix)[,my_list]
But I get :
Error in .subset(x, j) : 'list' incorrect index type
We need to unlist
as.data.frame(matrix)[, unlist(my_list)]
if there are column names not matching, then use intersect
dat1 <- as.data.frame(matrix)
nm1 <- intersect(names(dat1), unlist(my_list))
dat1[nm1]
Another base R option
as.data.frame(matrix)[simplify2array(my_list)]
This will sort the column names according to the list names, by first converting the list names to a vector of strings.
matrix[,match(colnames(matrix),as.character(my_list))]

R - Grepl vector over vector

I have a vector of character strings (v1) like so:
> head(v1)
[1] "do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight"
[2] "going_to_see_harry_sunday_happiness"
[3] "this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake"
[4] "why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp"
[5] "like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5"
[6] "my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj"
And another vector of character strings (v2) like so:
> head(v2)
[1] "here_i_go" "going" "naieve_ass" "your_synapsis" "my_picture_with" "roll"
What is the quickest way that I can return a list of vectors where each list item represents each vector item in v1 and each vector item is a regular expression match where an item in v2 appeared in that v1 item, like so:
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
I'd like to leave another option with stri_extract_all_regex() in the stringi package. You can create your regular expression directly from v2 and use it in pattern.
library(stringi)
stri_extract_all_regex(str = v1, pattern = paste(v2, collapse = "|"))
[[1]]
[1] "here_i_go"
[[2]]
[1] "going"
[[3]]
[1] "naieve_ass" "your_synapsis"
[[4]]
[1] NA
[[5]]
[1] "going"
[[6]]
[1] "my_picture_with"
If you want speed, I'd use stringi. You don't seem to have any regex, just fixed patterns, so we can use a fixed stri_extract, and (since you don't mention what to do with multiple matches) I'll assume only extracting the first match is fine, giving us a little more speed with stri_extract_first_fixed.
It's probably not worth benchmarking on such a small example, but this should be quite fast.
library(stringi)
matches = lapply(v1, stri_extract_first_fixed, v2)
lapply(matches, function(x) x[!is.na(x)])
# [[1]]
# [1] "here_i_go"
#
# [[2]]
# [1] "going"
#
# [[3]]
# [1] "naieve_ass" "your_synapsis"
#
# [[4]]
# character(0)
#
# [[5]]
# [1] "going"
Thanks for sharing data, but next time please share it copy/pasteably. dput is nice for that. Here's a copy/pasteable input:
v1 = c(
"do_i_need_to_even_say_it_do_i_well_here_i_go_anyways_chris_cornell_in_chicago_tonight" ,
"going_to_see_harry_sunday_happiness" ,
"this_motha_fucka_stay_solid_foh_with_your_naieve_ass_mentality_your_synapsis_are_lacking_read_a_fucking_book_for_christ_sake",
"why_twitter_will_soon_become_obsolete_http_www.imediaconnection.com_content_23465_asp" ,
"like_i_said_my_back_still_fucking_hurts_and_im_going_to_complain_about_it_like_no_ones_business_http_tumblr.com_x6n25amd5" ,
"my_picture_with_kris_karmada_is_gone_forever_its_not_in_my_comments_on_my_mysapce_or_on_my_http_tumblr.com_xzg1wy4jj")
v2 = c("here_i_go", "going", "naieve_ass", "your_synapsis", "my_picture_with", "roll" )

What's the name of the first argument of `[`?

letter[2] is equivalent to '['(letters,i=2) , second argument is i.
What is the name of the first argument so the 2 following expressions would be equivalent ?
lapply(1:3,function(x){letters[x]})
lapply(1:3,`[`,param1 = letters) # param1 to be replaced with solution
For you to be able to define a function similar to the one above, you will have to pass two arguments to your function. The function [ does take various inputs. We can use Map instead of lapply to give it both the data where to extract from and the Indices to indicate the part of the data to be extracted:
Map("[",list(letters),1:3)
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
This is similar to what you have above. Hope this helps
You have to be could be more specific than "[", for instance:
lapply(1:3, `[.numeric_version`, x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"
(Not sure [.numeric_version is the most appropriate, though... I'm digging a bit more)
rlang::as_closure and purrr::as_mapper ,both based on rlang::as_function (see doc)
will both convert [ to a function with named parameters:
lapply(1:3, purrr::as_mapper(`[`), .x = letters)
lapply(1:3, rlang::as_closure(`[`), .x = letters)
# [[1]]
# [1] "a"
#
# [[2]]
# [1] "b"
#
# [[3]]
# [1] "c"

R-programming: How can I change the variable names when I combine lists?

I have a data in this format:
x = c(list(a=1,b=2),list(a=5,b=6))
How can I change it to the following format?
x = list(a=1,b=2,a1=5,b1=6)
I am aware that I can achieve the above by using
names(x)[3:4]=c('a1','b1')
but it isn't effective as the the length of each lists vary in the data set that I have.
We can use make.unique and it works for all cases without doing any conversion
names(x) <- make.unique(names(x), sep="")
names(x)
#[1] "a" "b" "a1" "b1"
How about this...
as.list(as.data.frame(x))
$a
[1] 1
$b
[1] 2
$a.1
[1] 5
$b.1
[1] 6

R: lapply function over a list (selecting/sub-setting)

I am having a difficult time to "sub-setting" a list.
For example,
test <- data.frame(x = c("5353-66", "55-110-4000","6524-533", "62410-165", "653-520-2410"))
test$x <- as.character(test$x)
strsplit(test$x, "-")
strsplit gives me a list as below:
[[1]]
[1] "5353" "66"
[[2]]
[1] "55" "110" "4000"
[[3]]
[1] "6524" "533"
[[4]]
[1] "62410" "165"
[[5]]
[1] "653" "520" "2410"
When I run lapply(strsplit(test$x, "-"), "[[", 1), it gives me the first character string from each component of the list as below:
[[1]]
[1] "5353"
[[2]]
[1] "55"
[[3]]
[1] "6524"
[[4]]
[1] "62410"
[[5]]
[1] "653"
Then... How do I select entire [[1]] and [[2]] and [[3]]... separately?
For example, I want to assign test$y[1] as c("5353", "66") and test$y[2] as c("55" , "110" , "4000") and so on.
test$y <- lapply(strsplit(test$x, "-"), "[", 1)
Above line gave me the same result.
While it can get messy it's also fairly easy to do. You were on the right track but adding an unlist() and using strsplit() with the lapply() will get you what you want.
test$y <- lapply(1:length(test$x),function(i) unlist(strsplit(test$x[[i]],"-")))
test$y[[1]]
[1] "5353" "66"
This is where the magic of sapply comes in handy --
test <- data.frame(x = c("5353-66", "55-110-4000","6524-533", "62410-165", "653-520-2410"))
test$x <- as.character(test$x)
sapply(test$x,strsplit,'-')
$`5353-66`
[1] "5353" "66"
$`55-110-4000`
[1] "55" "110" "4000"
$`6524-533`
[1] "6524" "533"
$`62410-165`
[1] "62410" "165"
$`653-520-2410`
[1] "653" "520" "2410"
What you do with the data from here is up to you. Because your data is ragged, i.e., it will not fit into a rectangular matrix or data frame that needs a fixed number of cells per row, you should keep the data as a list. Data frames in fact are lists, so many of the data frame functions work on them as well.
If you must have a data frame, you can add on NAs for missing cells and then convert it back to a data frame in wide format:
out_list <- sapply(test$x,strsplit,'-')
max_length <- max(sapply(out_list,length))
out_list <- lapply(out_list, function(x) {
if(length(x)<max_length) {
x <- c(x,rep(NA,times=max_length-length(x)))
}
return(x)
})
out_data <- as.data.frame(out_list)
X5353.66 X55.110.4000 X6524.533 X62410.165 X653.520.2410
1 5353 55 6524 62410 653
2 66 110 533 165 520
3 <NA> 4000 <NA> <NA> 2410

Resources