Select items of list of strings that contain certain characters - r

I' ve got a list of names.
l1 <- rep(paste("Session", 1:6, sep=""), each=4)
l2 <- rep(paste("ID", 1:4, sep=""), 6)
list <- paste(l1, l2, sep="")
With real data the list is far more complicated ;)
How do create a new list from this list, that includes only those items from Session 1-4?
In dplyr there is the >>select(contains("Session1"|"Session2"))<< which is used to select variables in data.frames.
I am looking for something similar to use for lists.

Is this what you want ?
list[grepl("Session(1|2|3|4)ID", list)]
[1] "Session1ID1" "Session1ID2" "Session1ID3" "Session1ID4" "Session2ID1" "Session2ID2" "Session2ID3" "Session2ID4"
[9] "Session3ID1" "Session3ID2" "Session3ID3" "Session3ID4" "Session4ID1" "Session4ID2" "Session4ID3" "Session4ID4"

Here is another option with regex lookarounds to match "Session" followed by numbers 1-4 and not followed by any number ((?![0-9]))
grep("Session[1-4](?![0-9])", c(list, "Session10ID"), value = TRUE, perl = TRUE)
#[1] "Session1ID1" "Session1ID2" "Session1ID3" "Session1ID4" "Session2ID1" "Session2ID2" "Session2ID3" "Session2ID4" "Session3ID1" "Session3ID2"
#[11] "Session3ID3" "Session3ID4" "Session4ID1" "Session4ID2" "Session4ID3" "Session4ID4"

Related

How do you change the way unlist use.names structures the names it puts together

I've got a set of nested lists, such as this:
setoflists <- list(first.list = list(letter.a=1, letter.b=2, letter.c=3),
second.list = list(letter.d=4, letter.e=5, letter.f=6))
I want to flatten it to a single list. However, I want the names of the objects in the list to have the sublist first, then the top list, separated by an underscore "_". One reason is my list names already have lots of fullstops (.) in them.
I can flatten the list with unlist like so:
newlist <- unlist(setoflists, use.names = T, recursive = F)
but the names produced have top list, then sublist, separated by "."
> names(newlist)
[1] "first.list.letter.a" "first.list.letter.b" "first.list.letter.c" "second.list.letter.d" "second.list.letter.e"
[6] "second.list.letter.f"
The format I want is:
letter.a_first.list
letter.b_first.list ...
I don't think there's a way to control this using the current approach as it's calling make.names() as part of unlist(). You probably need to rename afterwards:
names(newlist) <- unlist(Map(paste, lapply(setoflists, names), names(setoflists), sep = "_"))
names(newlist)
[1] "letter.a_first.list" "letter.b_first.list" "letter.c_first.list" "letter.d_second.list" "letter.e_second.list"
[6] "letter.f_second.list"
One solution with regex, following the first idea of #ritchie-sacramento:
setoflists <- list(firstlist = list(letter.a=1, letter.b=2, letter.c=3),
secondlist = list(letter.d=4, letter.e=5, letter.f=6))
newlist <- unlist(setoflists, use.names = T, recursive = F)
names(newlist) <- sub("(.*)(?:[.])(letter.+)", "\\2_\\1", names(newlist))
names(newlist)
Output:
> names(newlist)
[1] "letter.a_first.list" "letter.b_first.list" "letter.c_first.list" "letter.d_secondlist" "letter.e_secondlist" "letter.f_secondlist"

Find strings where the first half matches the second

I have a list of IP address pairs separated by "::".
ip_pairs <- c("104.124.199.136::192.168.1.67", "104.124.199.136::192.168.137.174", "192.168.1.67::104.124.199.136", "192.168.137.174::104.124.199.136")
As you can see, the third and fourth elements of the vector are the same as the first two, but reversed (my actual problem is to find all unique pairings of IPs, so the solution would drop the pair B::A if A::B is already present. This could be solved using stringr or regex, I'm guessing.
One option:
library(stringr)
split_function = function(x) {
x = sort(x)
paste(x, collapse="::")
}
pairs = str_split(ip_pairs, "::")
unique(sapply(pairs, split_function))
[1] "104.124.199.136::192.168.1.67" "104.124.199.136::192.168.137.174"
Use read.table to create a two column data frame from the pairs, sort each row and find the duplicates using duplicated. Then extract out the non-duplicates. No packages are used.
DF <- read.table(text = ip_pairs, sep = ":")[-2]
ip_pairs[! duplicated(t(apply(DF, 1, sort)))]
## [1] "192.168.1.67::104.124.199.136" "192.168.137.174::104.124.199.136"

In R, how do I split each string in a vector to return everything before the Nth instance of a character?

Example:
df <- data.frame(Name = c("J*120_234_458_28", "Z*23_205_a834_306", "H*_39_004_204_99_04902"))
I would like to be able to select everything before the third underscore for each row in the dataframe. I understand how to split the string apart:
df$New <- sapply(strsplit((df$Name),"_"), `[`)
But this places a list in each row. I've thus far been unable to figure out how to use sapply to unlist() each row of df$New select the first N elements of the list to paste/collapse them back together. Because the length of each subelement can be distinct, and the number of subelements can also be distinct, I haven't been able to figure out an alternative way of getting this info.
We specify the 'n', after splitting the character column by '_', extract the n-1 first components
n <- 4
lapply(strsplit(as.character(df$Name), "_"), `[`, seq_len(n - 1))
If we need to paste it together, can use anonymous function call (function(x)) after looping over the list with lapply/sapply, get the first n elements with head and paste them together`
sapply(strsplit(as.character(df$Name), "_"), function(x)
paste(head(x, n - 1), collapse="_"))
#[1] "J*120_234_458" "Z*23_205_a834" "H*_39_004"
Or use regex method
sub("^([^_]+_[^_]+_[^_]+)_.*", "\\1", df$Name)
#[1] "J*120_234_458" "Z*23_205_a834" "H*_39_004"
Or if the 'n' is really large, then
pat <- sprintf("^([^_]+){%d}[^_]+).*", n-1)
sub(pat, "\\1", df$Name)
Or
sub("^(([^_]+_){2}[^_]+)_.*", "\\1", df$Name)
#[1] "J*120_234_458" "Z*23_205_a834" "H*_39_004"

How to convert several characters to vectors in R?

I am struggling with converting several characters to vectors and making them as a list in R.
The converting rule is as follows:
Assign a number to each character. ex. A=1, B=2, C=3,...
Make a vector when the length of characters is ">=2". ex. AB = c(1,2), ABC = c(1,2,3)
Make lists containing several vectors.
For example, suppose that there is ex object with three components. For each component, I want to make it to list objects list1, list2, and list3.
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
# 3 lists to be returned from ex object
list1 = "list(1,2,3,4)" # from (A,B,C,D)
list2 = "list(c(1,2), c(2,3), c(3,4))" # from (AB,BC,CD)
list3 = "list(c(1,2), c(3))" # from (AB,C)
Please let me know a good R function to solve the example above.
* The minor change is reflected.
lookUpTable = as.numeric(1:4) #map numbers to their respective strings
names(lookUpTable) = LETTERS[1:4]
step1<- #get rid of parentheses and split by ",".
strsplit(gsub("[()]", "", ex), ",")
result<- #split again to make things like "AB" into "A", "B", also convert the strings to numbers acc. to lookUpTable
lapply(step1, function(x){ lapply(strsplit(x, ""), function(u) unname(lookUpTable[u])) })
# assign to the global environment.
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list", x), result[[x]], envir = globalenv()); NULL})
)
# get it as strings:
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list_string", x), capture.output(dput(result[[x]])), envir = globalenv()); NULL})
)
data:
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
tips and tricks:
I make use of regular expressions in gsub (and strsplit). Learn regex!
I made a lookUpTable that maps the individual strings to numbers. Make sure your lookUpTable is set up analogously.
Have a look at apply functions like in that case ?lapply.
lastly I assign the result to the global environment. I dont recommend this step but its what you have requested.

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Resources