R Removing an element of a list element - r

This is more a general question on the behavior of lists in R, but the specific problem is:
I have a list of groups of words which I'm trying to manually remove specific words for - where no word is mentioned twice.
Currently, I'm using this method
l = strsplit(c("a b", "c d"), " ")
> l
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "d"
# remove the value "d"
l = lapply(l, function(x) { x[x != "d"] })
> l
[[1]]
[1] "a" "b"
[[2]]
[1] "c"
Is there any sort of built in list indexing method that would be preferable to use? I feel like I should just be able to parse the list without using lapply. If not, is it possible that someone could explain why this is the case?
Thanks

You need to go through each element of the list and check if the vector contains d to filter/remove it.
One of the reason is that a list can contains various type of data (functions, data.frame, numeric, character, boolean, other lists, class) so there can't be vectorized operations (which are - as suggests the name - for vectors).
What you do is to filter you filter on the front end - eg when you have the list. It could be preferable to filter in the back end your vector, eg before obtaining the list:
l = strsplit(gsub('d','',c("a b", "c d")), " ")
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "c"
Some alternative solution for a front end filtering:
lapply(l, grep, pattern='[^d]', value=T)

Related

How to access all sub list elements in R at once? [duplicate]

This question already has answers here:
Select first element of nested list
(5 answers)
R list get first item of each element
(2 answers)
Closed 3 years ago.
I have a splitted string of a vector like
df <- c("Test A:No1", "Test B:No2")
l <- str_split(df, ":")
l
which returns me
[[1]]
[1] "Test A" "No1"
[[2]]
[1] "Test B" "No2"
Now I am interested in accessing all first elements and all last elements independently or create a vector like
[1] "Test A" "Test B"
and
[1] "No1" "No2"
I tried several types of single and double brackets, with and without commas, but l[[x]][1] or l[[x]][2] give me only the list element x.
How can I access all elements at once (e.g. l[[]][1] )?
You may use sapply.
sapply(l, `[`, 1)
# [1] "Test A" "Test B"
sapply(l, `[`, 2)
# [1] "No1" "No2"
Explanation: In R quite everything is a function. Also the parentheses `[` actually are functions. Considering following example makes clear why the sapply above works.
Example
Consider this vector
x <- c("A", "B")
Whey we're doing
x[1]
# [1] "A"
x[2]
# [2] "B"
we're actually applying the special form of the underlying prefix-form of the `[` function:
`[`(x, 1)
# [1] "A"
`[`(x, 2)
# [1] "B"
maybe using unlist and lapply can get the work done.
df <- c("Test A:No1", "Test B:No2")
l <- str_split(df, ":")
> unlist(lapply(l,function(x) x[1]))
[1] "Test A" "Test B"
> unlist(lapply(l,function(x) x[length(x)]))
[1] "No1" "No2"

Searching for specified pattern in R list

I have a dataset stored as a list DataList
[[1]]
[1] a
[2] f
[3] e
[4] a
[[2]]
[1] f
[2] f
[3] e
I am trying to create a function Getfrequence which return the frequence of a given pattern in the list DataList
GetFrequence<- function(pattern, DataList)
{
freq= 0
i = 1
while (i<= List.length())
{
if (.....)
freq= freq + 1
}
return freq
}
My question is how can I search if the given pattern exists in the list?
I assume that with pattern, you mean the different elements in your list. Then something like this might be helpful?
First, let us create a list roughly similar to the one you have provided above:
a <- list(letters[1:3], letters[1:2], letters[1:5])
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "a" "b"
[[3]]
[1] "a" "b" "c" "d" "e"
Now, to get the frequency of each items across the whole list, we can unlist the list and stack everything into one vector. Once we have a simple vector left, we can use table.
table(unlist(a))
a b c d e
3 3 2 1 1
Note that you may have to use unlist several times, depending on your actual list-structure. That is, if you have a list of lists, it might be necessary to adjust the code somewhat. In that case, please post str(your_list).

R: trim consecutive trailing and leading special characters from set of strings

I have a list of character vectors, all equal lengths. Example data:
> a = list('**aaa', 'bb*bb', 'cccc*')
> a = sapply(a, strsplit, '')
> a
[[1]]
[1] "*" "*" "a" "a" "a"
[[2]]
[1] "b" "b" "*" "b" "b"
[[3]]
[1] "c" "c" "c" "c" "*"
I would like to identify the indices of all leading and trailing consecutive occurrences of the character *. Then I would like to remove these indices from all three vectors in the list. By trailing and leading consecutive characters I mean e.g. either only a single occurrence as in the third one (cccc*) or multiple consecutive ones as in the first one (**aaa).
After the removal, all three character vectors should still have the same length.
So the first two and the last character should be removed from all three vectors.
[[1]]
[1] "a" "a"
[[2]]
[1] "*" "b"
[[3]]
[1] "c" "c"
Note that the second vector of the desired result will still have a leading *, which, however became the first character after the operation, so it should be in.
I tried using which to identify the indices (sapply(a, function(x)which(x=='*'))) but this would still require some code to detect the trailing ones.
Any ideas for a simple solution?
I would replace the lead and lag stars with NA:
aa <- lapply(setNames(a,seq_along(a)), function(x) {
star = x=="*"
toNA = cumsum(!star) == 0 | rev(cumsum(rev(!star))) == 0
replace(x, toNA, NA)
})
Store in a data.frame:
DF <- do.call(data.frame, c(aa, list(stringsAsFactors=FALSE)) )
Omit all rows with NA:
res <- na.omit(DF)
# X1 X2 X3
# 3 a * c
# 4 a b c
If you hate data.frames and want your list back: lapply(res,I) or c(unclass(res)), which gives
$X1
[1] "a" "a"
$X2
[1] "*" "b"
$X3
[1] "c" "c"
First of, like Richard Scriven asked in his comment to your question, your output is not the same as the thing you asked for. You ask for removal of leading and trailing characters, but your given ideal output is just the 3rd and 4th element of the character lists.
This would be easily achievable by something like
a <- list('**aaa', 'bb*bb', 'cccc*')
alist = sapply(a, strsplit, '')
lapply(alist, function(x) x[3:4])
Now for an answer as you asked it:
IMHO, sapply() isn't necessary here.
You need a function of the grep family to operate directly on your characters, which all share a help page in R opened by ?grep.
I would propose gsub() and a bit of Regular Expressions for your problem:
a <- list('**aaa', 'bb*bb', 'cccc*')
b <- gsub(pattern = "^(\\*)*", x = a, replacement = "")
c <- gsub(pattern = "(\\*)*$", x = b, replacement = "")
> c
[1] "aaa" "bb*bb" "cccc"
This is doable in one regex, but then you need a backreference for the stuff in between i think, and i didn't get this to work.
If you are familiar with the magrittr package and its excellent pipe operator, you can do this more elegantly:
library(magrittr)
gsub(pattern = "^(\\*)*", x = a, replacement = "") %>%
gsub(pattern = "(\\*)*$", x = ., replacement = "")

Sapply different than individual application of function

When applied individually to each element of the vector, my function gives a different result than using sapply. It's driving me nuts!
Item I'm using: this (simplified) list of arguments another function was called with:
f <- as.list(match.call()[-1])
> f
$ampm
c(1, 4)
To replicate this you can run the following:
foo <- function(ampm) {as.list(match.call()[-1])}
f <- foo(ampm = c(1,4))
Here is my function. It just strips the 'c(...)' from a string.
stripConcat <- function(string) {
sub(')','',sub('c(','',string,fixed=TRUE),fixed=TRUE)
}
When applied alone it works as so, which is what I want:
> stripConcat(f)
[1] "1, 4"
But when used with sapply, it gives something totally different, which I do NOT want:
> sapply(f, stripConcat)
ampm
[1,] "c"
[2,] "1"
[3,] "4"
Lapply doesn't work either:
> lapply(f, stripConcat)
$ampm
[1] "c" "1" "4"
And neither do any of the other apply functions. This is driving me nuts--I thought lapply and sapply were supposed to be identical to repeated applications to the elements of the list or vector!
The discrepency you are seeing, I believe, is simply due to how as.character coerces elements of a list.
x2 <- list(1:3, quote(c(1, 5)))
as.character(x2)
[1] "1:3" "c(1, 5)"
lapply(x2, as.character)
[[1]]
[1] "1" "2" "3"
[[2]]
[1] "c" "1" "5"
f is not a call, but a list whose first element is a call.
is(f)
[1] "list" "vector"
as.character(f)
[1] "c(1, 4)"
> is(f[[1]])
[1] "call" "language"
> as.character(f[[1]])
[1] "c" "1" "4"
sub attempts to coerce anything that is not a character into a chracter.
When you pass sub a list, it calls as.character on the list.
When you pass it a call, it calls as.character on that call.
It looks like for your stripConcat function, you would prefer a list as input.
In that case, I would recommend the following for that function:
stripConcat <- function(string) {
if (!is.list(string))
string <- list(string)
sub(')','',sub('c(','',string,fixed=TRUE),fixed=TRUE)
}
Note, however, that string is a misnomer, since it doesn't appear that you are ever planning to pass stripConcat a string. (not that this is an issue, of course)

In R, how can a string be split without using a seperator

i am try split method and i want to have the second element of a string containing only 2 elemnts. The size of the string is 2.
examples :
string= "AC"
result shouldbe a split after the first letter ("A"), that I get :
res= [,1] [,2]
[1,] "A" "C"
I tryed it with split, but I have no idea how to split after the first element??
strsplit() will do what you want (if I understand your Question). You need to split on "" to split the string on it's elements. Here is an example showing how to do what you want on a vector of strings:
strs <- rep("AC", 3) ## your string repeated 3 times
next, split each of the three strings
sstrs <- strsplit(strs, "")
which produces
> sstrs
[[1]]
[1] "A" "C"
[[2]]
[1] "A" "C"
[[3]]
[1] "A" "C"
This is a list so we can process it with lapply() or sapply(). We need to subset each element of sstrs to select out the second element. Fo this we apply the [ function:
sapply(sstrs, `[`, 2)
which produces:
> sapply(sstrs, `[`, 2)
[1] "C" "C" "C"
If all you have is one string, then
strsplit("AC", "")[[1]][2]
which gives:
> strsplit("AC", "")[[1]][2]
[1] "C"
split isn't used for this kind of string manipulation. What you're looking for is strsplit, which in your case would be used something like this:
strsplit(string,"",fixed = TRUE)
You may not need fixed = TRUE, but it's a habit of mine as I tend to avoid regular expressions. You seem to indicate that you want the result to be something like a matrix. strsplit will return a list, so you'll want something like this:
strsplit(string,"",fixed = TRUE)[[1]]
and then pass the result to matrix.
If you sure that it's always two char string (check it by all(nchar(x)==2)) and you want only second then you could use sub or substr:
x <- c("ab", "12")
sub(".", "", x)
# [1] "b" "2"
substr(x, 2, 2)
# [1] "b" "2"

Resources