Use the grep functie to find words which contain either blue of red - r

Im using the grep function to select certain column heads. The heads I want to select should contain exactly "red" or "blue"
I got the red thing to work using (I stored the columnnames in a variable called x) ->
x <- c("Red", "Blue", "blue", "green")
grep("^red$", x, varnames=TRUE)
But i cant figure out how to look for red OR blue... Any thoughts?
grep("^(red|blue)$", x, varnames=TRUE)
This doesn't seem to work...

If the search is not supposed to be case-sensitive, then I'd suggest the following:
> x <- c("Red", "Blue", "blue", "green")
> grep("^(red|blue)$",tolower(x))
[1] 1 2 3

grep("red|blue", x, ignore.case=T, value=T) # returns [1] "Red" "Blue" "blue"
If you require the match to be case-sensitive, remove the ignore.case=T.
If you require a case-sensitive match to the entire string (which is what you get when you use the assertions ^ and $) then you are basically asking for x[x=="blue"|x=="red"], which may be more efficient than a regex.

Related

Is there a list of all available color options for `col` in R plot? [duplicate]

Short question, if I have a string, how can I test if that string is a valid color representation in R?
Two things I tried, first uses the function col2rgb() to test if it is a color:
isColor <- function(x)
{
res <- try(col2rgb(x),silent=TRUE)
return(!"try-error"%in%class(res))
}
> isColor("white")
[1] TRUE
> isColor("#000000")
[1] TRUE
> isColor("foo")
[1] FALSE
Works, but doesn't seem very pretty and isn't vectorized. Second thing is to just check if the string is in the colors() vector or a # followed by a hexadecimal number of length 4 to 6:
isColor2 <- function(x)
{
return(x%in%colors() | grepl("^#(\\d|[a-f]){6,8}$",x,ignore.case=TRUE))
}
> isColor2("white")
[1] TRUE
> isColor2("#000000")
[1] TRUE
> isColor2("foo")
[1] FALSE
Which works though I am not sure how stable it is. But it seems that there should be a built in function to make this check?
Your first idea (using col2rgb() to test color names' validity for you) seems good to me, and just needs to be vectorized. As for whether it seems pretty or not ... lots/most R functions aren't particularly pretty "under the hood", which is a major reason to create a function in the first place! Hides all those ugly internals from the user.
Once you've defined areColors() below, using it is easy as can be:
areColors <- function(x) {
sapply(x, function(X) {
tryCatch(is.matrix(col2rgb(X)),
error = function(e) FALSE)
})
}
areColors(c(NA, "black", "blackk", "1", "#00", "#000000"))
# <NA> black blackk 1 #00 #000000
# TRUE TRUE FALSE TRUE FALSE TRUE
Update, given the edit
?par gives a thorough description of the ways in which colours can be specified in R. Any solution to a valid colour must consider:
A named colour as listed in colors()
A hexademical representation, as a character, of the form "#RRGGBBAA specifying the red, green, blue and alpha channels. The Alpha channel is for transparency, which not all devices support and hence whilst it is valid to specify a colour in this way with 8 hex values it may not be valid on a specific device.
NA is a valid "colour". It means transparent, but as far as R is concerned it is a valid colour representation.
Likewise "transparent" is also valid, but not in colors(), so that needs to be handled as well
1 is a valid colour representation as it is the index of a colour in a small palette of colours as returned by palette()
> palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow"
[8] "gray"
Hence you need to cope with 1:8. Why is this important, well ?par tells us that it is also valid to represent the index for these colours as a character hence you need to capture "1" as a valid colour representation. However (as noted by #hadley in the comments) this is just for the default palette. Another palette may be used by a user, in which case you will have to consider a character index to an element of a vector of the maximum allowed length for your version of R.
Once you've handled all those you should be good to go ;-)
To the best of my knowledge there isn't a user-visible function that does this. All of this in buried away inside the C code that does the plotting; very quickly you end up in .Internal(....) land and there be dragons!
Original
[To be pedantic #000000 isn't a colour name in R.]
The only colour names R knows are those returned by colors(). Yes, #000000 is one of the colour representations that R understands but you specifically ask about a name and the definitive list or solution is x %in% colors() as you have in your second example.
This is about as stable as it gets. When you use a colour like col = "goldenrod", internally R matches this with a "proper" representation of the colour for whichever device you are plotting on. color() returns the list of colour names that R can do this looking up for. If it isn't in colors() then it isn't a colour name.

pyparsing how to pass identifiers to the parser

I am trying to pass a list of valids identifiers to the parser. That is to say: I have a list with the identifiers and the parser should use them, I'm passing them as a parameter into the constructor.
Instead of identifiers = Literal('identifier1') | Literal('identifier2') | Literal('identifier whatever') I have an array of identifiers identifiers = ['identifier1', 'identifier2', 'identifier whatever', ... 'identifier I can not what'] that I need to tell pyparsing to use as identifiers.
This is what I've done so far:
def __init__(self, idents):
if isinstance(idents, list) and idents:
for identifier in idents:
// and this is where I got stuck
// I tried:
// identifiers = Literal(identifier) but this keeps only the lastone
How can I achieve this?
The easiest way to convert a list of strings to a list of alternative parse expressions is to use oneOf:
import pyparsing as pp
color_expr = pp.oneOf(["red", "orange", "yellow", "green", "blue", "purple"])
# for convenience could also write as pp.oneOf("red orange yellow green blue purple")
# but since you are working with a list, I am show code using a list
parsed_colors = pp.OneOrMore(color_expr).parseString("blue orange yellow purple green green")
# use pprint() to list out results because I am lazy
parsed_colors.pprint()
sum(color_expr.searchString("blue 1000 purple, red red swan okra kale 5000 yellow")).pprint()
Prints:
['blue', 'orange', 'yellow', 'purple', 'green', 'green']
['blue', 'purple', 'red', 'red', 'yellow']
So oneOf(["A", "B", "C"]) and the easy-button version oneOf("A B C") are the same as Literal("A") | Literal("B") | Literal("C")
One thing to be careful of with oneOf is that it does not enforce word boundaries
pp.OneOrMore(color_expr).parseString("redgreen reduce").pprint()
will print:
['red', 'green', 'red']
even though the initial 'red' and 'green' are not separate words, and the final 'red' is just the first part of 'reduce'. This is exactly the behavior you would get with using an explicit expression built up with Literals.
To enforce word boundaries, you must use the Keyword class, and now you have to use a bit more Python to build this up.
You will need to build up an Or or MatchFirst expression for your alternatives. Usually you build these up using '^' or '|' operators, respectively. But to create one of these using a list of expressions, then you would call the constructor form Or(expression_list) or MatchFirst(expression_list).
If you have a list of strings, you could just create Or(list_of_identifiers), but this would default to converting the strings to Literals, and we've already seen you don't want that.
Instead, use your strings to create Keyword expressions using a Python list comprehension or generator expression, and pass that to the MatchFirst constructor (MatchFirst will be more efficient than Or, and Keyword matching will be safe to use with MatchFirst's short-circuiting logic). The following will all work the same, with slight variations in how the sequence of Keywords is built and passed to the MatchFirst constructor:
# list comprehension
MatchFirst([Keyword(ident) for ident in list_of_identifiers])
# generator expression
MatchFirst(Keyword(ident) for ident in list_of_identifiers)
# map built-in
MatchFirst(map(Keyword, list_of_identifiers))
Here is the color matching example, redone using Keywords. Note how colors embedded in larger words are not matched now:
colors = ["red", "orange", "yellow", "green", "blue", "purple"]
color_expr = pp.MatchFirst(pp.Keyword(color) for color in colors)
sum(color_expr.searchString("redgreen reduce skyblue boredom purple 100")).pprint()
Prints:
['purple']

How to specify a repeated pattern within one string using `sub()` instead of `gsub()` in R

I am aware of the many answer showing how to match multiple occurrences within a single string. However, I couldn't yet find an answer that would provide context as to why the following doesn't work:
## A string for which I want to replace `red` and `Red` with `RED`
x <- c("redflag flagred red and Red")
## This one works using `gsub()`
gsub("\\b(?:red|Red)\\b", "RED", x)
#[1] "redflag flagred RED and RED"
But is there a way to use sub() instead? The following doesn't work. It only matches the first occurrence and then stops:
sub("\\b(?:red|Red)\\b", "RED", x)
#[1] "redflag flagred RED and Red"
When checking the actual pattern it should match: https://regex101.com/r/X7DSB0/1 I am assuming this has something to do with the "global flag"?
I also tried adding a + or {1,} to get multiple matches but that doesn't work either:
## using a `+` doesn't work either
sub("\\b(?:red|Red)+\\b", "RED", x)
#[1] "redflag flagred RED and Red"
## using `{1,}` doesn't work either
sub("\\b(?:red|Red){1,}\\b", "RED", x)
#[1] "redflag flagred RED and Red"
What am I not understanding? How could I use sub() instead of gsub() for such an operation?
The g in gsub stands for "global," which means that you are telling the regex engine to apply the substitution to the entire string. On the other hand, sub just does the first replacement it encounters.
So the answer to your question is that you should use gsub if you intend to make every possible replacement:
gsub("\\b(?:red|Red)\\b", "RED", x)
[1] "redflag flagred RED and RED"

Barplot legend with entry only for second sub-bar

I'm new to R and currently experimenting with drawing bar plots for a contingency table. Now I'd like to have a legend in my plot with only one label named "Extra", which corresponds to the second row in my table. I tried
legend.text = c("","Extra")
but this draws two labels, while
legend.text = c(NULL,"Extra")
draws only one label, but with the color of the first sub bar.
Thanks in advance!
I think that I understand what you are asking. You can manipulate the legend through args.legend
barplot(2:1, legend.text=c("", "B"), col=2:3,
args.legend=list(fill=c(NA,3), border=c(NA,1)))
"" is a character vector with a length of 1.
You can check it like this:
length("")
# [1] 1
Also, as I mentioned in my comments (and as it obvious based on above):
identical(NULL, "")
# [1] FALSE
is.null("")
# [1] FALSE
If you pass a character, any character vector including this "blank" one, to your legend it would get printed while passing NULL makes it to be omitted form your plot.

using gsub to find all values that are NOT equal in R

I am trying to use gsub to change values in an Igraph vertex variable to colors before I plot a network graph.
The issue is that my graph has 3 values that I care about, and many others that I'd just like to group as "other" and assign 1 color to.
For example, if I had data that looks like this:
Name........Value
A............1
B............2
C............3
D............4
E............5
and I had code like this:
V(g)$color=V(g)$value #assign the "Value" attribute as the vertex color
V(g)$color=gsub("1","red",V(g)$color) #1 will be red
V(g)$color=gsub("2","blue",V(g)$color) #2 will be blue
V(g)$color=gsub("3", "yellow", V(DMedge)$color) #3 is yellow
What line of code could I add to make 4 and 5 into some other color, (green for example)? Thanks so much for any help you might have!
I would avoid sub (this is not about matching patterns) and do:
my.colors <- c("red", "blue", "yellow", "green")
V(g)$color <- my.colors[match(V(g)$value, c(1, 2, 3), nomatch = 4)]
It looks like this suffices for what you want to do:
x <- c("1","2","3","4")
gsub("4|5", "green", x)
[1] "1" "2" "3" "green" "green"
Or this
gsub("[^1-3]", "green", x)
[1] "1" "2" "3" "green" "green"
However as pointed out in other answers it looks like a better idea to set up a lookup table mapping numbers to colors and use match to determine the color.
Assuming that after you have made the initial substitutions, the only numbers left are the ones you want to be one uniform color, you can use a regex to match all contiguous digits and put the same color for them.
V(g)$color=gsub("\\d+", "green",V(g)$color)
See this page for gsub regular expressions.

Resources