Is there a "quote words" operator in R? [duplicate] - r

This question already has answers here:
Does R have quote-like operators like Perl's qw()?
(6 answers)
Closed 5 years ago.
Is there a "quote words" operator in R, analogous to qw in Perl? qw is a quoting operator that allows you to create a list of quoted items without having to quote each one individually.
Here is how you would do it without qw (i.e. using dozens of quotation marks and commas):
#!/bin/env perl
use strict;
use warnings;
my #NAM_founders = ("B97", "CML52", "CML69", "CML103", "CML228", "CML247",
"CML322", "CML333", "Hp301", "Il14H", "Ki3", "Ki11",
"M37W", "M162W", "Mo18W", "MS71", "NC350", "NC358"
"Oh7B", "P39", "Tx303", "Tzi8",
);
print(join(" ", #NAM_founders)); # Prints array, with elements separated by spaces
Here's doing the same thing, but with qw it is much cleaner:
#!/bin/env perl
use strict;
use warnings;
my #NAM_founders = qw(B97 CML52 CML69 CML103 CML228 CML247 CML277
CML322 CML333 Hp301 Il14H Ki3 Ki11 Ky21
M37W M162W Mo18W MS71 NC350 NC358 Oh43
Oh7B P39 Tx303 Tzi8
);
print(join(" ", #NAM_founders)); # Prints array, with elements separated by spaces
I have searched but not found anything.

Try using scan and a text connection:
qw=function(s){scan(textConnection(s),what="")}
NAM=qw("B97 CML52 CML69 CML103 CML228 CML247 CML277
CML322 CML333 Hp301 Il14H Ki3 Ki11 Ky21
M37W M162W Mo18W MS71 NC350 NC358 Oh43
Oh7B P39 Tx303 Tzi8")
This will always return a vector of strings even if the data in quotes is numeric:
> qw("1 2 3 4")
Read 4 items
[1] "1" "2" "3" "4"
I don't think you'll get much simpler, since space-separated bare words aren't valid syntax in R, even wrapped in curly brackets or parens. You've got to quote them.

For R, the closest thing that I can think of, or that I've found so far, is to create a single block of text and then break it up using strsplit, thus:
#!/bin/env Rscript
NAM_founders <- "B97 CML52 CML69 CML103 CML228 CML247 CML277
CML322 CML333 Hp301 Il14H Ki3 Ki11 Ky21
M37W M162W Mo18W MS71 NC350 NC358 Oh43
Oh7B P39 Tx303 Tzi8"
NAM_founders <- unlist(strsplit(NAM_founders,"[ \n]+"))
print(NAM_founders)
Which prints
[1] "B97" "CML52" "CML69" "CML103" "CML228" "CML247" "CML277" "CML322"
[9] "CML333" "Hp301" "Il14H" "Ki3" "Ki11" "Ky21" "M37W" "M162W"
[17] "Mo18W" "MS71" "NC350" "NC358" "Oh43" "Oh7B" "P39" "Tx303"
[25] "Tzi8"

Related

Insert characters when a string changes its case R

I would like to insert characters in the places were a string change its case. I tried this to insert a '\n' after a fixed number of characters and then a ' ', as I don't figure out how to detect the case change
s <-c("FloridaIslandE7", "FloridaIslandE9", "Meta")
gsub('^(.{7})(.{6})(.*)$', '\\1\\\n\\2 \\3', s )
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
This works because the positions are fixed but I would like to know how to do it for the general case.
Surely there's a less convoluted regex for this, but you could try:
gsub('([A-Z][0-9])', ' \\1', gsub('([a-z])([A-Z])', '\\1\n\\2', s))
Output:
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
Here is an option
str_replace_all(s, "(?<=[a-z])(?=[A-Z])", "\n")
#[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"
If you really want to insert \n, try this:
gsub("([a-z])([A-Z])", "\\1\\\n\\2", s)
[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"

Remove space in print statement in python

While using the below print command:
print(k,':',dict[k])
I get the output as shown below but in the output, i want to remove the space between the key and colon.How to do it?
Current Output:
Sam : 40
Required Output:
Sam: 40
You could try printing a single string consisting of a concatenation:
print(k + ': ' + dict[k])
The python print() statement has a separator parameter that defaults to a space. So the comma-separated values that you are passing into it serve as arguments each of which will get separated by white-space while printing.
I think what you are looking for is
print(name, ": ", "40", sep = '')
>>> Sam: 40
Simply specifying the "sep" parameter solves your issue.

How can I use R Regular Expressions to catch a Hebrew word?

I've been trying to catch the word
עונה
plus the subsequent number after it in a string such as
כל הילדים אוכלים, עונה 2 , פרק 8-לזניית ירקות וסלמון בדבש
Demonstrating it on Regex101.com was straightforward enough, with עונה(\s+\d+|\d+), but with R I came up empty.
str<-"כל הילדים אוכלים, עונה 2 , פרק 8-לזניית ירקות וסלמון בדבש"
exp<-"עונה(\\s+\\d+|\\d+)"
str_extract_all(str,exp)
Output:
[[1]]
character(0)
You can use this regex:
/[\u0590-\u05FF]/*

R regex match things other than known characters

For a text field, I would like to expose those that contain invalid characters. The list of invalid characters is unknown; I only know the list of accepted ones.
For example for French language, the accepted list is
A-z, 1-9, [punc::], space, àéèçè, hyphen, etc.
The list of invalid charactersis unknown, yet I want anything unusual to resurface, for example, I would want
This is an 2-piece à-la-carte dessert to pass when
'Ã this Øs an apple' pumps up as an anomalie
The 'not contain' notion in R does not behave as I would like, for example
grep("[^(abc)]",c("abcdef", "defabc", "apple") )
(those that does not contain 'abc') match all three while
grep("(abc)",c("abcdef", "defabc", "apple") )
behaves correctly and match only the first two. Am I missing something
How can we do that in R ? Also, how can we put hypen together in the list of accepted characters ?
[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+
The above regex matches any of the following (one or more times). Note that the parameter ignore.case=T used in the code below allows the following to also match uppercase variants of the letters.
a-z Any lowercase ASCII letter
1-9 Any digit in the range from 1 to 9 (excludes 0)
[:punct:] Any punctuation character
The space character
àâæçéèêëîïôœùûüÿ Any valid French character with a diacritic mark
- The hyphen character
See code in use here
x <- c("This is an 2-piece à-la-carte dessert", "Ã this Øs an apple")
gsub("[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+", "", x, ignore.case=T)
The code above replaces all valid characters with nothing. The result is all invalid characters that exist in the string. The following is the output:
[1] "" "ÃØ"
If by "expose the invalid characters" you mean delete the "accepted" ones, then a regex character class should be helpful. From the ?regex help page we can see that a hyphen is already part of the punctuation character vector;
[:punct:]
Punctuation characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
So the code could be:
x <- 'Ã this Øs an apple'
gsub("[A-z1-9[:punct:] àéèçè]+", "", x)
#[1] "ÃØ"
Note that regex has a predefined, locale-specific "[:alpha:]" named character class that would probably be both safer and more compact than the expression "[A-zàéèçè]" especially since the post from ctwheels suggests that you missed a few. The ?regex page indicates that "[0-9A-Za-z]" might be both locale- and encoding-specific.
If by "expose" you instead meant "identify the postion within the string" then you could use the negation operator "^" within the character class formalism and apply gregexpr:
gregexpr("[^A-z1-9[:punct:] àéèçè]+", x)
[[1]]
[1] 1 8
attr(,"match.length")
[1] 1 1

grep on two strings

I'm working to grab two different elements in a string.
The string look like this,
str <- c('a_abc', 'b_abc', 'abc', 'z_zxy', 'x_zxy', 'zxy')
I have tried with the different options in ?grep, but I can't get it right, 'm doing something like this,
grep('[_abc]:[_zxy]',str, value = TRUE)
and what I would like is,
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
any help would be appreciated.
Use normal parentheses (, not the square brackets [
grep('_(abc|zxy)',str, value = TRUE)
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
To make the grep a bit more flexible, you could do something like:
grep('_.{3}$',str, value = TRUE)
Which will match an underscore _ followed by any character . three times {3} followed immediately by the end of the string $
this should work: grep('_abc|_zxy', str, value=T)
X|Y matches when either X matches or Y matches
In this case just doing:
str[grep("_",str)]
will work... is it more complicated in your specific case?

Resources