What characters are valid for use in SCSS variable names?
If you check out the source for the SASS lexer, you'll see:
# A hash of regular expressions that are used for tokenizing.
REGULAR_EXPRESSIONS = {
:whitespace => /\s+/,
:comment => COMMENT,
:single_line_comment => SINGLE_LINE_COMMENT,
:variable => /(\$)(#{IDENT})/,
:ident => /(#{IDENT})(\()?/,
:number => /(-)?(?:(\d*\.\d+)|(\d+))([a-zA-Z%]+)?/,
:color => HEXCOLOR,
:bool => /(true|false)\b/,
:null => /null\b/,
:ident_op => %r{(#{Regexp.union(*IDENT_OP_NAMES.map{|s| Regexp.new(Regexp.escape(s) + "(?!#{NMCHAR}|\Z)")})})},
:op => %r{(#{Regexp.union(*OP_NAMES)})},
}
Which references the IDENT character set defined in a separate file:
s = if Sass::Util.ruby1_8?
'\200-\377'
elsif Sass::Util.macruby?
'\u0080-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF'
else
'\u{80}-\u{D7FF}\u{E000}-\u{FFFD}\u{10000}-\u{10FFFF}'
end
H = /[0-9a-fA-F]/
UNICODE = /\\#{H}{1,6}[ \t\r\n\f]?/
NONASCII = /[#{s}]/
ESCAPE = /#{UNICODE}|\\[ -~#{s}]/
NMSTART = /[_a-zA-Z]|#{NONASCII}|#{ESCAPE}/
NMCHAR = /[a-zA-Z0-9_-]|#{NONASCII}|#{ESCAPE}/
IDENT = /-?#{NMSTART}#{NMCHAR}*/
So, it looks like variable names can contain:
Any ASCII letter.
Any number 0-9 (as long as it is not the first character in the name).
Underscores and hyphens.
ASCII punctuation (!"#$%&'()*+,./:;<=>?#[]^{|}~) and spaces, if escaped with a backslash.
Unicode characters in the ranges 0080-D7FF, E000-FFFD, or 10000-10FFFF.
Unicode hex escape sequences such as \00E4.
Related
I would like to replace the 70 in between the brackets with a specific string lets say '80'.
from filter[70.00-100.00] --> filter[80.00-100.00]
However when using the following code:
str_replace('filter [70.00-140.00]'," *\\[.*?\\. *",'80')
The output is:
filter8000-140.00]
Is there any way to replace the string between the \ and . (in this case 70) without removing the \ and . ?
To replace any one or more digits after [ use
library(stringr)
str_replace('filter [70.00-140.00]','(?<=\\[)\\d+', '80')
sub('\\[\\d+', '[80', 'filter [70.00-140.00]')
sub('(?<=\\[)\\d+', '80', 'filter [70.00-140.00]', perl=TRUE)
See the online R demo. The (?<=\[)\d+ matches a location immediately preceded with [ and then one or more digits. \[\d+ matches [ and one or more digits, so [ must be restored and thus is added into the replacement pattern.
To replace exactly 70, you can use
library(stringr)
str_replace('filter [70.00-140.00]',"(\\[[^\\]\\[]*)70",'\\180')
# => [1] "filter [80.00-140.00]"
sub('(\\[[^][]*)70','\\180', 'filter [70.00-140.00]')
# => [1] "filter [80.00-140.00]"
See the regex demo. Details:
(\[[^\]\[]*) - Group 1: [, then zero or more chars other than [ and ]
70 - a 70 string.
In the replacement, \1 inserts Group 1 value.
Another solution could be replacing all occurrences of 70 strings inside square brackets that end with a word boundary and are not preceeded with a digit or digit+. (that is, to only match 70 as a whole integer part of a number) with
str_replace_all(
'filter [70.00-140.00]',
'\\[[^\\]\\[]*]',
function(x) gsub('(?<!\\d|\\d\\.)70\\b', '80', x, perl=TRUE))
# => [1] "filter [80.00-140.00]"
Here, \[[^\]\[]*] matches strings between two square brackets having no other square brackets in between and the gsub('(?<!\\d|\\d\\.)70\\b', '80', x, perl=TRUE) is run on the these matched substrings only. The (?<!\d|\d\.)70\b matches any 70 that is not preceded with digit or digit + . and is not followed by another word char (letter, digit or _, or connector punctuation, since ICU regexps are Unicode aware by default).
You can use this. It will replace all digits between the [ and the . :
preg_replace('/(?<=\[)(\d*)(?=\.)/', '80', 'filter [70.00-140.00]');
I'm using HERE's Platform Data Extension to retrieve road names. However, I don't understand the strings that I'm getting. I suspect they're encoded somehow but I don't know how to decode them.
For example:
ENGBNFDR Dr NNASN"e|fe "de "e|rre "dri|ve "nol|te;NASY"e|fe "de "e|rre;<snip>
If I split them by a "record separator" character, e.g. link_names.split('\x1e') the values look slightly more intelligible, but only slightly. There are still bizarre abbreviations I don't understand, e.g. ENGBN.
The PDE Layers documents can be found here: http://pde.cit.api.here.com/1/doc/content.html?detail=1&app_id=xxx&app_code=yyy
Layers > ROAD_NAME_FC1 > NAMES.
List of all names for this object, in all languages, latin1/pinyin/phonetic transliterations.
For convenience, non-exonym base names are listed first.
Format:
NAMES = NAME1 \u001D NAME2 \u001D NAME3 ...
NAME = NAME_TEXT \u001E TRANSLIT1 ; TRANSLIT2 ; ... \u001E PHONEME1 ; PHONEME2 ; ... NAME_TEXT = LANGUAGE_CODE NAME_TYPE IS_EXONYM text
TRANSLIT = LANGUAGE_CODE text
PHONEME = LANGUAGE_CODE IS_PREFERRED text
LANGUAGE_CODE is a 3 character string
NAME_TYPE is one letter (A = abbreviation, B = base name, E = exonym, K = shortened name, S = synonym)
IS_EXONYM = Y if the name is a translation into another language
IS_PREFERRED = Y if this is the preferred phoneme.
Please note, the delimiters are:
\u001D between languages (NAMES level)
\u001E between name text, transliterations, and phonemes ';' between different transliterations and phonemes of the same name.
For a text field, I would like to expose those that contain invalid characters. The list of invalid characters is unknown; I only know the list of accepted ones.
For example for French language, the accepted list is
A-z, 1-9, [punc::], space, àéèçè, hyphen, etc.
The list of invalid charactersis unknown, yet I want anything unusual to resurface, for example, I would want
This is an 2-piece à-la-carte dessert to pass when
'Ã this Øs an apple' pumps up as an anomalie
The 'not contain' notion in R does not behave as I would like, for example
grep("[^(abc)]",c("abcdef", "defabc", "apple") )
(those that does not contain 'abc') match all three while
grep("(abc)",c("abcdef", "defabc", "apple") )
behaves correctly and match only the first two. Am I missing something
How can we do that in R ? Also, how can we put hypen together in the list of accepted characters ?
[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+
The above regex matches any of the following (one or more times). Note that the parameter ignore.case=T used in the code below allows the following to also match uppercase variants of the letters.
a-z Any lowercase ASCII letter
1-9 Any digit in the range from 1 to 9 (excludes 0)
[:punct:] Any punctuation character
The space character
àâæçéèêëîïôœùûüÿ Any valid French character with a diacritic mark
- The hyphen character
See code in use here
x <- c("This is an 2-piece à-la-carte dessert", "Ã this Øs an apple")
gsub("[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+", "", x, ignore.case=T)
The code above replaces all valid characters with nothing. The result is all invalid characters that exist in the string. The following is the output:
[1] "" "ÃØ"
If by "expose the invalid characters" you mean delete the "accepted" ones, then a regex character class should be helpful. From the ?regex help page we can see that a hyphen is already part of the punctuation character vector;
[:punct:]
Punctuation characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
So the code could be:
x <- 'Ã this Øs an apple'
gsub("[A-z1-9[:punct:] àéèçè]+", "", x)
#[1] "ÃØ"
Note that regex has a predefined, locale-specific "[:alpha:]" named character class that would probably be both safer and more compact than the expression "[A-zàéèçè]" especially since the post from ctwheels suggests that you missed a few. The ?regex page indicates that "[0-9A-Za-z]" might be both locale- and encoding-specific.
If by "expose" you instead meant "identify the postion within the string" then you could use the negation operator "^" within the character class formalism and apply gregexpr:
gregexpr("[^A-z1-9[:punct:] àéèçè]+", x)
[[1]]
[1] 1 8
attr(,"match.length")
[1] 1 1
declare variable $fb := doc("factbook.xml")/mondial;
for $c in $fb//country
where ($c/encompassed/#continent = 'f0_119') and ($c/#population < 100000)
return concat('Country: ',$c/name, ', Population: ',$c/#population);
it returns:
Type Error: Type of value '
()
' does not match sequence type: xs:anyAtomicType?
At characters 11681-11698
At File "q2_3.xq", line 4, characters 13-67
At File "q2_3.xq", line 4, characters 13-67
At File "q2_3.xq", line 4, characters 13-67
however, if i do not do a concat return, just name or population it will work, and most strange thing is i have another program :
declare variable $fb := doc("factbook.xml")/mondial;
for $c in $fb//country
where $c/religions = 'Seventh-Day Adventist'
order by $c/name
return concat('Country: ',$c/name, ', Population: ',$c/#population);
The return syntax is exactly same, however, it works.
Why this happens?
Without seeing an example of your data it's impossible to say for sure, but if $c/name returns more than one value, then your error would make sense. Do you have any results where there are more than one name element?
I'm storing some files in database which has filename like 1839341255115211butterflies.jpg.I need to show this filename to the user as butterflies.jpg.I need to remove the first 16 digit and then show the filename.Added to it I also have few filenames which don't have this 16digit addition prior to the filename.Now my question is how do I identify if this string has 16digit numeric value prior to the filename, based on it remove the 1st 16digit and display just the filename. I'm aware of how to remove the first 16digit and retrive the filename but need help on how to identify a string that has 16digit.
Any suggestion is much appreciated.
A regular expression looks like a good fit here:
^[0-9]{16}
The above will match on strings that start with 16 digits (0 to 9).
Usage:
if(Regex.Match(fileName, #"^[0-9]{16}").Success)
{
fileName = fileName.Remove(0, 16);
}
string.Remove will work quite nicely:
var str = "1839341255115211butterflies.jpg";
str = str.Remove(0, 16);
Console.WriteLine(str);
With Linq:
remove all digits at the beginning until 16 digits:
string file = "1839341255115211butterflies.jpg";
string extension = Path.GetExtension(file);
string fileName = Path.GetFileNameWithoutExtension(file);
fileName = new string(fileName.Where((c, i) => i >= 17 || !Char.IsDigit(c)).ToArray());
file = fileName + extension;
Demo
Edit: If you just want to know if the first 16 chars are digits, it's easier and more readable:
bool startsWith16Digits = file.Take(16).All(Char.IsDigit);