I've tried to follow a tutorial to add a comment for Beyond Compare but I am still unable to mark the commented lines as unimportant differences. I would like to compare R files. This is how I configured the grammar Rules.
If possible I would like to ignore the commented line only if the content of the line is equal. In other words if by removing the comment the two lines would actually differ I would still like to have them marked as important differences.
Here is the actual result of the comparison. Strangely when there are two comment symbols (#) the line appear as minor difference.
Beyond Compare doesn't support what you're trying to do. The comparison for each character checks both the character itself and the grammar type of the element. For example, comparing an identifier to a string will always show the characters as completely different even if the strings themselves are identical.
In your example, since they're different grammar types, every character is considered a difference. On the left they're comments, so unimportant and normally drawn as blue differences, but you're ignoring unimportant differences so they're shown as matching/black instead. On the right, they're important text, so they're drawn as red differences.
The lines that are comments on both sides are showing as matching because (A) they're all the same character and grammar type, so, aside from the # leading character, they are treated as matches, and (B) you're ignoring unimportant differences. (B) means that you could actually have anything for the content of the comments on each side and it would still show up as matching.
Related
I'm working on a PDF to HTML project. In the original .ai file, some numeric characters are displayed in a box:
Although I know the font used in the file is GothicMB101Pro DeBold-83pv-RKSJ-H, I don't have the font file on my machine (and of course the original designer is long gone). In my illustrator, it appear like this:
The 1) part is one single character - not "1" and ")", so at least I know it's not some form of kerning but some unicode character. But I couldn't find any match in my search. The "enclosed numeric" characters ① aren't the same.
Since I'm not sure which character it is, and being not very knowledgeable in Japanese (it seems like a very common occurrence in Japanese language), I couldn't satisfy my client's requirement.
What are those characters and how do I get them onscreen?
I would guess, that since the output you are seeing without the original font installed, consists of two characters, the original also consisted of two characters, first of which is a regular one (in that case, number 1), and the second one is a combining character. There is one for a combining enclosing square, and this is probably the one that is rendered as closing parenthesis ")" that you see in the output. Using the number 1 and the enclosing square (at least in my browser in the stackoverflow answer editior) gives me the required result, as shown below:
1⃞
If your font does not render the enclosing square, it is probably the fault of your font, that is used as a fallback. But without knowing which font exactly is used as a replacement, it is hard to say if it is possible to work around the issue.
this is my first entry on stack overflow, so please be indulgent if my post might have some lack in terms of quality.
I want to learn some webscraping with R and started with a simple example --> Extracting a table from a Wikipedia site.
I managed to download the specific page and identified the HTML sections I am interested in:
<td style="text-align:right">511.000.000\n</td>
Now I want to extract the number in the data from the table by using regex. So i created a regex, which should match the structure of the number from my point of view:
pattern<-"\\d*\\.\\d*\\.\\d*\\.\\d*\\."
I also tried other variations but none of them found the number within the HTML code. I wanted to keep the pattern open as the numbers might be hundreds, thousand, millions, billions.
My questions: The number is within the HTML code, might it be
necessary to include some code for the non-number code (which should
not be extracted...)
What would be the correct version for the
pattern to identify the number correctly?
Thank you very much for your support!!
So many stars implies a lot of backtracking.
One point further, using \\d* would match more than 3 digits in any group and would also match a group with no digit.
Assuming your numbers are always integers, formatted using a . as thousand separator, you could use the following: \\d{1,3}(?:\\.\\d{3})* (note the usage of non-capturing group construct (?:...) - implying the use of perl = TRUE in arguments, as mentioned in Regular Expressions as used in R).
Look closely at your regex. You are assuming that the number will have 4 periods (\\.) in it, but in your own example there are only two periods. It's not going to match because while the asterisk marks \\d as optional (zero or more), the periods are not marked as optional. If you add a ? modifier after the 3rd and 4th period, you may find that your pattern starts matching.
The base R function factor() interprets character elements consisting of blank space as valid factor elements instead of NA. What is the benefit of interpreting blank space character elements like this? Is it a legacy feature that is kept as it is to maintain compatibility?
Example:
factor(c("a","a","","b"))
I realize that this isn't an ordinary problem that can be solved with a reproducible example as a starting point, but I decided to give it a try anyway. The design decision to have factor() interpret blank space character elements like this confounds me. It seems to me that it would simplify things with no clear disadvantages to interpret these elements as NA instead.
What is the benefit of interpreting blank space character elements like this?
Because empty string data usually means “this is an empty string”, and not “this is missing data”.
It depends on the usage of course: an empty “name” field is most likely missing data. But an empty “title” field is just that: no title. How else would you encode lack of a title (assuming “Mr” and “Mrs” have a separate field, which may not be the case).
For factors, having empty labels makes less sense. However, R tends to convert strings to factors quite liberally (especially when reading tabular data from files), and treating all those empty values as NA would cause a lot of mis-annotated data. In general, such implicit conversions should always be lossless, i.e. preserve the whole domain of values being converted.
Alright, I've been given a program that requires me to take a .txt file of varying symbols in rows and columns that would look like this.
..........00
...0....0000
...000000000
0000.....000
............
..#########.
..#...#####.
......#####.
...00000....
and using command arguments to specify row and column, requires me to select a symbol and replace that symbol with an asterisk. The problem i have with this is that it then requires me to recur up, down, left, and right any of the same symbol and change those into an asterisk.
As i understand it, if i were to enter "1 2" into my argument list it would change the above text into.
**********00
***0....0000
***000000000
0000.....000
............
..#########.
..#...#####.
......#####.
...00000....
While selecting the specified character itself isn't a problem, how do i have any similar, adjacent symbols change and then the ones next to those. I have looked around but can't find any information and as my teacher has had a different subs for the last 3 weeks, i havent had a chance to clarify my questions with them. I've been told that recursion can be used, but my actual experience using recursion is limited. Any suggestions or links i can follow to get a better idea on what to do? Would it make sense to add a recursive method that takes the coordinates given adds and subtracts from the row and column respectively to check if the symbol is the same and repeats?
Load in char by char, row by row, into a 2D array of characters. That'll make it a lot easier to move up and down and left and right, all you need to do is move one of the array indexes.
You can also take advantage of recursion. Make a function that changes all adjacent matching characters, and then call that same function on all adjacent matching characters.
I thought to ask this as an update to my previous similar question but it became too long.
I was trying to understand a regex given in w3.org that matches css comments and got this doubt
Why do they use
\/\*[^*]*\*+([^/*][^*]*\*+)*\/
----------------^
instead of just
\/\*[^*]*\*+([^/][^*]*\*+)*\/
?
Both are working similarly. Why do they have an extra star there?
Let's look at this part:
\*+([^/*][^*]*\*+)*
-A- --B-- -C-
Regex engine will parse the A part and match all the stars until there is NO MORE stars or there is a line break. So once A is done, the next character must be a line break or anything else that's not a star. Then why instead of using [^/] they used [^/*]?
Also look at the repeating capturing group.
([any one char that's not / or *][zero or more chars that's not *][one or more stars])
It captures groups of characters ending with atleast one or more stars. So C will take all the stars leaving B with no stars to match in the next round.
So the B part won't get a chance to meet any stars at all. That is why I think there's no need to put a star there.
But that regex is in w3.org so I guess my understanding may be wrong. Please explain what I'm missing.
This has already been corrected in the CSS3 Syntax module:
\/\*[^*]*\*+([^/][^*]*\*+)*\/ /* ignore comments */
Notice that the extraneous asterisk is gone, making this expression identical to what you have.
So it would seem that it was simply a mistake on their part while writing the grammar for CSS2. I'm digging the mailing list archives to see if there's any discussion there that could be relevant.