I am trying to execute update query with subtraction inside:
UPDATE categories_ns
SET
nsright = nsright – 10
WHERE
nsright > 9
And I am getting [Err] 1 - near "–": syntax error.
Could you please help me to understand why its happens ?
Thanks!
And yet again someone is having issues with Unicode having so many similar symbols and some of them getting into code by accident.
– and - are different symbols. The former is not a valid minus, the latter is.
The difference in dashes' lengths is often unclear in many monospaced fonts. You can view your code in a non-monospaced one so the difference becomes obvious. But first and foremost, avoid copying code that may not be what it looks like.
Some document processors and websites out there, for instance:
Replace quotes with fancier ones (like ˝)
Replace << and >> with « and »
Replace a "minus" constructs like - with a proper dash (–, —?)
...all of which make sense for prose or poems, but not code.
Related
Working with postgres 9.4 and using PGAdmin III: I have a schema with a character field that MAY contain "new lines" (NLs), or MAY contain double spaces, or neither. I need to divide the field into 3 segments by either of the above factors or by 26 characters in length.
If the total length is 26 or less, I just copy it as is.
If it has NLs, then I split it by those.
If no NLs but it has double spaces, I split it by those.
Finally, if none of the above, I want to split it into 3 26 character chunks.
For context; the column is data entered by customers into a 3 line box. anywhere from 4 characters up to 78. Some use NLs, some use spaces to push the text to the next line, some have luck with spacing so use neither.
Here's the part of the query to gather the first line:
CASE WHEN LENGTH(detail) < 27 THEN detail
WHEN detail LIKE E'%\n%' THEN SPLIT_PART(detail,E'\n',1)
WHEN detail LIKE E'% %' THEN SPLIT_PART(detail,E' ',1)
ELSE LEFT(detail, 26)
END AS line1,
When I test this on a sample schema, the first three parts work exactly as I need but the "ELSE" part never does. The LEFT statement workout outside of the CASE statement but not within.
I've tried enclosing it with a SELECT clause, nesting several CASE-WHEN-ELSE statements, and other things to no avail.
Oddly, it DOES work if I take out the "SPLIT_PART" line referring to '% %'.
Short of giving up on the double-space split, is there another solution, or have I just formatted something wrong?
CASE WHEN LENGTH(detail) < 27 THEN detail
WHEN detail LIKE '%\n%' THEN SPLIT_PART(detail,'\n',1)
WHEN position(' ' in detail) > 0 THEN SPLIT_PART(detail,' ',1)
ELSE left(detail,26)
After another bit of time mucking about, I suspected the LIKE E'% %' might be the issue - and it was. Simply changing it to LIKE E' ' allows it to work like I wanted.
Added:
Turns out the best solution is a combination of "position" to determine if the double space exists, and split-part to separate correctly at the location.
I want to search for multiple codes appearing in a cell. There are so many codes that I'd like to write parts of the code in succeeding lines. For example, let's say I am looking for "^a11","^b12", "^c67$" or "^d13[[:blank:]]". I am using:
^a11|^b12|^c67$|^d13[[:blank:]]
This seems to work. Now, I tried:
^a11|^b12|
^c67$|^d13[[:blank:]]
That also seemed to work. However when I tried:
^a11|^b12|^c67$|
^d13[[:blank:]]
It did not count the last one.
Note that my code is wrapped into a function. So the above is an argument that I feed the function. I'm thinking that's the problem, but I still don't know why one truncation works while the other does not.
I realized the answer today. The problem is that since I am feeding the regex argument, it will count the next line in the succeeding code.
Thus, the code below was only "working" because ^c67$ is empty.
^a11|^b12|
^c67$|^d13[[:blank:]]
And the code below was not working because ^d13 is not empty but also this setup looks for (next line)^d13[[:blank:]] instead of just ^d13[[:blank:]]
^a11|^b12|^c67$|
^d13[[:blank:]]
So an inelegant fix is:
^a11|^b12|^c67$|
^nothinghere|^d13[[:blank:]]
This inserts a burner code that is empty which is affected by the line break.
The appearance of "textparcali" in RStudio Source Editor was as follows.
In textparcali (tbl_df), I ran the following code to delete single strings.
textparcali$word<-gsub("\\W*\\b\\w\\b\\W*",'', textparcali$word)
But the deletion was interesting. You can see the picture below. Please note lines 67 and 50.
Everything was fine for line 50 and lines like that. However, this was not the case for line 67 (and I think there are others like it).
I focused on one line(67) to understand why you deleted it wrong. I've already seen what it says on this line in the editor. But I also wanted to look at the console. I wrote the following code to the console.
textparcali$word[67]
The word on line 67 looks different in the console. The value that doesn't appear when you make a copy paste but surprisingly appears on the console:
The reason I put it as a picture is because this character disappears after the copy-paste command.
You can download the file containing this character from the link below. However, you should open it with Notepad ++.
Character.txt
Gsub did his job right. How is that possible? What's the name of this character? When I try to write code that destroys this character, the " sign changes and does not delete.
textparcali$word<-gsub('[[:punct:]]+',' ',textparcali$word) command also does not work.
What is the explanation of my experience? I do not know. Is there a way to destroy this character? What caused this? I ve asked a lot.
Thank you all.
(I apologize for the bad scribbles in the pictures.)
I found the surprise character.
Above Right, Combining Dot ͘ ͘
The following is the code required to eliminate this character.
c<-"surprise character"
c
[1] "\u0358"
textparcali$word<-gsub("\u0358","",textparcali$word,ignore.case = FALSE)
textparcali$word<-gsub("\u307","",textparcali$word,ignore.case = FALSE)
Code 307 did the job for me. However, you should determine what the actual code is. If not, your character code may be incorrect.
More detailed information can be found in the links below.
https://gist.github.com/ngs/2782436
https://www.charbase.com/0358-unicode-combining-dot-above-right
Thanks a lot!
As part of my feature engineering, I need to parse text strings from different languages and keep text enclosed within parentheses. Everything was going well until I encountered a very strange phenomenon. For some languages, the parentheses I need to find look slightly different, and various regexp options fail.
I'm pasting screen-shots because strangely, copying and pasting the strange parentheses changes it to a 'normal' one, so I can't set up a different regex to find those separately.
Notice that the parentheses in the first entry look normal, but for the second entry, it appears sort of 'sharp'
If I use stringr's str_extract, the first instance works fine, but the second fails.
But, the encodings are the same. Anyone know what's going on?
[Edit: here are the results of dput on these same examples. dput apparently sees the parentheses as equivalent, even though grep does not]
c("Obnaružena poterâ šaga na (Motor šprica pipettora R1).", "(STAT tàn zhen Z zhóu ma dá) tàn cè dào diu bù<U+3002>")
Finally, I am actually copy and pasting the two parentheses from R into the code window below; they do appear different this way. First is normal, second is the strange one.
( (
I am trying to write a custom report in Spiceworks, which uses SQLite queries. This report will fetch me hard drive serial numbers that are unfortunately stored in a few different ways depending on what version of Windows and WMI were on the machine.
Three common examples (which are enough to get to the actual question) are as follows:
Actual serial number: 5VG95AZF
Hexadecimal string with leading spaces: 2020202057202d44585730354341543934383433
Hexadecimal string with leading zeroes: 3030303030303030313131343330423137454342
The two hex strings are further complicated in that even after they are converted to ASCII representation, each pair of numbers are actually backwards. Here is an example:
3030303030303030313131343330423137454342 evaluates to 00000000111430B17ECB
However, the actual serial number on that hard drive is 1141031BE7BC, without leading zeroes and with the bytes swapped around. According to other questions and answers I have read on this site, this has to do with the "endianness" of the data.
My temporary query so far looks something like this (shortened to only the pertinent section):
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
cast(('X''' || pd.serial || '''') as TEXT)
WHEN pd.serial like "202020%" THEN
LTRIM(X'2020202057202d44585730354341543934383433')
ELSE
pd.serial
END as HDSerial
The result of that query is something like this:
HDModel HDSerial
----------------- -------------------------------------------
Normal Serial 5VG95AZF
202020% test case W -DXW05CAT94843
303030% test case X'3030303030303030313131343330423137454342'
This shows that the X'....' notation style does convert into the correct (but backwards) result of W -DXW05CAT94843 when given a fully literal number (the 202020% line). However, I need to find a way to do the same thing to the actual data in the column, pd.serial, and I can't find a way.
My initial thought was that if I could build a string representation of the X'...' notation, then perhaps cast() would evaluate it. But as you can see, that just ends up spitting out X'3030303030303030313131343330423137454342' instead of the expected 00000000111430B17ECB. This means the concatenation is working correctly, but I can't find a way to evaluate it as hex the same was as in the manual test case.
I have been googling all morning to see if there is just some syntax I am missing, but the closest I have come is this concatenation using the || operator.
EDIT: Ultimately I just want to be able to have a simple case statement in my query like this:
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
LTRIM(X'pd.serial')
WHEN pd.serial like "202020%" THEN
LTRIM(X'pd.serial')
ELSE
pd.serial
END as HDSerial
But because pd.serial gets wrapped in single quotes, it is taken as a literal string instead of taken as the data contained in that column. My hope was/is that there is just a character or operator I need to specify, like X'$pd.serial' or something.
END EDIT
If I can get past this first hurdle, my next task will be to try and remove the leading zeroes (the way LTRIM eats the leading spaces) and reverse the bytes, but to be honest, I would be content even if that part isn't possible because it wouldn't be hard to post-process this report in Excel to do that.
If anyone can point me in the right direction I would greatly appreciate it! It would obviously be much easier if I was using PHP or something else to do this processing, but because I am trying to have it be an embedded report in Spiceworks, I have to do this all in a single SQLite query.
X'...' is the binary representation in sqlite. If the values are string, you can just use them as such.
This should be a start:
sqlite> select X'3030303030303030313131343330423137454342';
00000000111430B17ECB
sqlite> select ltrim(X'3030303030303030313131343330423137454342','0');
111430B17ECB
I hope this puts you on the right path.