I have a fread in my program:
{ok, [S]} = io:fread("entry: \n", "~s")
But I get {ok, "string"}
I want to get just the string and not the quotation marks.
So I can use it in:
digraph:add_vertex(G, S)
And receive a vertex string and not "string"
How can I do that?
It's an illusion. The quotation marks aren't really there. The shell, or other display mechanisms may show them when rendered with certain contexts in order to represent the data you are displaying, but the quotes in this case are really just meta data:
Here's your case:
1> {ok, [S]} = io:fread("entry: \n", "~s").
entry:
foo
{ok,["foo"]}
If you display S strictly though, you will see that it is a list with only 3 characters:
2> io:format("~w~n", [S]).
[102,111,111]
ok
If you ask io:format/2 to display the data generically, using it's best representation of the interpretation of the data though, it thinks 'ahha, this is a string, I shall display it as a string':
3> io:format("~p~n", [S]).
"foo"
ok
4>
Strings are obviously just lists, so in this that case a decision has to be made to display as a string, or as a list, and the decision to display as a string is made because the list bytes all represent printable characters. Adding a non printable character will change the behaviour therefore, like this:
5> io:format("~p~n", [[2|S]]).
[2,102,111,111]
ok
6>
Related
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4)
test_yaml_file = open("test.yaml")
test_file = yaml.load(test_yaml_file)
# test = LiteralScalarString('*clvm')
test = "*testing"
test_file['test_perf'] = test
with open("test.yaml", 'w') as changed_file:
yaml.dump(test_file, changed_file)
In this the expected output was
test_perf: *testing
but the output has been
test_perf: '*testing'
how to achieve this using ruamel?
Your scalar starts with a *, which is used in YAML to indicate an alias node. To prevent *testing to be interpreted as an alias during loading (even though the corresponding anchor (&testing) is not specified in the document), the scalar must be quoted or represented as a literal or folded block scalar.
So there is no way to prevent the quotes from happening apart from choosing to represent the scalar as literal or folded block scalar (where you don't get the quotes, but do get the | resp. >)
You should not worry about these quotes, because after loading you'll again have the string *testing and not something that all of a sudden has extra (unwanted) quotes).
There are other characters that have special meaning in YAML (&, !, etc.) and when indicated at the beginning of a scalar cause the scalar to be quoted. What the dump routine actually does is dump the string and read it back and if that results in a different value, the dumper knows that quoting is needed. This also works with strings like 2022-01-28, which when read back result in a date, such strings get quoted automatically when dumped as well (same for strings that look like floats, integers, true/false values).
I am currently using the following code to open up a word document (and then save it but that is irrelevant to opening the file at the moment):
word=win32.Dispatch('Word.Application')
try:
doc = word.Documents.Open('S:\problem\file.docx')
except Exception as e:
print(e)
(-2147352567, 'Exception occurred.', (0, 'Microsoft Word', 'Sorry, we
couldn’t find your file. Is it possible it was moved, renamed or
deleted?\r (S:\\problem\\file.docx)',
'wdmain11.chm', 24654, -2146823114), None)
The "problem" directory is the only directory it seems the win32 client is not able to recognize. I have renamed it several times to even single letters to see if the naming was the problem for some reason, but that does not seem to be the problem.
The file path is also recognized by the docx function- docx.Document and it is able to read the files in the directory. Here is the same code and results for the docx snippet:
Document('S://problem/file.docx')
<docx.document.Document at 0x83d3c18>
In Python strings, bkslash ("\") is one of the characters with a special meaning: it's used to create escape sequences (special chars) together with the char that follows it (this comes from C). Here's what [Python 3]: String and Bytes literals states:
The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.
In your string, you have "\p" (which is OK) and "\f" which is interpreted as a single char (form feed - new page), making your path invalid.
In order to fix this, either:
Escape (double) any "\" in the string (well, this is just a precaution measure since you only have to escape the ones that produce an escape sequence - in our example, "\p" is perfectly fine), except the ones that you want to produce an escape sequence: 'S:\problem\file.docx'
Make the string raw, by prepending it with the r marker (note that if the string ends with a "\", that should still be escaped, otherwise it will escape the string ending marker (' or ") that comes after it, yielding SyntaxError): r'S:\problem\file.docx'
As a general rule, in order to ensure that strings are what you think they are, either:
Check their length: if it's smaller than the number of chars you see (in the code), it means that there is at least one escape sequence
Use repr
Example:
>>> import sys
>>> sys.version
'3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
>>>
>>> s0 = 'S:\problem\file.docx'
>>> s1 = 'S:\\problem\\file.docx'
>>> s2 = r'S:\problem\file.docx'
>>>
>>> len(s0), len(s1), len(s2)
(19, 20, 20)
>>>
>>> s0 == s1, s1 == s2
(False, True)
>>>
>>> repr(s0), repr(s1), repr(s2)
("'S:\\\\problem\\x0cile.docx'", "'S:\\\\problem\\\\file.docx'", "'S:\\\\problem\\\\file.docx'")
I am writing a zsh completion function to complete IDs from a database. There is a program listnotes which outputs a list like this:
bf848bf6-63d2-474b-a2c0-e7e3c4865ce8 Note Title
aba21e55-22c6-4c50-8bf6-bf3b337468e2 Another one
09ead915-bf2d-449d-a943-ff589e79794a yet another "one"
...
How do I generate an associative array note_ids from the output of the listnotes command such that I get an associative array like this?
( bf848bf6-63d2-474b-a2c0-e7e3c4865ce8 "Note Title" aba21e55-22c6-4c50-8bf6-bf3b337468e2 "Another one" 09ead915-bf2d-449d-a943-ff589e79794a "yet another \"one\"" )
Note that there may be whitespace in the keys. I tried to generate something with sed:
note_ids=($(listnotes | sed 's/^\(.*\) \(.*\)$/\1 "\2"/'))
but quoting strings like this doesn’t seem to work, and double quotes in the title make it even more difficult.
Try something like
typeset -A note_ids
for line in ${(f)"$(listnotes)"}; do
note_ids+=(${line%% *} ${line#* })
done
${(f)PARAM}: split the result of the expansion of $PARAM at newlines
"$(listnotes)": put the output of listnotes verbatim into the expansion.
for line in LIST: iterate over the items in LIST as split by ${(f)…}.
note_ids+=(key value): add key-value pair to an the associative array note_ids
${line%% *}: cut the largest portion matching " *" (a space followed by anything) from the end of the expansion of line. So remove everying after including the first space, leaving only the key.
${line#* }: cut the smallest portion matching "* " (anything followed by three spaces) from the beginning of the expansion of $line. So remove the key and the three spaces used as separator.
Instead of using the parameter expansion flag (f) you could also read the output of listnotes line by line with read:
listnotes | while read; do
note_ids+=(${REPLY%% *} ${REPLY#* })
done
Unless specified otherwise read puts the read values into the REPLY parameter.
currently I am working on comparison between SICStus3 and SICStus4 but I got one issue that is SICStus4 will not consult any cases where the comment string has carriage controls or tab characters etc as given below.
Example case as given below.It has 3 arguments with comma delimiter.
case('pr_ua_sfochi',"
Response:
answer(amount(2370.09,usd),[[01AUG06SFO UA CHI Q9.30 1085.58FUA2SFS UA SFO Q9.30 1085.58FUA2SFS NUC2189.76END ROE1.0 XT USD 180.33 ZPSFOCHI 164.23US6.60ZP5.00AY XF4.50SFO4.5]],amount(2189.76,usd),amount(2189.76,usd),amount(180.33,usd),[[fua2sfs,fua2sfs]],amount(6.6,usd),amount(4.5,usd),amount(0.0,usd),amount(18.6,usd),lasttktdate([20061002]),lastdateafterres(200712282]),[[fic_ticketinfo(fare(fua2sfs),fic([]),nvb([]),nva([]),tktiss([]),penalty([]),tktendorsement([]),tourinfo([]),infomsgs([])),fic_ticketinfo(fare(fua2sfs),fic([]),nvb([]),nva([]),tktiss([]),penalty([]),tktendorsement([]),tourinfo([]),infomsgs([]))]],<>,<>,cat35(cat35info([])))
.
02/20/2006 17:05:10 Transaction 35 served by static.static.server1 (usclsefat002:7551) running E*Fare version $Name: build-2006-02-19-1900 $
",price(pnr(
user('atl','1y',<>,<>,dept(<>,'0005300'),<>,<>,<>),
[
passenger(adt,1,[ptconly(n)])
],
[
segment(1,sfo,chi,'ua','<>','100',20140901,0800,f,20140901,2100,'737',res(20140628,1316),hk,pf2(n,[],[],n),<>,flags(no,no,no,no,no,no,no,no,no)),
segment(2,chi,sfo,'ua','<>','101',20140906,1000,f,20140906,1400,'737',res(20140628,1316),hk,pf2(n,[],[],n),<>,flags(no,no,no,no,no,no,no,no,no))
]),[
rebook(n),
ticket(20140301,131659),
dbaccess(20140301,131659),
platingcarrier('ua'),
tax_exempt([]),
trapparm("trap:ffil"),
city(y)
])).
The below predicate will remove comment section in above case.
flatten-cases :-
getmessage(M1),
write_flattened_case(M1),
flatten-cases.
flatten-cases.
write_flattened_case(M1):-
M1 = case(Case,_Comment,Entry),!,
M2 = case(Case,Entry),
writeq(M2),write('.'),nl.
getmessage(M) :-
read(M),
!,
M \== end_of_file.
:- flatten-cases.
Now my requirement is to convert the comment string to an ASCII character list.
Layout characters other than a regular space cannot occur literally in a quoted atom or a double quoted list. This is a requirement of the ISO standard and is fully implemented in SICStus since 3.9.0 invoking SICStus 3 with the option --iso. Since SICStus 4 only ISO syntax is supported.
You need to insert \n and \t accordingly. So instead of
log('Response:
yes'). % BAD!
Now write
log('Response:\n\tyes').
Or, to make it better readable use a continuation escape sequence:
log('Response:\n\
\tyes').
Note that using literal tabs and literal newlines is highly problematic. On a printout you do not see them! Think of 'A \nB' which would not show the trailing spaces nor trailing tabs.
But there are also many other situations like: Making a screenshot of program text, making a photo of program text, using a 3270 terminal emulator and copying the output. In the past, punched cards. The text-mode when reading files (which was originally motivated by punched cards). Similar arguments hold for the tabulator which comes from typewriters with their manually settable tab stops.
And then on SO it is quite difficult to type in a TAB. The browser refuses to type it (very wisely), and if you copy it in, you get it rendered as spaces.
If I am at it, there is also another problem. The name flatten-case should rather be written flatten_case.
I need to be able to delimit a stream of binary data. I was thinking of using something like the ASCII EOT (End of Transmission) character to do this.
However I'm a bit concerned -- how can I know for sure that the particular binary sequence used for this (0b00000100) won't appear in my own binary sequences, thus giving a false positive on delimitation?
In other words, how is binary delimiting best handled?
EDIT: ...Without using a length header. Sorry guys, should have mentioned this before.
You've got five options:
Use a delimiter character that is unlikely to occur. This runs the risk of you guessing incorrectly. I don't recommend this approach.
Use a delimiter character and an escape sequence to include the delimiter. You may need to double the escape character, depending upon what makes for easier parsing. (Think of the C \0 to include an ASCII NUL in some content.)
Use a delimiter phrase that you can determine does not occur. (Think of the mime message boundaries.)
Prepend a length field of some sort, so you know to read the following N bytes as data. This has the downside of requiring you to know this length before writing the data, which is sometimes difficult or impossible.
Use something far more complicated, like ASN.1, to completely describe all your content for you. (I don't know if I'd actually recommend this unless you can make good use of it -- ASN.1 is awkward to use in the best of circumstances, but it does allow completely unambiguous binary data interpretation.)
Usually, you wrap your binary data in a well known format, for example with a fixed header that describes the subsequent data. If you are trying to find delimeters in an unknown stream of data, usually you need an escape sequence. For example, something like HDLC, where 0x7E is the frame delimeter. Data must be encoded such that if there is 0x7E inside the data, it is replaced with 0x7D followed by an XOR of the original data. 0x7D in the data stream is similarly escaped.
If the binary records can really contain any data, try adding a length before the data instead of a marker after the data. This is sometimes called a prefix length because the length comes before the data.
Otherwise, you'd have to escape the delimiter in the byte stream (and escape the escape sequence).
You can prepend the size of the binary data before it. If you are dealing with streamed data and don't know its size beforehand, you can divide it into chunks and have each chunk begin with size field.
If you set a maximum size for a chunk, you will end up with all but the last chunk the same length which will simplify random access should you require it.
As a space-efficient and fixed-overhead alternative to prepending your data with size fields and escaping the delimiter character, the escapeless encoding can be used to trim off that delimiter character, probably together with other characters that should have special meaning, from your data.
#sarnold's answer is excellent, and here I want to share some code to illustrate it.
First here is a wrong way to do it: using a \n delimiter. Don't do it! the binary data could contain \n, and it would be mixed up with the delimiters:
import os, random
with open('test', 'wb') as f:
for i in range(100): # create 100 binary sequences of random
length = random.randint(2, 100) # length (between 2 and 100)
f.write(os.urandom(length) + b'\n') # separated with the character b"\n"
with open('test', 'rb') as f:
for i, l in enumerate(f):
print(i, l) # oops we get 123 sequences! wrong!
...
121 b"L\xb1\xa6\xf3\x05b\xc9\x1f\x17\x94'\n"
122 b'\xa4\xf6\x9f\xa5\xbc\x91\xbf\x15\xdc}\xca\x90\x8a\xb3\x8c\xe2\x07\x96<\xeft\n'
Now the right way to do it (option #4 in sarnold's answer):
import os, random
with open('test', 'wb') as f:
for i in range(100):
length = random.randint(2, 100)
f.write(length.to_bytes(2, byteorder='little')) # prepend the data with the length of the next data chunk, packed in 2 bytes
f.write(os.urandom(length))
with open('test', 'rb') as f:
i = 0
while True:
l = f.read(2) # read the length of the next chunk
if l == b'': # end of file
break
length = int.from_bytes(l, byteorder='little')
s = f.read(length)
print(i, s)
i += 1
...
98 b"\xfa6\x15CU\x99\xc4\x9f\xbe\x9b\xe6\x1e\x13\x88X\x9a\xb2\xe8\xb7(K'\xf9+X\xc4"
99 b'\xaf\xb4\x98\xe2*HInHp\xd3OxUv\xf7\xa7\x93Qf^\xe1C\x94J)'