pyparsing multiple lines optional missing data in result set - pyparsing

I am quite new pyparsing user and have missing match i don't understand:
Here is the text i would like to parse:
polraw="""
set policy id 800 from "Untrust" to "Trust" "IP_10.124.10.6" "MIP(10.0.2.175)" "TCP_1002" permit
set policy id 800
set dst-address "MIP(10.0.2.188)"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 724 from "Trust" to "Untrust" "IP_10.16.14.28" "IP_10.24.10.6" "TCP_1002" permit
set policy id 724
set src-address "IP_10.162.14.38"
set dst-address "IP_10.3.28.38"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 233 name "THE NAME is 527 ;" from "Untrust" to "Trust" "IP_10.24.108.6" "MIP(10.0.2.149)" "TCP_1002" permit
set policy id 233
set service "TCP_1002-1005"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
"""
I setup grammar this way:
KPOL = Suppress(Keyword('set policy id'))
NUM = Regex(r'\d+')
KSVC = Suppress(Keyword('set service'))
KSRC = Suppress(Keyword('set src-address'))
KDST = Suppress(Keyword('set dst-address'))
SVC = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
ADDR = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
EXIT = Suppress(Keyword('exit'))
EOL = LineEnd().suppress()
P_SVC = KSVC + SVC + EOL
P_SRC = KSRC + ADDR + EOL
P_DST = KDST + ADDR + EOL
x = KPOL + NUM('PId') + EOL + Optional(ZeroOrMore(P_SVC)) + Optional(ZeroOrMore(P_SRC)) + Optional(ZeroOrMore(P_DST))
for z in x.searchString(polraw):
print z
Result set is such as
['800', 'MIP(10.0.2.188)']
['724', 'IP_10.162.14.38', 'IP_10.3.28.38']
['233', 'TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
The 800 is missing service tag ???
What's wrong here.
Thanks by advance
Laurent

The problem you are seeing is that in your expression, DST's are only looked for after having skipped over optional SVC's and SRC's. You have a couple of options, I'll go through each so you can get a sense of what all is going on here.
(But first, there is no point in writing "Optional(ZeroOrMore(anything))" - ZeroOrMore already implies Optional, so I'm going to drop the Optional part in any of these choices.)
If you are going to get SVC's, SRC's, and DST's in any order, you could refactor your ZeroOrMore to accept any of the three data types, like this:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC|P_SRC|P_DST)
This will allow you to intermix different types of statements, and they will all get collected as part of the ZeroOrMore repetition.
If you want to keep these different types of statements in groups, then you can add a results name to each:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC("svc*")|
P_SRC("src*")|
P_DST("dst*"))
Note the trailing '*' on each name - this is equivalent to calling setResultsName with the listAllMatches argument equal to True. As each different expression is matched, the results for the different types will get collected into the "svc", "src", or "dst" results name. Calling z.dump() will list the tokens and the results names and their values, so you can see how this works.
set policy id 233
set service "TCP_1002-1005"
set dst-address "IP_10.3.28.38"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
shows this for z.dump():
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: [['IP_10.3.28.38']]
- svc: [['TCP_1002-1005'], ['TCP_1006-1008'], ['TCP_1786']]
If you wrap ungroup on the P_xxx expressions, maybe like this:
P_SVC,P_SRC,P_DST = (ungroup(expr) for expr in (P_SVC,P_SRC,P_DST))
then the output is even cleaner-looking:
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: ['IP_10.3.28.38']
- svc: ['TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
This is actually looking pretty good, but let me pass on one other option. There are a number of cases where parsers have to look for several sub-expressions in any order. Let's say they are A,B,C, and D. To accept these in any order, you could write something like OneOrMore(A|B|C|D), but this would accept multiple A's, or A, B, and C, but not D. The exhaustive/exhausting combinatorial explosion of (A+B+C+D) | (A+B+D+C) | etc. could be written, or you could maybe automate it with something like
from itertools import permutations
mixNmatch = MatchFirst(And(p) for p in permutations((A,B,C,D),4))
But there is a class in pyparsing called Each that allows to write the same kind of thing:
Each([A,B,C,D])
meaning "must have one each of A, B, C, and D, in any order". And like And, Or, NotAny, etc., there is an operator shortcut too:
A & B & C & D
which means the same thing.
If you want "must have A, B, and C, and optionally D", then write:
A & B & C & Optional(D)
and this will parse with the same kind of behavior, looking for A, B, C, and D, regardless of the incoming order, and whether D is last or mixed in with A, B, and C. You can also use OneOrMore and ZeroOrMore to indicate optional repetition of any of the expressions.
So you could write your expression as:
x = KPOL + NUM('PId') + EOL + (ZeroOrMore(P_SVC) &
ZeroOrMore(P_SRC) &
ZeroOrMore(P_DST))
I looked at using results names with this expression, and the ZeroOrMore's seem to be confusing things, maybe still a bug in how this is done. So you may have to reserve using Each for more basic cases like my A,B,C,D example. But I wanted to make you aware of it.
Some other notes on your parser:
dblQuotedString.setParseAction(lambda t: t[0].replace('"','')) is probably better written
dblQuotedString.setParseAction(removeQuotes). You don't have any embedded quotes in your examples, but it's good to be aware of where your assumptions might not translate to a future application. Here are a couple of ways of removing the defining quotes:
dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \ and an ending quote \'
# removed leading and trailing "s, but also internal ones too, which are
# really part of the quoted string
dblQuotedString.setParseAction(lambda t: t[0].strip('"'))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \'
# removed leading and trailing "s, and leaves the one internal ones but strips off
# the escaped ending quote
dblQuotedString.setParseAction(removeQuotes)
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \"'
# just removes leading and trailing " characters, leaves escaped "s in place
KPOL = Suppress(Keyword('set policy id')) is a bit fragile, as it will break if there are any extra spaces between 'set' and 'policy', or between 'policy' and 'id'. I usually define these kind of expressions by first defining all the keywords individually:
SET,POLICY,ID,SERVICE,SRC_ADDRESS,DST_ADDRESS,EXIT = map(Keyword,
"set policy id service src-address dst-address exit".split())
and then define the separate expressions using:
KSVC = Suppress(SET + SERVICE)
KSRC = Suppress(SET + SRC_ADDRESS)
KDST = Suppress(SET + DST_ADDRESS)
Now your parser will cleanly handle extra whitespace (or even comments!) between individual keywords in your expressions.

Related

how to change 3 concatenated strings to bytes in python?

I am getting an error when running the following code in python 3, I look all over but could not find a right way to do it. any help will be appreciated.
raise TypeError('unicode strings are not supported, please encode to bytes: {!r}'.format(seq))
TypeError: unicode strings are not supported, please encode to bytes: 'relay read 7\n\r'
I need to send the following string via serial port: relay read #of relay.
import sys
import serial
if (len(sys.argv) < 2):
print ("Usage: relayread.py <PORT> <RELAYNUM>\nEg: relayread.py COM1 0")
sys.exit(0)
else:
portName = sys.argv[1];
relayNum = sys.argv[2];
#Open port for communication
serPort = serial.Serial(portName, 19200, timeout=1)
if (int(relayNum) < 10):
relayIndex = str(relayNum)
else:
relayIndex = chr(55 + int(relayNum))
serPort.write("relay read "+ relayIndex + "\n\r")
response = serPort.read(25)
if(response.find("on") > 0):
print ("Relay " + str(relayNum) +" is ON")
elif(response.find("off") > 0):
print ("Relay " + str(relayNum) +" is OFF")
#Close the port
serPort.close()
Use the string's encode method to construct the corresponding byte sequence.
In this case all of the characters in the string are in the ASCII range so it doesn't really matter which encoding scheme you use. (Differences between encoding schemes generally only matter when you're dealing with non-ASCII characters, ones whose ord() value is greater than 127.) So in this case you don't even need to specify a particular encoding scheme, you can simply use the encode method with no argument and let Python apply the platform's default encoding.
To do that, change this:
serPort.write("relay read "+ relayIndex + "\n\r")
to this:
serPort.write(("relay read "+ relayIndex + "\n\r").encode())
You'll probably have to do the reverse operation to get a string from the byte sequence returned by serPort.read. Change this:
response = serPort.read(25)
to:
response = serPort.read(25).decode()
BTW, it's typical for line endings in transmitted data to be represented by a Carriage Return followed by a Line Feed, or "\r\n". In your serPort.write call you're using the reverse of that, "\n\r". That's unusual but if that's what your device needs then so be it.

(Maybe) Illegal character in ODBC SQL Server Connection String PWD=

According to what I have researched there are no illegal characters in the PWD= field of a SQL Server Connection String.
However, using SQL Server Express 2008 I changed the SA password to a GUID, specifically:
{85C86BD7-B15F-4C51-ADDA-3B6A50D89386}
So when connecting via ODBC I use this connection string:
"Driver={SQL Server};Server=.\\MyInstance;Database=Master;UID=SA;PWD={85C86BD7-B15F-4C51-ADDA-3B6A50D89386};"
But it comes back as Login failed for SA.
However, if I change the SA password to something just as long but without {}- it succeeds! Are there certain characters in PWD= that need to be escaped? I tried all different combinations with no luck.
As Microsoft's documentation states (emphasis added) --
Connection strings used by ODBC have the following syntax:
connection-string ::= empty-string[;] | attribute[;] | attribute; connection-string
empty-string ::=
attribute ::= attribute-keyword=[{]attribute-value[}]
attribute-value ::= character-string
attribute-keyword ::= identifier
Attribute values can optionally be enclosed in braces, and it is good practice to do so. This avoids problems when attribute values contain non-alphanumeric characters. The first closing brace in the value is assumed to terminate the value, so values cannot contain closing brace characters.
I would suggest you simply remove the braces when you set the password, and then the connect string you provided above should work fine.
ADDITION
I dug a bit further on Microsoft's site, and found some ABNF rules which may be relevant --
SC = %x3B ; Semicolon
LCB = %x7B ; Left curly brackets
RCB = %x7D ; Right curly brackets
EQ = %x3D ; Equal sign
ESCAPEDRCB = 2RCB ; Double right curly brackets
SpaceStr = *(SP) ; Any number (including 0) spaces
ODBCConnectionString = *(KeyValuePair SC) KeyValuePair [SC]
KeyValuePair = (Key EQ Value / SpaceStr)
Key = SpaceStr KeyName
KeyName = (nonSP-SC-EQ *nonEQ)
Value = (SpaceStr ValueFormat1 SpaceStr) / (ValueContent2)
ValueFormat1 = LCB ValueContent1 RCB
ValueContent1 = *(nonRCB / ESCAPEDRCB)
ValueContent2 = SpaceStr / SpaceStr (nonSP-LCB-SC) *nonSC
nonRCB = %x01-7C / %x7E- FFFF ; not "}"
nonSP-LCB-SC = %x01-1F / %x21-3A / %x3C-7A / %x7C- FFFF ; not space, "{" or ";"
nonSP-SC-EQ = %x01-1F / %x21-3A / %x3C / %x3E- FFFF ; not space, ";" or "="
nonEQ = %x01-3C / %x3E- FFFF ; not "="
nonSC = %x01-003A / %x3C- FFFF ; not ";"
...
ValueFormat1 is recommended to use when there is a need for Value to contain LCB, RCB, or EQ. ValueFormat1 MUST be used when the Value contains SC or starts with LCB.
ValueContent1 MUST be enclosed by LCB and RCB. Spaces before the enclosing LCB and after the enclosing RCB MUST be ignored.
ValueContent1 MUST be contained in ValueFormat1. If there is an RCB in the ValueContent1, it MUST use the two-character sequence ESCAPEDRCB to represent the one-character value RCB.
All of which comes down to... I believe the following connect string should work for you (note that there are 2 left/open braces and 3 right/close braces on the PWD value) --
"Driver={SQL Server};Server=.\\MyInstance;Database=Master;UID=SA;PWD={{85C86BD7-B15F-4C51-ADDA-3B6A50D89386}}};"
According to this page, the only legal "special character" in a name (I think they're talking about the DSN) is the UNDERSCORE:
The ODBC specification (and the SQL specification) states that names
must be in the format of " letter[digit | letter | _]...". The only
special character allowed is an underscore.
There was no reference to "the ODBC Specification". This page says it's the the ODBC 4.0 Spec.

Can R read html-encoded emoji characters?

Question
My question, explained below, is:
How can R be used to read a string that includes HTML emoji codes like 🤗?
I'd like to:
(1) represent the emoji symbol (e.g., as a unicode symbol: 🤗) in the parsed string, OR(2) convert it into its text equivalent (":hugging face:")
Background
I have an XML dataset of text messages (from the Android/iOS app Signal) that I am reading into R for a text mining project. The data look like this, with each text message represented in an sms node:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!-- File Created By Signal -->
<smses count="1">
<sms protocol="0" address="+15555555555" contact_name="Jane Doe" date="1483256850399" readable_date="Sat, 31 Dec 2016 23:47:30 PST" type="1" subject="null" body="Hug emoji: 🤗" toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" />
</smses>
Problem
I am currently reading the data using the xml2 package for R. When I use the xml2::read_xml function, however, I get the following error message:
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
xmlParseCharRef: invalid xmlChar value 55358
Which, as I understand, indicates that the emoji character is not recognized as valid XML.
Using the xml2::read_html function does work, but drops the emoji character. A small example of this is here:
example_text <- "Hugging emoji: 🤗"
xml2::xml_text(xml2::read_html(paste0("<x>", example_text, "</x>")))
(Output: [1] "Hugging emoji: ")
This character is valid HTML -- Googling 🤗 actually converts it in the search bar to the "hugging face" emoji, and brings up results relating to that emoji.
Other information I've found that seems relevant to this question
I've been searching Stack Overflow, and have not found any questions relating to this particular issue. I've also not been able to find a table that straightforwardly gives HTML codes next to the emoji they represent, and so am not able to do an (albeit inefficient) conversion of these HTML codes to their textual equivalents in a big loop before parsing the dataset; for example, neither this list nor its underlying dataset seem to include the string 55358.
tl;dr: the emoji aren't valid HTML entities; UTF-16 numbers have been used to build them instead of Unicode code points. I describe an algorithm at the bottom of the answer to convert them so that they are valid XML.
Identifying the Problem
R definitely handles emoji:
In fact, a few packages exist for handling emoji in R. For example, the emojifont and emo packages both let you retrieve emoji based on Slack-style keywords. It's just a question of getting your source characters through from the HTML-escaped format so that you can convert them.
xml2::read_xml seems to do fine with other HTML entities, like an ampersand or double quotes. I looked at this SO answer to see whether there were any XML-specific constraints on HTML entities, and it seemed like they were storing emoji fine. So I tried changing the emoji codes in your reprex to the ones in that answer:
body="Hug emoji: 😀😃"
And, sure enough, they were preserved (though they're obviously not the hug emoji anymore):
> test8 = read_html('Desktop/test.xml')
> test8 %>% xml_child() %>% xml_child() %>% xml_child() %>% xml_attr('body')
[1] "Hug emoji: \U0001f600\U0001f603"
I looked up the hug emoji on this page, and the decimal HTML entity given there is not 🤗. It looks like the UTF-16 decimal codes for the emoji have been wrapped in &# and ;.
In conclusion, I think the answer is that your emoji are, in fact, not valid HTML entities. If you can't control the source, you might need to do some pre-processing to account for these errors.
So, why does the browser convert them properly? I'm wondering if the browser is a little more flexible with these things and is making some guesses about what those codes could be. I'm just speculating, though.
Converting UTF-16 to Unicode code points
After some more investigation, it looks like valid emoji HTML entities use the Unicode code point (in decimal, if it's &#...;, or hex, if it's &#x...;). The Unicode code point is different from the UTF-8 or UTF-16 code. (That link explains a lot about how emoji and other characters are variously encoded, BTW! Good read.)
So we need to convert the UTF-16 codes used in your source data to Unicode code points. Referring to this Wikipedia article on UTF-16, I've verified how it's done. Each Unicode code point (our target) is a 20-bit number, or five hex digits. When going from Unicode to UTF-16, you split it up into two 10-bit numbers (the middle hex digit gets cut in half, with two of its bits going to each block), do some maths on them and get your result).
Going backwards, as you want to, it's done like this:
Your decimal UTF-16 number (which is in two separate blocks for now) is 55358 56599
Converting those blocks to hex (separately) gives 0x0d83e 0x0dd17
You subtract 0xd800 from the first block and 0xdc00 from the second to give 0x3e 0x117
Converting them to binary, padding them out to 10 bits and concatenating them, it's 0b0000 1111 1001 0001 0111
Then we convert that back to hex, which is 0x0f917
Finally, we add 0x10000, giving 0x1f917
Therefore, our (hex) HTML entity is 🤗. Or, in decimal, &#129303
So, to preprocess this dataset, you'll need to extract the existing numbers, use the algorithm above, then put the result back in (with one &#...;, not two).
Displaying emoji in R
As far as I'm aware, there's no solution to printing emoji in the R console: they always come out as "U0001f600" (or what have you). However, the packages I described above can help you plot emoji in some circumstances (I'm hoping to expand ggflags to display arbitrary full-colour emoji at some point). They can also help you search for emoji to get their codes, but they can't get names given the codes AFAIK. But maybe you could try importing the emoji list from emojilib into R and doing a join with your data frame, if you've extracted the emoji codes into a column, to get the English names.
JavaScript Solution
I had this exact same problem, but needed the solution in JavaScript, not R. Using rensa's comment above (hugely helpful!), I created the following code to solve this issue, and I just wanted to share it in case anyone else happens across this thread as I did, but needed it in JavaScript.
str.replace(/(&#\d+;){2}/g, function(match) {
match = match.replace(/&#/g,'').split(';');
var binFirst = (parseInt('0x' + parseInt(match[0]).toString(16)) - 0xd800).toString(2);
var binSecond = (parseInt('0x' + parseInt(match[1]).toString(16)) - 0xdc00).toString(2);
binFirst = '0000000000'.substr(binFirst.length) + binFirst;
binSecond = '0000000000'.substr(binSecond.length) + binSecond;
return '&#x' + (('0x' + (parseInt(binFirst + binSecond, 2).toString(16))) - (-0x10000)).toString(16) + ';';
});
And, here's a full snippet of it working if you'd like to run it:
var str = '😊😘😀😆😂😁'
str = str.replace(/(&#\d+;){2}/g, function(match) {
match = match.replace(/&#/g,'').split(';');
var binFirst = (parseInt('0x' + parseInt(match[0]).toString(16)) - 0xd800).toString(2);
var binSecond = (parseInt('0x' + parseInt(match[1]).toString(16)) - 0xdc00).toString(2);
binFirst = '0000000000'.substr(binFirst.length) + binFirst;
binSecond = '0000000000'.substr(binSecond.length) + binSecond;
return '&#x' + (('0x' + (parseInt(binFirst + binSecond, 2).toString(16))) - (-0x10000)).toString(16) + ';';
});
document.getElementById('result').innerHTML = str;
// 😊😘😀😆😂😁
// is turned into
// 😊😘😀😆😂😁
// which is rendered by the browser as the emojis
Original:<br>😊😘😀😆😂😁<br><br>
Result:<br>
<div id='result'></div>
My SMS XML Parser application is working great now, but it stalls out on large XML files so, I'm thinking about rewriting it in PHP. If/when I do, I'll post that code as well.
I've implemented the algorithm described by rensa above in R, and am sharing it here. I am happy to release the code snippet below under a CC0 dedication (i.e., putting this implementation into the public domain for free reuse).
This is a quick and unpolished implementation of rensa's algorithm, but it works!
utf16_double_dec_code_to_utf8 <- function(utf16_decimal_code){
string_elements <- str_match_all(utf16_decimal_code, "&#(.*?);")[[1]][,2]
string3a <- string_elements[1]
string3b <- string_elements[2]
string4a <- sprintf("0x0%x", as.numeric(string3a))
string4b <- sprintf("0x0%x", as.numeric(string3b))
string5a <- paste0(
# "0x",
as.hexmode(string4a) - 0xd800
)
string5b <- paste0(
# "0x",
as.hexmode(string4b) - 0xdc00
)
string6 <- paste0(
stringi::stri_pad(
paste0(BMS::hex2bin(string5a), collapse = ""),
10,
pad = "0"
) %>%
stringr::str_trunc(10, side = "left", ellipsis = ""),
stringi::stri_pad(
paste0(BMS::hex2bin(string5b), collapse = ""),
10,
pad = "0"
) %>%
stringr::str_trunc(10, side = "left", ellipsis = "")
)
string7 <- BMS::bin2hex(as.numeric(strsplit(string6, split = "")[[1]]))
string8 <- as.hexmode(string7) + 0x10000
unicode_pattern <- string8
unicode_pattern
}
make_unicode_entity <- function(x) {
paste0("\\U000", utf16_double_dec_code_to_utf8(x))
}
make_html_entity <- function(x) {
paste0("&#x", utf16_double_dec_code_to_utf8(x), ";")
}
# An example string, using the "hug" emoji:
example_string <- "test 🤗 test"
output_string <- stringr::str_replace_all(
example_string,
"(&#[0-9]*?;){2}", # Find all two-character "&#...;&#...;" codes.
make_unicode_entity
# make_html_entity
)
cat(output_string)
# To print Unicode string (doesn't display in R console, but can be copied and
# pasted elsewhere:
# (This assumes you've used 'make_unicode_entity' above in the str_replace_all
# call):
stringi::stri_unescape_unicode(output_string)
Translated Chad's JavaScript answer to Go since I too had the same issue, but needed a solution in Go.
https://play.golang.org/p/h9JBFzqcd90
package main
import (
"fmt"
"html"
"regexp"
"strconv"
"strings"
)
func main() {
emoji := "😊😘😀😆😂😁"
regexp := regexp.MustCompile(`(&#\d+;){2}`)
matches := regexp.FindAllString(emoji, -1)
var builder strings.Builder
for _, match := range matches {
s := strings.Replace(match, "&#", "", -1)
parts := strings.Split(s, ";")
a := parts[0]
b := parts[1]
c, err := strconv.Atoi(a)
if err != nil {
panic(err)
}
d, err := strconv.Atoi(b)
if err != nil {
panic(err)
}
c = c - 0xd800
d = d - 0xdc00
e := strconv.FormatInt(int64(c), 2)
f := strconv.FormatInt(int64(d), 2)
g := "0000000000"[2:len(e)] + e
h := "0000000000"[10:len(f)] + f
j, err := strconv.ParseInt(g + h, 2, 64)
if err != nil {
panic(err)
}
k := j + 0x10000
_, err = builder.WriteString("&#x" + strconv.FormatInt(k, 16) + ";")
if err != nil {
panic(err)
}
}
fmt.Println(html.UnescapeString(emoji))
emoji = html.UnescapeString(builder.String())
fmt.Println(emoji)
}

Concat 2 strings erlang and send with http

I'm trying to concat 2 variables Address and Payload. After that I want to send them with http to a server but I have 2 problems. When i try to concat the 2 variables with a delimiter ';' it doesn't work. Also sending the data of Payload or Address doesn't work. This is my code:
handle_rx(Gateway, #link{devaddr=DevAddr}=Link, #rxdata{port=Port, data= RxData }, RxQ)->
Data = base64:encode(RxData),
Devaddr = base64:encode(DevAddr),
TextAddr="Device address: ",
TextPayload="Payload: ",
Address = string:concat(TextAddr, Devaddr),
Payload = string:concat(TextPayload, Data),
Json=string:join([Address,Payload], "; "),
file:write_file("/tmp/foo.txt", io_lib:fwrite("~s.\n", [Json] )),
inets:start(),
ssl:start(),
httpc:request(post, {"http://192.168.0.121/apiv1/lorapacket/rx", [], "application/x-www-form-urlencoded", Address },[],[]),
ok;
handle_rx(_Gateway, _Link, RxData, _RxQ) ->
{error, {unexpected_data, RxData}}.
I have no errors that I can show you. When I write Address or Payload individually to the file it works but sending doesn't work...
Thank you for your help!
When i try to concat the 2 variables with a delimiter ';' it doesn't work.
5> string:join(["hello", <<"world">>], ";").
[104,101,108,108,111,59|<<"world">>]
6> string:join(["hello", "world"], ";").
"hello;world"
base64:encode() returns a binary, yet string:join() requires string arguments. You can do this:
7> string:join(["hello", binary_to_list(<<"world">>)], ";").
"hello;world"
Response to comment:
In erlang the string "abc" is equivalent to the list [97,98,99]. However, the binary syntax <<"abc">> is not equivalent to <<[97,98,99]>>, rather the binary syntax <<"abc">> is special short hand notation for the binary <<97, 98, 99>>.
Therefore, if you write:
Address = [97,98,99].
then the code:
Bin = <<Address>>.
after variable substitution becomes:
Bin = <<[97,98,99]>>.
and that isn't legal binary syntax.
If you need to convert a string/list contained in a variable, like Address, to a binary, you use list_to_binary(Address)--not <<Address>>.
In your code here:
Json = string:join([binary_to_list(<<Address>>),
binary_to_list(<<Pa‌​yload>>)],
";").
Address and Payload were previously assigned the return value of string:concat(), which returns a string, so there is no reason to (attempt) to convert Address to a binary with <<Address>>, then immediately convert the binary back to a string with binary_to_list(). Instead, you would just write:
Json = string:join(Address, Payload, ";")
The problem with your original code is that you called string:concat() with a string as the first argument and a binary as the second argument--yet string:concat() takes two string arguments. You can use binary_to_list() to convert a binary to the string that you need for the second argument.
Sorry I'm new to Erlang
As with any language, you have to study the basics and write numerous toy examples before you can start writing code that actually does something.
You don't have to concatenate strings. It is called iolist and is one of best things in Erlang:
1> RxData = "Hello World!", DevAddr = "Earth",
1> Data = base64:encode(RxData), Devaddr = base64:encode(DevAddr),
1> TextAddr="Device address", TextPayload="Payload",
1> Json=["{'", TextAddr, "': '", Devaddr, "', '", TextPayload, "': '", Data, "'}"].
["{'","Device address","': '",<<"RWFydGg=">>,"', '",
"Payload","': '",<<"SGVsbG8gV29ybGQh">>,"'}"]
2> file:write_file("/tmp/foo.txt", Json).
ok
3> file:read_file("/tmp/foo.txt").
{ok,<<"{'Device address': 'RWFydGg=', 'Payload': 'SGVsbG8gV29ybGQh'}">>}

I want to replace all commas between the 13th comma starting from left and 12th comma starting from right in Unix

Present
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer, Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
Requirement
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
Current
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner,your current Core balance is Rs.29.8,GSM balance is Rs.12892.14,MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
Requirement
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner your current Core balance is Rs.29.8 GSM balance is Rs.12892.14 MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
I need to replace all the commas between the 13th comma from the left to the 12th comma starting from the right with a space, on a Unix system.
Here's a moderately succinct but mostly inscrutable (if not incomprehensible) solution using Perl.
#!/usr/bin/perl -anlF,
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
$, = ","; # Perl is obscure on occasion!
my($nflds) = scalar(#F);
print #F[0 .. $lhs-1], "#F[$lhs .. $nflds-$rhs-1]", #F[$nflds-$rhs .. $nflds-1]
if ($nflds > $lhs + $rhs);
The shebang line uses -l to make Perl handle newlines automatically. See perldoc perlrun.
It also uses -F, which, in Perl 5.20.0 and later, is explicitly documented to automatically put Perl into -a (awk) mode and -n (read loop but don't print) mode. The input lines are automatically split into the array F using , as the delimiter. Earlier versions of Perl do not infer -a and -n from the presence of -F, so the code (now) uses -an as well as -F,. The updated code has been shown to work with Perl 5.10.0 and with each major release up to 5.18, as well as 5.20 and later.
The use strict; and use warnings; lines set Perl to fussy. You should always use them.
The two assignments set up the values you specified in the question, except that it seems to be the 13th field from the right, rather than the 12th, that you want combined. They're easily fungible if you need to.
The $, = ","; lines sets the output field separator (OFS in Awk, and $OFS in Perl under use English qw( -no_match_vars );). See perldoc English and perldoc perlvars.
The my($nflds) = scalar(#F); line determines the number of fields.
The print line is conditional on there being enough fields.
It uses Perl's array slices to:
print fields 0..$lhs-1 as separate comma-separated fields
combine fields $lhs..$nflds-$rhs-1 as a single space-separated field (by virtue of the string around the slice)
print fields $nflds-$rhs..$nflds-1 as separate comma-separated fields
The output from that, given your input data, is:
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner your current Core balance is Rs.29.8 GSM balance is Rs.12892.14 MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
Note that the leading space on one of the fields in the first line is preserved.
I didn't come up with that immediately. I generated a more verbose solution like this, first:
#!/usr/bin/env perl -l
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
while (<>)
{
chomp;
my(#fields) = split /,/;
my($nflds) = scalar(#fields);
my(#output) = #fields;
if ($nflds > $lhs + $rhs)
{
my(#combine) = #fields[$lhs .. $nflds-$rhs-1];
my $composite = "#combine";
#output = (#fields[0 .. $lhs-1], $composite, #fields[$nflds-$rhs .. $nflds-1]);
}
local $, = ",";
print #output;
}
This produces the same output as the other script. I had the scripts called rs13.pl (verbose) and rs17.pl (compact) and checked them like this (data contains your two lines of input data):
diff <(perl rs13.pl data) <(perl rs17.pl data)
There was no difference.
There are ways to make the compact solution more compact, but I'm not sure they help much.
Here is another version that uses the splice and join functions instead of array slices. In some ways, it is tidier than the other two, but it doesn't have the same protection against lines with too few fields in them.
#!/usr/bin/perl -anlF,
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
$, = ","; # Perl is obscure on occasion!
my($nflds) = scalar(#F);
splice(#F, $lhs, $nflds - $lhs - $rhs, join(' ', #F[$lhs .. $nflds-$rhs-1]));
print #F;
It produces the same result as the other two scripts.
Yes, you could write the code in Awk; it wouldn't be as compact as this.

Resources