HtmlProvider parses Fraction As DateTime - datetime

Using HtmlProvider to access a web-based table sometimes returns a fraction as a string (correct) and, at other times, returns a DateTime (incorrect).
What am I missing?
module Test =
open FSharp.Data
let [<Literal>] url = "https://www.example.com/fractions"
type profile = HtmlProvider<url>
let profile = profile.Load(url)
let [<Literal>] resultFile = #"C:\temp\data\Profile.csv"
let CsvResult =
do
use writer = new StreamWriter(resultFile, false)
writer.WriteLine "\"Date\";\"Fraction\""
for row in profile.Tables.Table1.Rows do
"\"" + row.``Date``.ToString() + "\"" + ";" |> writer.Write
"\"" + row.``Fraction``.ToString() + "\"" + ";" |> writer.WriteLine
writer.Close
let csvResult = CsvResult

Without seeing sample data I can't be 100% certain, but I'm guessing that it's parsing fractions as dates if the numbers involved would be valid dates in the culture you're using: e.g., 1/4 would be a valid date in any culture that uses / as a separator, and would be treated either as April 1st or as January 4th, depending on which parsing culture your system defaults to.
Other type providers in FSharp.Data (such as the CSV type provideryou could ) allow you to configure how each column will be parsed, but that's not an option the HTML type provider gives you. (Which is a bit of a missing feature, of course). But since the HTML type provider does allow you to specify the culture info for datetime and number parsing, one way you might be able to work around this is specify a culture that does not use / as a separator (but still uses . as a decimal point, since otherwise if the HTML you're parsing has numbers written like 1,000 for one thousand, that could be interpreted as 1). One such culture is the en-IN culture ("English (India)"), where the date separator is - and the decimal point is ..
So try passing Culture=System.Globalization.CultureInfo.GetCultureInfo("en-IN") in your HtmlProvider options, and see if that helps it stop treating fractions as dates.

The following combination of functions worked:
// http://www.fssnip.net/29/title/Regular-expression-active-pattern
module Solution =
open System
open System.Text.RegularExpressions
open FSharp.Data
let (|Regex|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then Some(List.tail [ for g in m.Groups -> g.Value ])
else None
let ptrnFraction = #"^([0-9]?[0-9]?)(\/)([0-9]?[0-9]?)$"
let ptrnDateTime = #"(\d{2})\/(\d{2})\/(\d{4}) (\d{2}):(\d{2}):(\d{2})"
let ToFraction input =
match input with
| Regex ptrnFraction [ numerator; operator; denominator ] ->
(numerator + operator + denominator).ToString()
| Regex ptrnDateTime [ day; month; year; hours; minutes; seconds ] ->
(day + "/" + month).ToString()
| _ -> "Not valid!"
let dtInput = #"05/09/2017 00:00:00"
let frcInput = #"13/20"
let outDate = ToFraction dtInput
printfn "Out Date: %s" outDate
let outFraction = ToFraction frcInput
printfn "Out Fraction: %s" outFraction
//Output:> Out Date: 05/09 Out Fraction: 13/20
Thus, I was able to replace:
"\"" + row.``Fraction``.ToString() + "\"" + ";" |> writer.WriteLine
with:
"\"" + ToFraction(row.``Fraction``.ToString()) + "\"" + ";" |> writer.Write
Thanks to #rmunn for the clarity of his explanations and the benefit of his expertise.

Related

pyparsing parse c/cpp enums with values as user defined macros

I have a usecase where i need to match enums where values can be userdefined macros.
Example enum
typedef enum
{
VAL_1 = -1
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = **TEST_ENUM_CUSTOM(1,2)**,
}MyENUM;
I am using the below code, if i don't use format as in VAL_4 it works. I need match format as in VAL_4 as well. I am new to pyparsing, any help is appeciated.
My code:
BRACE, RBRACE, EQ, COMMA = map(Suppress, "{}=,")
_enum = Suppress("enum")
identifier = Word(alphas, alphanums + "_")
integer = Word("-"+alphanums) **#I have tried to "_(,)" to this but is not matching.**
enumValue = Group(identifier("name") + Optional(EQ + integer("value")))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue) + Optional(COMMA))
enum = _enum + Optional(identifier("enum")) + LBRACE + enumList("names") + RBRACE + Optional(identifier("typedef"))
enum.ignore(cppStyleComment)
enum.ignore(cStyleComment)
Thanks
-Purna
Just adding more characters to integer is just the wrong way to go. Even this expression:
integer = Word("-"+alphanums)
isn't super-great, since it would match "---", "xyz", "q--10-", and many other non-integer strings.
Better to define integer properly. You could do:
integer = Combine(Optional('-') + Word(nums))
but I've found that for these low-level expressions that occur many places in your parse string, a Regex is best:
integer = Regex(r"-?\d+") # Regex(r"-?[0-9]+") if you like more readable re's
Then define one for hex_integer also,
Then to add macros, we need a recursive expression, to handle the possibility of macros having arguments that are also macros.
So at this point, we should just stop writing code for a bit, and do some design. In parser development, this design usually looks like a BNF, where you describe your parser in a sort of pseudocode:
enum_expr ::= "typedef" "enum" [identifier]
"{"
enum_item_list
"}" [identifier] ";"
enum_item_list ::= enum_item ["," enum_item]... [","]
enum_item ::= identifier "=" enum_value
enum_value ::= integer | hex_integer | macro_expression
macro_expression ::= identifier "(" enum_value ["," enum_value]... ")"
Note the recursion of macro_expression: it is used in defining enum_value, but it includes enum_value as part of its own definition. In pyparsing, we use a Forward to set up this kind of recursion.
See how that BNF is implemented in the code below. I build on some of the items you posted, but the macro expression required some rework. The bottom line is "don't just keep adding characters to integer trying to get something to work."
LBRACE, RBRACE, EQ, COMMA, LPAR, RPAR, SEMI = map(Suppress, "{}=,();")
_typedef = Keyword("typedef").suppress()
_enum = Keyword("enum").suppress()
identifier = Word(alphas, alphanums + "_")
# define an enumValue expression that is recursive, so that enumValues
# that are macros can take parameters that are enumValues
enumValue = Forward()
# add more types as needed - parse action on hex_integer will do parse-time
# conversion to int
integer = Regex(r"-?\d+").addParseAction(lambda t: int(t[0]))
# or just use the signed_integer expression found in pyparsing_common
# integer = pyparsing_common.signed_integer
hex_integer = Regex(r"0x[0-9a-fA-F]+").addParseAction(lambda t: int(t[0], 16))
# a macro defined using enumValue for parameters
macro_expr = Group(identifier + LPAR + Group(delimitedList(enumValue)) + RPAR)
# use '<<=' operator to attach recursive definition to enumValue
enumValue <<= hex_integer | integer | macro_expr
# remaining enum expressions
enumItem = Group(identifier("name") + Optional(EQ + enumValue("value")))
enumList = Group(delimitedList(enumItem) + Optional(COMMA))
enum = (_typedef
+ _enum
+ Optional(identifier("enum"))
+ LBRACE
+ enumList("names")
+ RBRACE
+ Optional(identifier("typedef"))
+ SEMI
)
# this comment style includes cStyleComment too, so no need to
# ignore both
enum.ignore(cppStyleComment)
Try it out:
enum.runTests([
"""
typedef enum
{
VAL_1 = -1,
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = TEST_ENUM_CUSTOM(1,2)
}MyENUM;
""",
])
runTests is for testing and debugging your parser during development. Use enum.parseString(some_enum_expression) or enum.searchString(some_c_header_file_text) to get the actual parse results.
Using the new railroad diagram feature in the upcoming pyparsing 3.0 release, here is a visual representation of this parser:

Need help understanding how gsub and tonumber are used to encode lua source code?

I'm new to LUA but figured out that gsub is a global substitution function and tonumber is a converter function. What I don't understand is how the two functions are used together to produce an encoded string.
I've already tried reading parts of PIL (Programming in Lua) and the reference manual but still, am a bit confused.
local L0_0, L1_1
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
encodes = L0_0
L0_0 = gg
L0_0 = L0_0.toast
L1_1 = "__loading__\226\128\166"
L0_0(L1_1)
L0_0 = encodes
L1_1 = --"The Encoded String"
L0_0 = L0_0(L1_1)
L1_1 = load
L1_1 = L1_1(L0_0)
pcall(L1_1)
I removed the encoded string where I put the comment because of how long it was. If needed I can upload the encoded string as well.
gsub is being used to get 2 digit sections of A0_2. This means the string A0_3 is a 2 digit hexadecimal number but it is not in a number format so we cannot preform math on the value. A0_3 being a hex number can be inferred based on how tonubmer is used.
tonumber from Lua 5.1 Reference Manual:
Tries to convert its argument to a number. If the argument is already a number or a string convertible to a number, then tonumber returns this number; otherwise, it returns nil.
An optional argument specifies the base to interpret the numeral. The base may be any integer between 2 and 36, inclusive. In bases above 10, the letter 'A' (in either upper or lower case) represents 10, 'B' represents 11, and so forth, with 'Z' representing 35. In base 10 (the default), the number can have a decimal part, as well as an optional exponent part (see ยง2.1). In other bases, only unsigned integers are accepted.
So tonumber(A0_3, 16) means we are expecting for A0_3 to be a base 16 number (hexadecimal).
Once we have the number value of A0_3 we do some math and finally convert it to a character.
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
This block of code takes a string of hex digits and converts them into chars. tonumber is being used to allow for the manipulation of the values.
Here is an example of how this works with Hello World:
local str = "Hello World"
local hex_str = ''
for i = 1, #str do
hex_string = hex_string .. string.format("%x", str:byte(i,i))
end
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
local encoded = L0_0(hex_str)
print(encoded)
Output
;X__bJbe_W
And taking it back to the orginal string:
function decode(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 13) % 256)
end))
end
hex_string = ''
for i = 1, #encoded do
hex_string = hex_string .. string.format("%x", encoded:byte(i,i))
end
print(decode(hex_string))

Why incorrect convert from string to date?

In my Kotlin code:
const val TS_DATE_PATTERN = "yyyy-MM-dd'T'HH:mm:ss.SSS"
val ts = responseJsonObject.get("TS").getAsString()
val tsDate = SimpleDateFormat(TS_DATE_PATTERN).parse(ts)
val tsDateAsString = SimpleDateFormat(TS_DATE_PATTERN).format(tsDate)
logger.info("ts = " + ts + " -> tsDate = " + tsDate + " -> tsDateAsString = " + tsDateAsString)
And here the (formatted for readability) result:
ts = 2019-01-14T22:56:30.429582
tsDate = Mon Jan 14 23:03:39 EET 2019
tsDateAsString = 2019-01-14T23:03:39.582
As you can see the ts and tsDateAsString have different times, although they came from the same starting point.
E.g. ts = 22:56:30 but in tsDateAsString = 23:03:39
Why?
As a suggestion: whenever you can, use java.time-utilities.
SimpleDateFormat has a special handling for the milliseconds. Everything that is parsed by the S is treated as milliseconds. As long as you deal with 3-digit-milliseconds, everything is fine (you may even just use a single S (i.e. .S) for the milliseconds to parse them), but if you use 6-digit-milliseconds as input then you also get a 6-digit-millisecond(!)-value.
The 6-digit-milliseconds then are actually 3-digit seconds + the 3-digit milliseconds. That is where the deviation is coming from.
How to solve that? Well either shorten the input time string and lose some precision or use the preferred DateTimeFormatter instead with a pattern matching your input, i.e. yyyy-MM-dd'T'HH:mm:ss.SSSSSS:
val TS_DATE_PATTERN = "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
val formatter = DateTimeFormatter.ofPattern(TS_DATE_PATTERN)
val tsDate = formatter.parse(ts) // now the value as you would expect it...
Transforming that to a TimeStamp will work as follows:
Timestamp.valueOf(LocalDateTime.from(tsDate))

How do you access name of a ProtoField after declaration?

How can I access the name property of a ProtoField after I declare it?
For example, something along the lines of:
myproto = Proto("myproto", "My Proto")
myproto.fields.foo = ProtoField.int8("myproto.foo", "Foo", base.DEC)
print(myproto.fields.foo.name)
Where I get the output:
Foo
An alternate method that's a bit more terse:
local fieldString = tostring(field)
local i, j = string.find(fieldString, ": .* myproto")
print(string.sub(fieldString, i + 2, j - (1 + string.len("myproto")))
EDIT: Or an even simpler solution that works for any protocol:
local fieldString = tostring(field)
local i, j = string.find(fieldString, ": .* ")
print(string.sub(fieldString, i + 2, j - 1))
Of course the 2nd method only works as long as there are no spaces in the field name. Since that's not necessarily always going to be the case, the 1st method is more robust. Here is the 1st method wrapped up in a function that ought to be usable by any dissector:
-- The field is the field whose name you want to print.
-- The proto is the name of the relevant protocol
function printFieldName(field, protoStr)
local fieldString = tostring(field)
local i, j = string.find(fieldString, ": .* " .. protoStr)
print(string.sub(fieldString, i + 2, j - (1 + string.len(protoStr)))
end
... and here it is in use:
printFieldName(myproto.fields.foo, "myproto")
printFieldName(someproto.fields.bar, "someproto")
Ok, this is janky, and certainly not the 'right' way to do it, but it seems to work.
I discovered this after looking at the output of
print(tostring(myproto.fields.foo))
This seems to spit out the value of each of the members of ProtoField, but I couldn't figure out the correct way to access them. So, instead, I decided to parse the string. This function will return 'Foo', but could be adapted to return the other fields as well.
function getname(field)
--First, convert the field into a string
--this is going to result in a long string with
--a bunch of info we dont need
local fieldString= tostring(field)
-- fieldString looks like:
-- ProtoField(188403): Foo myproto.foo base.DEC 0000000000000000 00000000 (null)
--Split the string on '.' characters
a,b=fieldString:match"([^.]*).(.*)"
--Split the first half of the previous result (a) on ':' characters
a,b=a:match"([^.]*):(.*)"
--At this point, b will equal " Foo myproto"
--and we want to strip out that abreviation "abvr" part
--Count the number of times spaces occur in the string
local spaceCount = select(2, string.gsub(b, " ", ""))
--Declare a counter
local counter = 0
--Declare the name we are going to return
local constructedName = ''
--Step though each word in (b) separated by spaces
for word in b:gmatch("%w+") do
--If we hav reached the last space, go ahead and return
if counter == spaceCount-1 then
return constructedName
end
--Add the current word to our name
constructedName = constructedName .. word .. " "
--Increment counter
counter = counter+1
end
end

In Scala 2.8, how to access a substring by its length and starting index?

I've got date and time in separate fields, in yyyyMMdd and HHmmss formats respectively. To parse them I think to construct a yyyy-MM-ddTHH:mm:ss string and feed this to joda-time constructor. So I am looking to get 1-st 4 digits, then 2 digits starting from the index 5, etc. How to achieve this? List.fromString(String) (which I found here) seems to be broken.
The substring method certainly can get you there but String in Scala 2.8 also supports all other methods on sequences. The ScalaDoc for class StringOps gives a complete list.
In particular, the splitAt method comes in handly. Here's a REPL interaction which shows how.
scala> val ymd = "yyyyMMdd"
ymd: java.lang.String = yyyyMMdd
scala> val (y, md) = ymd splitAt 4
y: String = yyyy
md: String = MMdd
scala> val (m, d) = md splitAt 2
m: String = MM
d: String = dd
scala> y+"-"+m+"-"+d
res3: java.lang.String = yyyy-MM-dd
Just use the substring() method on the string. Note that Scala strings behave like Java strings (with some extra methods), so anything that's in java.lang.String can also be used on Scala strings.
val s = "20100903"
val t = s.substring(0, 4) // t will contain "2010"
(Note that the arguments are not length and starting index, but starting index (inclusive) and ending index (exclusive)).
But if this is about parsing dates, why don't you just use java.text.SimpleDateFormat, like you would in Java?
val s = "20100903"
val fmt = new SimpleDateFormat("yyyyMMdd")
val date = fmt.parse(s) // will give you a java.util.Date object
If you're using Joda Time, you should be able to use
val date = DateTimeFormat.forPattern("yyyyMMdd, HHmmss")
.parseDateTime(field1 + ", " + field2)
For the more general problem of parsing Strings like this, it can be helpful to use a Regex (although I wouldn't recommend it in this case):
scala> val Date = "(\\d\\d\\d\\d)(\\d\\d)(\\d\\d)".r
Date: scala.util.matching.Regex = (\d\d\d\d)(\d\d)(\d\d)
scala> "20100903" match {
| case Date(year, month, day) => year + "-" + month + "-" + day
| }
res1: java.lang.String = 2010-09-03
val field1="20100903"
val field2="100925"
val year = field1.substring(1,5)
val month = field1.substring(5,7)
val day = ...
...
val toYodaTime = year + "-" + month+"-"+day+ ...

Resources