Encryption or Hashing of Date Value - encryption

I have an old program that has been discontinued which communicates with an SQL database. When I enter certain information in the defunct software, it is encrypted, encoded, or hashed before being entered into the database.
I am creating another application to interact with the same data, and I need to figure out how the end result is being produced.
Here's an example:
I enter 6/18/2017, I get y/7w/iXIE
I enter 6/18/2099, I get y/7w/iXBM
I enter 6/12/2017, I get y/7c/iXIE
I enter 12/11/2018, I get SN/u0/ZmWk
The last one throws me for a loop... what method is being used and how can I replicate this?

It might be format preserving encryption or just substatutions. In all cases the number of characgters in each section delimited by / are the same number of characters. With enough samples, all 12 months, 31 days and years you should be able to match the method.
6/18/2017
y/7w/iXIE
6/18/2099
y/7w/iXBM
6/12/2017
y/7c/iXIE
12/11/2018
SN/u0/ZmWk
months: 6 -> y, 12 -> SN
days: 11 -> u0, 12 -> 7c, 18 -> 7w
years: 2017 -> iXIE, 2018 -> ZmWk, 2099 -> iXBM

Related

Testing bitmask when stored as integer and available as string

I have a bitmask (really a 'flagmask') of integer values (1, 2, 4, 8, 16 etc.) which apply to a field and I need to store this in a (text) log file. What I effectively store is something like "x=296" which indicates that for field "x", flags 256, 32 and 8 were set.
When searching the logs, how can I easily search this text string ("x=nnn") and determine from the value of "nnn" whether a specific flag was set? For instance, how could I look at the number and know that flag 8 was set?
I know this is a somewhat trivial question if we're doing 'true' bitmask processing, but I've not seen it asked this way before - the log searching will just be doing string matching, so it just sees a value of "296" and there is no way to convert it to its constituent flags - we're just using basic string searching with maybe some easy SQL in there.
Your best bet is to do full string regex matching.
For bit 8 set, do this:
SELECT log_line
FROM log_table
WHERE log_line
RLIKE 'x=(128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|210|211|212|213|214|215|216|217|218|219|220|221|222|223|224|225|226|227|228|229|230|231|232|233|234|235|236|237|238|239|240|241|242|243|244|245|246|247|248|249|250|251|252|253|254|255)[^0-9]'
Or simplified to:
RLIKE 'x=(12[89]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])[^0-9]'
The string x= must exist in the logs followed by the decimal number and a non decimal after the number.
For bit 2 set, do this:
SELECT log_line FROM log_table WHERE log_line RLIKE 'x=(2|3|6|7|10|11|14|15|18|19|22|23|26|27|30|31|34|35|38|39|42|43|46|47|50|51|54|55|58|59|62|63|66|67|70|71|74|75|78|79|82|83|86|87|90|91|94|95|98|99|102|103|106|107|110|111|114|115|118|119|122|123|126|127|130|131|134|135|138|139|142|143|146|147|150|151|154|155|158|159|162|163|166|167|170|171|174|175|178|179|182|183|186|187|190|191|194|195|198|199|202|203|206|207|210|211|214|215|218|219|222|223|226|227|230|231|234|235|238|239|242|243|246|247|250|251|254|255)[^0-9]'
I test the bit2 on some live data:
SELECT count(log_line)
...
count(log_line)
128
1 row in set (0.002 sec)
Much depends on how "easy" the "easy SQL" must be.
If you can use string splitting to separate "X" from "296", then using AND with the value 296 is easy.
If you cannot, then, as you observed, 296 yields no traces of its 8th bit state. You'd need to check for all the values that have the 8th bit set, and of course that means exactly half of the available values. You can compress the regexp somewhat:
248 249 250 251 252 253 254 255 => 24[89]|25[0-5]
264 265 266 267 268 269 270 271 => 26[4-9]|27[01]
280 281 282 283 284 285 286 287 => 28[0-7]
296 297 298 299 300 301 302 303 => 29[6-9]|30[1-3]
...24[89]|25[0-5]|26[4-9]|27[01]|28[0-7]|29[6-9]|30[1-3]...
but I think that the evaluation tree won't change very much, this kind of optimization is already present in most regex engines. Performances are going to be awful.
If at all possible I would try to rework the logs (maybe with a filter) to rewrite them as "X=000100101000b" or at least "X=0128h". The latter hexadecimal would allow you to search for bit 8 looking for "X=...[89A-F]h" only.
I have several times had to do changes like these - it involves preparing a pipe filter to create the new logs (it is more complicated if the logging process writes straight to a file - at worst, you might be forced to never run searches on the current log file, or to rotate the logs when such a search is needed), then slowly during low-load periods the old logs are retrieved, decompressed, reparsed, recompressed, and stored back. It's long and boring, but doable.
For database-stored logs, it's much easier and you can even avoid any alterations to the process that does the logs - you add a flag to the rows to be converted, or persist the timestamps of the catch-up conversion:
SELECT ... FROM log WHERE creation > lowWaterMark ORDER BY creation DESC;
SELECT ... FROM log WHERE creation < highWaterMark ORDER BY creation ASC;
the retrieved rows are updated, and the appropriate watermarkers are updated with the last creation value that has been retrieved. The "lowWaterMark" pursues the logging process. This way you only can't search from the current instant down to lowWaterMark, which will "lag" behind (ultimately, just a few seconds, except under heavy load).
One way to determine if a specific flag is set in the value "nnn" is to use bitwise operators to check if the flag is present in the value. For eg. to check if flag 8 is set in the value 296, you can use the bitwise AND operator to check if the value of 8 is present:
if (nnn & 8) == 8:
print("flag 8 is set")
Another method I can think of would be to check if the value of flag 8 is present in the log string "x=nnn" by using regular expressions, where you search for the string "8" in the log.
import re
log_string = "x=296"
if re.search("8", log_string):
print("flag 8 is set")
You could also use a function that takes the log string and flag as input and returns a boolean indicating whether the flag is set in the log string.
def check_flag(log_string, flag):
nnn = int(log_string.split('=')[1])
return (nnn & flag) == flag
if check_flag("x=296", 8):
print("flag 8 is set")
use a bitwise operator & to check if a specific flag is set in the value
function hasFlag(value, flag) {
return (value & flag) === flag;
}
const value = 296;
console.log(hasFlag(value, 256)); // true
console.log(hasFlag(value, 32)); // true
console.log(hasFlag(value, 8)); // true
console.log(hasFlag(value, 4)); // false

Reading MarkLogic logs from Query Console using XQuery

I want to read MarkLogic logs (for eg : ErrorLog.txt) from query console using Xquery. I had the below code but the problem is output is not properly formatted. Result is like below
xquery version "1.0-ml";
for $hid in xdmp:hosts()
let $h := xdmp:host-name($hid)
return
xdmp:filesystem-file("file://" || $h || "/" ||xdmp:data-directory($hid) ||"/Logs/ErrorLog.txt")
Problem is result is coming as per host basis like first all log of one host is coming and then starting with time 00:00:01 of host 2 and then 00:00:01 of host 3 after running the Xquery.
2019-07-02 00:00:35.668 Info: Merging 2 MB from /cams/q06data02/testQA2/Forests/testQA2-2.2/0002b4cd to /cams/q06data02/testQA2/Forests/testQA2-2.2/0002b4ce, timestamp=15620394303480170
2019-07-02 00:00:36.007 Info: Merged 3 MB at 9 MB/sec to /cams/q06data02/testQA2/Forests/test2-2.2/0002b4ce
2019-07-02 00:00:38.161 Info: Deleted 3 MB at 399 MB/sec /cams/q06data02/test2/Forests/test2-2.2/0002b4cd
Is it possible to get the output with hostname included with log entries and also if we can sort the output by timelines something like
host 1 : 2019-07-02 00:00:01 : Info Merging ....
host 2 : 2019-07-02 00:00:02 : Info Deleted 3 MB at 399 MB/sec ...
Log files are text files. You can parse and sort them like any other text file.
Although they can get very large (GB+), so simple methods may not be performant.
Plus you need to be able to parse the text into fields in order to sort by a field.
Since the first 20 bytes of every line is the time stamp, and that timestamp is in ISO format which sorts lexically same as date, you can split the file by lines and sort using basic colation to get by time sorting of multiple files.
In V9 one can use the pair of xdmp:logfile-scan and xdmp:logmessage-parse to efficiently search over log files (remotely as well as local) and then transform the results into text, XML (attribute or element format) or JSON.
One can also use the REST API for the same.
see: https://docs.marklogic.com/REST/GET/manage/v2/logs
Once logfiles (ideally a selected subset of log messages that is small enough to manage) is converted to a structured format (xml , json or text lines) then sorting, searching, enriching etc is easily performed.
For something much better take a look at Ops Director https://docs.marklogic.com/guide/opsdir/intro

Kibana : how to do a visualisation with an mathematical expression?

So I have 3 searches.
I'm interested in 3 lines of log (each line is a document, msg is a field)
S1 : msg = Sending to ELK
S2 : msg = ELK failure - rejected
S3 : msg = ELK failure due to us
Search 1 is a try, search 2 and 3 are failures, I need graph that display this :
(CountS1-(CountS2+CountS3))/(CountS1/100) on the Y axisand the date of the log on the X axis
I know how to use the date of the logs on the X axis but for the Y axis I can only do things such as count, average, sum, etc of 1 search only.
Any ideas?
Thanks.
Yes, the best solution is go to Scripted fields and create the field that you need.
You can do in this way for example:
(doc['CountS1'].value - (doc['CountS2'].value+doc['CountS3].value))/(doc['CountS1'].value/100)
With this you have a new field that you can use only reference the name that you given to the new field. For example you name this field as Example1, then in your visualize option will appear the new field.

How to connect data dictionaries to the unlabeled data

I'm working with some large government datasets from the Department of Transportation that are available as tab-delimited text files accompanied by data dictionaries. For example, the auto complaints file is a 670Mb file of unlabeled data (when unzipped), and comes with a dictionary. Here are some excerpts:
Last updated: April 24, 2014
FIELDS:
=======
Field# Name Type/Size Description
------ --------- --------- --------------------------------------
1 CMPLID CHAR(9) NHTSA'S INTERNAL UNIQUE SEQUENCE NUMBER.
IS AN UPDATEABLE FIELD,THUS DATA FOR A
GIVEN RECORD POTENTIALLY COULD CHANGE FROM
ONE DATA OUTPUT FILE TO THE NEXT.
2 ODINO CHAR(9) NHTSA'S INTERNAL REFERENCE NUMBER.
THIS NUMBER MAY BE REPEATED FOR
MULTIPLE COMPONENTS.
ALSO, IF LDATE IS PRIOR TO DEC 15, 2002,
THIS NUMBER MAY BE REPEATED FOR MULTIPLE
PRODUCTS OWNED BY THE SAME COMPLAINANT.
Some of the fields have foreign keys listed like so:
21 CMPL_TYPE CHAR(4) SOURCE OF COMPLAINT CODE:
CAG =CONSUMER ACTION GROUP
CON =FORWARDED FROM A CONGRESSIONAL OFFICE
DP =DEFECT PETITION,RESULT OF A DEFECT PETITION
EVOQ =HOTLINE VOQ
EWR =EARLY WARNING REPORTING
INS =INSURANCE COMPANY
IVOQ =NHTSA WEB SITE
LETR =CONSUMER LETTER
MAVQ =NHTSA MOBILE APP
MIVQ =NHTSA MOBILE APP
MVOQ =OPTICAL MARKED VOQ
RC =RECALL COMPLAINT,RESULT OF A RECALL INVESTIGATION
RP =RECALL PETITION,RESULT OF A RECALL PETITION
SVOQ =PORTABLE SAFETY COMPLAINT FORM (PDF)
VOQ =NHTSA VEHICLE OWNERS QUESTIONNAIRE
There are import instructions for Microsoft Access, which I don't have and would not use if I did. But I THINK this data dictionary was meant to be machine-readable.
My question: Is this data dictionary a standard format of some kind? I've tried to Google around, but it's hard to do so without the right terminology. I would like to import into R, though I'm flexible so long as it can be done programmatically.

SSRS: How to create 'Color Analytics Map' in Reporting Services 2008 / Report Builder 3.0

I'm setting up a map report in Reporting Services (Report Builder 3.0) running on a SQL Server 2008 RC2.
I've got a dataset like this:
CODE VAL
AF 11
AL 7
DZ 7
VI 15
AD 7
AO 6
AI 8
AQ 10
I've downloaded the shape (.shp) file from here: http://www.naturalearthdata.com/downloads/110m-cultural-vectors/. I imported this file to my report and in the dialog 'Select fields to match spatial data and analytical data' I've registered a match between the spatial data field 'ISO_A2' and my own 'CODE' field. I've checked manually that these columns match.
I've set my VAL-field as the field to 'visualize' in the Wizard.
I've changed the distribution-range in the 'Map Color Rules Properties' dialog as starting from 1 through 20.
When I preview my report, all the countries are rendered with No Color (all backgroundss are grey as the background)! I tried a lot of difference color setups but no luck. Seems like the spatial data and my test dataset doesn't get join correctly!? :(
How can I ensure that the spatial data and my analytical data are joined correctly?
Am I doing something obviously wrong?
Any solutions/hints highly appreciated! ;)
Regards
Alex

Resources