awk; comparing time -- puzzling behavior of strftime - datetime

Use GNU-Awk (gawk) in UNXUTILS on a Win-7 PC. This question deals with strftime(.) and a comparison of times using that.
Have followed the discussion on how to compare strftime values because I have a similar problem. In financial market data I have a date-time string ($25) given as "03-APR-2006 09:55:25" (so time = substr($25, 13, 8)) and my objective is to count records (especially order cancellation records) that come in after 14:00:00 (2 pm).
In my code I have a line which reads
{ if ($3==3) {
{ ++CK_NR}
{ ++CO_NR[$4, substr($25, 1, 11)] }
{ if (strftime(substr($25, 13, 8)) > ("14:00:00"))\
{
{ ++CK_LATE_NR }
{ ++CO_LATE_NR[$4, substr($25, 1, 11)] }
}
}
}}
Just realized that the inequality I used -- if (strftime(substr($25, 13, 8)) > ("14:00:00")) -- has only a string in the RHS, and I didn't make this string the argument of another strftime(.). What's puzzling is that it did NOT give me an error.
Am concerned that while it has not generated any errors, and the code has run, perhaps it is giving me something other than what I intended with the code. In a Command Prompt Window, doing
gawk "{ print (strftime(\"09:55:25\") > (\"14:00:00\")) }"
does yield "0" and
gawk "{ print (strftime(\"09:55:25\") < (\"14:00:00\")) }"
does yield "1". The GNU Awk Manual (http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions) yields little information on what is required for a meaningful comparison. Just now tried the above deleting the "strftime" even from the LHS, as under
gawk "{ print ((\"09:55:25\") > (\"14:00:00\")) }"
and
gawk "{ print ((\"09:55:25\") > (\"14:00:00\")) }"
and got the same results. I want to be sure that I am getting the correct True/False result because GAWK is comparing time, and not for some other internal rule it uses in making string comparisons (causing the limited test to be only a coincidence). Can someone resolve this puzzle? Thanks. Best,
Murgie

You seem to be using string comparisons. Strings are compared by comparing the first character of each, then the second character of each, and so on. Thus, "10" is less than "9", since the ASCII value of "1" is less than that of "0", see http://www.gnu.org/software/gawk/manual/gawk.html#Variable-Typing and http://en.wikibooks.org/wiki/An_Awk_Primer/Search_Patterns_%282%29#Logic_Operators
If you want to do numeric comparison for strings on the form "xx:yy:zz" (for instance "10:22:45") then you can try to convert the string to a number first using gsub(/:/,"",str)

Related

Comparison of Hex Numbers in Tcl

I am trying to compare the two following hex numbers in a TCL Script:
0x00001111
32'h00001111
They should be equal to each other, but I am having trouble comparing them because of the difference in formatting.
I have tried using "scan" command to convert, but have not gotten the correct output I am looking for.
Code snippet:
set read1 0x00001111
set read2 32'h00001111
set new_read [scan $read2 %x]
if ($read1 == $read2) {
puts "Two values equal"
}
This code does not work and sets new_read to 4369. Any help is appreciated
You want to change the format specifier to scan to sth. like:
if {[scan $read2 "%d'h%x" bw new_read] == 2} {
# on success
} else {
# on failure
}
This way, you will be able to compare scan $read1 %x and $new_read using == & friends.

r json mongodb query $in operator syntax error due to double quotes?

I'm building a json query to pass to a mongodb database in R.
In one scenario, I have a vector of dates and I want to query the database to return all records which have a date in the relevant field that matches a date in my vector of dates.
The second scenario is the same as the first, but this time I have a vector of character strings (IDs) and need to return all the records with matching IDs.
I understood the correct way to do this in a json query is to use the $in operator, and then put my vector in an array.
However, when I pass the query to my mongodb database, the exportLogId returns NULL. I'm quite sure that the problem is something to do with how I am representing the $in operator in the final query, since I have very similarly structured queries without the $in operator and they are all working. If I look for just one of my target dates or character strings, I get the desired result.
I followed the mongodb manual here to construct my query, and the only issue I can see is that the $in operator in the output of jsonlite::toJSON() is enclosed in double quotes; whereas I think it might need to be in single quotes (or no quotes at all, but I don't know how to write the syntax for that).
I'm creating my query in two steps:
Create the query as a series of nested lists
Convert the list object to json with jsonlite::toJSON()
Here is my code:
# Load libraries:
library(jsonlite)
# Create list of example dates to query in mongodb format:
sampledates <- c("2022-08-11T00:00:00.000Z",
"2022-08-15T00:00:00.000Z",
"2022-08-16T00:00:00.000Z",
"2022-08-17T00:00:00.000Z",
"2022-08-19T00:00:00.000Z")
# Create query as a list object:
query_list_l <- list(filter =
# Add where clause:
list(where =
# Filter results by list of sample dates:
list(dateSampleTaken = list('$in' = sampledates),
# Define format of column names and values:
useDbColumns = "true",
dontTranslateValues = "true",
jsonReplaceUndefinedWithNull = "true"),
# Define columns to return:
fields = c("id",
"updatedAt",
"person.visualId",
"labName",
"sampleIdentifier",
"dateSampleTaken",
"sequence.hasSequence")))
# Convert list object to JSON:
query_json = jsonlite::toJSON(x = query_list_l,
pretty = TRUE,
auto_unbox = TRUE)
The JSON query now looks like this:
> query_json
{
"filter": {
"where": {
"dateSampleTaken": {
"$in": ["2022-08-11T00:00:00.000Z", "2022-08-15T00:00:00.000Z", "2022-08-16T00:00:00.000Z", "2022-08-17T00:00:00.000Z", "2022-08-19T00:00:00.000Z"]
},
"useDbColumns": "true",
"dontTranslateValues": "true",
"jsonReplaceUndefinedWithNull": "true"
},
"fields": ["id", "updatedAt", "person.visualId", "labName", "sampleIdentifier", "dateSampleTaken", "sequence.hasSequence"]
}
}
As you can see, $in is now enclosed in double quotes, even though I put it in single quotes when I created the query as a list object. I have tried replacing with sprintf() but that just adds a lot of backslashes to my query. I also tried:
query_fixed <- gsub(pattern = "\\"\\$\\in\\"",
replacement = "\\'$in\\'",
x = query_json)
... but this fails with an error.
I would be very grateful to know if:
The syntax problem that is preventing $in from working is actually the double quotes?
If double quotes is the problem, how do I replace them with single quotes without messing up the JSON format?
UPDATE:
The issue seems to occur when R is passing the query to the database, but I still can't work out exactly why.
If I try the query out in loopback explorer in the database, it works and using the export log ID produced, I can then fetch the results with httr::GET() in R. Example query results are shown below (sorry for the hashes - the main point is you can see the format of the returned values):
[1] "[{\"_id\":\"e59953b6-a106-4b69-9e25-1c54eef5264a\",\"updatedAt\":\"2022-09-12T20:08:39.554Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0044-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0002\"}},{\"_id\":\"af5cd9cc-4813-4194-b60b-7d130bae47bc\",\"updatedAt\":\"2022-09-12T20:11:07.467Z\",\"dateSampleTaken\":\"2022-08-17T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0061-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0003\"}},{\"_id\":\"b5930079-8d57-43a8-85c0-c95f7e0338d9\",\"updatedAt\":\"2022-09-12T20:13:54.378Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0043-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0004\"}}]"

Elixir integer list to string causes (UnicodeConversionError) invalid code point

The problem
I need to create a string interpolating a list of integers in it.
"""
SomeQuery {
someQuery(articleIds: #{inspect article_ids}) {
edges {
node {
id
}
}
}
}
"""
Failing example
For example, the list [725553234] makes the example above to fail:
article_ids = [725553234]
"""
SomeQuery {
someQuery(articleIds: #{article_ids}) {
edges {
node {
id
}
}
}
}
"""
** (exit) an exception was raised:
** (UnicodeConversionError) invalid code point 725553234
(elixir) lib/list.ex:839: List.to_string/1
(commsapp_api) lib/my_project/client.ex:70: CommsappApi.News.Clients.CommunicationMs.Client.articles_feed/3
System
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:8:8] [ds:8:8:10]
[async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.6.3 (compiled with OTP 20)
Tried solutions
It tried the following:
Using inspect is not working: articleIds: #{inspect(article_ids)}
Using the IO.inspect with the :char_lists opt with :as_list: IO.inspect(article_ids, char_lists: :as_lists
Trying to join the integer list as string with: articleIds: [#{Enum.join(article_ids, ", ")}]
Interpolating the integers parsed to string with: Enum.map(article_ids, &Integer.to_string/1) |> Enum.join(", ")
I tried using a single line instead of a multiline string, not working
Many things I don't remember after trying different solutions... >.<
Guesses
The problem comes when using the brackets in the string, Elixir treats the interporlation as a list and raises the error because it cannot find the codepoints.
Ideas?
Thanks in advance!
In the documentation on strings, binaries and charlists, the charlists are defined as a list of point codes, i.e. integers. The code
article_ids = [725553234]
"#{article_ids}"
attempts to print the character which point code is 725553234. This point code is not defined and you get the error. Replace 725553234 by 65 and you should get an A character.
To interpolate your list of integers, you may want to do things like this:
iex(5)> a=[65, 66, 67]
'ABC'
iex(6)> "#{Enum.map(a, fn(c) -> Integer.to_string(c)<>" " end)}"
"65 66 67 "
BTW, if you look at list [65, 66, 67] is interpreted as 'ABC'.
By using Enum.join return string and separate integer by ","
Enum.join(list, ",")
or using Enum.map return as list by changing every integer value to string
Enum.map(list, &Integer.to_string/1)
list = [1, 2, 3, 4]
Enum.join(list, ",") -> "1,2,3,4"
Enum.map(list, &Integer.to_string/1) -> ["1", "2", "3", "4"]

Test for exact string in testthat

I'd like to test that one of my functions gives a particular message (or warning, or error).
good <- function() message("Hello")
bad <- function() message("Hello!!!!!")
I'd like the first expectation to succeed and the second to fail.
library(testthat)
expect_message(good(), "Hello", fixed=TRUE)
expect_message(bad(), "Hello", fixed=TRUE)
Unfortunately, both of them pass at the moment.
For clarification: this is meant to be a minimal example, rather than the exact messages I'm testing against. If possible I'd like to avoid adding complexity (and probably errors) to my test scripts by needing to come up with an appropriate regex for every new message I want to test.
You can use ^ and $ anchors to indicate that that the string must begin and end with your pattern.
expect_message(good(), "^Hello\\n$")
expect_message(bad(), "^Hello\\n$")
#Error: bad() does not match '^Hello\n$'. Actual value: "Hello!!!!!\n"
The \\n is needed to match the new line that message adds.
For warnings it's a little simpler, since there's no newline:
expect_warning(warning("Hello"), "^Hello$")
For errors it's a little harder:
good_stop <- function() stop("Hello")
expect_error(good_stop(), "^Error in good_stop\\(\\) : Hello\n$")
Note that any regex metacharacters, i.e. . \ | ( ) [ { ^ $ * + ?, will need to be escaped.
Alternatively, borrowing from Mr. Flick's answer here, you could convert the message into a string and then use expect_true, expect_identical, etc.
messageToText <- function(expr) {
con <- textConnection("messages", "w")
sink(con, type="message")
eval(expr)
sink(NULL, type="message")
close(con)
messages
}
expect_identical(messageToText(good()), "Hello")
expect_identical(messageToText(bad()), "Hello")
#Error: messageToText(bad()) is not identical to "Hello". Differences: 1 string mismatch
Your rexeg matches "Hello" in both cases, thus it doesn't return an error. You''ll need to set up word boundaries \\b from both sides. It would suffice if you wouldn't use punctuations/spaces in here. In order to ditch them too, you'll need to add [^\\s ^\\w]
library(testthat)
expect_message(good(), "\\b^Hello[^\\s ^\\w]\\b")
expect_message(bad(), "\\b^Hello[^\\s ^\\w]\\b")
## Error: bad() does not match '\b^Hello[^\s ^\w]\b'. Actual value: "Hello!!!!!\n"

Why are string lengths different between JavaScript and VB.NET (posted back via ASP.NET form)?

When I trim a multi-line textarea in JavaScript to a certain length (JS: .substring(0, x)), and that field is then posted back, checking the length in VB.NET will still find a length greater than the trim length from JavaScript (VB: .Length > x).
I have already determined this was a problem with line breaks, but I wanted to make sure no one else had to spend so long finding the answer (apparently it also applies to some implementations of JSP).
Somewhere in the whole ASP.NET scheme of things, a multi-line value is being massaged from the land of "\n" (vbLf) line breaks into the land of "\r\n" (vbCrLf) line breaks This difference in line breaks is the reason the lengths do not agree. Here is the simple way of addressing it in VB.NET (though a regex could probably do it to):
SomeString = SomeString.Replace(vbCrLf, vbCr)
Handling it in VB.NET opens myself up to potential duplication and would still leave it easy to miss this logic when someone adds another textarea; handling it in JavaScript could do the same thing. Is there some way to keep VB.NET/ASP.NET from handling line breaks this way or is there some better way of making this a non-issue? The answer to this best-practices question would definitely be the correct answer to this question.
The culprit seems to be an internal type in System.Web.dll; System.Web.HttpMultipartContentTemplateParser. Using Reflector, I found this code;
private bool GetNextLine()
{
int num = this._pos;
this._lineStart = -1;
while (num < this._length)
{
if (this._data[num] == 10)
{
this._lineStart = this._pos;
this._lineLength = num - this._pos;
this._pos = num + 1;
if ((this._lineLength > 0) && (this._data[num - 1] == 13))
{
this._lineLength--;
}
break;
}
if (++num == this._length)
{
this._lineStart = this._pos;
this._lineLength = num - this._pos;
this._pos = this._length;
}
}
return (this._lineStart >= 0);
}
Note some of the magic numbers, especially 10 and 13. These are vbLf and vbCr. It seems to me that this is processing the raw bytes that come in from the post, and 'a line' is considered to be anything ending with vbLf (10).
As the raw bytes are parsed (see the ParsePartData method, too) the compexities of vbcr and vblf are being cleaned out.
Ultimately, then, I think it's safe to just replace CrLF with LF again.
Welcome to the land of Unix line endings versus Windows OS based line endings.
Doesn't IIS have some sort of HTTP request filter or even a configuration option to not modify the request as it comes in? That would be the best answer.
Otherwise, search and replace is your best answer.

Resources