How to make spaces in URI readable by SPARQL queries in R? - r

As we write SPARQL in Virtuoso, it is easy to escape a space within an URI by coding the space with %20. However, when I install the package SPARQL in R, the escape fails. There is an argument curl_args in command SPARQL, which should work around this issue. But it is not successful. Here is my R scripts:
###Step 1: Building up the query
query <-"select ?instance {
?form a <URI name> .
?instance a <http://StemAddress/Where My Question Is> .
}"
###Step 2: Executing the query
qd <- SPARQL(endpoint,queryC,curl_args = curlPercentEncode("http://StemAddress/Where My Question Is", amp = TRUE, codes = " ", post.amp = TRUE))
####In Step 1, what it works in Virtuoso is
select ?instance {
?form a <URI name> .
?instance a <http://StemAddress/Where%20My%20Question%20Is> .
}
####But this just threw me an error in R environment.

Writing %20 is not an escape for a space. It really is the three characters %-2-0 in the URI. Encode != escape. Spaces in URIs are illegal.

Related

Test Regex with new line string \n via PHPUnit

I created a php library that parse content via regex. One of this regex is '#\n-{3,}#' to parse --- only with an break, a new line, before.
I have also tests written in PHPUnit for all methods and always I get a failure for tests with the new line regex. I test always with assertSame()
I tried to set as input the follow strings:
$input = PHP_EOL . '---';
$input = '<br>---';
$input = '
---'; // with break in code
As expected I set:
'<hr/>'
However always it fail and get an error. If I send this variables to the assert check it will fail and parse not the new line. Only without the \n inside the Regex, like '#-{3,}#', it works fine without error for the tests.
Also if I use as input for the test a new line with a string before, it works also, like
$input = "test\n---";
But I would like also to test without string, only start with a new line.
The parse for front end works fine, it replace via this regex from my markdown file if the content is include a break and followed by the 3 -.
How is it possible to set as input for the assertSame() function in PHPUnit a new line before the string?
The problem is you are using single quotes in your pattern.
In PHP \n means "new line" in a string only if double quotes are used.
$input = PHP_EOL . '---';
preg_match("#\n-{3,}#", $input); // this will match
See https://3v4l.org/EpEGf

SQLite: isolating the file extension from a path

I need to isolate the file extension from a path in SQLite. I've read the post here (SQLite: How to select part of string?), which gets 99% there.
However, the solution:
select distinct replace(column_name, rtrim(column_name, replace(column_name, '.', '' ) ), '') from table_name;
fails if a file has no extension (i.e. no '.' in the filename), for which it should return an empty string. Is there any way to trap this please?
Note the filename in this context is the bit after the final '\'- it shouldn't be searching for'.'s in the full path, as it does at moment too.
I think it should be possible to do it using further nested rtrims and replaces.
Thanks. Yes, you can do it like this:
1) create a scalar function called "extension" in QtScript in SQLiteStudio
2) The code is as follows:
if ( arguments[0].substring(arguments[0].lastIndexOf('\u005C')).lastIndexOf('.') == -1 )
{
return ("");
}
else
{
return arguments[0].substring(arguments[0].lastIndexOf('.'));
}
3) Then, in the SQL query editor you can use
select distinct extension(PATH) from DATA
... to itemise the distinct file extensions from the column called PATH in the table called DATA.
Note that the PATH field must contain a backslash ('\') in this implementation - i.e. it must be a full path.

Test for exact string in testthat

I'd like to test that one of my functions gives a particular message (or warning, or error).
good <- function() message("Hello")
bad <- function() message("Hello!!!!!")
I'd like the first expectation to succeed and the second to fail.
library(testthat)
expect_message(good(), "Hello", fixed=TRUE)
expect_message(bad(), "Hello", fixed=TRUE)
Unfortunately, both of them pass at the moment.
For clarification: this is meant to be a minimal example, rather than the exact messages I'm testing against. If possible I'd like to avoid adding complexity (and probably errors) to my test scripts by needing to come up with an appropriate regex for every new message I want to test.
You can use ^ and $ anchors to indicate that that the string must begin and end with your pattern.
expect_message(good(), "^Hello\\n$")
expect_message(bad(), "^Hello\\n$")
#Error: bad() does not match '^Hello\n$'. Actual value: "Hello!!!!!\n"
The \\n is needed to match the new line that message adds.
For warnings it's a little simpler, since there's no newline:
expect_warning(warning("Hello"), "^Hello$")
For errors it's a little harder:
good_stop <- function() stop("Hello")
expect_error(good_stop(), "^Error in good_stop\\(\\) : Hello\n$")
Note that any regex metacharacters, i.e. . \ | ( ) [ { ^ $ * + ?, will need to be escaped.
Alternatively, borrowing from Mr. Flick's answer here, you could convert the message into a string and then use expect_true, expect_identical, etc.
messageToText <- function(expr) {
con <- textConnection("messages", "w")
sink(con, type="message")
eval(expr)
sink(NULL, type="message")
close(con)
messages
}
expect_identical(messageToText(good()), "Hello")
expect_identical(messageToText(bad()), "Hello")
#Error: messageToText(bad()) is not identical to "Hello". Differences: 1 string mismatch
Your rexeg matches "Hello" in both cases, thus it doesn't return an error. You''ll need to set up word boundaries \\b from both sides. It would suffice if you wouldn't use punctuations/spaces in here. In order to ditch them too, you'll need to add [^\\s ^\\w]
library(testthat)
expect_message(good(), "\\b^Hello[^\\s ^\\w]\\b")
expect_message(bad(), "\\b^Hello[^\\s ^\\w]\\b")
## Error: bad() does not match '\b^Hello[^\s ^\w]\b'. Actual value: "Hello!!!!!\n"

read.csv.sql filter for fields with columns

I haven't been able to solve this one using the answers in this question or from the sqldf FAQ's.
LOC_NAME,BIRTH_DTTM,MOM_PAT_MRN_ID,EMPI,MOM_PAT_NAME,MOM_HOSP_ADMSN_TIME,MOM_HOSP_DISCH_TIME,DEL_PROV_NAME,ATTND_PROV_NAME,DELIVERY_TYPE,PRIM.REPT,COUNT_OF_BABIES,CHILD_PED_GEST_AGE_NUM,REASON_FOR_DEL,REASON_DEL_COM,INDUCT_METHOD,INDUCT_COM,AUGMENTATION
HOSPITAL,1/1/2000 10:00,abc,Eabc,"Surname1, Given1",1/1/2000 10:00,1/3/2000 10:00,"Doctor, First","Doctor, First","C-Section, Low Transverse",Repeat,1,38,,,1) None,,1) None
HOSPITAL,1/2/2000 11:00,def,Edef,"Surname2, Given2",1/2/2000 11:00,1/5/2000 11:00,"Doctor2, First2","Doctor2, First2","C-Section, Low Transverse",Primary,1,36,Ruptured Membranes;Labor;Other (see comment),"PPROM, Preterm labor",1) None,,1) None
HOSPITAL,1/3/2000 12:00,ghi,Eghi,"Surname3, Given3",1/3/2000 12:00,1/6/2000 12:00,"Doctor3, First3","Doctor3, First3","C-Section, Low Transverse",Repeat,1,31,Other (see comment),,1) None,,1) None
HOSPITAL,1/4/2000 13:00,jkl,Ejkl,"Surname4, Given4",1/4/2000 13:00,1/7/2000 13:00,,"Doctor4, First4","Vaginal, Spontaneous Delivery",,1,28,Other (see comment),Fetal anomaly,1) oxytocin (Pitocin),,
To read in the data, I have tried:
read.csv.sql(file)
read.csv.sql(file, filter = 'tr.exe -d ^" ')
read.csv.sql(file, filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
read.csv.sql(file,
filter = "perl -e 's{(\"[^\",]+),([^\"]+\")}{$_= $&, s/,/_/g, $_}eg'")
I'm working in R 3.0.0 with R Studio Server on a Ubuntu OS.
Unfortunately, changing the delimiter isn't an option (nor would it be very effective for some of the files I need to query. Some of my files are pathology reports, so no matter what delimiter I use, I'm going to run into this problem.
Any hints on what I'm missing to get this to read in?
Try csvfix as in sqldf FAQ #13 but use the write_dsv's default | symbol rather than ; since there are semicolons in your file:
read.csv.sql("myfile.csv", sep = "|", filter = "csvfix write_dsv")

Find word (not containing substrings) in comma separated string

I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)

Resources