I have a template class that grabs HTML and basically returns html to the caller. How do I test the caller using PHP Unit? Do I just assertTrue(is_string(call_function))? It seems like a stupid test, and I thought I may be testing it improperly.
Is the returned HTML supposed to be well-formed? If so you could validate it.
And/or if there is always supposed to be a certain node, or string of text, present you could check for its existence. Using strpos, regexes, or a proper DOM parser.
This StackOverflow question gives you some ideas for ways to parse and query your HTML: How do you parse and process HTML/XML in PHP?
More generally, the way I usually approach how to test a function that returns a string is to use:
$html=call_function();
$this->assertEquals("dummy",$html);
Then it fails, but tells me the correct output, so I paste that in:
$html=call_function();
$expected=<<<EOD
<html>
...
</html>
EOD;
$this->assertEquals($expected,$html);
If it fails again I then study the differences between the two correct answers I have. If this is a good unit test should they really even be different? Do I want to use a mock object to replace some uncontrollable aspect of the system? (E.g. if the HTML it is returning is google search results, then maybe I want a mock object to simulate calling google, but always return exactly the same search results page.)
If the only differences are timestamps I might use regexes to hunt-and-destroy them, to give me a string that should always be the same, e.g.
$html=preg_replace('/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/','[TIMESTAMP]',$html);
ADDITION
If the HTML string is very big, one alternative is to use md5() to reduce it to a short string. This will still warn you when something breaks, but the (big) downside is when it breaks you won't know where. If you are concerned about that then it is better to use the DOM approach (or its poor cousin, regexes) to just cherry-pick a few key parts of the HTML to test.
Related
I have been trying for days to move forward with this little code for getting the headers and the links of the news from a journal website.
using HTTP
function website_parser(website_url::AbstractString)
r = readstring(get(website_url))
splitted = split(r, "\n")
end
website_parser("https://www.nature.com/news/newsandviews")
The problem is that I could not figure out how to proceed on once I got the text from the website. How can I retrieve specific elements (as header and link of the news in this case)?
Any help is very much appreciated, thank you
You need some kind of HTML parsing. For only extracting the header, you probably can get away with regex, which are built in.
If it gets more complicated than that, regular expressions don't generalize, and you should use a full-fledged HTML parser. Gumbo.jl seems to be state of the art in Julia and has a rather simple interface.
In the latter case, it's unneccessary to split the document; in the former, it at least makes things more complicated, since then you have to think about line breaks. So, better parse first, then split.
Specific elements can be extracted using the library Cascadia git repo
for instance, the class attribute for elements in the HTML page can be extracted via qs = eachmatch(Selector(".classID"),h.root) so that all the class elements such as <div class="classID"> get selected/extracted for the returned query string (qs).
Is there any way of matching non JSON bodies (either XML, byte or whatever). Looking for the Python solution, however will appreciate any ideas behind that (even monkeypatching).
It's possible, but not directly supported.
Currently there's only the ability to match JSON. You can fake non-JSON matching by expecting a string body, but then you won't be able to use pact's built in matchers- which might mean your tests will be data dependent unless you do a bit of leg work.
There is a stub for xml support, but it's not currently implemented.
If you're willing to get your hands dirty in Ruby (not that different to Python!) you can write your own matcher. I can show you how to configure the pact-provider-verifier to use the custom matching code. Currently, if you use a content type that is not JSON, as J_A_X says, it will do an exact string diff.
We are building an API in-house and often are passing a parameter with multiple values.
They use: mysite.com?id=1&id=2&id=3
Instead of: mysite.com?id=1,2,3
I favor the second approach but I was curious if it was actually incorrect to do the first?
I'm not an HTTP guru, but from what I understand there's not a definitive standard on the query part of the URL regarding multiple values, it's typically up to the CGI that handles the request to parse the query string.
RFC 1738 section 3.3 mentions a searchpart and that it should go after the ? but doesn't seem to elaborate on its format.
http://<host>:<port>/<path>?<searchpart>
I did not (bother to) check which RFC standard defines it. (Anyone who knows about this please leave a reference in the comment.) But in practice, the mysite.com?id=1&id=2&id=3 way is already how a browser would produce when a form contains duplicated fields, typically the checkboxes. See it in action in this w3schools example page. So there is a good chance that the whatever programming language you are using, already provides some helper functions to parse an input like that and probably returns a list.
You could, of course, go with your own approach such as mysite.com?id=1,2,3, which is not bad at all in this particular case. But you will need to implement your own logic to produce and to consume such format. Now you may or may not need to think about handling some corner cases by yourself, such as: what if the input is not well-formed, like mysite.com?id=1,2,? And do you need to invent yet another separator, if the comma sign itself can also be a valid input, like mysite.com?name=Doe,John|Doe,Jane? Would you reach to a point that you will use a json string as the value, like mysite.com?name=["John Doe", "Jane Doe"]? etc. etc.. Your mileage may vary.
Worth adding that inconsistend handling of duplicate parameters in the URL on the server is may lead to vulnerabilities, specifically server-side HTTP parameter pollution, with a practical example - Client side Http Parameter Pollution - Yahoo! Classic Mail Video Poc.
in your first approach you will get an array of querystring values but in second approach you will get a string of querystring values.
I guess it depends on technology you use, how it becomes convenient. I am currently standing in front of the same question using currency=USD,CHF or currency=USD¤cy=CHF
I am using Thymeleaf and using the second option makes it easy to work, I can then request something like: ${param.currency.contains(currency.value)}. When I try to use the first option it seems it takes the "array" like a string, so I need to split first and then do contain, what leads me to a more mess code.
Just my 50 cents :-)
When accessing a form or query string value from code-behind in ASP.NET, what are the pros and cons of using, say:
// short way
string p = Request["param"];
instead of:
// long way
string p = Request.QueryString["param"]; // if it's in the query string or
string p = Request.Form["param"]; // for posted form values
I've thought about this many times, and come up with:
Short way:
Shorter (more readable, easier for newbies to remember, etc)
Long way:
No problems if there are a form value and query string value with same name (though that's not usually an issue)
Someone reading the code later knows whether to look in URLs or form elements to find the source of the data (probably the most important point)
.
So what other advantages/disadvantages are there to each approach?
the param collection includes all (4) collections:
Query-string parameters
Form fields
Cookies
Server variables
you can debate that searching in the combined collection is slower than looking into a specific one, but it is negligible to make a difference
The long way is better because:
It makes it easier (when reading the code later) to find where the value is coming from (improving readability)
It's marginally faster (though this usually isn't significant, and only applies to first access)
In ASP.NET (as well as the equivalent concept in PHP), I always use what you are calling the "long form." I do so out of the principle that I want to know exactly from where my input values are coming, so that I am ensuring that they get to my application the way I expect. So, it's for input validation and security that I prefer the longer way. Plus, as you suggest, I think the maintainability is worth a few extra keystrokes.
Background: I have a complex search form that stores the query and it's hash in a cache. Once the cache is set, I redirect to something like /searchresults/e6c86fadc7e4b7a2d068932efc9cc358 where that big long string on the end is the md5 hash of my query. I need to make a new argument for views to know what the hash is good for.
The reason for all this hastle is because my original search form is way to complex and has way to many arguments to consider putting them all into the path and expecting to do the filtering with the normal views arguments.
Now for my question. I have been reading views 2 documentation but not figuring out how to accomplish this custom argument. It doesn't seem to me like this should be as hard as it seems to me like it must be. Leaving aside any knowledge of the veiws api, it would seem that all I need is a callback function that will take the argument from the path as it's only argument and return a list of node id's to filter to.
Can anyone point me to a solution or give me some example code?
Thanks for your help! You guys are great.
PS. I am pretty sure that my design is the best I can come up with, lets don't get off my question and into cross checking my design logic if we can help it.
It's not as easy as you would like to make it.
In views, arguments are used to return objects, fx user, node, term, custom object. So you could make some custom code, to get the "query object". That would only be first step. You then need to get the info from the query object. You could either try making a custom relationship bond with the nodes or build your own filter to make the SQL needed. This can quickly become a confusing time sink.
Instead, I would suggest that you use hook_views_query_alter, which will allow you to alter the query. Since you already have the SQL, it's just a matter of checking for the hash, and if it's there, alter the query. Should be a pretty simple thing to do. Only thing that is a bit tricky, is that you have to make the query with the query object that views uses, but it's not that hard to figure out.