the database that i am scraping is located here: https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/CheckLicense.aspx
what i would like to do is:
use a wildcard search to find companies with "roof" or "roofing" in the name.
i can perform this search "roof%". however, "%roof" or "* roof%" doesn't work. i am more interested in figuring out how to make the later query work.
example:
xyz roofing co
cal roofers inc
can someone help me with this?
There's no such thing as an aspx database.
There's no way for you to derive a substring search, as you are attempting to do, if the site doesn't support it. The 'search tips' page tells you exactly what is valid:
https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/CheckLicense.aspx
We have no 'magical' way of making the site do more than it was designed to do.
Related
Looking for advice please on methods to scrape the gender of clothing items on a website that doesn't specify the gender on the product page.
The website I'm crawling is www.very.co.uk and an example of a product page would be this - https://www.very.co.uk/berghaus-combust-reflect-long-jacket-red/1600352465.prd
Looking at that page, there looks to be no easy way to create a script that could identify this item as womenswear. Other websites might have breadcrumbs to use, or the gender might be in the title / URL but this has nothing.
As I'm using scrapy, with the crawl template and Rules to build a hierarchy of links to scrape, I was wondering if it's possible to pass a variable in one of the rules or the starting_URL to identify all items scraped following this rule / starting URL would have a variable as womenswear? I can then feed this variable into a method / loader statement to tag the item as womenswear before putting it into a database.
If not, would anyone have any other ideas on how to categorise this item as womenswear. I saw an example where you could use an excel spreadsheet to create the start_urls and in that excel spreadsheet tag each row as womenswear, mens etc. However, I feel this method might cause issues further down the line and would prefer to avoid it if possible. I'll spare the details of why I think this would be problematic unless anyone asks.
Thanks in advance
There does seem to be a breadcrumb in your example, however for an alternative you can usually check the page source by simply searching your term - maybe there's some embedded javascript/json that can be extract?
Here you can see some javascript for subcategory that indicates that it's a "womens_everyday_sports_jacket".
You can parse it quite easily with some regex:
re.findall('subcategory: "(.+?)"', response.body_as_unicode())
# womens_everyday_sports_jacket
Is there a way to open a specified document, eg "production order 123" or form, eg "purchase orders" in Ax2012 from an external application directly?
In detail, I'm looking for something similiar like AXPath, but this doesn't work with versions greater then 2009.
Is there any ( maybe included ) way to achieve this?
There is! It's using AX's drilldown functionality which uses AxHLink.exe to handle dynamics:// URLs, which are passed to the Classes\SysStartupCmd function. You could also create some custom code there if you wanted to launch the AX client executable directly.
My question I asked some while back should have a great deal of useful information in it here:
What handles dynamics:// URLs?
Some more can be found: http://technet.microsoft.com/en-us/library/aa834337.aspx
EDIT:
It sounds like you are confused or the posts weren't clear enough. I think you have 3 basic options.
Dynamics:// URLs are handled by AxHLink.exe and they only seem to handle drilldown, viewalert, and viewalertrule. So if you want to use Dynamics:// URLs, you will need to hi-jack those somehow. There is a pastbin from Jan in that other stack post.
Create a custom URI handler and event poller (lot of work) see http://axcoder.blogspot.dk/2010/10/how-to-open-form-in-running-ax-from.html
Extend SysStartupCmd and then instead of using Dynamics:// URLs, just call Ax32.exe -startupCmd directly and a parameter can be passed to your custom class.
i have site where i need to develop site search functionality. the data may reside in database table or may in aspx page as static word. i search google and found that lucene.net may be appropriate for the site search functionality. but i never use lucene.net so i dont know how to create lucene.net index file. i want to develop 2 utility in my site like
1) one for create & update index file reading data from database table & physical aspx file.
2) utility which search multiple single or multiple keyword against index file.
i found a bit of code snippet which i just do not understand
string indexFileLocation = #"C:\Index";
string stopWordsLocation = #"C:\Stopwords.txt";
var directory = FSDirectory.Open(new DirectoryInfo(indexFileLocation));
Analyzer analyzer = new StandardAnalyzer(
Lucene.Net.Util.Version.LUCENE_29, new FileInfo(stopWordsLocation));
what is Lucene.Net.Util.Version.LUCENE_29 what is stopWordsLocation
how data need to store in Stopwords.txt
but have no concept to develop the above 2 utility. so please guide me how search my DB and as well as aspx files with lucene.net....i will be glad if some one discuss here with bit of sample code. thanks
Lucene.Net.Util.Version.LUCENE_29 just indicates the Lucene version your are using, you should always use the most up to date in new code. It is there for backward compatibility in case you upgrade your Lucene with a version that changes the StandardAnalyzer, but you dont want to re-index all your data.
The stopWordsLocation is the location of a file with your stop words, words you dont want to index.
IE: it, he, she, the, or, and etc...
Its a regular text file, each line should contain 1 stop word, and separate each line with a linebreak.
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/analysis/WordlistLoader.html#getWordSet(java.io.Reader)
I have a Drupal 7 website that is running apachesolr search and is using faceting through the facetapi module.
When I use the facets to narrow my searches, everything works perfectly and I can see the filters being added to the search URL, so I can copy them as links (ready-made narrowed searches) elsewhere on the site.
Here is an example of how the apachesolr URL looks after I select several facets/filters:
search_url/search_keyword?f[0]=im_field_tag_term1%3A1&f[1]=im_field_tag_term2%3A100
Where the 'search_keyword' portion is the text I'm searching for and the '%3A' is just the url encoded ':' (colon).
Knowing this format, I can create any number of ready-made searches by creating the correct format for the URL. Perfect!
However, these filters are always ANDed, the same way they are when using the facet interface. Does anyone know if there is a syntax I can use, specifically in the search URL, to OR my filters/facets? Meaning, to make it such that the result is all entries that contains EITHER of the two filters?
Thanks in advance for any help or pointers in the right direction!
New edit:
I do know how to OR terms within the same vocabulary through the URL, I'm just wondering how to do it for terms in different vocabularies. ;-)
You can write a filter query that looks like:
fq=field1:value1 OR field2:value2
Alternatively you can use localparams to specify the query operator:
fq={!q.op=OR}field1:value1 field2:value2
As far as I know, there's not any easier way to do this. There is, in fact, an rather old bug asking for a way to OR the fq parameters...
I finally found a way to do this in Drupal
Enable the fq parameter setting.
Go to admin/config/search/apachesolr/[your_search_page]/core_search/edit or just navigate to the settings of the search page you're trying to modify
Check the 'Allow user input using the URL' setting
URL Syntax
Add the following at the end of the URL: ?fq=tid:(16 OR 38), where 16 and 38 are the term ids
Is there some kind of SQL Statement that I can used to do a search on 1 Column in my table that is similar to what I am looking for.
Like if I am looking for something to do with a "Car" but I spell it as "Kar" it must return items of "car".
Or
If I am looking for "My Company" and I spell it as "MyCompany" it must still retun "My Company".
Select * from TableName where Column Like '%company%' will return "My Company" but I need more as the user will not always know how to spell. So I need something more advanced like some small Google App or something...
That feature is in the text services so if you build a full-text search index, you should be able to use the feature.
Have a look here:
http://msdn.microsoft.com/en-us/library/ms187384.aspx
http://msdn.microsoft.com/en-us/library/ms142571.aspx
This is quite an involved problem. The quick answer is to use the SQL Server soundex algorithm, but it's pretty hopeless. Try the suggestions on this SO answer. (Updated)
Read this blog post: http://googlesystem.blogspot.com/2007/04/simplified-version-of-googles-spell.html
This is something you could implement with SQL, but it's not a built in feature.
Another way to help you users find what they are looking for is to implement type-ahead on the search field. If the user type: "my" he will get "My Company" as a suggestion and likely go with that.
You can easily implement type ahead using jquery or other javascript libraries. Here's a list of type ahead plugins for jQuery: http://plugins.jquery.com/plugin-tags/typeahead
No. A full text index might get you closer, depending on your requirements (what exact features are you looking for?) One option would be roll you own .NET assembly with the desired features add it (CREATE ASSEMBLY) to sql server and use that to search.