Is there a place I can find the top x number of search terms (preferably from Google) and import them via something such as a HTTP Request/Post?
For that you'll need to use Google's Custom Search API. Specifically, you can use the num parameter to tell it how many results to return. Also, items[].title will give you the title in plain text, items[].snippet will give you the snippet in plain text.
Related
The text I'm searching for is all contained within a CSS class called "content-center", and within that is a series of CSS classes all with the same name that old similar, but different information. It seems to only be returning [<JSHandle preview=JSHandle#node>] rather than returning the text itself as if saying "yes, this text is on the page X times".
page.wait_for_selector('.content-center')
print(page.query_selector_all(".content-center:has-text('Bob Johnson')"))
page.query_selector_all returns the ElementHandle[] values of the elements which got found. Over these you can loop and call the text_content() method to get the text out of that specific element.
Also in most cases, its enough to use the text-selectors to verify something is on the page or an element has text, see here for reference.
I am wondering if there's any function that I can create in order to modify the magic tags behaviour.
Ideally, I would like to use a tag like this {#post_content|120} which would go through my custom function and check if there's a | character, then execute the original magic tag, while trimming text down to 120 characters.
But I don't know where to hook in order to filter this content.
I know that I can pass a function name with the magic tag but this isn't really helpful as I need to pass the characters limit parameter which PODS doesn't support.
Also, I can't be creating functions for all my characters limit as I have a lot of places where I need different limits and I would end up using tons of functions and no dynamic solution.
Can I somehow trigger a magic tag with a parameter? Any other thoughts about doing this another way?
Thank you!
I don't think that's possible, {#your_field, your_function} is how it works (the function takes the field value as input) - you could use different function names like trim_120, trim_100 and do the stuff you need in there - I guess it's to create excerpts with different length's although there are other ways to do that e.g use the_content filter for one ...
Can I Create a reblog link programatically?
Is it against the terms of service? I can't tell...
Anatomy of a tumblr reblog link: (unique numbers made up)
http://www.tumblr.com/reblog/85728493821/7vu4jf89
In my RSS feed I have:
myblog.tumblr.com/post/85728493821
So its safe to say the 85... number is a unique post id
But what is the other code? (7vu4jf89)
The 2nd value differs for each reblog link, so its not just my unique identifier.
Arbitrary values do not work either.
I was thinking maybe its something Tumblr implemented specifically to prevent people from doing the sort of thing I'm attempting? Maybe its some sort of hash value combining my account identifier and the post?
Any insight is appreciated.
Tumblr Reblogs
Ignoring the RSS part for the moment, I believe there are two official methods to achieve a working reblog link.
Use the template variable {ReblogButton} (http://www.tumblr.com/docs/en/custom_themes#like_and_reblog_buttons)
Use the Tumblr API (http://www.tumblr.com/docs/en/api/v2#reblogging)
In reply to your question about other code. I believe this is a unique, randomly generated key, the make up of which I am not 100% sure on. The key seems be unique per post and per site.
For example, if the original reblog key is 12345678 and the post is reblogged, a new key is generated for the site that reblogged the post.
Back to the RSS part, sadly as you have probably gathered, getting the reblog key inside the RSS feed by default is impossible. My advice would be to find the permalink in the RSS feed and use an API call to return the corresponding key for a reblog.
There is a way to construct the reblog URL manually, if you have access to the post’s HTML page:
search for rk= in the HTML source code (it's in the block opened by <!-- BEGIN TUMBLR CODE -->)
copy the value of this parameter (e.g. "1234" if you find rk=1234)
now manipulate the URL:
append this value at the URL (add a slash before it, if there is none) (you can replace the slug with the value, if available)
replace "post" with "reblog"
remove the subdomain
call this crafted URL
This rk value (maybe "reblog key"?) doesn’t seem to be included in the feed.
We have a simple search interface which calls the search:search($query-text) function. Is there a syntax to include control for wildcarding, stemming and case sensitivity within the single text string that the function accepts? I haven't been able to find anything in the MarkLogic docs.
See the $options parameter and the <term> and <term-option> constraint at https://docs.marklogic.com/search:search . There is a guide at http://developer.marklogic.com/learn/2009-07-search-api-walkthrough
and some details http://developer.marklogic.com/learn/2009-07-search-api-walkthrough#ndbba3437f320a4a4
I don't know of any existing syntax for those options, aside from the built-in behavior of turning on wildcards when a term contains '*' or '?' and turning on case-sensitivity when the term contains capital letters.
You could develop a syntax. Implementing it might involve a custom parser along the lines of https://github.com/mblakele/xqysp then feeding the resulting cts:query into search:resolve.
Piggybacking on Eric Bloch's answer... you can always dynamically construct your node based on input in the user interface.
For example, I often do this in order to separate the facet selection portion of the query from the text search portion and put the facet selection query in the additional-query element in the options node.
Searching through the Web by using the Google search engine is a de facto standard for Internet users.
Google provides a basic or an advanced form to prepare a query string to its search engine. Supposing to be interested in not using the web form, one can simply do an HTTP get request to the specific URL with a query string constructed upon the search conditions.
For instance I can search for results with word "hello" by doing an HTTP request at:
http://www.google.com/search?q=hello
I can add another word, e.g. "world", as follows:
http://www.google.com/search?q=hello+world
You know, the search can be more "complicated" by specifying nice parameters like:
or condition(s)
exact phrase(s)
search on specific domain(s)
avoid a specific word(s)
search with a specific language
limit search by geographical area
search for document type
etc.
How can I modify the query string to account for the above search parameters?
I carefully examined the answers by Pratik Chowdhury and Robbie Vercammen. They provides a link to Web documents that report a list of possible textual filtering to be used within the Google search form. Despite this is interesting, they don't provide an answer to the question. Hence, I studied a lot the problem and I found the following solution.
Suppose that you need to make a una tantum HTTP call (e.g. by a PHP class runned via CRON once a month) to Google Search in order to retrieve the search results for a particular string query, e.g. all the pages with some words (i.e. "hello" and "world") in your website (i.e. mywebsite.com), then you can do an HTTP get call to the following address:
http://www.google.com/search?q=hello+world+site:mywebsite.com
The q parameter can contain the whole search query, however Google defined a dummy proof list of parameters.
Notice that the AND operator can be represented by the as_q parameter instead.
To get page results with one between "hello" and" world" (i.e. and OR), must be changed the query "q" parameter as:
q=hello+OR+world
while a more compact representation uses the as_oq parameter:
as_oq=hello+world
If one looks for the exact phrase "hello world", the q parameter is:
q="hello+world"
while, again, another compact representation uses the as_epq parameter:
as_epq=hello+world
If one looks for all the results that not contain the words "hello" and "world", the q parameter is:
q=-hello+-world
while, again, another compact representation uses the as_eq parameter:
as_eq=hello+world
Of course, as_q, as_oq, as_epq, as_eq, etc. can by combined in a unique search query as usual (i.e. by using the & character). Thus, for instance I can search for both words "hello" and "word" plus one between "programming" and "code" as follow here:
q=hello+world&as_oq=programming+code
One can search for a specific domain (again, mydomain.com) as follow:
as_sitesearch=mydomain.com
However, if you want to exclude a specific domain (e.g., because it is a spam source), you must recur to standard notation. E.g.:
q=hello+-site:mydomain.com
return all the pages with word "hello" that are not in site mydomain.com.
To get for a specific file type, e.g. a pdf, you can use as_filetype:
as_filetype=pdf
More complex search parameter can be used, as provided in Google support docs.
For instance, to get also results with a synonym of a word, simply use the ~ operator in front of the word, e.g.
q=~hello
Moreover, if you want to use wildcards, e.g. to get all the exact phrases that start with "hello" and end with "world", you should use the * operator:
q="hello+*+world"
which probably will return something like: "hello to the world" and "hello sweet world".
One can also search for specific words inside the page title or in the page url by using the following keywords (read here for more details):
intitle
allintitle
inurl
allinurl
For instance, the following returns all the pages s.a. both words "hello" and "world" are in the url:
q=allinurl:hello+world
For the language of the Google GUI page (not the one of the results), one must insert into the query string the language string (e.g. en for English, fr for French, it for Italian, etc.) to the hl parameter. In other words, if one search with the English version of Google, the query string becomes as follow:
http://www.google.com/search?hl=en&q=hello+world+site:mywebsite.com
To select a specific language, e.g. Italian, use the lr query parameter:
lr=lang_it
One can also select pages published in a specific geographical region by using the cr parameter. E.g., to find all the pages published in Italy:
cr=countryIT
To create complex and / or queries, you can use () and OR.
For example if we want to search for
("tschakk buff" AND "boom bang") OR ("zata tong" AND "zong klirr")
The query would look like this:
https://www.google.com/search?q=("tschakk%20buff"%20"boom%20bang")%20OR%20("zata%20tong"%20"zong%20klirr")
though this books title seems dangerous but anyway it will answer all your questions if u don't misuse it.
The name of the book is "Dangerous Google – Searching for Secrets" by Michał Piotrowski by some hackin9 magazine.
Wish ya luck
If you are trying to assemble your own url by manually typing the url before using it, this site should prove helpful: http://www.googleguide.com/advanced_operators.html
Advangle is a nice free service where you can construct web-search queries visually and get a query string (or URL to Google and Bing) as the result.