Riak: indexing and querying kv data

Riak: indexing and querying kv data - riak

I would like to implement "indexing and querying kv data" as described in the Riak docs at http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Indexing-and-Querying-Riak-KV-Data/.
While there is a little documentation about how to setting up indexing using the HTTP API, the documentation of basho lacks any information on how to query such indexed kv data using the HTTP API. Apparently it is not working like when i index contents from the file system, at least i did not get it to work like that.
Could anybody help in posting some simple examples using cURL? Thanks in advance!

When querying Riak Search over HTTP I believe you have 2 options, which are described in greater detail here. The easiest one is probably to use the Solr compatible interface, but it is also possible to feed Riak Search results to a MapReduce job and get the results that way.

Related

PageSpeed Insights API: usability, security and metrics missing from json response

I am using PageSpeed Insights API to grab speed metrics of different websites and integrate the data in a tool I'm creating.
If I try a query using the API test tool (https://developers.google.com/speed/docs/insights/v4/reference/pagespeedapi/runpagespeed), then everything is fine and I get the info I need.
However, when I perform the very same query (as far as I can see) from my server, the response json does not include the same information. Some information is just missing.
Basically, other than the 'initial_url', all the metrics information that should be included in the 'loadingExperience' branch is missing. No info on 'FIRST_CONTENTFUL_PAINT_MS' or 'DOM_CONTENT_LOADED_EVENT_FIRED_MS'.
On the other hand, I can't seem to find the way to request info on USABILITY and SECURITY under the 'ruleGroups' branch. According to the API reference, this branch should feature information on these aspects too, but nothing like that is return after the query. Just the SPEED branch info is returned.
This is the URL I use to query the API:
https://www.googleapis.com/pagespeedonline/v4/runPagespeed?url=https://stackoverflow.com&strategy=mobile&screenshot=true&locale=en&key=XXXXXXXXmyAPIKeyXXXXXXXX';
Am I missing anything? I have checked the API documentation and Google'd for more info on this, but I can't seem to find any parameter to force request this information.
(By the way, this is my first question at StackOverFlow, so I hope I have shared all the necessary information. And apologies if my english is bad. I do my best.)

I'm having the same issue with some websites. The problem is related to the website. Some websites are providing the userExperience.metrics object and some are not. I have no idea what is causing this.
However you can try to use strategy=desktop parameter to get the userExperience.metrics object in version 5. This worked for me.
Working URL: https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://stackoverflow.com&strategy=desktop&key=[SetYourApiKeyHere]

Zenoss Graph results of postgresql database query

So I just added my servers to Zenoss and installed some postgresql zenpacks. Unfortunatly I do not care for most of the postgresql monitoring tools. Instead of what they gave me, I am wondering if it is possible to send a custom query that I wrote, then graph the result using Zenoss? How do I go about doing this? Are there any good resources that you know of?
Thanks.

You will need to write your own ZenPack (or at least Template). Check Development Guide http://wiki.zenoss.org/ZenPack_Development_Guide or http://zenosslabs.readthedocs.org/en/latest/zenpack_development/
IMHO you will need zencommand datasource, which will execute your custom SQL query and this query output (number only) will be metric value, which will be processed by Zenoss.
Or you can expose metric(s) via SNMP and then it'll be only standard SNMP metric in Zenoss.
It's up to you how do you implement it. I recommend you to use community forum http://www.zenoss.org/forum for Zenoss related questions.

Best UI interface/Language to query MarkLogic Data

We will be moving from Oracle and use MarkLogic 8 as our datastore and will be using MarkLogic's Java api to talk with data.
I am exploring for any UI tool (like SQL Developer is there for Oracle), which can be used for ML. I found that ML's Query Manager can used for accessing data. But I see multiple options wrt language:
SQL
SPARQL
XQuery
JavaScript
We need to perform CRUD operations and search for data, and our testing team is aware of SQL (for Oracle), so I am confused which route I should follow and on what basis I should decide which one/two will be better to explore. We are most likely to use JSON document type.
Any help/suggestions would be helpful.

You already mention you will be using the MarkLogic Java Client API, that should provide most of the common needs you could have, including search, CRUD, facets, lexicon values, and also custom extension though REST extensions as the Client API will be leveraging the MarkLogic REST API. It saves you from having to code inside MarkLogic to a large extent.
Apart from that you can run ad hoc commands from the Query Console, using one of the above mentioned languages. SQL will require the presence of a so-called SQL view (see also your earlier question Using SQL in Query Manager in MarkLogic). SPARQL will require enabling the triple index, and ingestion of RDF data.
That leaves XQuery and JavaScript, that have pretty much identical expression power, and performance. If you are unfamiliar with XQuery and XML languages in general, JavaScript might be more appealing.
HTH!

RCurl Twitter Streaming API Keyword Filtering

I saw this previous post but I have not been able to adapt the answer to get my code to work.
I am trying to filter on the term bruins and need to reference cacert.pem since for authentication on my Windows machine. Lastly, I have written a function to parse each response (my.function) and need to include this as well.
postForm("https://stream.twitter.com/1/statuses/sample.json",
userpwd="user:pass",
cainfo = "cacert.pem",
a = "bruins",
write=my.function)
I am looking to stay completely within R and unfortunately need to use Windows.
Simply, how can I include the search term(s) that I want such that the response is filtered?
Thanks in advance.

Alright, so I've looked at what you're doing, and some of what you're working on may be helped by examining the Twitter API methods, although it can be difficult to figure out how to translate some of the examples into R (via the RCurl Package).
What you're currently trying is very close to what you need to do, you simply need to change two things.
First of all, you're querying the url for the random sample of statuses. This url returns a random sample of roughly 1% of all tweets.
If you're interested in collecting only tweets about specific keywords, you want to use the filter API url: "https://stream.twitter.com/1/statuses/filter.json"
After changing that, you simply need to change your parameter from "a" to "postfields", and the parameter you'd be passing would look like: "track=bruins"
Finally, you should use the getURL function, to open a continuous stream, so all tweets with your keywords can be collected, rather than using the postForm command (which I believe is intended for HTML forms).
so your final function call should look like the following:
getURL("https://stream.twitter.com/1/statuses/filter.json",
userpwd="Username:Password",
cainfo = "cacert.pem",
write=my.function,
postfields="track=bruins")

For manipulating twitter, use the twitteR package.
library(twitteR)
searchTwitter("bruins")
You can include other parameters (like cainfo) in the call to searchTwitter, and they should get passed getForm underneath.

I don't think the Streaming API is currently included in twitteR - the search api is different (it's backward looking, whereas streaming is "current looking").
From my understanding, streaming is quite different to how lots of APIs work typically work; rather than pulling data from a web service and having a defined object returned, you're setting up a "pipe" for Twitter to push data to you and you then listen for that response.
You also need to worry about OAuth I think (which twitteR does deal with).
Is there any reason that you want to keep in R? I've used python successfully with the Streaming API and a package called tweepy to write data to a MySQL database and then use R to query and analyse the data.

Last time I checked, twitteR did not talk to the streaming API. Moreover, as far as I know, very few publicly-available Twitter Streaming API connection libraries in any language honor Twitter's recommendations on reconnecting when Streaming disconnects / throws an error.
My recommendation is to access Streaming via a library that's actively maintained, write the re-connection protocol yourself if you have to, and persist the data into a database that handles JSON natively. I'm about to embark on a project of this nature and will be writing the collector in Perl, doing my own re-connect logic and persisting into either PostgreSQL or MongoDB. Most likely it will be MongoDB; PostgreSQL doesn't get native JSON till 9.2.

Late to the game, I know, but you'll want to use the "streamR" package for access to Twitter's streaming API.

Best way to keyword search Amazon SimpleDB using EC2 and Asp.Net?

I am wondering if anyone has any thoughts on the best way to perform keyword searches on Amazon SimpleDB from an EC2 Asp.Net application.
A couple options I am considering are:
1) Add keywords to a multi-value attribute and search with a query like:
select id from keywordTable where keyword ='firstword' intersection keyword='secondword' intersection keyword = 'thirdword'
Amazon Query Example
2) Create a webservice frontend to Katta:
Katta on EC2
3) A queued Lucene.Net update service that periodically pushes the Lucene index to the cloud. (to get around the 'locking' issue)
Load balance Lucene(StackOverflow post)
Lucene on S3 (blog post)

If you are looking for a strictly SimpleDB solution (as per the question as stated) Katta and Lucene won't help you. If you are looking for merely an 'Amazon infrastructure' based solution then any of the choices will work.
All three options differ in terms of how much setup and management you'll have to do and deciding which is best depends on your actual requirements.
SimpleDB with a multi-valued attribute named Keyword is your best choice if you need simplicity and minimum administration. And if you don't need to sort by relevance. There is nothing to set up or administer and you'll only be charged for your actual cpu & bandwidth.
Lucene is a great choice if you need more than keyword searching but you'll have manage updates to the index yourself. You'll also have to manage the load balancing, backups and fail over that you would have gotten with SimpleDB. If you don't care about fail over and can tolerate down time while you do a restore in the event of EC2 crash then that's one less thing to worry about and one less reason to prefer SimpleDB.
With Katta on EC2 you'd be managing everything yourself. You'd have the most flexibility and the most work to do.

Just to tidy up this question... We wound up using Lightspeed's SimpleDB provider, Solr and SolrNet by writing a custom search provider for Lightspeed.
Info on implementing ISearchEngine interface for Lightspeed:
http://www.mindscape.co.nz/blog/index.php/2009/02/25/lightspeed-writing-a-custom-search-engine/
And this is the Solr Library we are using:
http://code.google.com/p/solrnet/
Since Solr can be easily scaled using EC2 machines, this made the most sense to us.

Simple Savant is an open-source .NET persistence library for SimpleDB which includes integrated support for full-text search using Lucene.NET (I'm the Simple Savant creator).
The full-text indexing approach is described here.