How do I get a full list of genres from the Gracenote Music API? - gracenote

We are currently working with the Gracenote Music API and are wondering if there is a full list of generes and mappings between the different hierarchies of genres. Ideally, we'd love a dump of those tables in the backend Gracenote system. If .csv's, text files, or even XML are easier to provide, we will figure out a way to import that data in our system.
If a full mapping isn't available, a list of top level genres would be very helpful.

I'm afraid there is no way to iterate the list of genres via the Web API. Most of the client SDKs have this capability.

It turns out that there are at least three sources for example code in the GNSDK:
Properly maintained samples in the "samples" directory. This will compile into full applications with minimal effort (once you've settled on a makefile solution for your platform, as a complete Automake setup is not yet part of the package).
samples/code_snippets - These are useful to look at, but do not necessarily build into full apps, and may not be completely up to date with the SDK.
Code linked from the documentation. This is a problem if you downloaded the SDK as an archive and the documentation as a PDF, as the links will resolve as relative file links, not HTTP links, and you won't have the files. You need to look at the HTML version of the documentation on the server to find these files. However, they are apparently outdated and will not build without some (relatively minor rework). This can be done using the primary samples as a guide.
So, all of that said, what you want to look at in the GNSDK Developer's Guide is "Advanced Topics : Using Lists". You will want to read that entire section, then find and work with the sample application referenced on page 93.

To get the list of genres (or moods, or eras) you need to make a call to the "fieldvalues" API, you can see how to do it here:
https://developer.gracenote.com/rhythm-api#attribute-station
This call will give you the list of supported genres:
https://cXXXXXXX.web.cddbp.net/webapi/json/1.0/radio/fieldvalues?fieldname=RADIOGENRE&client=CLIENT_ID&user=USER_ID
You can then use the returned ID's with pygn.createRadio()

Related

Possible to search through all JSPs in Adobe CQ5 repository with CRXDE?

We have a couple of relatively simple websites running on Adobe CQ 5.5 that were developed by a third party. I'm pretty familiar with how CQ works, but I'm working with somebody else's code here and I need to be able to search through all components in the system for a particular string.
The issue is that I can't seem to find a way to search across all of the various .jsp files stored with the various system components. I would have figured that the query tool in CRXDE Lite would have done the trick with something like this:
/jcr:root//*[jcr:contains(., 'Find this exact string in a JSP')] order by #jcr:score
But I've had no luck.
What I am looking for is some sort of global search that includes JSP files. Is that possible? Were I using a regular Java system, any IDE worth the download would be able to do this.
Thanks.
Might not be easiest way, but you can use the VLT tool to checkout the repository into your filesystem. Then you can lookup using whatever tool you prefer. It might even be faster in the long run
I don't have the actual answer but I suppose the JSPs are indexed via a filter that strips out some of their content.
It should be possible to configure the repository to index them as is instead, based on the info at http://wiki.apache.org/jackrabbit/IndexingConfiguration and http://jackrabbit.apache.org/jackrabbit-text-extractors.html
Sorry about the vagueness of this answer - I know the basic principles but to provide the details I would need more time than I can afford now ;-)

Is there a scraper application like KimonoLabs?

I have used scrapy and beautiful soup many times, however find kimonolabs solution much easier and faster. The only problem is that sometimes jobs do need a bit of tweaking, which is not possible (e.g., crawling using a unique pattern).
Is there any other solution which combines the ease with optional complexity? Mainly I want to define a page scraping template using a WYSIWYG interface, and then programatically write the crawler.
Use an Import.io extractor.
Download the Import.io browser
Create an extractor (what you call a "scraping template")
From your code use the extractor's REST API
Full disclosure: I'm one of the founders of ParseHub.
ParseHub tries to solve exactly this problem. It gives you a gui and powerful tools for defining templates visually, and falls back to a subset of javascript if you need more fine-grained control. All of the programming primitives that you're familiar with (if, for, break, recursion, etc.) are available.
You can find it at www.parsehub.com
Try Agenty
Agenty has exact same feature to scrape websites, and the Chrome extension to setup the scraping agents. You can just install the extension and create agents to scrape any site.
FYI : We also have plan to launch hosted solution and REST API by April, 2016 (Update - API is available now)
You may see more details on website (www.datascraping.co) now Agenty.com
Disclosure : I'm one of the founding member

Frama-C: access to the cil/src/ext modules data and few others questions as well

first of all, i will explain what i would like to do here : given a C big programm, i would like to output a list of producers/consumers for a data and a list of calling/called-by functions of the function where this data is.
for doing this, i am thinking about using what computes some modules of frama-c, like dataflow.ml or callgraph.ml in my own plugin.
however, as i read the plugin developper doc, i can't manage to see how we can have access to the data of those modules.
is a "open.cyl_type" sufficient here in my own plugin?
moreover, here are my other questions :
i tried using by the way pdg plugin for my purposes but when i call it and it says "pdg graph computed", how can i access it?
is there any more documented thing about "impact" plugin than the official webpage, in depth, how it works fondamentally? (i have to say that i'm in like a pre-project phase, and that i installed frama-c with the apt-get on ubuntu and that i did not get an impact plugin working (i'll see by compiling the sources))
by the way, do you think i'm using the right method to get to my purposes?
Your question is quite unclear, and this answer is thus very generic. As mentioned in the developer documentation, there are two main classes of plugins: static plugins, compiled with the kernel and whose API is exposed in a module (usually of the same name of the plugin) in Db. Dynamic plugins, such as Semantic_callgraph register dynamically their entry points through the Dynamic module.
If you do make doc in Frama-C sources (I'm not sure that there is a corresponding package in Ubuntu) you can access documentation for the Db module in FRAMAC_SOURCE_DIR/doc/code/html/Db.html and the list of functions registered by dynamic plugins in FRAMAC_SOURCE_DIR/doc/code/dynamic_plugins/Dynamic_plugins.html.
I think that, following Virgile's advice, you should get the source code anyway because you will most of the time need to browse the code to find what you are looking for. Beside, you can have a look at the hello_word plug-in (in src/dummy/hello_world) to have an example of a very simple plug-in. You can also find some examples on my web site at https://anne.pacalet.fr/Notes/doku.php?id=notes:0061_frama_c_scripts to find out how to have access to some information in the AST.

How to scrape websites such as Hype Machine?

I'm curious about website scraping (i.e. how it's done etc..), specifically that I'd like to write a script to perform the task for the site Hype Machine.
I'm actually a Software Engineering Undergraduate (4th year) however we don't really cover any web programming so my understanding of Javascript/RESTFul API/All things Web are pretty limited as we're mainly focused around theory and client side applications.
Any help or directions greatly appreciated.
The first thing to look for is whether the site already offers some sort of structured data, or if you need to parse through the HTML yourself. Looks like there is an RSS feed of latest songs. If that's what you're looking for, it would be good to start there.
You can use a scripting language to download the feed and parse it. I use python, but you could pick a different scripting language if you like. Here's some docs on how you might download a url in python and parse XML in python.
Another thing to be conscious of when you write a program that downloads a site or RSS feed is how often your scraping script runs. If you have it run constantly so that you'll get the new data the second it becomes available, you'll put a lot of load on the site, and there's a good chance they'll block you. Try not to run your script more often than you need to.
You may want to check the following books:
"Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL"
http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593271204
"HTTP Programming Recipes for C# Bots"
http://www.amazon.com/HTTP-Programming-Recipes-C-Bots/dp/0977320677
"HTTP Programming Recipes for Java Bots"
http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669
I believe that the most important thing you must analyze is which kind of information do you want to extract. If you want to extract entire websites like google does probably your best option is to analyze tools like nutch from Apache.org or flaptor solution http://ww.hounder.org If you need to extract particular areas on unstructured data documents - websites, docs, pdf - probably you can extend nutch plugins to fit particular needs. nutch.apache.org
On the other hand if you need to extract particular text or clipping areas of a website where you set rules using DOM of the page probably what you need to check is more related to tools like mozenda.com. with those tools you will be able to set up extraction rules in order to scrap particular information on a website. You must take into consideration that any change on a webpage will give you an error on your robot.
Finally, If you are planning to develop a website using information sources you could purchase information from companies such as spinn3r.com were they sell particular niches of information ready to be consume. You will be able to save lots of money on infrastructure.
hope it helps!.
sebastian.
Python has the feedparser module, located at feedparser.org that actually handles RSS in its various flavours and ATOM in its various flavours. No reason to reinvent the wheel.

Categorized Document Management System

At the company I work for, we have an intranet that provides employees with access to a wide variety of documents. These documents fall into several categories and subcategories, and each of these categories have their own web page. Below is one such page (each of the links shown will link to a similar view for that category):
http://img16.imageshack.us/img16/9800/dmss.jpg
We currently store each document as a file on the web server and hand-code links to these documents whenever we need to add a new document. This is tedious and error-prone, and it also means we lack any sort of security for accessing these documents. I began looking into document management systems (like KnowledgeTree and OpenKM), however, none of these systems seem to provide a categorized view like in the preview above.
My question is ... does anyone know of any Document Management System that allow for the type of flexibility we currently have with hand-coding links to our documents into various webpages (major and minor , while also providing security, ease of use, and (less important) version control? Or do you think I'd be better off developing such a system from scratch?
If you are trying to categorize the files or folders in the document management system, That's not a difficult task. You only need to access to admin panel to maintain the folders or categorize the folders
In Laserfiche, You can easily categorize your folders regarding the departments and can also be subcategorized them
You should look into Alfresco. It's extremely extensible and provides a lot of ways of accessing the repository.
Note: click the "Developers" tab for the community edition.
My question is ... does anyone know of
any Document Management System that
allow for the type of flexibility we
currently have with hand-coding links
to our documents into various webpages
(major and minor , while also
providing security, ease of use, and
(less important) version control?
Or do you think I'd be better off developing such a system from scratch?
Well there are companies that make a living selling doc management software. Anything you can get off the shelf is going to be a huge time saver, and its going to be better than anything you could reasonably develop by hand.
I've used a few systems:
Sharepoint: although I hear some people don't like it, I didn't either ;)
HyperOffice worked really well for my company of around 150 employees and has all the features you describe.
Current company uses Confluence, I like it :) But its probably one of those tools whose pricetag isn't worth it, especially if you're only using a subset of its features like doc management.
I haven't used it, but one guy I know raves about Alfresco, a free and open source doc management system. I looked at its website, seems simple enough to use.
We also faced a similar problem. However version control was more on our priority and we did look into many solutions in and around. We found Globodox extremely easy to install and use and more important the support team was absolutely fantastic
Try Mayan EDMS, it's Django based, and open source, used it as a base and build the custom features you wish on top of it.
Code location: https://gitlab.com/mayan-edms/mayan-edms
Homepage at: http://www.mayan-edms.com
The project is also available via PyPI at: https://pypi.python.org/pypi/mayan-edms/

Resources