Categories of tags - vocabulary

I'm starting a pro bono project that is the web interface to the world's largest collection of lute music and it's a challenging collection from several points of view. The pieces are largely from 1400 to 1600, but they range from the mid-1200's to present day. Needless to say, there is tremendous variability in how the pieces are categorized and who they are attributed to. It is obvious that any sort of rigid, DB-enforced hierarchy isn't going to work with this collection, so my thoughts turn to tags.
But not all tags are the same. I'll have tags that represent a person/role (composer, translator, entabulator, etc.), tags that represent the instrument(s) the piece in written for, and tags that represent how the piece has been classified by any one of half a dozen different classification systems used over the centuries.
We will be using a semi-controlled tag vocabulary to prevent runaway tag proliferation (e.g. del.icio.us), but I want to treat the tags as belonging to different groups. People tags should not be offered when the editor is doing instrument tagging, etc.
Has anyone done something like this? I have several ways I can think of to do it, but if there is an existing system that is well-done it would save me time implementing/debugging.
FWIW: This is a Django system and I'm looking at starting with Django-tagging and then hacking from there, possibly adding a category field or ...

There's an issue #14 for django-tagging filed back in 2007 which is trying to address this problem. Don't know whether developers are planning to add this feature or not.
However, there's a machinetags branch of django-tagging mantained by Gregor Müllegger here at https://code.launchpad.net/~gregor-muellegger/django-tagging/machinetags/. It allows to assign tags namespaces (and/or values), and facilitate querying tags by namespace / value. So you'd be able to tag a piece with instrument:<instrument_name> or instrument=<instrument_name>, for example.
It's mostly in sync with the django-tagging trunk (the latest commit is there's a number of commits missing though). I remember myself working on some project using that branch about a year ago; it worked fine. Read the documentation on branch and comments on the issue for more details.

Related

What is the proper way to tag pages in Google Analytics?

I don't even know if "tagging pages" is what I mean.
Essentially, I have a large education website with many types of pages. Specifically, I want to tag our program pages by faculty, level, etc. For example, the Biology program page would be tagged with Science (as its faculty), and Undergraduate (as its level). It's possible that a program could belong to multiple faculties and/or levels (Psychology, for instance, is both a Science program and an Arts program). There is nothing in the URL to signify faculty or level. The website is built in Drupal, in case you know of any modules that could facilitate this.
I want to understand how different faculties/levels/etc perform. I will be building reports in Google Data Studio.
Any guidance would be appreciated!
What you are looking for is called 'content grouping'. If you haven't information in the URL you can define some rules when the page loads and pass the information to Analytics with the pageviews.
You can find more information here:
https://support.google.com/analytics/answer/2853423?hl=en
Then you can get these information from Data Studio.
Because of your multi-value needs, nothing in GA is going to satisfy your requirements out of the box. You will have to do some post-processing, and I am not familiar enough with Data Studio to know where its limits are in that regard.
As the previous poster suggested, Content Grouping is the standard way to create custom aggregations of pages. You can have multiple content groupings, such as Faculty and Level, but a page can be in only one group per grouping (not the clearest terminology but it appears to be what Google uses).
A different option is Custom Dimensions. There are two options here. One is to create custom dimensions for Level and Faculty. Each page can still have only one value per dimension, but you could send a comma-delimited string when a department is in multiple faculties (for instance) and then pull it apart again in a spreadsheet.
The second option is to create a custom dimension for Department directly, and associate each department to the appropriate one or more faculties and levels in your reporting.
How you set the custom dimensions or content grouping will depend on your implementation of GA. If you are using the Google Analytics Drupal module, it says it supports setting custom dimensions as a feature. If you are using Google Tag Manager you can set the dimension value in your tags directly, though of course it will need to decide what value to set on based on either totally enumerated rules you write or something it can read out of the page. Here is some Tag Manager documentation: Content Grouping via GTM; Custom Dimensions via GTM.
If the department is present in the page in some consistently marked-up way you can grab it; if not the Metatag module or one of its schema.org extensions might be able to provide you a spot to set a value for GTM to retrieve.

Drupal 8: How do I create a content reference/textarea compound field?

I have a company content type being used on a website-in-progress. Companies consist of a company name, name sorting mechanism, logo image and URL. I have a requirement to show companies taking part in different use cases in various parts of the website.
Challenge:
In the past I used a company content type for a "corporate members" list on a page, also assigning a "member level" field. But then, companies started being listed as event sponsors with specifics for what they were sponsoring, also with different membership levels in a "corporate giving club". It was messy while added/included in the old company content type. It caused confusion for content editors and developers.
Possible Solution:
In other content types, create compound fields such as "Sponsors", "Corporate members", "Donors", select lists with company entities, or sponsorship detail text areas with company entities. That may keep companies flexible and not messy (keeping company just a company).
Progress:
I found the Double field module, splitting core fields up into two separate parts. This is almost what I'm looking for, providing compound field sets. However, it won't allow references to be selected and is out of the scope of this project per this issue. I'm finding a lot of Drupal 7 examples, but not much Drupal 8 yet.
I'm starting to dig into the examples for developers module for inspiration and will post a solution unless you beat me to it. I'm also open to alternative ideas.
What is the best way to proceed? Specific code examples are not required. I would accept clues that lead me to a final working solution.
The solution turned out to be simpler than creating a custom module.
The Field Collection module allowed for the simple creation of compound fields containing any number and most (if not all) field types, including reference fields. It's also available as a Drupal 8 module (alpha at the time of writing, seems to work fine).

Using events as page section usage

I'm currently researching a solution to monitor the performance of specific sections of a page. For example, you have a simple page with 2 images with links to other pages. You are driving lots of traffic to this page and you are experimenting with different contents on that page.
6 months after, you want to see which section of the page performed better with what kind of specific imges.
Let's imagine you require a report that should tell you the following: on average, the first spot performs better, but last week the image was bad and that's why you had less conversion from that spot.
I'd like to use such a system on a high-traffic homepage of an eCommerce website, in order to better monitor the usage of the selling spots.
I was thinking to use Google Analytics events with a positioning scheme (splitting the website in columns and rows, giving to each cell an identification ID such as a1 for column a, row 1) and keeping a local datawarehouse of creatives (images, promotions etc.), but apparently, after 10.000.000 hits per month, Analytics is recommending the premium version which is quite pricey (12k USD per month, 1 year upfront payment).
I was thinking about PIWIK as an alternative, but there is no event tracking there - or am I missing anything?
Looking forward to hearing your input on this matter.
You're better off with a provider like Optimizely for this use case. Still gonna be expensive, but it'll more quickly get you the information you need to make decisions.
We normally use multi variation tests or A/B tests to measure the success of user interfaces. Google Analytics have this feature and it is free.
This links maybe useful
https://www.youtube.com/watch?v=yDWTMOC_Dp4
https://support.google.com/analytics/answer/1745147?hl=en

Scrape all google search result for a specific name

I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.

Using Yahoo! Pipes

Have you used pipes.yahoo.com to quickly and easily do... anything? I've recently created a quick mashup of StackOverflow tags (via rss) so that I can browse through new questions in fields I like to follow.
This has been around for some time, but I've just recently revisited it and I'm completely impressed with it's ease of use. It's almost to the point where I could set up a pipe and then give a client privileges to go in and edit feed sources... and I didn't have to write more than a few lines of code.
So, what other practical uses can you think of for pipes?
It's nice for aggregating feeds, yes, but the other handy thing to do is filtering the feeds. A while back, I created a feed for Digg (before Digg fell into the Fark pit of dispair). I didn't care about the overwhelming Apple and Ubuntu news, so I filtered those keywords out of Technology, which I then combined with Science and World & Business feeds.
Anyway, you can do a lot more than just combine things. If you wanted to be smart about it, you could set up per-subfeed and whole-feed filters to give granular or over-arching filtering abilities as the news changes and you get bored with one topic or another.
The one thing I have really used Y! Pipes for (rather than just playing around with it) is to clean up item titles, merge and finally de-dupe the feeds I got from querying multiple blog search engines with the same search term. This is something I’ve done in several very different contexts, eg. for my own ego surfing, in another case for the planet site set up by some conference’s organisers to keep an eye on their conference’s buzz, etc. Highly recommended.
You can do tons of things with pipes. For example for sites like digg or reddit, you can make one to bypass the site and go directly to the linked article (rewriting the RSS).
I like also to filter webcomics' feeds to keep just the comics, and then mix them all in only one feed
I've taken the liberty of copying your pipe and rearranging it a bit so that it's easier to add and remove tags:
Yahoo Pipe: StackOverflow Merge Tags
Tags are now listed in a string builder, so to add a tag you just have to hit the + button on the string builder and type in the tag preceded by a slash.
Well, pipes are real fast and useful.
Other effective uses might be:
1) combine many feeds into one, then sort, filter and translate it.
2) geocode your favorite feeds and browse the items on an interactive map.
3) power widgets/badges on your web site.
4) grab the output of any Pipes as RSS, JSON, KML, and other formats.
This is by no means a comprehensive list.
One of my favorite things to do with Yahoo! Pipes is to aggregate multiple craigslist feeds into a single feed. You can make a feed out of any category or search criteria on craigslist. I live in a university town and am always on the lookout for tickets to sporting events, for example. I have a half-dozen craigslist searches all being combined into a single feed via Yahoo! Pipes. This works a lot better for me than simply monitoring the entire "Tickets" category; filters out most of the tickets I am not interested in. Yes, this is another aggregating feeds example, but the craigslist usage is quite valuable with the ability to aggregate feeds that are themselves based upon searches.
I've used Pipes to translate blogs into English. I would have liked to use it to fetch the full text for blogs which only provide a summary of the content in the feed, but unfortunately they don't provide any input which fetches the content from a parameterizable source :-(.
Just stumbled on this while looking for ways to connect Excel to Pipes. A bit necromancer-ish, but here goes.
One thing I've done, is take an HTML page (science data) which has links to tons of CSV files for a bunch of Army Corps measurement stations. Each station has a big table of datafiles, all organized individually by month and year. I use YQL to parse out and organize the links to the individual CSV files in a way that Pipes can read them. Then, I use that as input into a Pipe, which has a user input for "Station" and "Date."
Using this, I can go to the Pipes page, type in those values and get the values only for a specific station and date, rather than have to find the station on a website, find the year and month in a big table, click the link, open the CSV file, and find the values for a day within that month's worth of data. I can even change the pipe to specify the hour, and the parameter, and then get a single value returned.
Now, I wish I could figure out how to program Excel so that I can use "=yahoo_function(station, datetime)" to place that value automatically into a cell give the values of other columns!

Resources