Howto visually design a mashup query for programatic extraction - rss

I'm into development of an application that fetches various inputs from internet pages whereas each information snippet comes from a different location (mashup).
I would like to generate the mashup building block (snippets) through a visual tool.
Do you know of anything similar that can be used for such a project? (Already made control, a sample code, article, etc.)
Preferred development environment is .NET - but not mandatory.

IMO the major challenge will be to extract the appropriate information from each feed in semantic form. Wikipedia describes mashups as:
There are many types of mashups, such
as consumer mashups, data mashups, and
enterprise mashups. The most common
type of mashup is the consumer mashup,
aimed at the general public.
Data mashups combine similar types of
media and information from multiple
sources into a single representation.
One example is AlertMap, which
combines data from over 200 sources
related to severe weather conditions,
biohazard threats, and seismic
information, and displays them on a
map of the world; another is Chicago
Crime Map, which indicates the crime
rate and location of crime in Chicago.
The classic mashup - Chicago crime - works because key information such as dates and geolocations are available semantically. Other types of common information are persons, organisations, and domain-specific identifiers.
When you have identified these you may wish to consider the RDF-based tools that the semantic web is developing. Note that governments are starting to emit their data in RDF so I would see this as a key technology
If your web pages do not have semantic information immediately you will probably have to create screen scrapers and HTML parsers. That's not very glamorous, there are no special tools and tends to be just hard work.

Related

Intent Data - How exactly are traceable urls used to track interest in b2b topics?

I've been doing some research on intent data and I have some technical questions, especially about how two businesses might be collecting "contact level" i.e. personally identified web traffic details without using third-party cookies.
Some quick background: Most of the large providers of intent data (bombora, the big willow/aberdeen/Spiceworks Ziff Davis, Tech Target etc.) offer "account" based intent data - essentially when users visit websites in their network, they do a reverse IP addresses lookup, match them to know IP addresses of large companies (usually companies with at least 250 employees) and note what topics are "surging" - aka showing unusual traffic on a given week. This largely makes sense to me. I'm assuming that when a visitor shows up at your site, google analytics and similar tools can tell you what google search keywords were used to arrive at your site, and that's how they can say things like - we can "observe intent signals across an unlimited number of contextual keyword categories, allowing you to customize your keywords and layer these insights onto your campaigns for optimal performance." Third party cookies, and data from DSP's (demand side platform's enabling ad buyers to buy ads across many platforms) are also involved in providing data, those these will be less useful sources of data after google sunset's third party cookies on Chrome.
Two providers - intentdata.io, and intentflow.com are offering contact level intent data. You can imagine why that would be of interest - if the director of sales is interested in your sales SaaS tool, you have a better idea of how qualified that lead is and who to reach out to. Only one of the two providers is specific about what exactly they're collecting - i.e. what "intent" they are capturing and how they're collecting it.
Intentdata.io:
Intentdata.io looks like a tiny company (two employees on LinkedIn). The most specific statement I've found about what their data is was in an Impact+ podcast interview - Ed, the CRO at intentdata.io, mentions that the data is analogous to commenting on a Forbes article or a conversation on LinkedIn. But he's clear - "that's just an analogy." They also say elsewhere that the data they provide mentions specifically what action the contact took that landed them in the provided data.
Ed from intentdata.io is also asked about GDPR compliance in his Impact+ interview - he basically says, some lawyers will disagree but he believes their data to be GDPR compliant, and it is in use by some firms in the EU. He does mention though that some firms have asked them to exclude certain columns from the data, like email addresses.
Edit: Found a bit more on intentdata.io - looks like they build a custom setup to pull "intent" data for each customer - they don't have a database monitoring company interaction with content across social media and b2b sites, instead you provide them with "lists (names and URLs) of customers, competitors, influencers, events, target accounts and key terms that would indicate intent at different stages in the buying journey. Pull together important hashtags, details on your ideal buyer (job titles, functions, seniority) and firmographics (size, industry, location)" - then they create a custom "algorithm" from this info, and they iterate on that "algorithm" a little bit over time.
They also make this statement on their site: "IntentData.io's data is collected from observing public actions that users are taking around the web. That means that first, we observe action (not reading, searching, browsing, being shown an ad, etc.) which we believe is a more concrete manifestation of intent. Second, people are taking these actions publicly for the world to see. We do not use any cookies, bidstream data or reverse IP lookups."
Finally one piece of their sales collateral asks: What ad budget do you have for PPC nurturing ads? So their may be some targeted PPC ads involved in the "algorithm."
Edit 2: Their sales collateral also states that they use "a third-party intent data methodology that uses multi-variable linear regression analysis to correlate observed actions with a specific contact. This is the method that the LeadSift engine of IntentData.io data uses."
Intentflow.com:
Intentflow.com seems like the sketchier of the two providers if I'm honest. They provide a video walkthrough of how they get their data at intentflow.com/thesis - but I'm not following how using "traceable urls" with no cookies involved, could give you contact level information. They also say they lookup what the most popular articles/pages are for 5k to 40k unique keywords or phrases that are related to 10-50 keywords or phrases you give them to target. And they use "traceable urls" to track who visits those sites. Again - no cookies involved. Supposedly fully compliant at least with US laws. They don't provide data for the EU "by design" so presumably they're not GDPR compliant? They also claim they can identify the individuals who are visiting your website, again using "traceable urls" - it seems clear from the pitch that you're asked to reach out to your backlink providers around the web to use this traceable url.
I've seen an interview where a rep from Bombora says they tried for a while to do contact level intent data and it wasn't very useful - and it wasn't really doable in a compliant way. Ed seems to be aware they've said that publicly, and he says "that's just not true."
So what's going on here? How exactly are these two small firms getting contact level intent data? Do you think they're doing it in a compliant way?
Got more information:
Intentdata.io use public comments, likes, shares etc. on blogs, social posts via web crawling and scraping for events, influencers, hashtags, articles etc. that the customer deems worth tracking. They do some work to try and connect the commenters with an identifiable contact. They bill on a quarterly basis for this.
Intentflow.com doesn't seem to use "traceable urls" at all. They take bidstream data, and identify the individual visitors via an "identity graph." They provide a minimum of 5k contacts per month at $2 per contact, making their data very expensive ($120k+ per year). You can't get lower than however many contacts their system spits out per month so it seems like there's not a good firm limit on what you will be charged. They say they can identify ~70% of web traffic, and they only provide data on US site visitors. Each row of their output would include not just the contact, but the site that contact was shown an ad on. Definitely interesting data - but I'm guessing they will be very affected by upcoming changes to third party cookies, privacy laws, etc.

Does aggregation/reporting of scoring across distinct but related SCORM packages usually fall under the remit of the containing LMS?

I am designing an SCORMv1.2 based Elearning solution for a client who manage their existing training courses via SABA LMS. I am considering providing each section of the course as a separate SCO with its own score tracking.
I am wondering whether it is usually a function of the LMS to aggregate tracking scores across distinct SCO's for a user, or whether I should be creating multi-package SCO which aggregates scores for each of its child SCO's.
I'd say there's a mix of some LMSs that will do some aggregation with scores and some that don't.
It's common practice for content vendors to deliver a complete block of content as a single package and often as a single SCO so that they can control the look and feel of the navigation. This means they often just present an aggregated score to the LMS.
It is possible to do more complex things with SCORM 2004 including multi-sco packages that include their own navigation menus, but this is not commonly done.
Here's some statistics on what features of SCORM are commonly used.
I would say, "Yes". The vendor who publishes the content need to configure the each section of the course as a separate SCO so that they can be tracked separately in the LMS.
A SCORM 1.2 course can be delivered as multi SCO by using the parameters attribute of the item element in the manifest file. This can also be acheived by using different start page for each SCO rather than one start page that is common for the entire course.

Insta-like filters on Google Street view imagery

Is it eligible to apply real-time insta-like filters on Google Street View imagery retrieved via the google street view API that will be published online as a part of a slideshow in a web app?
No Google street view Content will be cached on a third party server in the process.
Do you think this would be in alignment with google's terms?
Streetview seems to be under the Google Maps umbrella as far as the api TOS goes. The document isn't that long, but your use case doesn't seem to be in direct violation. I would still be wary of something such as Section 8.3 (a):
Content (including but not limited to map data, traffic, directions,
and places) is provided for planning purposes only. You may find that weather conditions, construction projects, closures, or other events may cause road conditions or directions to differ from the results depicted in the Content. You should exercise judgment in your use of the Content.
Although that seems more like a caveat emptor warning admonishing them of liability if their data is inaccurate and tells you to turn left off of a cliff. I insist that you should really read the actual terms. There are a lot of little things that they seem to be worried about in there and I would definitely recommend having an attorney help you navigate what's going on in there.
I'd hazard to say you'd be fine especially with no caching, but you should really consult with an attorney and read the terms.

Travel APIs how to integrate them all?

I may start working on a project very similar to Hipmunk.com, where it pulls the hotel cost information by calling different APIs (like expedia, orbitz, travelocity, hotels.com etc)
I did some research on this, but I am not able to find any unique hotel id or any field to match the hotels between several API's. Anyone have experience on how can to compare the hotel from expedia with orbitz or travelcity etc?
Thanks
EDIT: Google also doing the same thing http://www.google.com/hotelfinder/
From what I have seen of GDS systems, and these API's there is rarely a unique identifier between systems for e.g. hotels
Airports, airlines and countries have unique ISO identifiers: http://www.iso-code.com/airports.2.html
I would guess you are going to have to have your own internal mapping to identify and disambiguate the properties.
:|
When you get started with hotel APIs, the choice of free ones isn't really that big, see e.g. here for an overview.
The most extensive and accessible one is Expedia's EAN http://developer.ean.com/ which includes Sabre and Venere with unique IDs but still each structured differently.
That is, you are looking into different database tables.
You do get several identifies such as Name, Address, and coordinates, which can serve for unique identification, assuming they are free of errors. Which is an assumption.

Does anyone know of a service/db I can use for businesses and geocode?

I was wondering how and where companies like Foursquare/Gowalla find and keep up to date their list of location/businesses.
Is it a web service? Do they buy a directory and enter it into a database?
This is from a comment I found at http://www.quora.com/Where-or-how-does-a-company-like-Foursquare-get-a-directory-of-all-locations-and-their-addresses
Companies usually get place data from one of the following:
Data licenses: Companies like Localeze, InfoUSA, Amacai etc.
license location data: Big players like TeleAtlas and Navteq serve as global aggregators of this data. There are also lots of small niche players that license e.g. restaurant data only, or ATM data only, on a per country basis.
Crowd Sourcing. Some companies crowd source their data. Open Data Sets. There are
some data sets with a creative commons or other license from which location related data can be extracted. E.g. GeoCommons and Wikipedia. APIs. A number of companies provides APIs by which you can access data on the fly. This include GeoAPI.com, Google, Yelp and others.
In general, this data is fragmented both in type (e.g. POI vs neighborhood or geocode) and place (US vs UK vs South Africa vs Wherever)
Google has a geocoding service that's freely available for personal use.
For business use, it costs a few$, but it's still pretty reasonable.
And the API is pretty straightforward
http://code.google.com/apis/maps/documentation/javascript/v2/services.html

Resources