GDPR and Google Translate - google-translate

are there any detailed information on Google Translate and the GDPR?
In my opinion, translating personal data with the google translate widget is an big issue here, especially if you run an online-store and the user translates pages while checking out (i.e.: last checkout step, where all the personal data including cart-positions, billing-information and user contact-information are preset).
There is a way the exclude parts of the website from being translated (adding "notranslate" class attribute), but i assume the data itself is send to google translate servers anyway?
Looking forward to an answer.
Best regards,
Andrea

Collect some statistics about your visitors and implement localization for major languages they use.
This will probably prevent users from using google translate for your checkout process.

Related

Firebase - Machine Learning and interest tracking to create an algorithm for sorting posts

One of my applications includes user-generated posts and functions in a similar way to Instagram. When a user opens the app they see a feed of posts sorted by date. This works when there just one small demographic using the app, but as the user base becomes more diverse, not everyone is interested in the same posts. This is why apps like TikTok and Instagram have algorithms to decide which posts to show to a user. Where do I even start with this? I understand that there need to be tags on each post for what they are about (this is where I think I can use machine learning) and then each users information needs to include their interests (I’m not sure what can be used to change this as they like or dislike posts). Is there a simple pre-built way of doing this or any examples? It seems fo be a pretty big secret that mostly big tech companies understand and use.
You could use Google's "cloud vision api(For Images): https://cloud.google.com/vision" and "Video Intelligent Api(For videos): https://cloud.google.com/video-intelligence/docs".
Video Intelligence Api could handle images too from byte stream.
Build a firebase function that analyse posted media with these api.
Build the rest of the logic from here. Find a way to detect their interest from post, save their interests.

Import.io - Can it replace Kimonolabs

I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.

Is there built-in alternative to google analytics in Liferay

I'm looking for a portlet like-solution that would collect and report usage analytics in Liferay... but google analytics is not an option, unfortunately.
Stats by community, group, session tracking, apart from the usual bounce and exit rates, referrals, origin, etc. I know I'm kind of asking the reinvention of the wheel, but there are plenty of usage data that can be collected by Liferay that google can't. I've already checked PiWik, and it looks very impressive.
Any suggestions? TIA,
As of 2015 there is Audience Targeting plugin, which (at least for Liferay 6.2) comes bundled with analytics-api / analytics-hook modules, which collect some useful analytics data. Mind now:
So far it doesn't look like there is any standalone use for them as they were introduced, I believe, to enable the content visited, page visited and other such rules in the Audience Targeting itself; you can't see the raw events in any of the provided portlets
The events are stored as rows in a SQL database, so I would be concerned about it's performance in the long run (with thousands of clicks every minute etc.), although I say this purely theoretically as I haven't done any tests myself nor checked if there are some performance enhancing measures implemented
What you can do, however, is to put together your own portlet which would create some graphs etc. based on the data stored in CT_Analytics_AnalyticsEvent table.
Right now, I dont think there is any out of the box feature available for this, you might need to create this. There can be 2 things
1) You need to create a javascript library if you need realtime/web analysis (this is same like creating google analytics lib)
2) This option is quite easy. Liferay stores everything in db, you can have a report portlet which will show the report based on the data. We did this for one project where we were tracking the session ids/ip and logged in user details for portlets.
To achieve point 2) you can create new Liferay service, which will be used to store these data and retrieve.
Hope this helps
You already mention Piwik, which is similar to google analytics. You probably have your own theme (almost everybody changes the appearance to look like their own site) and it's quite appropriate to place the relevant piwik-stats-snippet in there.
You can also, as Felix suggests, mine your log files. Liferay stores some data, your webserver access logs also are quite worth to mine. And, of course, you can change your theme to log even more for every page access, just take care that you don't create a performance bottleneck by writing too much during one page request.
So, coming back to your question: Built-in like google analytics: No. Easily integrateable (like Piwik): Yes, of course. Completely customizeable: Yes, of course.
Edit: It just happens that David has created and documented an integration that makes using Piwik even easier

Access to old, no longer available, feed entries

I am working on a project that requires reliable access to historic feed entries which are not necessarily available in the current feed of the website. I have found several ways to access such data, but none of them give me all the characteristics I need.
Look at this as a brainstorm. I will tell you how much I have found and you can contribute if you have any other ideas.
Google AJAX Feed API - will limit you to 250 items
Unofficial Google Reader API - Perfect but unofficial and therefore unreliable (and perhaps quasi-illegal?). Also, the authentication seems to be tricky.
Spinn3r - Costs a lot of money
Spidering the internet archive at the site of the feed - Lots of complexity, spotty coverage, only useful as a last resort
Yahoo! Feed API or Yahoo! Search BOSS - The first looks more like an aggregator, meaning I'd need a different registration for each feed and the second should give more access to Yahoo's data but I can find no mention of feeds.
(thanks to Lou Franco) Bloglines Sync API - Besides the problem of needing an account and being designed more as an aggregator, it does not have a way to add feeds to the account. So no retrieval of arbitrary feeds. You need to manually add them through the reader first.
Other search engines/blog search/whatever?
This is a really irritating problem as we are talking about semantic information that was once out there, is still (usually) valid, yet is difficult to access reliably, freely and without limits. Anybody know any alternative sources for feed entry goodness?
Bloglines has an API to sync accounts
http://www.bloglines.com/services/api/sync
You have to make an account, subscribe to the feed you want to download, but then then you can download based on Date, which can be way in the past. Not sure of the terms.
The best answer I've found so far, is this: Google reader's unofficial API turns out to have a public access point for their feeds, which means there is no authentication needed. Use is as follows:
http://www.google.com/reader/public/atom/feed/{your feed uri here}?n=1000
replace the text in the squigglies (including the squigglies themselves) with the feed URI you're interested in. More information about the precise arguments can be found here:
http://blog.martindoms.com/2009/10/16/using-the-google-reader-api-part-2/
but remember to use the /public/ url if you don't want to mess with authentication

RSS/Atom for professional use

I wondered if anyone can give an example of a professional use of RSS/Atom feeds in a company product. Does anyone use feeds for other things than updating news?
For example, did you create a product that gives results as RSS/Atom feeds? Like price listings or current inventory, or maybe dates of training lessons?
Or am I thinking in a wrong way of use cases for RSS/Atom feeds anyway?
edit #abyx has a really good example of a somewhat unexpected use of RSS as a way to get debug information from program transactions. I like the idea of this process. This is the type of use I was thinking of - besides publishing search results or last changes (like mediawiki)
Some of my team's new systems generate RSS feeds that the developers syndicate.
These feeds push out events that interest the developers at certain times and the information is controlled using different loggers. Thus when debugging you can get the debugging feed, when you want to see completed transactions you go to the transactions feeds etc.
This allows all the developers to get the information they want in a comfortable way and without any need to mess a lot with configuration. If you don't want to get it there's no need to remove yourself from a mailing list or edit a configuration file - simply remove the feed and be done with it.
Very cool, and the idea was stolen from Pragmatic Project Automation.
Most of the digital libraries uses RSS/ATOM to display their search/results, data update, according to the OAI-PMH protocol
With our internal TRAC server, I'm subscribed to the timeline view for each project that I work on. It's great for keeping track of checkins and bug tickets. This is pretty exclusive to a developer position though.
I also am subscribed to the recent changes for our installation of MediaWiki that we use for our intranet. That way it's easy to see if documents that I need have been changed, or if there's new policies etc.
Our website has a news page that I wrote an RSS feed for as well. While you mentioned that you weren't really interested in recent news, it is nice to keep up with our press releases.
I have seen RSS used to syndicate gas prices from a service for a specific zip code.
there are many examples. Here are a couple.
SharePoint provides RSS feeds from its lists.
Many faceted navigation products allow you to get an RSS feed based on a selected filter. For example, you can navigate to view 24" LCD Monitors on newegg.com and then get an RSS feed of that view.
Mantis bug tracker includes RSS feeds although I wish they were more configurable. Also we use MediaWiki for documentation which has all sorts of RSS Feeds including a per page watch, and recent changes.
I just added RSS feeds to the ticketing system I use at work (TicketDesk) and that feature should be in the next release of the product.
It's nice because it basically provides me a custom search view of outstanding trouble tickets or work requests that comes to me rather then me having to go to the application. It also allows users to get feeds of issues they may be interested in, but not require them to get emails on each update.
I'm looking at implementing an RSS feed for calls for service that our agency takes, to provide the administrators a quick and easy way to see what has been going on.
Atom feed documents and Atom entry documents are used as the representation format for RESTful web services that follow the Atom Publication Protocol (AtomPub).
I personally have used syndication feeds to expose a sub-set of the Windows Event Log information so that I could subscribe and be notified of critical events on a server.
immobilienscout24
they use RSS feeds for updates on your search.

Resources