Google Dataprep - replace data in columns - google-cloud-datastore

I have started to use Google's Dataprep solution to cleanse eCommerce product feeds. As I receive data from 100s of eCommerce stores, I want to cleanse the data for consistency and rename the various spellings of brand names. For example, I have a column 'Vendor' that has millions of rows with Adidas spelt differently:
adidas
Adidas
Adidas classic
Adidas orginals
adidas originals
adidas skateboarding
Adidas Skateboarding
For the purpose of my requirements, I want to rename all examples to 'adidas'. I was looking at the various routines in Dataprep and the Replace function could do the work, however, it's not a scalable solution.
Is there a way in Dataprep to have a master file of brand names and do a lookup on this data and replace the incorrect instances? In Excel, a simple VLOOKUP might work and I am questioning if this exists in Dataprep.
I hope the above makes sense, thank you to those who can help.
Craig

If you have a master file that maps incorrect spellings to a standardized name, the lookup dialog (in the column menu) might be what you're looking for: https://cloud.google.com/dataprep/docs/html/Lookup-Wizard_57344860

Related

Is it possible to scrape multiple data points from multiple URLs with data on different pages into a CSV?

I'm trying to build a directory on my website and want to get that data from SERPs. The sites from my search results could have data on different pages.
For example, I want to build a directory of adult sports leagues in the US. I get my SERPs to gather my URLs for leagues. Then from that list, I want to search those individual URLs for: name of league, location, sports offered, contact info, description, etc.
Each website will have that info in different places, obviously. But I'd like to be able to get the data I'm looking for (which not every site will have) and put that in a CSV and then use it to build the directory on my website.
I'm not a coder but trying to find out if this is even feasible from my limited understanding of data scraping. Would appreciate any feedback!
I've looked at some data scraping software. Put requests on Fiverr with no response.

CSV list of all universities - google maps

I have a CSV list of university names around the world - about 13,000 university names. I'm looking for a way to pull the addresses of these universities. Google Maps API / Google Places API looks promising, but requires lat/long to map the locations.
End game is to mark to each school as a 1 if the school is in the US, and 0 if the school is outside of the US.
Any thoughts on how to search these colleges in maps and pull out the addresses - or at least the country?
Example:
is there nothing else in the csv, only the names? that's going to make it hard, i'd bet the names aren't always unique in the world.
you could write something that had different passes at biting the apple - for instance, if the university has a state name in it, check those off as 1's - then find another logic to use to take "another bit" until the apple is gone.
On top of #WEBjuju's answer, since you only want to mark if the school is in US, or outside of US, you can use the "country" type in Place Types in the Google Places API, by setting the option as country='us'.
https://developers.google.com/places/supported_types?csw=1#table2
You may also want to cross check with this list of schools.
https://www.4icu.org/reviews/index2.htm
https://en.wikipedia.org/wiki/Lists_of_universities_and_colleges_by_country

Woocommerce reuse values in variable products

I hope someone can help me with this :)
I will be selling a product that has 10 variables, but all the choices need to be the same for all, how can I add and reuse the values once instead of copying and pasting values 10 times?
What I will be selling is a 10 meal bulk package and each meal needs the
same choice values.
Picture of what I'm trying to achieve here
Thanks for any help in advance :) Scott
I have 3 different products, 5 meals, 7 meals and 10 meals in which the customer can choose their own meals, the attribute values will also be changing weekly. which at the moment I am pasting in new values 22 times in the edit product.
If I delete and duplicate the product I lose the URL
When I go to Products > attributes still the same making 10 attributes like meal 1, meal 2, meal 3 etc I can't duplicate them and have to do it all manually up to 40 times with new values.
[Picture Front end here]
(https://dl.dropboxusercontent.com/u/61411490/Screen%20Shot%202015-09-17%20at%2009.33.32.png).
This is a very late answer, but I have encountered the same issue and my solution might be of interest to the next person who happens on this question.
I am using WP All Import and the Woocommerce extension (both free, but variable products require a license) to maintain product details in excel. Updating data on the website is as easy as uploading a new CSV. Keeping data in sync in an excel file, with references, formulas, etc. becomes much easier, and you can often completely factor out duplicate data.
Instead of using Woocommerce's variable products, I created each version as a simple product and linked between them.

Trying to build a 'random text' WordPress plugin. No idea where to start

Here’s what I’ve done in a spreadsheet:
I’ve assigned people to one or more categories (ie. male, female, tall, short)
I’ve assigned weights to these people (ie. 200 lbs, 120 lbs, 300 lbs)
I’ve assigned names to these people (ie. John, Jane, Bill)
Here’s what I need to do in a plugin:
Find some way to get my data into it (maybe through an admin interface, or via my spreadsheet)
Filter results by one or more categories (ie. only male; only tall + female, etc)
From those filtered results, pull 2 or 3 people (as many as I can fit) whose combined weights equal X or less
Display the names of those 2 or 3 people as a list to front-end users
At the press of a button, randomly generate another 2 or 3 person team
I don’t mind getting my hands dirty, but I don't know where to begin. If you guys could give me any advice, best practices, code to get me started, or names of plugins that already do this, etc, I’d really appreciate it.
Also, if I’m biting off too much for a complete noob, feel free to let me know. Because if it comes down to it, I’ll just create the teams manually and throw them into a random text plugin, or something.
This depends heavily on what format it's in presently. If it's in a spreadsheet, you can import it pretty easily by saving as a CSV and processing it with fgetscsv
Assuming this is going into MySQL (as most WP plugins do), this is just a SQL query (ie WHERE wp_custom_person_record_weight > 100 AND wp_custom_person_record_name != 'Bill'
Same as #2 but with a JOIN and a SUM and a WHERE query against that sum.
This is the same SQL query, if you call it through mysqli_query you'll get an array back that you can output on the page
Random records can be gleaned a number of ways, either by going through a limit of X,2 where X is a randomly generated number between 0 and the # of records or through MySQL itself (although that is not recommended for performance reasons).

Using Yahoo! Pipes

Have you used pipes.yahoo.com to quickly and easily do... anything? I've recently created a quick mashup of StackOverflow tags (via rss) so that I can browse through new questions in fields I like to follow.
This has been around for some time, but I've just recently revisited it and I'm completely impressed with it's ease of use. It's almost to the point where I could set up a pipe and then give a client privileges to go in and edit feed sources... and I didn't have to write more than a few lines of code.
So, what other practical uses can you think of for pipes?
It's nice for aggregating feeds, yes, but the other handy thing to do is filtering the feeds. A while back, I created a feed for Digg (before Digg fell into the Fark pit of dispair). I didn't care about the overwhelming Apple and Ubuntu news, so I filtered those keywords out of Technology, which I then combined with Science and World & Business feeds.
Anyway, you can do a lot more than just combine things. If you wanted to be smart about it, you could set up per-subfeed and whole-feed filters to give granular or over-arching filtering abilities as the news changes and you get bored with one topic or another.
The one thing I have really used Y! Pipes for (rather than just playing around with it) is to clean up item titles, merge and finally de-dupe the feeds I got from querying multiple blog search engines with the same search term. This is something I’ve done in several very different contexts, eg. for my own ego surfing, in another case for the planet site set up by some conference’s organisers to keep an eye on their conference’s buzz, etc. Highly recommended.
You can do tons of things with pipes. For example for sites like digg or reddit, you can make one to bypass the site and go directly to the linked article (rewriting the RSS).
I like also to filter webcomics' feeds to keep just the comics, and then mix them all in only one feed
I've taken the liberty of copying your pipe and rearranging it a bit so that it's easier to add and remove tags:
Yahoo Pipe: StackOverflow Merge Tags
Tags are now listed in a string builder, so to add a tag you just have to hit the + button on the string builder and type in the tag preceded by a slash.
Well, pipes are real fast and useful.
Other effective uses might be:
1) combine many feeds into one, then sort, filter and translate it.
2) geocode your favorite feeds and browse the items on an interactive map.
3) power widgets/badges on your web site.
4) grab the output of any Pipes as RSS, JSON, KML, and other formats.
This is by no means a comprehensive list.
One of my favorite things to do with Yahoo! Pipes is to aggregate multiple craigslist feeds into a single feed. You can make a feed out of any category or search criteria on craigslist. I live in a university town and am always on the lookout for tickets to sporting events, for example. I have a half-dozen craigslist searches all being combined into a single feed via Yahoo! Pipes. This works a lot better for me than simply monitoring the entire "Tickets" category; filters out most of the tickets I am not interested in. Yes, this is another aggregating feeds example, but the craigslist usage is quite valuable with the ability to aggregate feeds that are themselves based upon searches.
I've used Pipes to translate blogs into English. I would have liked to use it to fetch the full text for blogs which only provide a summary of the content in the feed, but unfortunately they don't provide any input which fetches the content from a parameterizable source :-(.
Just stumbled on this while looking for ways to connect Excel to Pipes. A bit necromancer-ish, but here goes.
One thing I've done, is take an HTML page (science data) which has links to tons of CSV files for a bunch of Army Corps measurement stations. Each station has a big table of datafiles, all organized individually by month and year. I use YQL to parse out and organize the links to the individual CSV files in a way that Pipes can read them. Then, I use that as input into a Pipe, which has a user input for "Station" and "Date."
Using this, I can go to the Pipes page, type in those values and get the values only for a specific station and date, rather than have to find the station on a website, find the year and month in a big table, click the link, open the CSV file, and find the values for a day within that month's worth of data. I can even change the pipe to specify the hour, and the parameter, and then get a single value returned.
Now, I wish I could figure out how to program Excel so that I can use "=yahoo_function(station, datetime)" to place that value automatically into a cell give the values of other columns!

Resources