Perhaps trivial question, but I couldn't find anything relevant.
If I have a long list of let say surnames or maybe better institution names which are usually long. How do I do lookup queries when user types first few characters?
e.g.
technology institute
science institute
literature seminar
techinvest
technology park
animal shelter etc..
simply when user start typing tech.... I want to provide
technology institute
techinvest
technology park
as resutlt for dropdown.
Related
I’m doing research on Solo Brands which owns the Solo Stove, Chubbies (apparel), Oru (Kayak), and ISLE (paddleboards) brands. Their current wholesale partners include Dick’s, REI, ACE Hardware, Scheels, Academy Sports, and Costco (which just launched). I wanted to do a web scraping analysis to determine how many locations its brands are currently in for each (and how many carry 3-4 of their brands) to size the opportunity for further distribution gains in the wholesale channel. For example, the company claims they have been in all REI locations since 2017, so I’d like to corroborate that. I also assume they’re only in a handful of Costco locations since they just kicked off there. Hope that makes sense.
I am not a coder by trade and would appreciate any guidance.
Thanks so much,
B-Mac
I want to record the job posting information from this search. Is anyone aware of an API or can you confirm it's possible to scrape with Python beautiful soup? (I'm familiar with scraping, I just can't see how to get this website)
Disclosure: I work at SerpApi.
You can use google-search-results package to get data from Google Jobs listings. Check a demo at Repl.it.
from serpapi import GoogleSearch
params = {
"engine": "google_jobs",
"q": "sustainability jobs in mi",
"google_domain": "google.com",
"api_key":
"API_KEY"
}
client = GoogleSearch(params)
data = client.get_dict()
print("Job results")
for job_result in data['jobs_results']:
print(f"""Title: {job_result['title']}
Company name: {job_result['company_name']}
Description: {job_result['description']}
""")
print("Filters")
for chip in data['chips']:
print(f"Type: {chip['type']}\n")
print("Options")
for option in chip['options']:
print(option['text'])
Response
{
"jobs_results":[
{
"title":"Sustainability Analyst",
"company_name":"Amcor",
"location":"Ann Arbor, MI",
"via":"via LinkedIn",
"description":"Amcor Limited Job Posting\n\nRole: Sustainability Analyst\n\nLocation: TBD, ideally in the US (Ann Arbor, MI)\n\nAbout Amcor\n\nAmcor (ASX: AMC;\n\nAmcor is proud of its recent pledge to design all of our packaging to be recyclable or reusable by 2025. The job holder will play a very important and exciting role in Amcor’s journey to deliver this important commitment.\n\nPosition Overview\n\nRead more about Amcor’s sustainability commitment:\n\nThe Sustainability function plays a key role in positioning Amcor as THE leading packaging company for the environment delivering on Amcor’s sustainability strategy, the 2025 pledge and as a supplier of choice for responsible packaging.\n\nThe Sustainability Analyst is responsible for analyzing, reporting, and coordinating selected global Sustainability activities with direction from the VP Sustainability.\n\nEssential Responsibilities And Duties\n• Track legislative activity, analyze for risk and opportunity, help to prioritize actions\n• Assist with drafting... positions, coordinate Amcor activity and governance around advocacy (mostly in industry group participation)\n• Assists with internal reporting and communications, including preparing decks for internal meetings\n• Partnership administration, tracking projects and payments, and liaising with corporate finance on dept budget\n• Manage compliance statements, including anti-slavery statements, conflict minerals etc.\n• Coordinates the International Costal Cleanup, as needed with other partners\n• Other similar duties as required to support the corporate sustainability program\n\nQualifications\n• Education: Master's Degree or equivalent in related field preferred\n• Three to five years of experience\n• Strong analytical skills, including ability to interpret and graphically display environmental performance data\n• Excellent written and verbal communications skills\n• Excellent working knowledge of Microsoft Office\n• Demonstrated professional work characteristics including high initiative, dependability, and ability to manage confidential information\n• Must be well organized and comfortable interfacing with all levels of management\nAmcor Leadership Framework Competencies\n• Drive for Results\n• Influencing Others\n• Customer Focus\n• Learning on the Fly\n• Interpersonal Savvy\n• Organizational Awareness\n• Priority Setting\n• Organizing\n• Functional / Technical Skills\n• Strong Computer Skills\n\nRelationships\n• Amcor Leadership\n• Direct Reports\n• External Vendors\n• Government agencies\n• Global partners/ Nonprofit organizations\n• Industry organizations\nExpected Travel: 10% Travel\n\nThe information contained herein is not intended to be an all-inclusive list of the duties and responsibilities of the job, nor are they intended to be an all-inclusive list of the skills and abilities required to do the job.\n\n#North America",
"extensions":[
"Over 1 month ago",
"Full-time"
]
},
{
"title":"Environmental Jobs in Michigan,USA",
"company_name":"freelancejobopenings.com",
"location":"Michigan",
"via":"via Freelance Job Openings",
"description":"Environmental Jobs in Michigan,USA\n\nSummer Camp Instructor\n\nenvironmental learning center at barr lake state park with a satellite office in fort collins and fieldwork outposts in environmental science, leadership, and or outdoor adventure programs for diverse audiences in formal and non formal outdoor and classroom environmental studies, biological sciences, natural resource management, or related field, with a focus in ornithology.\n\n strong summer, birding, camp, education, colorado, outdoors, teaching\n\nwebsite: barefoot student summer camp\n\nSITE LEAD\n\nenvironmental changes, and sudden work schedule changes.\n• tech savvy: frito lay is an industry leader site: fritolay the site lead is accountable for ensuring the building is operating at top performance to deliver the zone sops strategy and ensures a safe working environment. the role requires cross functional understanding in order to drive operations success.\n\nwe are open 24 hours a day, which means\n\nField Service ... Chromatography Spectrometry Instruments - Grand Rapids, MI\n\nenvironmental testing, and forensic toxicology looking to hire field service engineer to support lcms and gcms platforms. travel to client labs to perform calibrations, diagnose problems with equipment field service chromatography spectrometry instruments grand rapids, mi\n\nleader in liquid chromatography mass spectrometry and gas chromatography mass spectrometry, supporting clinical research, drug discovery, food and environmental testing, and forensic toxicology looking to hire field service engineer to support\n\nUTA Test Engineer\n\nenvironmental demands may be referenced in an attempt to municate the manner in which this position traditionally is performed. about capgemini:\n\na global leader in consulting, technology services and digital transformation, capgemini is at the forefront of innovation to address the entire breadth of clients’ opportunities in the evolving world of cloud, digital and platforms. building on its strong 50 year heritage and deep industry specific expertise, capgemini enables organizations to realize\n\nIndustrial Water/Wastewater Design Engineer\n\nenvironmental, civil, or chemical\n• 4+ years of industrial water wastewater system environmental, civil or chemical\n• water wastewater treatment design experience in variety industrial markets\n• experience with biological and physical chemical treatment design build experience\n\nwhat we offer engineering water wastewater\n\nbusiness line design and consulting services group (dcs)\n\ncountry",
"extensions":[
"13 hours ago",
"Full-time"
]
}
]
}
If you want more information, check out SerpApi documentation.
I have a list of data. It's all a single column, each row is a comment from a post asking for book recommendations. Here's an example, containing the first 2 entries:
"My recommendations from books I read this year:<p>Bad Blood : Man, this book really does read like a Hollywood movie screenplay. The rise and fall of Theranos, documented through interviews with hundreds of ex-employees by the very author who came up with the first expose of Theranos. Truly shows the flaws in the "fake it before you make it" mindset and how we glorify "geniuses".<p>Shoe Dog : Biography of the founder of Nike. Really liked how it's not just a book glorifying the story of Nike, but tells the tale of how much effort, balance and even pure luck went into making the company the household name it is today.<p>Master Algorithm : It's a book about the different fields of Machine learning (from Bayesian to Genetic evolution algos) and talks about the pros and cons of each and how these can play together to create a "master algorithm" for learning. It's a good primer for people entering the field and while it's not a DIY, it shows the scope of the problem of learning as a whole.<p>Three Body Problem: Finally, after years of people telling me to read this (on HN and off), I read the trilogy (Remembrance of Earth's Past), and I must say, the series does live up to the hype. Not only is it fast paced and deeply philosophical, but it's presented in a format very accessible to casual readers as well (unlike many hard sci-fi books which seem to revel in complexity). If I had to describe this series in a single line, it's "What would happen if China was the country that made first contact with an alien race?"","A selection:<p>Sapiens (Yuval Noah Harari, 2014 [English]) - A bit late to the party on this one. Mostly enjoyed it, especially the early ancient history stuff, but I felt it got a bit contrived in the middle - like the author was forcing it. Overall a good read though.<p>How to Invent Everything (Ryan North, 2018) - First book I've pre-ordered in a long time. A look at the history of civilization and technology through a comedic lens. Pretty funny and enjoyable.<p>The Rise of Theodore Roosevelt (Edmund Morris, 1979) - Randomly happened across this book while browsing a used bookstore for some stuff to read on a summer vacation. Loved it. It's big, but reads pretty quick for a biography. I've been a fan of TR since I first really learned about him in High School and I would recommend this for anyone interested in TR/The West/Americana.<p>Jaws (Peter Benchley, 1974) - Quite a bit darker than the movie.<p>Sharp Objects (Gillian Flynn, 2006) - I enjoyed Gone Girl (book and film) so I wanted to read this before the HBO series. To be honest...not my cup of tea. It was <i>okay</i>.<p>The Art of Racing in the Rain (Garth Stein, 2008) - Made me cry on an airplane. Thankfully my coworkers were on a different flight."
(Notice, comments are separated by ",")
I'm trying to load this list into a data table in an R sandbox (rapporter.net). But because of browser security, I can't load a local file (fread, read.table).
How can I read raw data into a data table in R?
I am trying to scrape and parse the following RSS feed http://www.nestle.com/_handlers/rss.ashx?q=068f9d6282034061936dbe150c72d197. I have no problem to extract the basic items that I need (e.g., title, description, pubDate) using the following code:
library(RCurl)
library(XML)
xml.url <- "http://www.nestle.com/_handlers/rss.ashx?q=068f9d6282034061936dbe150c72d197"
script <- getURL(xml.url)
doc <- xmlParse(script)
titles <- xpathSApply(doc,'//item/title',xmlValue)
descriptions <- xpathSApply(doc,'//item/description',xmlValue)
pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue)
My problem is that the output for item "description" includes not only the actual text but also a lot of style formatting expressions. For example, the first element is:
descriptions[1]
[1] "<p><iframe height=\"322\" src=\"https://www.youtube-nocookie.com/embed/fhESDXnlMa0?rel=0\" frameBorder=\"0\" width=\"572\"></iframe><br />\n<br />\n<p><em>Nescafé</em> is partnering with Facebook to launch an immersive video, pioneering new technology just released for the platform.</p>\n<p>\nThe <em>Nescafé</em> <a class=\"externalLink\" title=\"Opens in a new window: Nescafé on Facebook\" href=\"https://www.facebook.com/Nescafe/videos/vb.203900255471/10156233581755472/?type=2&theater\" target=\"_blank\">‘Good Morning World’ video</a> stars people in kitchens across the world, performing the hit song ‘Don’t Worry’ using spoons, cups, forks and a jar of coffee. Uniquely, viewers can rotate their smartphones through 360˚ to explore the video, the first time this has been possible on Facebook.</p>\n<p>\n“We know young coffee lovers pick up their phone at the start of every day looking to be entertained by real experiences. The 360˚ video allows us to be engaging in an innovative way,” said Carsten Fredholm, Senior Vice President of Nestlé’s Beverage Strategic Business Unit.\n</p>\n<p><em>Nescafé</em> recently teamed up with Google to offer the first virtual reality coffee experience through the <em>Nescafé 360˚</em> app. It also became the first global brand to move its website onto Tumblr, to strengthen connections with younger fans by allowing them to create and share content.</p>\n<p>The Nestlé brand is one of only six globally to partner Facebook for the launch of this technology.</p></p>"
I can think of a regex approach to replace the unwanted character strings. However, is there a way to access the plain text elements of item "description" directly through xpath?
Any help with this issue, is very much appreciated. Thank you.
You can do:
descriptions <- sapply(descriptions, function(x) {
xmlValue(xmlRoot(htmlParse(x)))
}, USE.NAMES=FALSE)
which gives (via cat(stringr::str_wrap(descriptions[[1]], 70)):
In a move that will provide young Europeans increased access to
jobs and training opportunities, Nestlé and the Alliance for YOUth
have joined the European Pact for Youth as founding members. Seven
million people in Europe under the age of 25 are still inactive -
neither in employment, education or training. The European Pact for
Youth, created by European CSR business network CSR Europe and the
European Commission, aims to work together with businesses, youth
organisations, education providers and other stakeholders to reduce
skills gaps and increase youth employability. As part of the Pact, the
Alliance for YOUth will focus on setting up âdual learningâ schemes
across Europe, combining formal education with apprenticeships and on-
the-job training to help match skills with jobs on the market. The
Alliance for YOUth is a group of almost 200 companies mobilised by
Nestlé to help young people in Europe find work. It has pledged to
create 100,000 employability opportunities by 2017 and has already met
half of this target in its first year. Luis Cantarell, Executive Vice
President for Nestlé and co-initiator of the European Pact for Youth,
said: âPromoting a cultural shift to dual learning schemes based on
business-education collaboration is at the heart of Nestléâs youth
employment initiative since its start in 2013. The European Pact for
Youth will help to build a skilled workforce and will tackle youth
unemployment.â Learn more about the European Pact for Youth and read
their press release.
There are \n characters at various points in the resultant text (in almost all the descriptions) but you can gsub those away.
I need to do a pretty complex matching of phrases.
I have large bodies of text in files which exceed 1000 words each.
The phrases I am searching for (searchphrase) are like this:
Investment does not mean:
i. Claims to money that arise solely from:
1. Commercial contracts for the sale of goods or
services by a national or an enterprise of a party
to an enterprise in the territory of the other party,
or
2. The extension of credit in connection with a
commercial transaction, such as trade financing
other than loans or claims to money previously
covered.
I want to know if the phrase occurs in each of the files I have. However, the files will not have content that are exact replicas of the phrase. Instead the file (textfile) will be a large document with a paragraph like:
But investment does not mean claims to money derived solely from
commercial transactions designed exclusively for the sale of goods or
services by a national or legal person in the territory of one
Contracting Party to a national or legal person in the territory of the
other Contracting Party, credits to finance commercial transactions such
as trade financing, and other credits with a duration of less than three
years, as well as credits granted to the State or to a State enterprise.
As you can see, searchphrase is pretty similar in actual meaning to this paragraph from textfile. There is also considerable overlap in the keywords. Hence, I should get a match.
What sort of algorithm should I try and use to code this? Are pre-coded modules available anywhere that do this job?