RSS Google news language - rss

I am creating a RSS feed from Google News and it's working so far, but I'd like to get news from 2 languages, not just English
This is my RSS URL so far:
https://news.google.com/rss/search?q=energy+efficiency
It's working fine, just need to add the 2 languages filter (German + English)
This is what I've found in different blogs, but I do not wish to filter the news by location, just by language:
"If you wish to have news in English and located from the United States sources, add the following query string to the URL to change country and language:"
&hl=en-US&gl=US&ceid=US:en
No matter how I modify the above URL, I get an error...

After reading lots of posts and playing around, I found the solution.
In case someone needs it:
https://news.google.com/rss?q=energy+efficiency&hl=en
Add the language code at the end of the link:
&hl=en //English
&hl=de //German

Related

Scraping the gender of clothing items

Looking for advice please on methods to scrape the gender of clothing items on a website that doesn't specify the gender on the product page.
The website I'm crawling is www.very.co.uk and an example of a product page would be this - https://www.very.co.uk/berghaus-combust-reflect-long-jacket-red/1600352465.prd
Looking at that page, there looks to be no easy way to create a script that could identify this item as womenswear. Other websites might have breadcrumbs to use, or the gender might be in the title / URL but this has nothing.
As I'm using scrapy, with the crawl template and Rules to build a hierarchy of links to scrape, I was wondering if it's possible to pass a variable in one of the rules or the starting_URL to identify all items scraped following this rule / starting URL would have a variable as womenswear? I can then feed this variable into a method / loader statement to tag the item as womenswear before putting it into a database.
If not, would anyone have any other ideas on how to categorise this item as womenswear. I saw an example where you could use an excel spreadsheet to create the start_urls and in that excel spreadsheet tag each row as womenswear, mens etc. However, I feel this method might cause issues further down the line and would prefer to avoid it if possible. I'll spare the details of why I think this would be problematic unless anyone asks.
Thanks in advance
There does seem to be a breadcrumb in your example, however for an alternative you can usually check the page source by simply searching your term - maybe there's some embedded javascript/json that can be extract?
Here you can see some javascript for subcategory that indicates that it's a "womens_everyday_sports_jacket".
You can parse it quite easily with some regex:
re.findall('subcategory: "(.+?)"', response.body_as_unicode())
# womens_everyday_sports_jacket

RSS feed for BlogSpot

Is it possible to get RSS feed for BlogSpot for specific keywords?
I have tried with the below URLs but they do not seem to be working.
Atom 1.0: https://blogname.blogspot.com/feeds/posts/default/-/[label]
RSS 2.0: https://blogname.blogspot.com/feeds/posts/default/-/[label]?alt=rss
For keyword-specific feeds, use the following endpoint
https://www.yourblogname.blogspot.com/feeds/posts/default?q=KEYWORD
https://www.blogger.com/feeds/BLOGID/posts/default?q=KEYWORD
The keyword will need to be passed as a query string to the q query parameter.
Be sure to enable blog feed
Go to Settings > Others > Site Feed > Allow Blog Feed then select Full
Blogger labels are case sensitive, It will treat Food differently from food
An example: https://fordemos.blogspot.com/feeds/posts/default/-/Food?alt=rss

URL format for Google News RSS feed

Google deprecated the old RSS feed URL format December 1st 2017 (deprecation notice), in addition to that they dropped the button in the Google News interface to generate a RSS URL (news mentioning this change).
This means that there is no public or documented method of generating a new RSS link. The only documentation they have is out of date since they changed the interface.
What is the new format for generating a RSS feed for a Google News topic?
Found an up-to-date library (1) that uses Google News RSS.
The URL new format seems to be:
Top news:
https://news.google.com/news/rss
By major topic:
https://news.google.com/news/rss/headlines/section/topic/{topic}
Where {topic} is one of the following values: WORLD NATION BUSINESS TECHNOLOGY ENTERTAINMENT SPORTS SCIENCE HEALTH
By any/custom topic:
Once at https://news.google.com, browse to the desired topic, for example this. Identify the topic ID in its URL, e.g. CAAqIQgKIhtDQkFTRGdvSUwyMHZNR056T1hFU0FtVnVLQUFQAQ, and use the format:
https://news.google.com/rss/topics/{id}?hl={lang}
In the format above, essentially rss/ is added after https://news.google.com/.
By geolocation:
https://news.google.com/news/rss/headlines/section/geo/{location}
Not sure about the formatting for the {location} parameter
By search query:
New link: https://news.google.com/rss/search?q={query}
Old link: https://news.google.com/news/rss/search/section/q/{query}
Where the {query} parameter is a free text search
Specifying country and language:
For example if you wish to have news in Swedish and located from Swedish sources, add the following query string to the URL to change country and language to sv-SE:
?hl=sv&gl=SE&ceid=SE%3Asv
Requests to the Geo endpoint seem to be working again.
e.g. https://news.google.com/news/rss/headlines/section/geo/{place_name}
Also, if you use the non-geo search, you can specify a 7-day window by adding +when:7d to your search.
e.g. https://news.google.com/rss/search?q={key_words}+when:7d
This options isn't valid anymore:
https://news.google.com/news/rss/headlines/section/topic/{topic}
produces Error 500.
This seems to work:
https://news.google.com/news?cf=all&hl=en&pz=1&ned=us&q=astronomy&output=rss
The Geolocation mentioned above still works too. You can also specify city and state:
https://news.google.com/news/rss/headlines/section/geo/DenverCo
Updated Google RSS News Feed
You can try this as well.
https://news.google.com/rss?hl=en-NG&gl=NG&ceid=NG:en
I was also looking for documentation. This is the best article I found.
https://blog.newscatcherapi.com/google-news-rss/
If using search you can also specify not to include articles with a certain keyword.
e.g. if I wanted to search for pages that contain the word "apple" with out the word "pie" you can specify
q=apple%20-pie
or in full
https://news.google.com/rss/search?q=apple%20-pie&hl=en-GB&gl=UK&ceid=GB:en
The RSS feed for top stories is the simplest one. Just append RSS to the https://news.google.com and you get the RSS feed of the top stories of your location.
https://news.google.com/rss
Know more

Wordpress feed - (A feed could not be found at)

I had set up two Wordpress development sites, URL's were something like mysite.com/dev/blog1 and mysite.com/dev/blog2. blog2 would fetch posts from categories on blog1 using fetch_feed() e.g. fetch_feed(mysite.com/dev/blog1/category/fun/feed) and everything worked fine.
However, since moving the sites over to mysite.com/blog1 and mysite.com/blog2, the feed does not work. I get the following error:
A feed could not be found at mysite.com/blog1/category/fun/feed. A feed with an invalid mime type may fall victim to this error, or SimplePie was unable to auto-discover it.. Use force_feed() if you are certain this URL is a real feed." } } ["error_data"]=> array(0) { } }
When I go to the feed URL though, the feed does exist.
Any ideas?
The fetch_feed() function is still trying to look at the dev. Highlighted below:
"A feed could not be found at mysite.com/dev/blog1/category/fun/feed. A feed with an invalid mime type may fall vic....."
You will need to remove the dev part from the url.
Had the same problem. I've tried different Wordpress RSS plugins and got the same result: "RSS Error: A feed could not be found at ..."
(For almost all RSS feeds except for feeds from feedburner!)
I contacted our provider / web hosting service, they "unlocked" the domain of the rss feed within a minute and now it works perfectly!
If you're using a webhoster, maybe you need to contact their support for help.

Get Comments posted on a page on facebook

I am trying to learn using Facebook SDK using ASP.net I have achieved till now is to get the friend list, athetes pages I like, music i like. Now I amtrying to get the comments that are posted on my time line. What keyword I should use for fetching the comments. I am using following line for fetching friends,favourite athlete, music etc. How do I fetch comments .
dynamic me = fb.Get("me?fields=friends,name,email,favorite_athletes,music,likes");

Resources