for an assignment I have I need to count the number of news on different stocks and compare it by graph, this is the link to the site and I attached an example for one of the stock where you can see 49 times for this specific stock,
what will be the best way to attack it -using what type of package-bs4 or another one ? what is the best approach here ?
url='https://www.calcalist.co.il/stocks/home/0,7340,L-3959-1102532--4,00.html'
Related
I'm trying to get rankings data for NFT collections sorted by their highest all-time volume. It seems that currently the opensea API does not support ranked lists as an endpoint. As a workaround, I'm looking at web-scraping to fetch the all-time volume rankings information using https://opensea.io/rankings?sortBy=total_volume.
However, I am having difficultly fetching data for any entry in the rankings list past a 100 items - i.e. page 2 of the rankings and onwards. The opensea url does not explicitly change when I click on the list of ranks at the bottom of the page (101-201).
Any ideas on how I could automate web scraping for ranks past the first 100 entries?
I'd appreciate any help here. And thanks for your help in advance!
Did you check out this library that does scraping for you under the hood? I have tested some endpoints and it appears to return data. Please check out: https://github.com/dcts/opensea-scraper
I am trying to learn about web-scraping and as an application I figured I'd build an aggregator that crawls retailers for certain products and sets up a price comparison for the same product from different retailers.
As I got started on this I realized exactly how large a tasks this is.
First, I need to crawl sites that have various formats for not only their DOM structures but also slightly different names for the same products and formats for item's prices and prices for items on sale.
Second, After I've somehow decoded the DOM for x number of sites (doing it for one or two is easy but I want to make the crawler scalable!) and fetched the data for various items. I need to be able to compare the different names of same products so I can compare the differing prices (convert them to the same currency, check if the returned price is the original/on-sale price, etc...) between retailers.
I am trying to write my crawlers using Scrapy but can someone recommend an approach for how to adapt the crawler for a variety of retailers and if there are any libraries/approaches that would work well for the second problem of comparing like(unlike) items?
For comparison you can convert strings of product names to lists, compare them and put a threshold to determine whether two products are same or not.
I'm trying to get all unique visitors for a selected time period, but I want to filter them by date on the server. However, the sum of unique visitors for each day isn't the number of unique visitors for the time period.
For example:
Monday: 2 unique visitors
Tuesday: 3 unique visitors
The unique visitors for the two days period isn't necessarily 5.
Is there a way to get the results I want using the Google Analytics API (v3)?
You're right that Users aren't additive, so you can't simply add them day by day. There are several ways around this.
The fist and most obvious is that if you've implemented the User-ID you should be able to straight up pull and interrogate the data about which users saw your site on which days.
Another way I've implemented before is to dynamically pull the number of Users from the Google Analytics API whenever you need it. Obviously this only works if you're populating a live web dashboard or similar, but since it's just the one figure you're asking for, it wouldn't slow down the load time by much. Eg. if you're using a dashboarding tool such as Klipfolio, you may be able to define a dynamic data source, and query Google whenever you needthe figure (https://support.klipfolio.com/hc/en-us/articles/216183237-BETA-Working-with-dynamic-data-sources)
You could also limit the number of ways that the data can be interrogated, and calculate all of them. For example, if you only allow users to look at data month-by-month or day-by-day, then you only need those figures.
Finally, you can estimate the figure with reasonable accuracy by splitting it into two parts. New Users are equal to New Sessions (you're only new on your first Session), which is additive, so that figure can be separated out and combined as required.
Then, you could take a rough ratio of new to returning Users (% New Users) from, say, 1 year of data, and use that with the New Users figure to generate an average on any level.
I'm trying to use searchTwitter() to find certain topic on twitter. For example:
searchTwitter("#Fast and Furious 7", n = 10000)
can only give me a few thousand results. I have also done some research on other topics. It seems that by looking at the date from the result it can only return the result from 9 days before (There are arguments called since and until which are used to specify time range. But they don't work).
So I'm thinking is there a way to get information for all of this topic? (Or at least I can take control date range).
Apart from this. Can I use xml in R to achieve the same purpose?
Twitter provides search for the last few days only.
The cost of keeping the data indexed is too high, given the few users interested. Twitter's business model is live information.
If you want historical data, you will have to buy this from third party providers. I don't remember the name, but a company offering such data was linked from the Twitter web page where they explained this limitation of their search API.
Our website has many related products from a large number of different brands
I'd like to try and use Google Analytics data to find out, on average, how many different brands - which I have set as a hit level custom dimension - a user will look at over a given time period
I'm not sure if this is possible, but it would be really cool to know!
Helps with understanding brand loyalty/defection
I started by creating a simple custom report to see how many manufacturers get seen by how many users, but I don't know where to go next with this question in order to answer it!