What's the best way to scrape Spectrum community data? - web-scraping

I'd like to scrape Spectrum channel data. And I was thinking about using scraping tools like Puppeteer to login as a user and scrape a certain channel. One problem I can think of is I may not have access to the history of all chat logs before I join. How do you solve this problem?

Create a user on that site and then use those credentials in the scrape code. It will have all access required. I used HtmlAgilityPack which I find simply great for this.
https://www.nuget.org/packages/HtmlAgilityPack/

Related

I am a app developer and I am unable access Create Passenger Name Record API

I have sent couple of emails to support team for become a sabre customer, I have submitted the application to get the access at following link.
https://www.sabretravelnetwork.com/home/solutions/travel_agency/contract_selector/without_arc2
Pls let us know if I am missing anything?
Thanks
Access to the PNR (Passenger Name Records) requires a contract with Sabre. They only give this access to travel agents or companies writing services for travel. There is also associated fees. Also you need to be aware there are costs for every PNR you create. So its not as easy as just getting access to the PNR.
I know this is not the answer you want but its how it works.
If your just trying to build out a small booking engine I would suggest getting into Expedia's API toolkit. Much easier and allot less expensive to get into.

Is there a good way to link registered users' emails with data in google analytics?

If I build a website for my new awesome mobile app (or web service or whatever) I might want to do a slow launch, sending email invites to the first x people to register on the site.
Is there a good way to link each registered email to the corresponding data in google analytics (or any similar service), and query them based on location, language, etc.?
Maybe the spanish version isn't quite done yet, so I don't want to invite people who used a spanish browser to sign up. Or maybe my app is location-dependent (like time tables for buses) and just doesn't work at all outside of my home town.
I really want to have a simple email-only "registration".
It is completely possible, although it may breach some of GA's terms of use if done wrong.
You should not store email addresses in any way as part of your GA data because it would be considered personally identifiable data. However, there is nothing saying that you couldn't store a kind of GUID for each user, and then compare that with email addresses offline - although the user should be made aware that any actions they take while using your service/application/whatever are being tracked with the capability of being personally identified.
As far as getting the actual data that you are discussing, language and location are stored by GA by default, so no headache there!
The best way to store the user's GUID would probably be in a custom dimension. How you do this is going to depend on how you build your product. I had to write a tracking library using the measurement protocol for an AS3 project awhile back because there isn't an AS3 library that is supported anymore. If you are using JavaScript, it will be much easier, as Google offers native JS libraries to handle web analytics.
Finally, try taking a look at the documentation. Its pretty easy to understand

Retrieving Linkedin Group discussion posts

I am making an application in which i have to make connnect to linkedin and after connecting i've to retrieve group discussion posts and all other information that i can retrieve related to group
I dont know where to start :(
I have APIkey and SecretKey
Can anyone provide the sample code
Regards
I dont know where to start
The documentation of the API you are trying to use is generally a good place to start.
Here's how I would proceed if I was at your place:
Head over to the documentation
Read it carefully
Start designing a POC (proof-of-concept). Could be a simple console application in which you attempt to consume the API.
Once you have a working POC integrate it into your actual application
If in between those steps you encounter some specific problems you shouldn't hesitate to show your progress here and ask some real question instead of give-me-the-code type of questions.

Script or Library to find contact means on a website

Does anyone know a script/recipe/library to find most relevant contact information on a website?
Some possible case:
Find contact phone number on a personal web page
Find owner email address on a blog
Find url of the contact page
Check out WSO2's Mashup Server. You can run it on your local machine and follow the tutorial for scraping. You could pass the dynamic parameters you need into the <http> element of the scraper to loop through multiple sites running the same scrape, then push everything to a collection source (AJAX application for capturing the information or store inside WSO2 server). You can write very complex search patterns using XPath and XSLT to capture the information you want.
I don't have enough information about the specific sites you are scraping to help with the script, but any way you go, it's going to take a lot of trial and error until you get the result you are looking for.
Happy scraping!
I'm not aware of any libraries that do this.
Hm, I would use regular expressions to match for phone numbers and email addresses, combined with a web spider that walks the site, and then a method for ranking the contact information.
Typically contact information will also be partnered with one of a few common labels such as "Support", "Support email", "Sales", etc. There's probably a dozen or so versions of this that will cover 95% of all sites in English.
So, basically I would start by building a simple recursive web spider that walks all the publicly accessible pages in a given domain, parsing the HTML for email addresses and phone numbers, and making a list of them, and then ranking them based on whether or not they are listed near to any of the common labels.
It won't be perfect, but then again, that's part of the value of the algorithm - making it smarter, and tweaking it over time until it gets better.

Membership bulk email software

We have a Microsoft web stack web site, we have a members database.
We want to start doing mass-emails to our (opted in) membership.
I don't particularly want to re-invent the wheel with a system for having a web form submit and then mass send emails looping through thousands of records without timing out and overloading the server...
I guess I'm looking for something like Mailman, that can run on a windows (ASP/ASP.NET + SQL Server) backend and do the work for me, with suitable APIs to manage the subscriber list and send emails to that list.
Suggestions please?
I agree with acrosman, third parties that host email lists are a good way to go. A very reliable site I've found for mass emailing is http://mailing-list-services.com/. They do a good job to make sure their servers are never black listed or marked as spam. I've used them a few times, their website design blows, but their service is awsome. The Lyris Listmanager software they use has a pretty extensive API.
Advanced Intellect has some great tools, like aspNetEmail and ListNanny.
MaxBulkMailer might be a solution for you? The organisation I work for uses it to connect to www.authsmtp.com which gives us credits for a certain number of e-mails that we can send per month. You can import a spreadsheet of your mailing list or tap straight into a SQL server and pull the names and addresses. Available for Mac and Windows.
(not a sales pitch)
my company offers mail manager, but it's a hosted service. It has a full API though.
You can also check out how DotNetNuke does this
Unless your running a business that specializes in email, I'd suggest you find a hosted solution. There are 100's of little issues that come up when you run your own service over time. A hosted solution can save you lots of time and effort (and therefore money).

Resources