Webscraping through Office Scripts without API, using getElementByID, etc

Webscraping through Office Scripts without API, using getElementByID, etc - web-scraping

Please assist:
I'm familiar with VBA and C++, but not with Java. Now wanting to delve into Office Scripts.
However, I want to know if I can achieve the same as in VBA:
I am logging into niche websites and fetching data in tables using VBA Internet Controls (getElementByID()), etc.
As far as I know, these niche websites do not have an API, as the sample scenario of webscraping on the Microsoft website does:
https://learn.microsoft.com/en-us/office/dev/scripts/resources/scenarios/noaa-data-fetch
I would like to know if I can log onto these websites, and then fetch information using HTML (getElementByID()) or similar?
I am just unsure if I can use Office Scripts directly, or if I require to include some library or something.
Any guidance would be appreciated.

Currently, there is no way to do this through Office Scripts alone. The fetch command and REST APIs are the only ways to get data in a script directly from webservices. If you'd like to request the addition of a specific library, please use the Send feedback button in the Office Scripts Code Editor.
The discussion in the comments about using Power Automate is a reasonable path to pursue. The linked video (https://www.youtube.com/watch?v=_O9eEotCT0U) is a good place to start.

Related

How to develop a web browser using c# .net using installed.net libraries and without using the web browser control?

I have searched the internet for 2 days having found no answer to the below requirement. What i found most were GeckoFX and CefSharp which are external packages and not installed libraries. How can this be done?
I have been asked to do the following:
Use a suitable library function out of the set of libraries installed with the .NET platform. You must not use the C# WebBrowser class but perform the required HTTP-level communication directly from within your code. The code must clearly identify the HTTP-level client-server communication and must explicitly manage Home page, Favourite, History Lists and Tabs.
Optionally, you may add functionality to render a web page, but there must be an option to disable this functionality and to show only the raw HTML that has been retrieved.
Thanks

What have you attempted so far and what problem are you encountering?
Maybe read this first :)
Currently it sounds like you have been given an interview or homework task that you dont know how to solve. If so, then you should have some idea on where to start or you are in the wrong course or job interview. If you want help, then try to solve the question yourself and ask for help when you are stuck. Tell us what you have tried, show the code you currently have and let us know where you are stuck or what doesnt work as expected.
Where are you stuck? Fetching the webpage? Building the user interface?

Getting started with databases

I am currently a front end developer. I know HTML and CSS pretty well, I'm OK with jQuery and know some Vanilla JS. I have an idea for a website I want to create where I will be storing data for products (data that I will be grabbing from various websites around the web). It's basically a help me choose application where the user will go through some steps and be given some choices based on their selections. This site is nothing new, but it's more for learning purposes/portfolio work.
Most of my co-workers use ASP.NET and I've seen that you can setup a website like this using ASP.NET and the provided server controls along with C#, however, I want to take another route that allows me to do the same thing NOT using ASP.NET (C# is OK and preferred if that is possible) in that I can grab data, store data, and bind data to my page.
In addition to this, I would like to do this on the Mac.
Here's a list of things I have looked at:
MongoDB (I was really confused by the setup and didn't read anywhere that this would definitely be the solution).
AngularJS
EmberJS
BackboneJS
Several other JS frameworks
Ruby on Rails
Note about the above: Some of the above might be the solution, but I don't want to start spending time learning them only to realize a week in that this is not going to help me get to my goal.
If this post would be better suited for another stack site please let me know. Thank you.

To create a basic website with a persistence you'll need to deal with three parts the front-end (client), back-end (server) and the persistence (database). Of the things that you've listed Angular, Ember and Backbone are all front-end frameworks. They each have their own way of approaching the issue but they all work in the client facing part of the project so views, interaction and dispatching data to the backend. Rails is the only thing that you've listed that's a back-end framework, another option for the back-end if you're more familiar with JS might be Node and Express. Node allows you to build a server in JS and Express is one of the more popular Node frameworks. That section will be responsible for getting the calls for data and calls with data from the front-end and dispatching the appropriate response. Rails typically works with with a SQL database like MySQL or PostGres out of the box because Rails' active record is meant to work with SQL. Mongo is a NoSQL database and I think people are getting it working with Rails but I don't know that it's highly common. Mongo's shell is pretty much javascript and it stores data as JSON (not technically but close enough) so it's been a comfortable choice for JS developers learning back-end. Either Rails or Node can get a server up and running locally on your machine so you can work with the full architecture. So what it comes down to really is picking one of each from those sections and making them play nicely together. For your purposes I would think that the way to go would be either a basic Rails app (probably with MySQL) and using jQuery ajax calls to manage some calls from the front-end or building something with the so-called MEAN stack (Mongo, Express, Angular, Node) which is all JS and using Angulars built in http functionality to handle those calls. Hope this at least narrowed the field of research a bit. Really thats a pretty open question and there are a lot of options.

What is your webhosting site? I suggestPhpmyadmin Or Mysqldatabase You can create tables and strings where you can put the websites you wanna "grab" data from and put a little javascript in there to tell your website if blahblahnlah =blahblahblah then get id="website1"

Some clarifications:
At first, you need to distinguish between a server side language (used to program the functionality) and a database (used to store data).
C# is a language of the .net framework. Regarding websites, there's no C# without ASP.net.
There are two major groups for realizing back end solutions: PHP (market share ~ 40%) and ASP (~ 25%). PHP is a programming language, ASP.net incorporates several programming languages (mainly C# and VB.net).
Both worlds are able to connect to databases: For PHP, this is mainly MySQL, for ASP.net it is mostly Microsoft SQL server.

How to collect contact information from websites?

Does anyone know a web crawler tool for collecting contact details from a website? Say I have a www.website/contact.. I want to pull out the address, phone number, etc.. There are 2 tools I've been looking at: cralwer4j opensource jar for java and Scrapy opensource in Python. But I am finding it a bit hard to use for my scenario.
Any suggestions would be great. Thanks

You might google for "simple web crawler" to find a solution that fits you best. In the net there are plenty "pure python" based web crawlers. Based on sceleton code you add db wrap up. I think the most problem would be db setting and saving data in it.
What if there are 1000000s of websites to crawl.. Is there a way to crawl all websites in my are?
No problem for scripting. Just put millions addresses in a file (or files), open it for reading in python or other script. Then get link by link from it and crawl/scrape to your pleasure. Result you might also want to save in file (csv, json).
I'd also recommend you a ready simple python crawler.

Build an Offline website - Burn it on a CD

I need to build a website that can be downloaded to a CD.
I'd like to use some CMS (wordpress,Kentico, MojoPortal) to setup my site, and then download it to a cd.
There are many program that know how to download a website to a local drive, but how to make the search work is beyond my understanding.
Any idea???
The project is supposed to be an index of Local community services, for communities without proper internet connection.

If you need to make something that can be viewed from a CD, the best approach is to use only HTML.
WordPress, for example, needs Apache and MySQL to run. And although somebody can "install" the website on his own computer if you supply the content via a CD, most of your users will not be knowledgeable enough to do this task.

Assuming you are just after the content of the site .. in general you should be able to find a tool to "crawl" or mirror most sites and create an offline version that can be burned on a CD (for example, using wget).
This will not produce offline versions of application functionality like search or login, so you would need to design your site with those limitations in mind.
For example:
Make sure your site can be fully navigated without JavaScript (most "crawl" tools will discover pages by following links in the html and will have limited or no JavaScript support).
Include some pages which are directory listings of resources on the site (rather than relying on a search).
Possibly implement your search using a client-side technology like JavaScript that would work offline as well.
Use relative html links for images/javascript, and between pages. The tool you use to create the offline version of the site should ideally be able to rewrite/correct internal links for the site, but it would be best to minimise any need to do so.
Another approach you could consider is distributing using a clientside wiki format, such as TiddlyWiki.
Blurb from the TiddlyWiki site:
TiddlyWiki allows anyone to create personal SelfContained hypertext
documents that can be published to a WebServer, sent by email,
stored in a DropBox or kept on a USB thumb drive to make a WikiOnAStick.

I think you need to clarify what you would like be downloaded to the CD. As Stennie said, you could download the content and anything else you would need to create the site either with a "crawler" or TiddlyWiki, but otherwise I think what you're wanting to develop is actually an application, in which case you would need to do more development than what standard CMS packages would provide. I'm not happy to, but would suggest you look into something like the SalesForce platform. Its a cloud based platform that may facilitate what you're really working towards.

You could create the working CMS on a small web/db server image using VirtualBox and put the virtual disk in a downloadable place. The end user would need the VirtualBox client (free!) and the downloaded virtual disk, but you could configure it to run with minimal effort for the creation, deployment and running phases.

What language can be used to automate web test?

I'm working on a Web Service which is associated with a form that requires inputs from user. The problem is the fields are quite large, approximately 200 fields need to be filled in. I would like to ask what language would serve best in this case, in term of automating inputs.
I tried Ruby with Watir, but it doesn't work as expected when dealing with iframe. So I'm looking for an alternative solution. Any feedback or suggestion would be greatly appreciated. My WS is developed using ASP.NET and Java Script.
Thanks,
Chan

Selenium may provide the functionality you're looking for. It is a testing framework that supports recording tests, so writing scripts is optional. It does however provide scripting functionality in a variety of languages, including Java, C#, Ruby, Python and more.

Have you tried WatiN? It is an open source automated test framework for Web application. And it support C# and other managed languages.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Webscraping through Office Scripts without API, using getElementByID, etc - web-scraping

Related

How to develop a web browser using c# .net using installed.net libraries and without using the web browser control?

Getting started with databases

How to collect contact information from websites?

Build an Offline website - Burn it on a CD

What language can be used to automate web test?

Categories

Resources