Automate existing web browser session - accessibility

How can I programmatically interact with an existing web page in a web browser launched in a standard way? For example I navigate to a specific page and want to be able to run a Python script that fills some edits or clicks some elements.
This should be possible at least through IAccessible2 for main browsers, but I did not find any pointers. To put it in another way, how do screen readers do it? And bonus question, is there Python library for it?
EDIT: I am looking for something more than user input simulation. I would like to programmatically read the DOM at least, write if possible. So far I have looked at code in NVDA which is very low-level and complex. Is there anything easier?

How can I programmatically interact with an existing web page in a web browser launched in a standard way? For example I navigate to a specific page and want to be able to run a Python script that fills some edits or clicks some elements.
The answer is keyboard/mouse macros if you have to visually see the browser as it happens. You can google macro programs for your OS.
But you most likely are looking for a headless browser such as PhantomJS, HtmlUnit, TrifleJS, Splash, and SimpleBrowser
Check out - https://saucelabs.com/blog/headless-browser-testing-101
When you mention 'interact with an existing webpage in a web browser launched in the standard way' you are talking about the DOM (Document Object Model).
Many QA environments are running testing scripts on code that has not been rendered by the browser into a DOM (you see the DOM when you inspect a page using your browser tools). When you use a headless browser it creates the DOM and then runs all the tests as if a human were clicking without having to visually look at it happen.
see - https://css-tricks.com/dom/
To put it in another way, how do screen readers do it? And bonus question, is there Python library for it?
Screen readers are interacting with the DOM at a low level. I do not know if there is a Python library. Most likely this would be overkill though unless you are building a desktop app that interacts with browsers like a screen reader does.
edit...
I did some more digging and found this article that is a much more verbose explanation of how screen readers interact with the browser/dom.
Also, there is a python API for manipulating the DOM and this library seemed popular.

Related

is There any way to control programs by finding from task manager and managing contents?

Hello guess my title is bad enough to explain question but I am trying to understand is there any way to control and automate things just finding tasks from task manager? I have seen in Visual Studio "Spy++". Firstly, i didn't understand what it's aim and how far we can go with it. I just got it, it can provide us logs in a cool range.
I would like to give an example,
I want to log in Facebook/Twitter and do casual things with developed software by myself(I don't want to use selenium or any kind of that thing) or I want to get informations from a game which is about characters actual health, attack power, ability power... or giving command that game from my software like, press a,b or 1.
Can someone tell me, exact subject name of what i am talking about?
Terminology: Selenium / AutoIt: "UI automation". Reading and modifying in-game values: "memory editor" or "trainer".
There is no universal way to control programs if you want your tool to be transparent. A browser may listen to OS input events (Windows messages telling it which keys were pressed or where the mouse was clicked), games may use DirectInput and yet other apps may subscribe to low-level system events or hooks.
For example browser automation:
Using plugins/extensions gives you a JavaScript API that allows you to inspect pages, forms on those pages, modify browser behavior and whatnot.
Browsers can also have their own external API. This can be done by linking to their DLLs, or passing command line arguments, or passing messages in other ways. For Firefox, this API is named "Marionette".
Then there's Selenium, that provides a common API for various browsers. It controls them using "drivers".
Selenium "knows" how to drive a browser, as it's coded against the browser's APIs. Spy++ "knows" that it's inspecting a Win32 window and looks for known controls, their classes and their names so you could write another program to send specific messages to those specific controls of those specific applications.
As for "log in to Facebook", no, you cannot do that in a reasonable amount of time for the currently popular browsers if you want to code it from the ground on up.
You'll have to, in one way or the other, interface with the browser and ask for a handle to the username/password textboxes, enter data into them and then submit the form. Then you'll practically be rebuilding Selenium, so why not use that tool in the first place?
Or you'll have to scrape the pixels on the screen, recognize those textboxes, click the mouse there and send some keys. And then Facebook redesigns their login form and you'll have to start over.
tl;dr: use the right tool for the job. If you want to automate a site's UI, then use Selenium.

Recommended Electron App configuration - iframe? webview? local server? other?

I am building my first electron desktop app. It creates a formatted document/book from spreadsheet data to either be printed or made into a PDF. I am trying to figure out the best way to prevent performance loss from reflow/repaint when having a large document(lots of divs). I have found that if I have the book in an iframe then I do not suffer reflows/repaints from UI changes and can control when to have it loaded. If I try to create a PDF however I will only get the amount of the iframe that is visible.
Simply, I'm looking for the best solution to prevent reflow in a complex HTML element while still being able to print it to PDF.
I've found a solution to the problem.
As far as I understand, for an electron desktop app, that is not running a server, you cannot directly access myIframe.contentDocument on elements for actions such as append, innerHTML, or offsetHeight etc. The only thing you can do is contentDocument.write(), however once you have used this method, you have access to all of the other regular DOM methods. The best I can understand is that when you use contentDocument.write() it essentially creates a virtual HTML document. This is my workaround at the moment which is working like a charm in giving me control over what elements reflow and what ones do not.
Hope this helps anyone dealing with the same issue.

using RStudio as an pseudo shiny app

This is a rather general, and curious question.
I am working on a moderately complex shiny app, using custom HTML and Javascript code (with menus and independent dialogs), using shiny as a communication protocol with the base R. Everything looks very nice, until I realised that RStudio itself is a web page (or am I wrong?)
The main reason to design a GUI in a shiny app is it's cross-platform, but it still needs to be opened in a web browser. Conversely, RStudio is also a web page but opens just like any other installed software. To me, it looks like a self-contained web browser with different menus.
Now the question: is it possible to use parts of RStudio in a different "app"?
For example, I would love to separate the code editor and the console from RStudio and use them in conjunction with other HTML and Javascript code to produce a GUI similar to RStudio but with different purposes.
To better explain why: RStudio is fantastic, but it has the one big disadvantage (no flame intended, others think this is a feature) that everything must fit in the same page. In order to make the code editor larger, one needs to shrink other parts of the interface. I would like to make them separate dialogs, creating divs when a menu is selected.
Thanks in advance,
Adrian
engineer from RStudio here. You are correct about RStudio itself being "a web page"; the whole UI is effectively done in HTML. There's even a version of RStudio which already runs in a web browser called RStudio Server.
There are unfortunately no extensibility points to do what you want. RStudio internals are largely anonymized and insulated from external access, which make them difficult to separate, re-use, or connect to other services. Here are a few pointers that may be helpful, however:
As a commenter pointed out, it is now possible to pop out the editor window.
You can make an RStudio Add-in which runs in a separate browser window when invoked. Depending on what you want to do in your separate window, you may be able to accomplish it with an add-in.
If you can't use add-ins, the easiest thing to do is actually to just change RStudio itself. It's an open source project, so you're welcome to hack on it and make improvements in reusability or UI flexibility. We welcome pull requests. :-)

Is it possible to "step through" a browser's applying of CSS rules for web development?

Is there a way or tool that could let me step through the painting of CSS rules, one by one?
Similar as one would do in an IDE with program code, but with CSS. (But I wouldn't preferably want to do it by taking the browser's source code and stepping through its underlying functions - I just mean stepping throug "updates" by CSS rules, in a form similar to a Web Developer Toolbar.)
I expect this is usually more tedious than useful, but in some cases it would really help, in web development, like debugging cats and owls or finding out how a particular effect is achieved.
edit to clarify, by "stepping through" I mean sg. like: potentially stopping the browser from painting another rule, after each end every rule I choose, before the next one is applied (each before the "final paint" of the page is finished), for inspection of what happens.
edit 2 after BoltClock's comment, I replaced the word 'render' with 'paint', to be more clear. Removed original to be uncluttered.
Beside already mentioned webtools i guess this is only possible if the complete source code of the browser is available so its possible to either locally debug or remote debug the browser application itself with breakpoints set to the interesting "toplevel" functions.
It is for example no problem to download the source of the Java based open source browser Lobo which can then be debugged like any other application directly from your IDE like eclipse, intellij etc.
I however dont think the complete source of products like the MS Internet Explorer will ever be fully available to allow you to debug its deepest magic (which in case of MS Internet Explorer probably also takes a livetime...).
So coming back to a browser that has source code available you can either:
Have the browser beeing compiled/ run inside a IDE and directly debug your local code
Have the browser running as application allowing remote debugging and the according source code as source for a remote debugger (mostly as well from within your IDE).
This way you can analyse the deep magic of such a browser where you see how the different resources like images, css etc. etc. are collected, validated, parsed, processed and in the end displayed.
Once the interesting functions are located and a good set of (conditional) breaktpoints is set this could be very useful when it comes to the behaviour of a specific browser.
If that however is too detailed for your context i guess there is no other possibility but to rely on the already given functionality regarding analysing the browsers behaviour like with chromes devTools or the Mozilla plugin Firebug. No doubt this will more and more be integrated in such plugins/ tools like the comment of user BoltClock suggests and it is always worthy to study the functionality of such plugins/ tools to take the biggest possible advance of them.

Showing a form from a webpage

I have a problem I am trying to solve in an elegant manner. I have a .net application that I have created. I am trying to get one of the forms to be shown from a webpage. This sounds strange I'll admit, so here is the backstory
We have some large monitors at work, that show information on them. I have no control over how the information is displayed. Currently they are just using a browser and tabbing in the browser to show each different piece of information on the screen. Most of the info they show is just standard html stuff, text and images.
Now along comes my winforms application. The part of the application I need to show is a graphical display. Everything on this display is drawn using GDI+, if that matters. I need to get this form into a format that I can show. Below is my own solution, but I am pretty sure this is not the best method, but it may be the only method I can use
Create a console application. The application would do the following
1. Run as a service on a server
2. Create the display in memory, and save it to a bitmap every so ofter
3. Save the bitmap to a location on the network.
4. have an HTML file that links the image that can be shown in the browser
I though about doing something with the clients, however the clients are not always up, so I could have periods where the image wouldnt be updated.
I also was thinking about an ASP.net solution, but that would require me to learn ASP.net, and I am not quite ready to take that challenge
In IE you can host a winforms app/control as an ActiveX control, like so:
<object id="DateTimePicker" height="31" width="177"
classid="bin/Web.Controls.DateTime.dll#Web.Controls.DateTime.DateTimePicker" VIEWASTEXT>
</object>
See this article for more information: http://www.codeproject.com/KB/miscctrl/htmlwincontrol.aspx
Now, I'm not claiming that this is any more elegant than your solution, but it is an alternative.
I think using Asp.Net to serve a dynamic image using a HttpHandler would be the best approach, but depending on your skills and time this may not be an option. Here is a nice tutorial: http://www.codeguru.com/columns/dotnet/article.php/c11013
IMHO The best way to build this would be as a browser plug-in, like how Flash works. Microsoft has created a plug-in framework called SpicIE, that allows you develop managed plug-ins for IE. This is probably your best bet.
The old unmanaged way is to build out your WinForms dll app and then package it in a signed cab file, and then reference that cab file with an HTML object tag (codebase arg is the one you need).
i.e.,
document.write("<object CLASSID='clsid:DC187740-46A9-11D5-A815-00B0D0428C0C' CODEBASE='/MyFormsApp/MyFormsApp.cab#Version=1,00,0000' />");
The first time the user hits the page they will be asked to allow for the installer to load its payload (dll's). Once they do, they will have a fully fledged WinForms desktop APP running through a browser window.
I took the easy route on this one. I created a small winform app, that coverts the GDI objects to a bitmap, and then I save the bitmap to a network share. This file is refenced in a simple HTML file that is displayed on the monitor.
I chose the winform app, because it makes it really easy for me to set this up in task manager, and run it every 10 minutes to update.

Resources