using the chrome console to select out data - console

I'm looking to pull out all of the companies from this page (https://angel.co/finder#AL_claimed=true&AL_LocationTag=1849&render_tags=1) in plain text. I saw someone use the Chrome Developer Tools console to do this and was wondering if anyone could point me in the right direction?
TLDR; How do I use Chrome console to select and pull out some data from a URL?

Note: since jQuery is available on this page, I'll just go ahead and use it.
First of all, we need to select elements that we want, e.g. names of the companies. These are being kept on the list with ID startups_content, inside elements with class items in a field with class name. Therefore, selector for these can look like this:
$('#startups_content .items .name a')
As a result, we will get bunch of HTMLElements. Since we want a plain text we need to extract it from these HTMLElements by doing:
.map(function(idx, item){ return $(item).text(); }).toArray()
Which gives us an array of company names. However, lets make a single plain text list out of it:
.join('\n')
Connecting all the steps above we get:
$('#startups_content .items .name a').map(function(idx, item){ return $(item).text(); }).toArray().join('\n');
which should be executed in the DevTools console.
If you need some other data, e.g. company URLs, just follow the same steps as described above doing appropriate changes.

Related

Extract number data from HTML with RobotFramework

I need to extract a number from an HTML page and convert it into a variable in my test case.
The problem is that there is no ID directly to this element, here is the HTML code, I want to get the 54 (that number can change that's why I need to identificate him with another way), I tried Get Text by using "resultat" but I get "54 ligne(s) trouvée(s)" but I only want "54":
<div class="tab-interpage> == $0
<div class="resultat">
<b>54</b>
ligne(s) trouvée(s)
</div>
...
You have other options how to locate an element, see Locating elements section in Selenium Library.
This might be a situation that requires xPath, I can imagine this one works (but I don't see the whole DOM, so I can't be 100 % sure):
//div[#class="resultat"]/b
combined with the keyword:
${var}= Get Text //div[#class="resultat"]/b
Obviously if there're more div elements with class "resultat", you might run into problems here. In this case, explore the DOM a bit more and see what are some other ways you can get to the element you need.
I think it'd be much more readable if the HTML elements had proper attributes like:
form with class attribute
unique ids usually work best

How can I automate data extraction in this website? (can't see query options, if any)

I want to extract election results from this website (currently all data is at zero because voting is ongoing). In the menu you can select the data you want (I need "En Chile" -> "División Geográfica" -> "Comunas"). This is the voting results for each municipality ("comuna"). When you select the desired "comuna", then you can click the excel file and results are downloaded.
The "problem" is that no url is shown along the process. I cannot see the url link to the particular "comuna" I need. I was hoping to get a specific link so then I can automate using wget with the name of each municipality. Instead, the data seems to be masked. I know you can extract JSON data, perhaps using the properties shown in the website's HTML code:
<select class="form-control" id="selComunas" ng-model="comuSelected"
ng-options="item.d for item in comunas" name="comunas"
ng-show="(vistaVertical==='G' || vistaVertical==='E') && subMenu === null"
ng-change="updateComuna()">
<option value="">Comunas...</option>
</select>
But where to make the call? No idea. Also, I see no link to the excel file in the HTML. Stuff seems to be masked inside "ng" elements, which seems to be AngularJS, as far as I can read online. No idea how to proceed. As you can see, I'm noob on this. Any help is more than welcome.
Open chrome dev tools and look for something useful there while clicking on the UI
http://www.servelelecciones.cl/data/elecciones_constitucion/filters/comunas/all.json
This link returns all communas with their ids
[{"c":2564,"d":"ALGARROBO"},{"c":2801,"d":"ALHUE"},{"c":2674,"d":"ALTO BIOBIO"},...
Next thing, when we select value from dropdown, we get another link in network tab. Number in the end matches, isn't it?
http://www.servelelecciones.cl/data/elecciones_constitucion/computo/comunas/2570.json
"d":"Integer",
"e":"Integer",
"f":"Integer",
"sd":null
},
"data":[
{
"a":"Apruebo",
"b":null,
"c":"0",
"d":"0,00%",
"e":null,
"f":"",
"sd":null
},
{
"a":"Rechazo",
"b":null,
"c":"0",
"d":"0,00%",
"e":null,
"f":"",
"sd":null
}
From this point you can process JSON directly by whatever programming language you use

Qt: how to save QTextEdit contents with custom properties

I have a text editor (QTextEdit). Some words in my editor contains additional information attached (namely, two corresponding integer positions in wave file for that word).
They are stored in Python object as custom properties for QTextCharFormat objects (I attach them with code like this: self.editor.textCursor().setCharFormat(QTextCharFormat().setProperty(MyPropertyID, myWordAttachment) )
Unfortunately, if I save my document to html, all of that additional formatting is lost.
So, I want to perform simplest task: to save my document with all of it's formatting,including myWordAttachment (and to load it from disk).
Am I right that Qt5 doesn't have something ready for it, and I have to write all that document's serialization code by myself? (I still hope that where is simple function that did the job)
1.you loop your text every character.
2.and you catch the character and its charFormat()
3.and you get the properties.
4.Because the properties are eventually a value of something, int,str,...
So you get the properties by charFormat().property(1),(2),(3)... or properties()
5.The most important thing is the character's position & the range.You get the position during the 1th loop.
6.When you catch the CharFormats, you insert into something hashable object like list.
& and you don't forget to insert the CharFormats position.
6.you save your document and the position & properties.
My suggestion for your solution.
1.you can get characterCount() by the QTextDocument object.
2.you loop the range of the characterCount()
3.Before doing it, you make a QTextCursor object.
4.you set the textcursor at the first position.(movePosition method & Start moveoperation & KeepAnchor flag)
5.you move the cursor to right one character & Another.
6.you check the character's charFormat() by tc.charFormat() and the tc.position()
7.But it is the time to Think twice. CharFormat is always the bunch of characters.
you probably get some characters of the same CharFormat().
You can prepare for it.I can Think about some way,but... you should set the QCharFormat objectType or propertyId() for specifing the QCharFormat in Advance(during editing your document).Why don't you set the texts into the properties for after saving & loading.I hope you manage to pass here during debugging & tring.
8.if you get a charFormat,and you check the objectType().
9.if the objectType() is the same as Before searched, you pass the search engine without doing anything.
10.The second important thing is that calls clearSelection() each searching.
11.You save your document() as it is html strings.and you save the charFormats() properties.
12.when you load your document(),the html sentence comes back.
and load the properties.
you make QTextCursor and setPosition( the property's position saved in advance.)
you move QTextCursor until the position and you select the target texts.
you adopt the charFormat properties again and the end.
Summary
The important thing how you specify the charFormat().
You can catch the charFormat without any problem.but the charFormat() is adopted in some range.So you must distinguish the range.
1.The targeted texts is set in the QTextCharFormat's property.
2.You have The QTextCursor pass during the same QTextCharFormat's object.
I can Think of them...
I Think it is some helps for you.

R Selenium - Difficulty Extracting Data from Complex Table

I'm trying to webscrape some soccer data. I'm able to loop through all of the necessary web pages, but I'm having trouble getting the data that I need from each page. I think the tables that hold the table are some form of Java, which makes it difficult.
I'm trying to get the goal times for each team from the following website:
http://www.scoreboard.com/uk/match/arsenal-west-brom-2014-2015/AyTNt38e/#match-summary|match-statistics;0|lineups;1
but I can't seem to distinguish between goals/cards/other events that are present. Can anyone help me, or is this simply a lost cause on this website?
My code to get the time of the first event (goal/cards/other) is :
library("RSelenium")
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("http://www.scoreboard.com/uk/match/arsenal-west-brom-2014-2015/AyTNt38e/#match-summary|match-statistics;0|lineups;1")
x<-mybrowser$findElements(using = 'css selector', ".time-box")
x[[1]]$getElementText()
You need to pick a specific parent element that holds only and all the elements that you want. In this case, "#summary-content div.time-box" works as the CSS selector.
If you want the event type, e.g. goal vs card vs ..., then you want to use the CSS selector "#summary-content div.icon-box" and then look at the other class on the DIV element. soccer-ball for a goal, y-card for a yellow card, and so on. For example,
<div class="icon-box soccer-ball">
That should be enough to get you started. You should be able to get the rest of them yourself.

Nested REST Routing

Simple situation: I have a server with thousands of pictures on it. I want to create a restful business layer which will allow me to add tags (categories) to each picture. That's simple. I also want to get lists of pictures that match a single tag. That's simple too. But now I also want to create a method that accepts a list of tags and which will return only pictures that match all these tags. That's a bit more complex, but I can still do that.
The problem is this, however. Say, my rest service is at pictures.example.com, I want to be able to make the following calls:
pictures.example.com/Image/{ID} - Should return a specific image
pictures.example.com/Images - Should return a list of image IDs.
pictures.example.com/Images/{TAG} - Should return a list of image IDs with this tag.
pictures.example.com/Images/{TAG}/{TAG} - Should return a list of image IDs with these tags.
pictures.example.com/Images/{TAG}/{TAG}/{TAG} - Should return a list of image IDs with these tags.
pictures.example.com/Images/{TAG}/{TAG}/{TAG}/{TAG}/{TAG} - Should return a list of image IDs with these tags.
etcetera...
So, how do I set up a RESTful web service projects that will allow me to nest tags like this and still be able to read them all? Without any limitations for the number of tags, although the URL length would be a limit. I might want to have up to 30 tags in a selection and I don't want to set up 30 different routing thingies to get it to work. I want one routing thingie that could technically allow unlimited tags.
Yes, I know there could be other ways to send such a list back and forth. Better even, but I want to know if this is possible. And if it's easy to create. So the URL cannot be different from above examples.
Must be simple, I think. Just can't come up with a good solution...
The URL structure you choose should be based on whatever is easy to implement with your web framework. I would expect something like:
http://pictures.example.com/images?tags=tag1,tag2,tag3,tag4
Is going to be much easier to handle on the server, and I can see no advantage to the path segment approach that you are having trouble with.
I assume you can figure out how to actually write the SQL or filesystem query to filter by multiple tags. In CherryPy, for example, hooking that up to a URL is as simple as:
class Images:
#cherrypy.tools.json_out()
def index(self):
return [cherrypy.url("/images/" + x.id)
for x in mylib.images()]
index.exposed = True
#cherrypy.tools.json_out()
def default(self, *tags):
return [cherrypy.url("/images/" + x.id)
for x in mylib.images(*tags)]
default.exposed = True
...where the *tags argument is a tuple of all the /{TAG} path segments the client sends. Other web frameworks will have similar options.

Resources