I'm attempting to scrape a page that has about 10 columns using Ruby and Nokogiri, with most of the columns being pretty straightforward by having unique class names. However, some of them have class ids that seem to have long number strings appended to what would be the standard class name.
For example, gametimes are all picked up with .eventLine-time, team names with .team-name, but this particular one has, for example:
<div class="eventLine-book-value" id="eventLineOpener-118079-19-1522-1">-3 -120</div>
.eventLine-book-value is not specific to this column, so it's not useful. The 13 digits are different for every game, and trying something like:
def nodes_by_selector(filename,selector)
file = open(filename)
doc = Nokogiri::HTML(file)
doc.css(^selector)
end
Has left me with errors. I've seen ^ and ~ be used in other languages, but I'm new to this and I have tried searching for ways to pick up all data under id=eventLineOpener-XXXX to no avail.
To pick up all data under id=eventLineOpener-XXXX, you need to pass 'div[id*=eventLineOpener]' as the selector:
def nodes_by_selector(filename,selector)
file = open(filename)
doc = Nokogiri::HTML(file)
doc.css(selector) #doc.css('div[id*=eventLineOpener]')
end
The above method will return you an array of Nokogiri::XML::Element objects having id=eventLineOpener-XXXX.
Further, to extract the content of each of these Nokogiri::XML::Element objects, you need to iterate over each of these objects and use the text method on those objects. For example:
doc.css('div[id*=eventLineOpener]')[0].text
Related
I do have two file which contains domains, after reading it with readlines(), I am getting two lists:
a = ['abc.com','cde.com','efg.com']
b = ['yabc.com','cde.com','abce.com','efg.com']
Now I need to find the common between the two.
No partial match allowed.( abc.com above has two partial match )
There is no order
Output should be : [''cde.com,'efg.com']
There is one problem I am handling manually, in some line of files we have multiple domain with "|" separator like:
abc.com|cde.com|efg.com which is treated as one string in and giving me list like:
['abc.com|cde.com|efg.com\n','xyz.com']
In this case again abc, cde and efg.com would be missed.
I tried set, intersection, two for loops, re.search, but not accurate result.
I try to use RQDA for quantitative text analysis. I want to code text passages with the same characters automatically.
Let´s say I have the category dog and I marked "dog" in the first sentence and "dogfood" in the fourth. I want RQDA mark "dog" also in the second sentence and "dogfood in the fifth.
In Maxqda, for example, this is done automatically if I enable the software. Is there a function to do this?
If I understand you want to make an automatic coding using RQDA. The function would be codingBySearch:
codingBySearch(pattern, fid = getFileIds(), cid, seperator,
concatenate = FALSE)
But this function only allows you to make a single pattern per time. If you would like to get a list of patterns, a loop will sort it out:
X <- c("pattern1", "pattern2", "pattern3", "pattern4", "pattern5", "pattern6")
for (i in X) {
codingBySearch(i,fid=getFileIds(),cid=cid_number, seperator="[.!?]",ignore.case=TRUE)
}
Where cid is the number of the code you created in the GUID interface. You can also adapt the separators as you see fit.
i am not a dot net programmer but need to migrate dotnet code to java .having issue understanding this follwing piece
Lets say specificTermical and ShipTo have latitutde property with different value so what happends when we use concat what will be the final value eg. 23.10+43.10 or something else
List<OrderDispatchItemDTO> locations =(List<OrderDispatchItemDTO>) msg.Details.Select(x => x.SpecificTerminal).Concat(msg.Details.Select(x => x.ShipTo));
The line of code that you provide returns a List of OrderDispatchItemDTO objects, that contains the values of both the SpecificTerminal and ShipTo properties of the Details objects.
It doesn't make any kind of calculation between the values of SpecificTerminal and ShipTo properties; it only adds both of them in a common list.
More detailed:
The Select method returns a new IEnumerable of the selected objects
And the Concat method concatenates the second collection into the first.
Concat is a string method. When you concatenate "23.10" and "43.10", it gives "23.1043.10". Therefore combining the two strings together.
To do any calculation in c#, you have to convert from strings data types to other mathematical data type that fits the say.
You may convert those two values to float and add them as shown below:
Float sum = Convert.ToFloat(23.10) + Convert.ToFloat(43.10);
I think this won't be very complex, but I'm unable to figure this out.
I have in Matlab a 17x1 struct object with 6 fields, named photolist. I only want to export the name field , to use in R.
photolist.name gives me the list I need, but when I want to store it in a variable:
name = photolist.name
I only get the first value, same for
name = getfield(photo_list, 'name')
and while
name = [photolist.name]
gives me all values, it does so in one long string without spaces.
using
save('temp.mat', 'photolist')
gives me something I can import to R, but then I need to go multiple nested layers deep to get the values I need, which is a workaround but not very satisfying.
How do i save just the .name fields to a variable?
Found it, was already answered here
names = extractfield(photolist, 'name')
And another way to get the same result is:
names = {photolist.name}
I have two .csv files containing information which I would like to link. I read each .csv file into a dictionary and named it as follows:
Key Value
Dictionary1 = {Complex, Protein}
Dictionary2 = {Protein, Absorbance}
I would like to be able to link the proteins from Dictionary1 to Dictionary2 so the end result would be if I were to call a complex in Dictionary1 it would give me the absorbances associated with all the proteins in Dictionary2.
Perhaps I have taken the wrong approach putting both the data sets into dictionaries...
You can use the value resulting from the lookup in the first as the key for the second. Of course, assuming that the data is immutable.
Python:
dict1 = {'Complex': 'Protein'}
dict2 = {'Protein': 'Absorbance'}
dict2[dict1['Complex']] # 'Absorbance'