Morning Star Integration - web-scraping

I've been trying to make a comprehensive Google Sheet containing information about different stocks that can have stocks dynamically added and removed for a school project. To that end, I've done some research into loading data from tables with IMPORTHTML, and come up with this:
=IMPORTHTML(CONCAT("http://financials.morningstar.com/ratios/r.html?t=", B3),"table", 1)
Where B3 is NFLX or any other stock ticker that would be added. However, Sheets is returning with `Imported content is empty", and I've got no clue why.

Partial Answer:
yes, after a bit of fiddling I found that the best url to look at for Financial Parts is
http://financials.morningstar.com/finan/financials/getFinancePart.html?t=XNAS:NFLX&region=usa&culture=en-US&ops=clear
You can use this and combine it with
https://github.com/fastfedora/google-docs/blob/master/scripts/ImportJSON/Code.gs
which adds ImportJSON functionality to google sheets.
This should get you started. However the data comes as bunch of divs/tds that you would have to clean it up but it's possible after a bit of fiddling :)

I can't tell you where Ahmed Masud got the link (and I would like to know too). I get a lot
of data from morningstar and they can be strange. However, using his link, you can get the
Key Ratios and clean the resulting data with importhtml and also reference your B3 cell with this:
=arrayformula(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IMPORTHTML("http://financials.morningstar.com/finan/financials/getFinancePart.html?t=XNAS:"& B3 &"&region=usa&culture=en-US&ops=clear","table", 1), "<\/td>" , "" ),"<\/tr>",""),"<\/th>",""),"<\/thead>",""),"<\/span>",""))
Sorry, I put in the wrong formula. This will get the B3 entry.
This will get ownership:
=arrayformula(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IMPORTHTML("http://investors.morningstar.com/ownership/shareholders-overview.html?t=XNAS:"& B3 &"&region=usa&culture=en-US&ops=clear","table", 4), "<\/td>" , "" ),"<\/tr>",""),"<\/th>",""),"<\/thead>",""),"<\/span>",""))
Again the stock symbol should be in B3.

Related

readHTMLTable does not recognize URL

Okay guys, I have, what I'm sure, is an entry-level problem. Still, I cannot explain it. Here's my code and its error:
> sample1 = readHTMLTable(http://www.pro-football-reference.com/boxscores/201609150buf.htm, which = 16)
Error: unexpected '/' in "sample1 = readHTMLTable(http:/"
It's having a problem with the second front-slash? Not only does every URL have two front-slashes, but I've poured through countless examples of this function, both on this site and others, and they've all formatted this code in this way. So, what am I doing wrong?
Additionally, I've tried it without the back-slashes:
> sample1 = readHTMLTable(www.pro-football-reference.com/boxscores/201609150buf.htm, which = 16)
Error: unexpected symbol in "sample1 = readHTMLTable(www.pro-football-reference.com/boxscores/201609150buf.htm"
Here, I'm not even sure which symbol it's talking about.
Please explain.
The issue is that you need to place your url in quotes (""). The following does return the table from your specified url:
sample1 = readHTMLTable("www.pro-football-reference.com/boxscores/201609150buf.htm")
As you probably know, the "which=" parameter is used to select which of the tables in that page you would like to retrieve. However my own attempts show that only 1 and 2 work. Could you tell me which table you are attempted to read into R? If this method doesn't end up working you can also attempt to read in the entirety of the webpage and parse out the table in question.
Hope this helps get things started!

phpexcel select cell after freezePane()

A) I would like to have a PHPExcel-generated file to open with cell A1 selected. Not a problem: I can do that.
B) I would like to have a PHPExcel-generated file with frozen panes (at 'E6', but that's not the real issue). Again, not a problem: I can do that.
Now, when trying to do A and B, that's when I hit a real problem: the file always opens with cell E6 selected, no matter what I try...
I've tried using
$objPHPExcel->getActiveSheet()->freezePane('E6');
in different stages of the file construction (right at the beginning, at the end, in the middle), always with
$objPHPExcel->getActiveSheet()->setSelectedCell('A1');
AFTER freezing the panes, but no luck...
I searched and searched and found no solution to this (except a perhaps-related-but-unanswered request here at SO). Either I'm overlooking something obviously simple or I've uncovered a small bug... :-) Can someone help?
Many thanks in anticipation.
Looking at the code, The Excel2007 Writer overrides the selected cell when there's a split pane, changing it to the top-left cell of the split.
Quick and dirty fix in Classes/PHPExcel/Writer/Excel2007/Worksheet.php, change line 262 which should read
$activeCell = $topLeftCell;
to
$activeCell = empty($activeCell) ? $topLeftCell : $activeCell;
I haven't tested it fully, but it should work for now.... I really should be testing to see which "pane" the selected cell falls into, and setting appropriately in that pane

Undefined columns selected error in R

I apologize in advance because I'm extremely new to coding and was thrust into it just a few days ago by my boss for a project.
My data set is called s1. S1 has 123 variables and 4 of them have some form of "QISSUE" in their name. I want to take these four variables and duplicate them all, adding "Rec" to the end of each one (That way I can freely play with the new variables, while still maintaining the actual ones).
Running this line of code keeps giving me an error:
b<- llply(s1[,
str_c(names(s1)
[str_detect(names(s1), fixed("QISSUE"))],
"Rec")],table)
The error is as such:
Error in `[.data.frame`(s1, , str_c(names(s1)[str_detect(names(s1), fixed("QISSUE")) & :
undefined columns selected
Thank you!
Use this to get the subset. Of course there is other ways to do that with simpler code
b<- llply(s1[,
names(s1)[str_detect(names(s1), fixed("QISSUE"))]
],c)
nwnam=str_c(names(s1)[str_detect(names(s1), fixed("QISSUE"))],"Rec")
ndf=data.frame(do.call(cbind,b));colnames(ndf)=nwnam
ndf
# of course you can do
cbind(s1,ndf)

How to use create-<breeds>-with between two breed turtle agents?

I've been stuck by this issue for a long time. I have two networks in my model, so I want to create different types of links with different breed turtle agentsets.
Let's name the 1st turtle agentset T1 and the 2nd T2, so what I did is the following:
breed [T1s T1]
undirected-link-breed [TL1s TL1]
breed [T2s T2]
undirected-link-breed [TL2s TL2]
;;Got error report
ask T1s [create-TL1s-with other n-of 10 T1s]
The last line gave an error reporting that "You cannot use breeded and unbreeded links in the same world". I'm quite confused about what this means.
And then, I changed the last line to:
ask T1s [create-links-with other n-of 10 T1s]
It worked this time, but if that's the case, how can I define two different types of links, i.e., TL1 and TL2, with different turtle agentsets T1s and T2s?
Can anybody help me out? I really appreciate it!
Thanks
That error means that you've created some links that have no breed (probably with create-link-with) before creating links with a breed, or vice-versa. If you want to use link breeds, you can never use create-link-with, create-link-to, or create-link-from. You must always use create-<breed>-with, create-<breed>-to, and create-<breed>-from.
So, search your code for instances of create-link-with, create-link-to, or create-link-from and either delete them or change them to create-<breed>-with, create-<breed>-to, or create-<breed>-from. If you're still getting the error, call clear-all or clear-links to make sure you've removed all unbreeded links.

Trying get the price of products with RCurl

Im scrapping the price of some products from a website . In Python I used the urllib2 without problems, but when I tried using RCurl in R I couldn't donwload the source code.
I have to paste the source code with the product code, then I catch the price. The path of a product is: http://www.americanas.com.br/produto/code_of_product.
Actually, I can't download the source code of a product with RCurl. When I try for example getURL('http://www.americanas.com.br/produto/111467594') it returns "".
I tried using getURL('.../produtos/111467594') and I could download the source, but in this way I'm unable to get the price. :(
Anyone know how could I get the price of the products?
Thanks.
Ps.: Sorry for my bad english. :)
welcome to StackOverflow.
It's hard to say for me why it doesn't work, could you include a verbose=TRUE in the getURL? Also, I notice there's different prices on the webpage you linked. You want all or just the first? How about this to get the "Por price":
library("stringr")
productwebpage<-readLines("http://www.americanas.com.br/produto/111467594")
pricerow<-productwebpage[grep("p class=\"sale price\"",productwebpage)]
price<-str_extract_all(pricerow,"\\(?[0-9,.]+\\)?")[[1]]
You could also substitute the grep("p class=\"sale price\"",productwebpage) to either grep("<p><span class=\"regular price\">",productwebpage) (to get the "de price" / old price) or grep("<span class=\"p-v interest\">",productwebpage) (which will give you the "sem jouros" price / per month payment). For the last example you will get the months first and the payment after so it will be:
> price
[1] "12" "83,25"
This should hopefully work for other products as well (just tried 5 and seemed to work for all of them).

Resources