Can Qauntmod Get Capital Gains from Yahoo Finance?

Can Qauntmod Get Capital Gains from Yahoo Finance? - quantmod

There seems to be a relatively new feature on the Yahoo Finance Historical Price page. It now has four options to select from: "Historical Prices", "Dividends Only", "Stock Splits", and "Capital Gain". The "Capital Gain" seems to be quite recent. From my tests, Quantmod getDividends() only retrieves the "Dividends Only" data. The Yahoo Capital Gain data appears to be the sum of short-term and long-term capital gains, if any. Quantmod doesn't seem to have a function to retrieve the caption gains.
My questions are:
How can we use quantmod to retrieve capital gains?
The adjustOHLC() used by quantmod's getSymbols seems to use only the div data. Does the capital gains data need to be included in the adjustment?

Related

How to scrap div element using BeautifulSoup and output it as a text

I have a python script for extracting some content. It works by loading urls from a csv file I have and outputing it in a csv. The content is such that some of it has a div class that has some unformatted text. Trying to scrap that is proving difficult. How can I tweak my code to capture that. The unformatted text is not in all the webpages so I have added an error handling statement.
Also is there a way I can have the unformatted text in the same column as Content rather than having it on its own column?
urls = ['https://www.studypool.com/discuss/18233577/obtain-a-copy-of-the-financial-statements-for-a-publicly-traded-company-then-complete-a-ratio-analysis','https://www.studypool.com/discuss/18898929/financial-accounting-questions-multiple-choice-about-the-chapter-cash-amp-investments',
'https://www.studypool.com/discuss/18237517/compare-forms-of-fundamental-and-technical-analyses'
]
def transform(url):
r = requests.get(str(url))
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.find('h1',{'class':"question-title"})
content = soup.find('div',{'class':'user-generated-description'})
textbox = soup.find('div', {'class':'unformatted-text-box'})
try:
textbox = textbox.find('a',{'rel':'unformatted-text-box'}).text.strip()
except:
textbox = ''
row = {'Title':title.text,
'Content':content.text,
'Textbox':textbox}

This is one way of achieving your goal:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
from tqdm import tqdm
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}
urls = ['https://www.studypool.com/discuss/18233577/obtain-a-copy-of-the-financial-statements-for-a-publicly-traded-company-then-complete-a-ratio-analysis','https://www.studypool.com/discuss/18898929/financial-accounting-questions-multiple-choice-about-the-chapter-cash-amp-investments',
'https://www.studypool.com/discuss/18237517/compare-forms-of-fundamental-and-technical-analyses'
]
big_list = []
s = requests.Session()
s.headers.update(headers)
for url in tqdm(urls):
r = s.get(url)
soup = bs(r.text, 'html.parser')
title = soup.select_one('h1.question-title').get_text(strip=True)
content = soup.select_one('div.user-generated-description').text.strip()
try:
textbox = soup.select_one('div.unformatted-text-box').text.strip()
except Exception as e:
textbox = 'not specified'
big_list.append((title, content + '\n' + textbox))
df = pd.DataFrame(big_list, columns = ['Title', 'Content'])
df.to_csv('saved_data.csv')
print(df)
Result printed in terminal:
Title Content
0 University of Illinois at Chicago Accurate Reporting of Social Media Use Discussion This is an assignment on ratio analysis. You need to obtain a copy of the financial statements for a publicly traded company. Then choose as many ratios as possible. (I suggest choosing 4-5, the professor's requirement is at least three) I will send you the specific requirements as an attachment.\nFinancial Statement Analysis\nNew Focus Consulting, 2014\nChapter 13\nRatios and Trend Analysis\nChapter 13\nHorizontal Analysis: source, value investing basics\nChapter 13\nCalculate Income Statement 2013 vertical\namounts and 2011 to 2013 horizontal\namounts.\nChapter 13\nCurrent Ratio: Used to determine a company’s\nability to repay short-term debts.\nCurrent Assets\nCurrent Liabilities\nChapter 13\nQuick Ratio: Addressed liquidity by using cash\nand current assets that can be most quickly\nconverted to cash(quick assets).\nQuick Assets\nCurrent Liabilities\nChapter 13\nInventory Turnover Ratio: Number of times the inventory of\na company is sold and replaced over a specified period\nof time.\nCost of Goods Sold\nAverage Inventory at Cost\nChapter 13\nAccounts Receivable Turnover Ratio: Calculates\nhow quickly a company turns it credit sales into\ncash.\nCredit Sales\nAverage Accounts Receivable\nChapter 13\nAverage Collection Period Ratio: The\naverage number of days it takes for a\ncompany to collect its accounts receivable.\nAvg. Accounts Receivable\n(Sales/360)\nChapter 13\nDebt to Equity Ratio: Calculates the amount\nof debt as a percentage of equity. Some\nanalysts will use total liabilities as debt.\nTotal Debt\nTotal Equity\nChapter 13\nGross Profit Margin Ratio: Determines the\nprofitability of a company through direct\nexpenses. Used to evaluate efficiency of\noperations.\nSales – Cost of Goods Sold\nSales\nChapter 13\nOperating Margin Ratio: Determines the\nprofitability percentage from a company’s\noperations.\nOperating Income\nSales\nChapter 13\nNet Profit Margin Ratio: Determines the profit\nof a company after it meets the obligations\nfor a specific period.\nNet Profit\nSales\nChapter 13\nReturn on Equity Ratio: Indicates the return\nearned by the owners(investors) for a\nperiod.\nNet Profit\nAverage Owners Equity\nChapter 13\nEarnings Per Share Ratio: The theoretical\nearnings per each outstanding share.\nNet Income – Preferred Dividends\nAverage Number of Common\nShares Outstanding\nChapter 13\nThe prior ratios were some examples of\nratios and analysis. There are a number\nmore. Some not presented were ratios\nusing assets as a denominator. In my\nopinion, they are less telling than other\nratios.\nNew Focus Consulting\nFinancial Statement & Ratio Assignment\nObtain a copy of the financial statements for a publicly traded company.\nSelect three of the ratios presented in class or from Financial Statement Analysis\nand show the calculations for your selected company.\nCALCULATE FOR AT LEAST THE LAST THRE YEARS. ONE OF THE\nYEARS MUST BE DURING THE YEAR ENDED IN 2018.\nRemember, ratios are most relevant when compared to a companies' own\nhistorical, industry or competitors trends. For the above calculations, what\nstory do they tell? Provide an explanation for each of the three ratios presented.\nThe assignment will be at least two pages, not more than four pages.\nNote: Apple Inc, Samsung or Tesla are not allowed to be used for this assignment.\nNew Focus Consulting\n2007\nNew Focus Consulting\nFinancial Indicators & Ratios\nUsed to understand trends of a company. Most useful when compared to\na company's historical information or industry average.\nAccounts Receivable Turnover: Net credit sales over average accounts receivable. Measures\nhow quickly customers pay their bills.\nCapitalization Rate: Calculated as net income over owners investment, and\n(Cap Rate) reflects the rate of return a property will produce on an\ninvestment.\nCash Debt Coverage Ratio: Net cash from operating activities over total liabilities.\nMeasures a company's ability to repay its liabilities from cash\ngenerated from operations without liquidating assets.\nCost/Income Ratio: Total expenses divided by total expenses.\nCurrent Ratio: Current assets over current liabilities. Used by lending\ninstitutions to determine a company's ability to repay\nshort-term debts.\nDebt Coverage Ratio: Net income of an investment over the debt service of the\ninvestment.\nDebt to Equity Ratio: Total debt(longterm and shortterm) over total equity. Lending\ninstitutions will usuall be concerned with a companies\nDebt to Equity ratio over .5 to .75.\nDividend Yield Ratio: Annual dividends over current market share price of stock.\nLong Term Debt to Equity Ratio: Long term debt over owner's equity. In general, a zero to .3\nNew Focus a\nConsulting\nratio is considered\nrelatively low debt exposure.\n2006\nOperating Ratio: Operating revenues over operating expenses. When\ncompared to other periods or industry averages, helps\nmeasure a company's operating efficiency.\nPrice/Earnings Ratio: Current price of a stock divided by actual earning per share.\n(P/E Ratio)\nReturn on Investment: Net Income divided by net book value(total assets minus\n(ROI) intangible assets and liabilities).\nNew Focus Consulting\n2006\n\nPurchase answer to see full\nattachment
1 Financial Accounting Cash & Investments Multiple Choice Questions 1)The following information regarding the cash activities of Roves Ltd. for the month of April 20x5 is given below:Cash balance per books, April 12522𝐶𝑎𝑠ℎ𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑𝑑𝑢𝑟𝑖𝑛𝑔𝐴𝑝𝑟𝑖𝑙53427𝐶𝑎𝑠ℎ𝑝𝑎𝑦𝑚𝑒𝑛𝑡𝑠𝑚𝑎𝑑𝑒𝑑𝑢𝑟𝑖𝑛𝑔𝐴𝑝𝑟𝑖𝑙38371𝑁𝑆𝐹𝑐ℎ𝑒𝑞𝑢𝑒𝑠𝑓𝑟𝑜𝑚𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠𝑠ℎ𝑜𝑤𝑛𝑜𝑛𝑡ℎ𝑒𝑏𝑎𝑛𝑘𝑠𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡1580𝐵𝑎𝑛𝑘𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑐ℎ𝑎𝑟𝑔𝑒𝑠578𝐼𝑛𝑝𝑟𝑒𝑝𝑎𝑟𝑖𝑛𝑔𝑡ℎ𝑒𝑏𝑎𝑛𝑘𝑟𝑒𝑐𝑜𝑛𝑐𝑖𝑙𝑖𝑎𝑡𝑖𝑜𝑛𝑓𝑜𝑟𝑡ℎ𝑒𝑚𝑜𝑛𝑡ℎ𝑜𝑓𝐴𝑝𝑟𝑖𝑙,𝑤ℎ𝑎𝑡𝑖𝑠𝑡ℎ𝑒𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑𝑐𝑎𝑠ℎ𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑝𝑒𝑟𝑏𝑜𝑜𝑘𝑠𝑎𝑡𝐴𝑝𝑟𝑖𝑙30𝑡ℎ?𝑆𝑒𝑙𝑒𝑐𝑡𝑜𝑛𝑒:𝑎.364 b. 17578𝑐. 15420 d. 159982)𝑇𝑟𝑖𝑑𝑒𝐿𝑡𝑑.𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑1085𝑠ℎ𝑎𝑟𝑒𝑠𝑜𝑓𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑜𝑛𝐽𝑢𝑙𝑦1,20𝑥5𝑎𝑡 15.80 per share. On December 31, 20x6, the market value had increased to 12.30.𝑂𝑛𝐹𝑒𝑏𝑟𝑢𝑎𝑟𝑦28,20𝑥7,𝑡ℎ𝑒𝑠ℎ𝑎𝑟𝑒𝑠𝑜𝑓𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑤𝑒𝑟𝑒𝑠𝑜𝑙𝑑,𝑎𝑛𝑑𝑇𝑟𝑖𝑑𝑒𝐿𝑡𝑑.𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑𝑎𝐺𝑎𝑖𝑛𝑜𝑛𝐹𝑉𝑇𝑃𝐿𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡𝑠𝑜𝑓 7143.Assuming that the investment in the shares of Next Ltd. is classified as FVTPL, how much were the Next Ltd. shares sold for on February 28, 20x7?Select one:a. 10000𝑏. 6202 c. 20488𝑑. 242863)A company purchased shares costing 103344𝑑𝑢𝑟𝑖𝑛𝑔𝑡ℎ𝑒𝑦𝑒𝑎𝑟.𝑇ℎ𝑒𝑠𝑒𝑠ℎ𝑎𝑟𝑒𝑠𝑎𝑟𝑒𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑𝑎𝑠𝐹𝑉𝑇𝑃𝐿.𝐴𝑡𝑡ℎ𝑒𝑒𝑛𝑑𝑜𝑓𝑡ℎ𝑒𝑦𝑒𝑎𝑟,𝑡ℎ𝑒𝑐𝑜𝑚𝑝𝑎𝑛𝑦𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 6831 in dividends from these shares. At year-end, the fair value of the shares is 122955.𝑊ℎ𝑎𝑡𝑖𝑠𝑡ℎ𝑒𝑛𝑒𝑡𝑖𝑚𝑝𝑎𝑐𝑡𝑜𝑛𝑡ℎ𝑒𝑆𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡𝑜𝑓𝐶𝑜𝑚𝑝𝑟𝑒ℎ𝑒𝑛𝑠𝑖𝑣𝑒𝐼𝑛𝑐𝑜𝑚𝑒𝑓𝑜𝑟𝑡ℎ𝑒𝑦𝑒𝑎𝑟?𝑆𝑒𝑙𝑒𝑐𝑡𝑜𝑛𝑒:𝑎. 6831 revenue in profit and loss and 0𝑖𝑛𝑜𝑡ℎ𝑒𝑟𝑐𝑜𝑚𝑝𝑟𝑒ℎ𝑒𝑛𝑠𝑖𝑣𝑒𝑖𝑛𝑐𝑜𝑚𝑒𝑏. 0 revenue in profit and loss and 26442𝑖𝑛𝑜𝑡ℎ𝑒𝑟𝑐𝑜𝑚𝑝𝑟𝑒ℎ𝑒𝑛𝑠𝑖𝑣𝑒𝑖𝑛𝑐𝑜𝑚𝑒𝑐. 26442 revenue in profit and loss and 0𝑖𝑛𝑜𝑡ℎ𝑒𝑟𝑐𝑜𝑚𝑝𝑟𝑒ℎ𝑒𝑛𝑠𝑖𝑣𝑒𝑖𝑛𝑐𝑜𝑚𝑒𝑑. 6831 revenue in profit and loss and 19611𝑖𝑛𝑜𝑡ℎ𝑒𝑟𝑐𝑜𝑚𝑝𝑟𝑒ℎ𝑒𝑛𝑠𝑖𝑣𝑒𝑖𝑛𝑐𝑜𝑚𝑒4)𝑇𝑟𝑖𝑑𝑒𝐿𝑡𝑑.𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑988𝑠ℎ𝑎𝑟𝑒𝑠𝑜𝑓𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑜𝑛𝐽𝑢𝑙𝑦1,20𝑥5𝑎𝑡 18.85 per share. On December 31, 20x5, the market value of the Next shares was 10.35𝑎𝑛𝑑𝑜𝑛𝐷𝑒𝑐𝑒𝑚𝑏𝑒𝑟31,20𝑥6,𝑡ℎ𝑒𝑚𝑎𝑟𝑘𝑒𝑡𝑣𝑎𝑙𝑢𝑒ℎ𝑎𝑑𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑑𝑡𝑜 13.13. On February 28, 20x7, the shares of Next Ltd. were sold for 22.79.𝑊ℎ𝑎𝑡𝑖𝑠𝑡ℎ𝑒𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑜𝑓𝑡ℎ𝑒𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡𝑖𝑛𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑎𝑡𝐽𝑎𝑛𝑢𝑎𝑟𝑦1,20𝑥7?𝑆𝑒𝑙𝑒𝑐𝑡𝑜𝑛𝑒:𝑎. 12972 b. 18624𝑐. 22517 d. 102265)𝑇𝑟𝑖𝑑𝑒𝐿𝑡𝑑.𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑985𝑠ℎ𝑎𝑟𝑒𝑠𝑜𝑓𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑜𝑛𝐽𝑢𝑙𝑦1,20𝑥5𝑎𝑡 16.89 per share. On December 31, 20x5, the market value of the Next shares was 9.86𝑎𝑛𝑑𝑜𝑛𝐷𝑒𝑐𝑒𝑚𝑏𝑒𝑟31,20𝑥6,𝑡ℎ𝑒𝑚𝑎𝑟𝑘𝑒𝑡𝑣𝑎𝑙𝑢𝑒ℎ𝑎𝑑𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑑𝑡𝑜 13.51. On February 28, 20x7, the shares of Next Ltd. were sold for24.63.𝐴𝑠𝑠𝑢𝑚𝑖𝑛𝑔𝑡ℎ𝑎𝑡𝑡ℎ𝑒𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡𝑖𝑛𝑡ℎ𝑒𝑠ℎ𝑎𝑟𝑒𝑠𝑜𝑓𝑁𝑒𝑥𝑡𝐿𝑡𝑑.𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑𝑎𝑠𝐹𝑉𝑇𝑃𝐿,𝑤ℎ𝑖𝑐ℎ𝑜𝑓𝑡ℎ𝑒𝑓𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔𝑤𝑜𝑢𝑙𝑑𝑏𝑒𝑝𝑎𝑟𝑡𝑜𝑓𝑡ℎ𝑒𝐹𝑒𝑏𝑟𝑢𝑎𝑟𝑦28,20𝑥7𝑗𝑜𝑢𝑟𝑛𝑎𝑙𝑒𝑛𝑡𝑟𝑦?𝑆𝑒𝑙𝑒𝑐𝑡𝑜𝑛𝑒:𝑎.𝐷𝑒𝑏𝑖𝑡𝐺𝑎𝑖𝑛𝑜𝑛𝐹𝑉𝑇𝑃𝐿𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡𝑠10953 b. Credit OCI - Gain on FVTOCI Investments 10953𝑐.𝐶𝑟𝑒𝑑𝑖𝑡𝑅𝑒𝑡𝑎𝑖𝑛𝑒𝑑𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 7624 d. Credit Gain on FVTPL Investments $10953\nnot specified
2 Rasmussen College Compare Forms of Fundamental and Technical Analysis Presentation You have just completed your first training for the new class of interns at your employer, Bank of Wealth Investment Brokers. Part of your role as the new Portfolio Analyst is to train the new research interns on all of the facets of investing. You have now been asked to conduct another training on the purpose of fundamental and technical analyses with examples and explanations of equations.\nYou will need to develop a PowerPoint presentation that explains the differences and similarities of fundamental and technical analyses. Include in your presentation a few examples of equations used for company analysis such as ROE, EPS, PE ratio. A PowerPoint presentation will provide brief and clear information on the required subject. Often, bullet points are utilized in a PowerPoint presentation; however, since interns will be expected to know and understand the material thoroughly, your presentation should be more detailed and offer supporting evidence, including a reference list. Be sure to use the Notes section under each slide to add information. Here is a link to information about adding speaker notes.\n\nThe presentation should give the interns enough information to understand the similarities and differences of fundamental and technical analyses. Be sure to use audience-specific language and tone in the presentation. Remember, you are writing this presentation for the interns; however, the Portfolio Manager may attend.\nnot specified

Query in back-testing strategy in R- Indian trader perspective

There is a documentation for backtesting in R in GitHub(https://timtrice.github.io/backtesting-strategies/).
I have a query in two lines of code mentioned in this document (https://timtrice.github.io/backtesting-strategies/using-quantstrat.html#settings-and-variables).
First line
Sys.setenv(TZ = "UTC")
Second line
currency('USD')
As you can see, the first line sets - system time to the US and the second line - sets the currency in which trading is occurring to the US. I am an Indian Trader and my job is to do back-testing with equity data for Indian companies. I use quantstrat and quantmod packages along with its dependencies. The data is downloaded from Yahoo Finance through R platform.
What is the argument should an Indian trader pass to both these
functions(Sys.setenv and currency)???. The currency of Indian market
is INR(Indian Nation Rupees) and the time of India is GMT+5:30
I have tried to pass the argument "GMT+5:30" to Sys.setenv function and it turned back an error. But when i tried to pass GMT, there was no error. But Indian timing is GMT+5:30.

I found the answer. For determining the time zone, type OlsonNames() in R. You will get a comprehensive list of timezones. Among that, please choose the specific one according to your timezone. So for me(Indian trader), it would be Sys.getenv("Asia/Kolkata") For the currency, please set it as currency("INR") . I thank Ilya Kipnis - for helping in arriving at solution.

Use of adjusted vs.anadjusted prices for stock strategy backtesting?

This is more of a methodological (rather than a programming) issue, yet it feels SO is the right place for it. Following the ups and downs after Yahoo changed its defaults in May 2017 for fetching daily data (discussed on https://github.com/joshuaulrich/quantmod/issues/174, http://blog.fosstrading.com/2017/06/yahoo-finance-alternatives.html and also on SO Why Open,High,Low prices are wrong when using quantmod?) I am probably not the only one not 100% certain which data to use in a backtesting procedure and whether quantmod getSymbols.yahoo and adjustOHLC still provide the relevant data for quality backtesting.
Quantmod 0.4.11 also includes AlphaVantage as (adjusted stock) data provider, but I am not familiar with their reliability.
How to prepare the (stock and index) data obtained from getSymbols calls? Which data ((stock & dividends) adjusted or unadjusted) should be used? Which transformations do you use? The adjustOHLC function also contains a bug, as it is not split adjusted (easily seen on AAPL by calling
getSymbols(AAPL)
chart_Series(adjustOHLC(AAPL))
and observing a jump in 2014.

You should always use adjusted prices. Most of the time when data provider doesn't have adjusted prices then usually provider's close prices are adjusted. There is no point doing backtests on a raw close prices data. I've once made a mistake by downloading close prices instead of adjusted and at the end of backtesting, my strategy told me that among all S&P composites Master Card was the worst performer. After looking at the MA chart it was obvious why.
Beacuse of a split on January 22, 2014 my data had a single return over -90%! In conclusion raw close data for backtesting might give you utterly false results.
How to deal with splits
Divide every price before a split by split ratio. For example Master Card had 1:10 split ratio so you should divide every price before 21.01.2014 by 10. It's very easy to find splits in a data, you just have to look for returns around or below -50%.
Dividends
Subtract from every price before dividend day dividend amount. To find dividends days you need dividends calendar, it's impossible to find them by yourself.

What's the exchange suffix for German and Australian stocks for GoogleFinance API?

What's the exchange suffix for German and Australian stocks for GoogleFinance API? For London stocks, it's .L (e.g. VOD.L). Just wonder what's the suffix for Germany and Aussie?
I tried something like .DE for German but it didn't work..(that's the exchange suffix for Yahoofinance anyway)
btw, below is my code to call GoogleFinance API with R
ticker <- "VOD.L"
a <- getSymbols(ticker, src="google",
from = as.Date("2010-01-01"), to = as.Date("2017-05-16"))

Here in Australia, our main exchange is the Australian Securities Exchange (ASX).
Personally, when I query Google Finance manually (i.e. through the web interface), I write my queries as ASX:WOW, like so. Note that some vendors treat this differently. E.g. Yahoo Finance prefers the WOW.AX convention (I believe Bloomberg does also, from memory).

Example for Germany (Software AG): ETR:SOW or FRA:SOW (ETR refers to the Xetra electronic exchange, where a large majority of the volume is nowadays traded. It is also the exchange that is most commonly used for reference data. FRA, on the other hand, refers to the "manual" trading floor. The main reason why you might sometimes want to use FRA is because it has longer trading hours than ETR. See here for more details.)
Example for Australia (Australia and New Zealand Banking Group): ASX:ANZ

use conditional statement to extract and map multiple XML nodes with XpathSApply in R

I retrieved xml file from a site using the code:
library (XML)
abstract <- xmlParse(file = 'http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?querytext=%28systematic%20review%20OR%20systematic%20literature%20review%20AND%20text%20mining%20techniques%29&pys=2009&&hc=1000', isURL = T)
the returned xml looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<totalfound>40420</totalfound>
<totalsearched>3735435</totalsearched>
<document>
<rank>1</rank>
<title><![CDATA[Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics]]></title>
<authors><![CDATA[Ghose, A.; Ipeirotis, P.G.]]></authors>
<affiliations><![CDATA[Dept. of Inf., Oper., & Manage. Sci., New York Univ., New York, NY, USA]]></affiliations>
<controlledterms>
<term><![CDATA[Internet]]></term>
<term><![CDATA[data mining]]></term>
<term><![CDATA[electronic commerce]]></term>
<term><![CDATA[pattern classification]]></term>
</controlledterms>
<thesaurusterms>
<term><![CDATA[Communities]]></term>
<term><![CDATA[Economics]]></term>
<term><![CDATA[History]]></term>
<term><![CDATA[Marketing and sales]]></term>
<term><![CDATA[Measurement]]></term>
</thesaurusterms>
<pubtitle><![CDATA[Knowledge and Data Engineering, IEEE Transactions on]]></pubtitle>
<punumber><![CDATA[69]]></punumber>
<pubtype><![CDATA[Journals & Magazines]]></pubtype>
<publisher><![CDATA[IEEE]]></publisher>
<volume><![CDATA[23]]></volume>
<issue><![CDATA[10]]></issue>
<py><![CDATA[2011]]></py>
<spage><![CDATA[1498]]></spage>
<epage><![CDATA[1512]]></epage>
<abstract><![CDATA[With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we reexamine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes such as their perceived usefulness. Our approach explores multiple aspects of review text, such as subjectivity levels, various measures of readability and extent of spelling errors to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences are negatively associated with product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are rated more informative (or helpful) by other users. By using Random Forest-based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. We examine the relative importance of the three broad feature categories: “reviewer-related” features, “review subjectivity” features, and “review readability” features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates eco- - nometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their helpfulness and economic impact.]]></abstract>
<issn><![CDATA[1041-4347]]></issn>
<htmlFlag><![CDATA[1]]></htmlFlag>
<arnumber><![CDATA[5590249]]></arnumber>
<doi><![CDATA[10.1109/TKDE.2010.188]]></doi>
<publicationId><![CDATA[5590249]]></publicationId>
<partnum><![CDATA[5590249]]></partnum>
<mdurl><![CDATA[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5590249&contentType=Journals+%26+Magazines]]></mdurl>
<pdf><![CDATA[http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5590249]]></pdf>
</document>
I want to extract title and match with author. I used XpathSApply and getNode on "//title" and "//authors" using:
getNodeSet(abstract, "//title")
getNodeSet(abstract, "//authors")
titlenodes <- xpathSApply(abstract, "//title")
then I discovererd some documents are without title. So if I extracted the separately, it will be impossible to match the title to its corresponding author. I need a way to detect which document has no title and pick onlu author for such documents returning NA for its title.

Consider importing all of the XML content into dataframe off the parent node, document. In this way, you can see which rows have missing titles and/or authors.
xmldf <- xmlToDataFrame(nodes = getNodeSet(abstract, "//document"))
# subset data frame of only title and author (to see NAs)
titleauthorsdf <- xmldf[, c("title", "authors")]
# character vector of authors with no titles
notitleauthorslist <- c(xmldf$authors[is.na(xmldf$title)])

If all you want is a list of authors where there is not title, you can do it this way:
xpathSApply(abstract,"//document[not(title)]/authors", xmlValue)
# [1] "Armstrong, R.; Baillie, C.; Cumming-Potvin, W." "Stede, M."
# [3] "Government Documents" "Piotrowski, M."
# ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can Qauntmod Get Capital Gains from Yahoo Finance? - quantmod

Related

How to scrap div element using BeautifulSoup and output it as a text

Query in back-testing strategy in R- Indian trader perspective

Use of adjusted vs.anadjusted prices for stock strategy backtesting?

What's the exchange suffix for German and Australian stocks for GoogleFinance API?

use conditional statement to extract and map multiple XML nodes with XpathSApply in R

Categories

Resources