There seems to be a relatively new feature on the Yahoo Finance Historical Price page. It now has four options to select from: "Historical Prices", "Dividends Only", "Stock Splits", and "Capital Gain". The "Capital Gain" seems to be quite recent. From my tests, Quantmod getDividends() only retrieves the "Dividends Only" data. The Yahoo Capital Gain data appears to be the sum of short-term and long-term capital gains, if any. Quantmod doesn't seem to have a function to retrieve the caption gains.
My questions are:
How can we use quantmod to retrieve capital gains?
The adjustOHLC() used by quantmod's getSymbols seems to use only the div data. Does the capital gains data need to be included in the adjustment?
Related
I have a python script for extracting some content. It works by loading urls from a csv file I have and outputing it in a csv. The content is such that some of it has a div class that has some unformatted text. Trying to scrap that is proving difficult. How can I tweak my code to capture that. The unformatted text is not in all the webpages so I have added an error handling statement.
Also is there a way I can have the unformatted text in the same column as Content rather than having it on its own column?
urls = ['https://www.studypool.com/discuss/18233577/obtain-a-copy-of-the-financial-statements-for-a-publicly-traded-company-then-complete-a-ratio-analysis','https://www.studypool.com/discuss/18898929/financial-accounting-questions-multiple-choice-about-the-chapter-cash-amp-investments',
'https://www.studypool.com/discuss/18237517/compare-forms-of-fundamental-and-technical-analyses'
]
def transform(url):
r = requests.get(str(url))
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.find('h1',{'class':"question-title"})
content = soup.find('div',{'class':'user-generated-description'})
textbox = soup.find('div', {'class':'unformatted-text-box'})
try:
textbox = textbox.find('a',{'rel':'unformatted-text-box'}).text.strip()
except:
textbox = ''
row = {'Title':title.text,
'Content':content.text,
'Textbox':textbox}
This is one way of achieving your goal:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
from tqdm import tqdm
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}
urls = ['https://www.studypool.com/discuss/18233577/obtain-a-copy-of-the-financial-statements-for-a-publicly-traded-company-then-complete-a-ratio-analysis','https://www.studypool.com/discuss/18898929/financial-accounting-questions-multiple-choice-about-the-chapter-cash-amp-investments',
'https://www.studypool.com/discuss/18237517/compare-forms-of-fundamental-and-technical-analyses'
]
big_list = []
s = requests.Session()
s.headers.update(headers)
for url in tqdm(urls):
r = s.get(url)
soup = bs(r.text, 'html.parser')
title = soup.select_one('h1.question-title').get_text(strip=True)
content = soup.select_one('div.user-generated-description').text.strip()
try:
textbox = soup.select_one('div.unformatted-text-box').text.strip()
except Exception as e:
textbox = 'not specified'
big_list.append((title, content + '\n' + textbox))
df = pd.DataFrame(big_list, columns = ['Title', 'Content'])
df.to_csv('saved_data.csv')
print(df)
Result printed in terminal:
Title Content
0 University of Illinois at Chicago Accurate Reporting of Social Media Use Discussion This is an assignment on ratio analysis. You need to obtain a copy of the financial statements for a publicly traded company. Then choose as many ratios as possible. (I suggest choosing 4-5, the professor's requirement is at least three) I will send you the specific requirements as an attachment.\nFinancial Statement Analysis\nNew Focus Consulting, 2014\nChapter 13\nRatios and Trend Analysis\nChapter 13\nHorizontal Analysis: source, value investing basics\nChapter 13\nCalculate Income Statement 2013 vertical\namounts and 2011 to 2013 horizontal\namounts.\nChapter 13\nCurrent Ratio: Used to determine a companyβs\nability to repay short-term debts.\nCurrent Assets\nCurrent Liabilities\nChapter 13\nQuick Ratio: Addressed liquidity by using cash\nand current assets that can be most quickly\nconverted to cash(quick assets).\nQuick Assets\nCurrent Liabilities\nChapter 13\nInventory Turnover Ratio: Number of times the inventory of\na company is sold and replaced over a specified period\nof time.\nCost of Goods Sold\nAverage Inventory at Cost\nChapter 13\nAccounts Receivable Turnover Ratio: Calculates\nhow quickly a company turns it credit sales into\ncash.\nCredit Sales\nAverage Accounts Receivable\nChapter 13\nAverage Collection Period Ratio: The\naverage number of days it takes for a\ncompany to collect its accounts receivable.\nAvg. Accounts Receivable\n(Sales/360)\nChapter 13\nDebt to Equity Ratio: Calculates the amount\nof debt as a percentage of equity. Some\nanalysts will use total liabilities as debt.\nTotal Debt\nTotal Equity\nChapter 13\nGross Profit Margin Ratio: Determines the\nprofitability of a company through direct\nexpenses. Used to evaluate efficiency of\noperations.\nSales β Cost of Goods Sold\nSales\nChapter 13\nOperating Margin Ratio: Determines the\nprofitability percentage from a companyβs\noperations.\nOperating Income\nSales\nChapter 13\nNet Profit Margin Ratio: Determines the profit\nof a company after it meets the obligations\nfor a specific period.\nNet Profit\nSales\nChapter 13\nReturn on Equity Ratio: Indicates the return\nearned by the owners(investors) for a\nperiod.\nNet Profit\nAverage Owners Equity\nChapter 13\nEarnings Per Share Ratio: The theoretical\nearnings per each outstanding share.\nNet Income β Preferred Dividends\nAverage Number of Common\nShares Outstanding\nChapter 13\nThe prior ratios were some examples of\nratios and analysis. There are a number\nmore. Some not presented were ratios\nusing assets as a denominator. In my\nopinion, they are less telling than other\nratios.\nNew Focus Consulting\nFinancial Statement & Ratio Assignment\nObtain a copy of the financial statements for a publicly traded company.\nSelect three of the ratios presented in class or from Financial Statement Analysis\nand show the calculations for your selected company.\nCALCULATE FOR AT LEAST THE LAST THRE YEARS. ONE OF THE\nYEARS MUST BE DURING THE YEAR ENDED IN 2018.\nRemember, ratios are most relevant when compared to a companies' own\nhistorical, industry or competitors trends. For the above calculations, what\nstory do they tell? Provide an explanation for each of the three ratios presented.\nThe assignment will be at least two pages, not more than four pages.\nNote: Apple Inc, Samsung or Tesla are not allowed to be used for this assignment.\nNew Focus Consulting\n2007\nNew Focus Consulting\nFinancial Indicators & Ratios\nUsed to understand trends of a company. Most useful when compared to\na company's historical information or industry average.\nAccounts Receivable Turnover: Net credit sales over average accounts receivable. Measures\nhow quickly customers pay their bills.\nCapitalization Rate: Calculated as net income over owners investment, and\n(Cap Rate) reflects the rate of return a property will produce on an\ninvestment.\nCash Debt Coverage Ratio: Net cash from operating activities over total liabilities.\nMeasures a company's ability to repay its liabilities from cash\ngenerated from operations without liquidating assets.\nCost/Income Ratio: Total expenses divided by total expenses.\nCurrent Ratio: Current assets over current liabilities. Used by lending\ninstitutions to determine a company's ability to repay\nshort-term debts.\nDebt Coverage Ratio: Net income of an investment over the debt service of the\ninvestment.\nDebt to Equity Ratio: Total debt(longterm and shortterm) over total equity. Lending\ninstitutions will usuall be concerned with a companies\nDebt to Equity ratio over .5 to .75.\nDividend Yield Ratio: Annual dividends over current market share price of stock.\nLong Term Debt to Equity Ratio: Long term debt over owner's equity. In general, a zero to .3\nNew Focus a\nConsulting\nratio is considered\nrelatively low debt exposure.\n2006\nOperating Ratio: Operating revenues over operating expenses. When\ncompared to other periods or industry averages, helps\nmeasure a company's operating efficiency.\nPrice/Earnings Ratio: Current price of a stock divided by actual earning per share.\n(P/E Ratio)\nReturn on Investment: Net Income divided by net book value(total assets minus\n(ROI) intangible assets and liabilities).\nNew Focus Consulting\n2006\n\nPurchase answer to see full\nattachment
1 Financial Accounting Cash & Investments Multiple Choice Questions 1)The following information regarding the cash activities of Roves Ltd. for the month of April 20x5 is given below:Cash balance per books, April 12522πΆππ βππππππ£ππππ’πππππ΄ππππ53427πΆππ βπππ¦ππππ‘π ππππππ’πππππ΄ππππ38371πππΉπβπππ’ππ ππππππ’π π‘πππππ π βππ€ππππ‘βππππππ π‘ππ‘πππππ‘1580π΅ππππ πππ£ππππβπππππ 578πΌπππππππππππ‘βππππππππππππππππ‘πππππππ‘βπππππ‘βπππ΄ππππ,π€βππ‘ππ π‘βπππππ’π π‘πππππ βπππππππππππππππ ππ‘π΄ππππ30π‘β?ππππππ‘πππ:π.364 b. 17578π. 15420 d. 159982)ππππππΏπ‘π.ππ’ππβππ ππ1085π βππππ πππππ₯π‘πΏπ‘π.πππ½π’ππ¦1,20π₯5ππ‘ 15.80 per share. On December 31, 20x6, the market value had increased to 12.30.πππΉππππ’πππ¦28,20π₯7,π‘βππ βππππ πππππ₯π‘πΏπ‘π.π€ππππ πππ,πππππππππΏπ‘π.ππππππ‘ππππΊππππππΉππππΏπΌππ£ππ π‘ππππ‘π ππ 7143.Assuming that the investment in the shares of Next Ltd. is classified as FVTPL, how much were the Next Ltd. shares sold for on February 28, 20x7?Select one:a. 10000π. 6202 c. 20488π. 242863)A company purchased shares costing 103344ππ’πππππ‘βππ¦πππ.πβππ ππ βππππ πππππππ π πππππππ πΉππππΏ.π΄π‘π‘βπππππππ‘βππ¦πππ,π‘βππππππππ¦ππππππ£ππ 6831 in dividends from these shares. At year-end, the fair value of the shares is 122955.πβππ‘ππ π‘βππππ‘ππππππ‘πππ‘βπππ‘ππ‘πππππ‘πππΆπππππβπππ ππ£ππΌπππππππππ‘βππ¦πππ?ππππππ‘πππ:π. 6831 revenue in profit and loss and 0ππππ‘βππππππππβπππ ππ£ππππππππ. 0 revenue in profit and loss and 26442ππππ‘βππππππππβπππ ππ£ππππππππ. 26442 revenue in profit and loss and 0ππππ‘βππππππππβπππ ππ£ππππππππ. 6831 revenue in profit and loss and 19611ππππ‘βππππππππβπππ ππ£πππππππ4)ππππππΏπ‘π.ππ’ππβππ ππ988π βππππ πππππ₯π‘πΏπ‘π.πππ½π’ππ¦1,20π₯5ππ‘ 18.85 per share. On December 31, 20x5, the market value of the Next shares was 10.35ππππππ·πππππππ31,20π₯6,π‘βπππππππ‘π£πππ’πβπππππππππ πππ‘π 13.13. On February 28, 20x7, the shares of Next Ltd. were sold for 22.79.πβππ‘ππ π‘βπππππππππππ‘βππΌππ£ππ π‘ππππ‘πππππ₯π‘πΏπ‘π.πππππ’ππ‘ππ‘π½πππ’πππ¦1,20π₯7?ππππππ‘πππ:π. 12972 b. 18624π. 22517 d. 102265)ππππππΏπ‘π.ππ’ππβππ ππ985π βππππ πππππ₯π‘πΏπ‘π.πππ½π’ππ¦1,20π₯5ππ‘ 16.89 per share. On December 31, 20x5, the market value of the Next shares was 9.86ππππππ·πππππππ31,20π₯6,π‘βπππππππ‘π£πππ’πβπππππππππ πππ‘π 13.51. On February 28, 20x7, the shares of Next Ltd. were sold for24.63.π΄π π π’πππππ‘βππ‘π‘βππππ£ππ π‘ππππ‘πππ‘βππ βππππ πππππ₯π‘πΏπ‘π.ππ ππππ π πππππππ πΉππππΏ,π€βππβπππ‘βπππππππ€ππππ€ππ’ππππππππ‘πππ‘βππΉππππ’πππ¦28,20π₯7πππ’πππππππ‘ππ¦?ππππππ‘πππ:π.π·ππππ‘πΊππππππΉππππΏπΌππ£ππ π‘ππππ‘π 10953 b. Credit OCI - Gain on FVTOCI Investments 10953π.πΆπππππ‘π
ππ‘ππππππΈπππππππ 7624 d. Credit Gain on FVTPL Investments $10953\nnot specified
2 Rasmussen College Compare Forms of Fundamental and Technical Analysis Presentation You have just completed your first training for the new class of interns at your employer, Bank of Wealth Investment Brokers. Part of your role as the new Portfolio Analyst is to train the new research interns on all of the facets of investing. You have now been asked to conduct another training on the purpose of fundamental and technical analyses with examples and explanations of equations.\nYou will need to develop a PowerPoint presentation that explains the differences and similarities of fundamental and technical analyses. Include in your presentation a few examples of equations used for company analysis such as ROE, EPS, PE ratio. A PowerPoint presentation will provide brief and clear information on the required subject. Often, bullet points are utilized in a PowerPoint presentation; however, since interns will be expected to know and understand the material thoroughly, your presentation should be more detailed and offer supporting evidence, including a reference list. Be sure to use the Notes section under each slide to add information. Here is a link to information about adding speaker notes.\n\nThe presentation should give the interns enough information to understand the similarities and differences of fundamental and technical analyses. Be sure to use audience-specific language and tone in the presentation. Remember, you are writing this presentation for the interns; however, the Portfolio Manager may attend.\nnot specified
There is a documentation for backtesting in R in GitHub(https://timtrice.github.io/backtesting-strategies/).
I have a query in two lines of code mentioned in this document (https://timtrice.github.io/backtesting-strategies/using-quantstrat.html#settings-and-variables).
First line
Sys.setenv(TZ = "UTC")
Second line
currency('USD')
As you can see, the first line sets - system time to the US and the second line - sets the currency in which trading is occurring to the US. I am an Indian Trader and my job is to do back-testing with equity data for Indian companies. I use quantstrat and quantmod packages along with its dependencies. The data is downloaded from Yahoo Finance through R platform.
What is the argument should an Indian trader pass to both these
functions(Sys.setenv and currency)???. The currency of Indian market
is INR(Indian Nation Rupees) and the time of India is GMT+5:30
I have tried to pass the argument "GMT+5:30" to Sys.setenv function and it turned back an error. But when i tried to pass GMT, there was no error. But Indian timing is GMT+5:30.
I found the answer. For determining the time zone, type OlsonNames() in R. You will get a comprehensive list of timezones. Among that, please choose the specific one according to your timezone. So for me(Indian trader), it would be Sys.getenv("Asia/Kolkata") For the currency, please set it as currency("INR") . I thank Ilya Kipnis - for helping in arriving at solution.
This is more of a methodological (rather than a programming) issue, yet it feels SO is the right place for it. Following the ups and downs after Yahoo changed its defaults in May 2017 for fetching daily data (discussed on https://github.com/joshuaulrich/quantmod/issues/174, http://blog.fosstrading.com/2017/06/yahoo-finance-alternatives.html and also on SO Why Open,High,Low prices are wrong when using quantmod?) I am probably not the only one not 100% certain which data to use in a backtesting procedure and whether quantmod getSymbols.yahoo and adjustOHLC still provide the relevant data for quality backtesting.
Quantmod 0.4.11 also includes AlphaVantage as (adjusted stock) data provider, but I am not familiar with their reliability.
How to prepare the (stock and index) data obtained from getSymbols calls? Which data ((stock & dividends) adjusted or unadjusted) should be used? Which transformations do you use? The adjustOHLC function also contains a bug, as it is not split adjusted (easily seen on AAPL by calling
getSymbols(AAPL)
chart_Series(adjustOHLC(AAPL))
and observing a jump in 2014.
You should always use adjusted prices. Most of the time when data provider doesn't have adjusted prices then usually provider's close prices are adjusted. There is no point doing backtests on a raw close prices data. I've once made a mistake by downloading close prices instead of adjusted and at the end of backtesting, my strategy told me that among all S&P composites Master Card was the worst performer. After looking at the MA chart it was obvious why.
Beacuse of a split on January 22, 2014 my data had a single return over -90%! In conclusion raw close data for backtesting might give you utterly false results.
How to deal with splits
Divide every price before a split by split ratio. For example Master Card had 1:10 split ratio so you should divide every price before 21.01.2014 by 10. It's very easy to find splits in a data, you just have to look for returns around or below -50%.
Dividends
Subtract from every price before dividend day dividend amount. To find dividends days you need dividends calendar, it's impossible to find them by yourself.
What's the exchange suffix for German and Australian stocks for GoogleFinance API? For London stocks, it's .L (e.g. VOD.L). Just wonder what's the suffix for Germany and Aussie?
I tried something like .DE for German but it didn't work..(that's the exchange suffix for Yahoofinance anyway)
btw, below is my code to call GoogleFinance API with R
ticker <- "VOD.L"
a <- getSymbols(ticker, src="google",
from = as.Date("2010-01-01"), to = as.Date("2017-05-16"))
Here in Australia, our main exchange is the Australian Securities Exchange (ASX).
Personally, when I query Google Finance manually (i.e. through the web interface), I write my queries as ASX:WOW, like so. Note that some vendors treat this differently. E.g. Yahoo Finance prefers the WOW.AX convention (I believe Bloomberg does also, from memory).
Example for Germany (Software AG): ETR:SOW or FRA:SOW (ETR refers to the Xetra electronic exchange, where a large majority of the volume is nowadays traded. It is also the exchange that is most commonly used for reference data. FRA, on the other hand, refers to the "manual" trading floor. The main reason why you might sometimes want to use FRA is because it has longer trading hours than ETR. See here for more details.)
Example for Australia (Australia and New Zealand Banking Group): ASX:ANZ
I retrieved xml file from a site using the code:
library (XML)
abstract <- xmlParse(file = 'http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?querytext=%28systematic%20review%20OR%20systematic%20literature%20review%20AND%20text%20mining%20techniques%29&pys=2009&&hc=1000', isURL = T)
the returned xml looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<totalfound>40420</totalfound>
<totalsearched>3735435</totalsearched>
<document>
<rank>1</rank>
<title><![CDATA[Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics]]></title>
<authors><![CDATA[Ghose, A.; Ipeirotis, P.G.]]></authors>
<affiliations><![CDATA[Dept. of Inf., Oper., & Manage. Sci., New York Univ., New York, NY, USA]]></affiliations>
<controlledterms>
<term><![CDATA[Internet]]></term>
<term><![CDATA[data mining]]></term>
<term><![CDATA[electronic commerce]]></term>
<term><![CDATA[pattern classification]]></term>
</controlledterms>
<thesaurusterms>
<term><![CDATA[Communities]]></term>
<term><![CDATA[Economics]]></term>
<term><![CDATA[History]]></term>
<term><![CDATA[Marketing and sales]]></term>
<term><![CDATA[Measurement]]></term>
</thesaurusterms>
<pubtitle><![CDATA[Knowledge and Data Engineering, IEEE Transactions on]]></pubtitle>
<punumber><![CDATA[69]]></punumber>
<pubtype><![CDATA[Journals & Magazines]]></pubtype>
<publisher><![CDATA[IEEE]]></publisher>
<volume><![CDATA[23]]></volume>
<issue><![CDATA[10]]></issue>
<py><![CDATA[2011]]></py>
<spage><![CDATA[1498]]></spage>
<epage><![CDATA[1512]]></epage>
<abstract><![CDATA[With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we reexamine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes such as their perceived usefulness. Our approach explores multiple aspects of review text, such as subjectivity levels, various measures of readability and extent of spelling errors to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences are negatively associated with product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are rated more informative (or helpful) by other users. By using Random Forest-based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. We examine the relative importance of the three broad feature categories: βreviewer-relatedβ features, βreview subjectivityβ features, and βreview readabilityβ features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates eco- - nometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their helpfulness and economic impact.]]></abstract>
<issn><![CDATA[1041-4347]]></issn>
<htmlFlag><![CDATA[1]]></htmlFlag>
<arnumber><![CDATA[5590249]]></arnumber>
<doi><![CDATA[10.1109/TKDE.2010.188]]></doi>
<publicationId><![CDATA[5590249]]></publicationId>
<partnum><![CDATA[5590249]]></partnum>
<mdurl><![CDATA[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5590249&contentType=Journals+%26+Magazines]]></mdurl>
<pdf><![CDATA[http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5590249]]></pdf>
</document>
I want to extract title and match with author. I used XpathSApply and getNode on "//title" and "//authors" using:
getNodeSet(abstract, "//title")
getNodeSet(abstract, "//authors")
titlenodes <- xpathSApply(abstract, "//title")
then I discovererd some documents are without title. So if I extracted the separately, it will be impossible to match the title to its corresponding author. I need a way to detect which document has no title and pick onlu author for such documents returning NA for its title.
Consider importing all of the XML content into dataframe off the parent node, document. In this way, you can see which rows have missing titles and/or authors.
xmldf <- xmlToDataFrame(nodes = getNodeSet(abstract, "//document"))
# subset data frame of only title and author (to see NAs)
titleauthorsdf <- xmldf[, c("title", "authors")]
# character vector of authors with no titles
notitleauthorslist <- c(xmldf$authors[is.na(xmldf$title)])
If all you want is a list of authors where there is not title, you can do it this way:
xpathSApply(abstract,"//document[not(title)]/authors", xmlValue)
# [1] "Armstrong, R.; Baillie, C.; Cumming-Potvin, W." "Stede, M."
# [3] "Government Documents" "Piotrowski, M."
# ...