Given stocks in S&P500, how can I find which sector each stock belongs to, e.g. financial, energy ...., using R package, or other sources?
The term "sector" is, by itself, an ambiguous term. What one data provider calls "consumer services" may be called "restaurants" by another. That said, TTR provides a function called stockSymbols that returns some information including Sector, from NASDAQ, for ~6400 NMS stocks.
library(TTR)
ss <- stockSymbols()
#Fetching AMEX symbols...
#Fetching NASDAQ symbols...
#Fetching NYSE symbols...
head(ss)
# Symbol Name LastSale MarketCap IPOyear Sector Industry Exchange
#1 AA-P Alcoa Inc. 92.300 0 NA Capital Goods Metal Fabrications AMEX
#2 AAU Almaden Minerals, Ltd. 1.620 97228060 NA Basic Industries Precious Metals AMEX
#3 ACU Acme United Corporation. 12.984 40798351 1988 Capital Goods Industrial Machinery/Components AMEX
#4 ACY AeroCentury Corp. 20.280 31297252 NA Technology Diversified Commercial Services AMEX
#5 ADGE American DG Energy Inc. 1.720 83404061 NA Energy Electric Utilities: Central AMEX
#6 ADK Adcare Health Systems Inc 5.800 85018494 NA Health Care Hospital/Nursing Management AMEX
If you just want stocks that are in the S&P 500, you can cheat and use the holdings of SPY (or there are tons of places that you can find the holdings of the S&P 500, including the Standard & Poors website)
#install.packages("qmao", repos="http://r-forge.r-project.org")
library(qmao)
spyh <- getHoldings("SPY", auto.assign=FALSE)
head(ss[ss$Symbol %in% rownames(spyh), ])
# Symbol Name LastSale MarketCap IPOyear Sector
#455 AAPL Apple Inc. 452.97 425179837530 1980 Technology
#490 ADBE Adobe Systems Incorporated 44.02 22095230291 1986 Technology
#493 ADI Analog Devices, Inc. 46.79 14317018779 NA Technology
#495 ADP Automatic Data Processing, Inc. 70.03 33980125863 NA Technology
#500 ADSK Autodesk, Inc. 39.75 8896050000 NA Technology
#535 AKAM Akamai Technologies, Inc. 46.70 8333728621 1999 Miscellaneous
# Industry Exchange
#455 Computer Manufacturing NASDAQ
#490 Computer Software: Prepackaged Software NASDAQ
#493 Semiconductors NASDAQ
#495 EDP Services NASDAQ
#500 Computer Software: Prepackaged Software NASDAQ
#535 Business Services NASDAQ
Related
I am trying to get stock symbols with these functions (both failed)
TTR::stockSymbols("AMEX")
Error in symbols[, sort.by] : incorrect number of dimensions
tidyquant::tq_exchange("AMEX")
Getting data...
Error: Can't rename columns that don't exist.
x Column Symbol doesn't exist.
Do these functions work for you? What fixes do you know to correct them? Thank you!
I get the same error. It seems like there has been some changes in the website from which these packages get the information. There is an open issue about this.
In the same thread it is mentioned that you can get the information from the underlying JSON which returns this information.
tmp <- jsonlite::fromJSON('https://api.nasdaq.com/api/screener/stocks?tableonly=true&limit=25&offset=0&exchange=AMEX&download=true')
head(tmp$data$rows)
# symbol name
#1 AAMC Altisource Asset Management Corp Com
#2 AAU Almaden Minerals Ltd. Common Shares
#3 ACU Acme United Corporation. Common Stock
#4 ACY AeroCentury Corp. Common Stock
#5 AE Adams Resources & Energy Inc. Common Stock
#6 AEF Aberdeen Emerging Markets Equity Income Fund Inc. Common Stock
# lastsale netchange pctchange volume marketCap country ipoyear
#1 $24.60 -0.3595 -1.44% 15183 40595215.00 United States
#2 $0.846 0.0359 4.432% 2272603 101984125.00 Canada 2015
#3 $33.82 0.61 1.837% 7869 112922038.00 United States 1988
#4 $11.76 2.01 20.615% 739133 18179596.00 United States
#5 $28.31 0.11 0.39% 6217 120099060.00 United States
#6 $9.10 0.09 0.999% 40775 461841180.00 United States
# industry sector
#1 Real Estate Finance
#2 Precious Metals Basic Industries
#3 Industrial Machinery/Components Capital Goods
#4 Diversified Commercial Services Technology
#5 Oil Refining/Marketing Energy
#6
# url
#1 /market-activity/stocks/aamc
#2 /market-activity/stocks/aau
#3 /market-activity/stocks/acu
#4 /market-activity/stocks/acy
#5 /market-activity/stocks/ae
#6 /market-activity/stocks/aef
Can someone help me in providing the instructions on how to open the pagination link in aspx form using mechanicalsoup , I updated the __EVENTTARGET and __EVENTARGUMENT , but still it is opening the current page , instead of opening up the next page.
form = browser.select_form('#form1')
form["__EVENTTARGET"] = "ctl00$ContentPlaceHolder1$gvData"
form["__EVENTARGUMENT"] = "Page$2"
print(form.form.find("input", {"name": "__EVENTTARGET"}).attrs)
print(form.form.find("input", {"name": "__EVENTARGUMENT"}).attrs)
new_response = browser.submit_selected()
print(new_response.content)
This script goes from page 1 to 9 and gets the information:
import requests
from bs4 import BeautifulSoup
url = 'https://www.bseindia.com/corporates/List_Scrips.aspx'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
data = {}
for i in soup.select('input'):
data[i['name']] = i.get('value', '')
for page in range(1, 10): # <--- increase the number of pages here
print('Page {}...'.format(page))
print('-' * 80)
soup = BeautifulSoup(requests.post(url, headers=headers, data=data).content, 'html.parser')
for tr in soup.select('tr.TTHeader ~ tr:not(:has(td[colspan]))'):
print(tr.get_text(strip=True, separator=' '))
data = {}
for i in soup.select('input'):
data[i['name']] = i.get('value', '')
data['__EVENTTARGET'] = 'ctl00$ContentPlaceHolder1$gvData'
data['__EVENTARGUMENT'] = 'Page${}'.format(page+1)
del data['ctl00$ContentPlaceHolder1$btnSubmit']
Prints:
Page 1...
--------------------------------------------------------------------------------
500002 ABB ABB India Limited Active B 2.00 INE117A01022 Heavy Electrical Equipment Equity
500003 AEGISLOG AEGIS LOGISTICS LTD. Active A 1.00 INE208C01025 Oil Marketing & Distribution Equity
500004 TPAEC TORRENT POWER AEC LTD. Delisted B 10.00 INE424A01014 Equity
500005 AKARLAMIN AKAR LAMINATORS LTD. Delisted XD 10.00 INE984C01013 Iron & Steel Products Equity
500006 ALPHADR ALPHA DRUG INDIA LTD. Delisted B 10.00 INE256B01026 Equity
500008 AMARAJABAT AMARA RAJA BATTERIES LTD. Active A 1.00 INE885A01032 Auto Parts & Equipment Equity
500009 AMBALALSA AMBALAL SARABHAI ENTERPRISES LTD. Active X 10.00 INE432A01017 Pharmaceuticals Equity
500010 HDFC HOUSING DEVELOPMENT FINANCE CORP.LTD. Active A 2.00 INE001A01036 Housing Finance Equity
500011 AMRTMIL-BDM AMRUT INDUSTRIES LTD. Delisted Z 10.00 NA Equity
500012 ANDHRAPET ANDHRA PETROCHEMICALS LTD. Active X 10.00 INE714B01016 Commodity Chemicals Equity
500013 ANSALAPI ANSAL PROPERTIES & INFRASTRUCTURE LTD. Active T 5.00 INE436A01026 Realty Equity
500014 UTIQUE Utique Enterprises Ltd Active X 10.00 INE096A01010 Finance (including NBFCs) Equity
500015 ICICIDM ICICI LTD. Delisted B 10.00 INE005A01011 Equity
500016 ARUNAHTEL ARUNA HOTELS LTD. Active XT 10.00 INE957C01019 Hotels Equity
500018 ARPOLDM ARPOLDM Delisted B 10.00 INE035A01018 Equity
500019 BOR BANK OF RAJASTHAN LTD. Delisted B 10.00 INE320A01014 Banks Equity
500020 BOMDYEING BOMBAY DYEING & MFG.CO.LTD. Active A 2.00 INE032A01023 Textiles Equity
500021 ASINCOF ASINCOF Delisted Z 10.00 NA Equity
500023 ASIANHOTNR Asian Hotels (North) Limited Active B 10.00 INE363A01022 Hotels Equity
500024 ASSAMCO Assam Company (India) Limited Delisted T 1.00 INE442A01024 Tea & Coffee Equity
500025 ASSAMBR ASSAMBROOK LTD.-$ Delisted X 10.00 INE353C01011 Tea & Coffee Equity
500026 ATSHIND ATASH INDUSTRIES LTD. Delisted Z 10.00 NA Equity
500027 ATUL ATUL LTD. Active A 10.00 INE100A01010 Specialty Chemicals Equity
500028 ATVPR ATV PROJECTS INDIA LTD. Active XT 10.00 INE447A01015 Construction & Engineering Equity
500029 AUTOLITIND AUTOLITE (INDIA) LTD. Active B 10.00 INE448A01013 Auto Parts & Equipment Equity
Page 2...
--------------------------------------------------------------------------------
500030 AUTORIDFIN AUTORIDERS FINANCE LTD. Suspended T 10.00 INE450A01019 Finance (including NBFCs) Equity
500031 BAJAJELEC BAJAJ ELECTRICALS LTD.-$ Active A 2.00 INE193E01025 Household Appliances Equity
500032 BAJAJHIND Bajaj Hindusthan Sugar Limited Active B 1.00 INE306A01021 Sugar Equity
... and so on.
I have a function to loop through weblinks and extract the relevant tables.
List.Of.Tabs <- map(pages2, ~ {
name <- .x[1]
link <- .x[2]
Sys.sleep(2)
webpage <- read_html(link)
tbls <- html_nodes(webpage, "table")
tbls_ls <- html_table(tbls, fill = TRUE)
pos1 <- possibly(function(tbls) bind_rows(tbls) %>%
filter_all(any_vars(. %in% c("Singapore", "SGP"))) %>%
mutate(name = name)
, otherwise = NA)
pos1(tbls_ls)
})
However, on some occassion, I got this message:
Error in matrix(NA_character_, nrow = n, ncol = maxp) :
invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In max(p) : no non-missing arguments to max; returning -Inf
2: In matrix(NA_character_, nrow = n, ncol = maxp) :
NAs introduced by coercion to integer range
I figured it was due to some lists in pages2 having more than 2 elements.
print(pages2)
[[48]]
[1] "DICK'S SPORTING GOODS, INC. "
[2] "https://www.sec.gov/Archives/edgar/data/1089063/000108906319000017/dks-exhibit_21x20190202.htm"
[[49]]
[1] "DIEBOLD NIXDORF, Inc "
[2] "https://www.sec.gov/Archives/edgar/data/28823/000002882319000069/dbd12312018ex-211.htm"
[[50]]
[1] "DIGITAL REALTY TRUST, INC. "
[2] "https://www.sec.gov/Archives/edgar/data/1297996/000129799619000032/dlr10kex211_2018ss1.htm"
[3] "https://www.sec.gov/Archives/edgar/data/1297996/000129799619000032/dlr10kex212_2018ss1.htm"
How can I amend my code such that the error doesn't appear?
I cannot replicate the error messages with the provided links. But I can revise your code. You have lists with a company name and URL(s). Since you are specifying link <- .x[2], you use the first URL in each list. If you arrange a data frame, you can overcome this issue. I created a data frame based on the list object you provided. One column has company names, and the other contains URLs. map2_df() uses a company name and an URL in each row and scrape the tables. In this way, you can make sure that you access to all URLs you have.
library(rvest)
library(tidyverse)
page2 <- list(c("DICK'S SPORTING GOODS, INC.", "https://www.sec.gov/Archives/edgar/data/1089063/000108906319000017/dks-exhibit_21x20190202.htm"),
c("DIEBOLD NIXDORF, Inc", "https://www.sec.gov/Archives/edgar/data/28823/000002882319000069/dbd12312018ex-211.htm"),
c("DIGITAL REALTY TRUST, INC.", "https://www.sec.gov/Archives/edgar/data/1297996/000129799619000032/dlr10kex211_2018ss1.htm", "https://www.sec.gov/Archives/edgar/data/1297996/000129799619000032/dlr10kex212_2018ss1.htm"))
# Create a data frame based on page2
map_dfr(.x = page2,
.f = function(x){tibble(company = x[1],
url = grep(x = x, pattern = "https", value = TRUE))}) -> mytemp
# For each pair of company and url (for each row with company and url),
# scrape tables, bind lists, add a new column with a company name, and
# get rows that have Singapore or SGP
map2_df(.x = mytemp$url,
.y = mytemp$company,
.f = function(x, y){read_html(x) %>%
html_nodes("table") %>%
html_table(fill = TRUE) %>%
bind_rows() %>%
mutate(name = y)}) %>%
filter_all(any_vars(. %in% c("Singapore", "SGP")))
name X1 X2 X3
<chr> <chr> <chr> <chr>
1 DIEBOLD NIXDORF, Inc Aisino Wincor Engineering Pte. Ltd. Singapore 43.56%(71)
2 DIEBOLD NIXDORF, Inc Diebold Nixdorf Manufacturing Pte. Ltd. Singapore 94.72%(66)
3 DIEBOLD NIXDORF, Inc Diebold Nixdorf Singapore Pte. Ltd. Singapore 94.72%(51)
4 DIGITAL REALTY TRUST, INC. Digital Investment Management Pte. Ltd. NA Singapore
5 DIGITAL REALTY TRUST, INC. Digital Japan 1 Pte. Ltd. NA Singapore
6 DIGITAL REALTY TRUST, INC. Digital Japan 2 Pte. Ltd. NA Singapore
7 DIGITAL REALTY TRUST, INC. Digital Japan Holding Pte. Ltd. NA Singapore
8 DIGITAL REALTY TRUST, INC. Digital Singapore 1 Pte. Ltd. NA Singapore
9 DIGITAL REALTY TRUST, INC. Digital Singapore 2 Pte. Ltd. NA Singapore
10 DIGITAL REALTY TRUST, INC. Digital Singapore Jurong East Pte. Ltd. NA Singapore
11 DIGITAL REALTY TRUST, INC. Digital Investment Management Pte. Ltd. NA Singapore
12 DIGITAL REALTY TRUST, INC. Digital Japan 1 Pte. Ltd. NA Singapore
13 DIGITAL REALTY TRUST, INC. Digital Japan 2 Pte. Ltd. NA Singapore
14 DIGITAL REALTY TRUST, INC. Digital Japan Holding Pte. Ltd. NA Singapore
15 DIGITAL REALTY TRUST, INC. Digital Singapore 1 Pte. Ltd. NA Singapore
16 DIGITAL REALTY TRUST, INC. Digital Singapore 2 Pte. Ltd. NA Singapore
17 DIGITAL REALTY TRUST, INC. Digital Singapore Jurong East Pte. Ltd. NA Singapore
BONUS
Given your comment, I think the following would work. Please note that I do not have your real data. So this idea may/may not work. This is the best I can do.
map2_dfr(.x = mytemp$url,
.y = mytemp$company,
.f = function(x, y){read_html(x) %>%
html_nodes("table") %>%
html_table(fill = TRUE) %>%
map(.f = ~mutate_all(., .funs = list(~as.character(.)))) %>%
bind_rows() %>%
mutate(name = y)}) %>%
filter_all(any_vars(. %in% c("Singapore", "SGP")))
I have been trying to calculate the quarter over quarter change in shares with no luck. I have a data.table with approx 15millions rows.
What I need to calculate is the change in absolute values quarter by quarter according to the Holder and the stock they own.
My data table looks like this:
stock Holder Quarter Shares
1: GOOGLE Advance Capital Management, Inc. 2015 Q3 5800
2: GOOGLE Advance Capital Management, Inc. 2015 Q4 9000
3: GOOGLE Advance Capital Management, Inc. 2016 Q1 7000
4: GOOGLE Advance Capital Management, Inc. 2016 Q2 7560
5: GOOGLE Advest, Inc. 2015 Q3 12000
6: GOOGLE Advest, Inc. 2015 Q3 13450
I'm trying to use data.table functions, using
df[, qoq := c(NA, diff(Shares)), by = "Holder,stock,Quarter"]
However, I get only NA.
I was expecting something like this:
stock Holder Quarter Shares qoq
1: GOOGLE Advance Capital Management, Inc. 2015 Q3 5800 NA
2: GOOGLE Advance Capital Management, Inc. 2015 Q4 9000 4000
3: GOOGLE Advance Capital Management, Inc. 2016 Q1 7000 -2000
4: GOOGLE Advance Capital Management, Inc. 2016 Q2 7560 560
5: GOOGLE Advest, Inc. 2015 Q3 12000 NA
6: GOOGLE Advest, Inc. 2015 Q3 13450 1450
After that, I need to calculate the variance of this result, again, by Holder and stock. Is there any general function to calculate statistics by grouping several columns? I tried aggregate but is taking yearsssss...
aggregate(REPORTED_HOLDING~Quarter+FILER_NAME+STOCK_NAME, FUN=sum, data=df)
With dplyr, assuming df is you data.frame:
df %>%
group_by(stock, Holder) %>%
mutate(qoq = Shares - lag(Shares)) %>%
summarise(qvar = var(qoq, na.rm = T))
I have a field of data containing company names, such as
company <- c("Microsoft", "Apple", "Cloudera", "Ford")
> company
Company
1 Microsoft
2 Apple
3 Cloudera
4 Ford
and so on.
The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols:
require(tm.plugin.webmining)
results <- WebCorpus(YahooFinanceSource("MSFT"))
I'm missing the in-between step. How can I query ticket symbols programmatically based on company names?
I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from this web file: ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt. I say rough because for some reason my calls with httr::content(httr::GET(...)) don't work every time - I think it has to do with the type of web address (ftp://) but I don't do that much web scraping so I can't really explain this. It seemed to work better on my Linux than my Mac, but that could be irrelevant. Regardless, here's what I got: Thanks to #thelatemail's comment, this seems to be working much smoother:
library(quantmod) ## optional
symbolData <- read.csv(
"ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt",
sep="|")
##
> head(symbolData,10)
Symbol Security.Name Market.Category Test.Issue Financial.Status Round.Lot.Size
1 AAIT iShares MSCI All Country Asia Information Technology Index Fund G N N 100
2 AAL American Airlines Group, Inc. - Common Stock Q N N 100
3 AAME Atlantic American Corporation - Common Stock G N N 100
4 AAOI Applied Optoelectronics, Inc. - Common Stock G N N 100
5 AAON AAON, Inc. - Common Stock Q N N 100
6 AAPL Apple Inc. - Common Stock Q N N 100
7 AAVL Avalanche Biotechnologies, Inc. - Common Stock G N N 100
8 AAWW Atlas Air Worldwide Holdings - Common Stock Q N N 100
9 AAXJ iShares MSCI All Country Asia ex Japan Index Fund G N N 100
10 ABAC Aoxin Tianli Group, Inc. - Common Shares S N N 100
Edit:
As per #GSee's suggestion, a (presumably) more robust way to obtain the source data is with the stockSymbols() function in the package TTR:
> symbolData2 <- stockSymbols(exchange="NASDAQ")
Fetching NASDAQ symbols...
> ##
> head(symbolData2)
Symbol Name LastSale MarketCap IPOyear Sector
1 AAIT iShares MSCI All Country Asia Information Technology Index Fun 34.556 6911200 NA <NA>
2 AAL American Airlines Group, Inc. 40.500 29164164453 NA Transportation
3 AAME Atlantic American Corporation 4.020 83238028 NA Finance
4 AAOI Applied Optoelectronics, Inc. 20.510 303653114 2013 Technology
5 AAON AAON, Inc. 18.420 1013324613 NA Capital Goods
6 AAPL Apple Inc. 103.300 618546661100 1980 Technology
Industry Exchange
1 <NA> NASDAQ
2 Air Freight/Delivery Services NASDAQ
3 Life Insurance NASDAQ
4 Semiconductors NASDAQ
5 Industrial Machinery/Components NASDAQ
6 Computer Manufacturing NASDAQ
I don't know if you just wanted to get ticker symbols from names, but if you are also looking for actual share price information you could do something like this:
namedStock <- function(name="Microsoft",
start=Sys.Date()-365,
end=Sys.Date()-1){
ticker <- symbolData[agrep(name,symbolData[,2]),1]
getSymbols(
Symbols=ticker,
src="yahoo",
env=.GlobalEnv,
from=start,to=end)
}
##
## an xts object named MSFT will be added to
## the global environment, no need to assign
## to an object
namedStock()
##
> str(MSFT)
An ‘xts’ object on 2013-09-03/2014-08-29 containing:
Data: num [1:251, 1:6] 31.8 31.4 31.1 31.3 31.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2014-09-02 21:51:22.792"
> chartSeries(MSFT)
So like I said, this isn't the cleanest solution but hopefully it helps you out. Also note that my data source was pulling companies traded on NASDAQ (which is most major companies), but you could easily combine this with other sources.