Programmatically look up a ticker symbol in R - r

I have a field of data containing company names, such as
company <- c("Microsoft", "Apple", "Cloudera", "Ford")
> company
Company
1 Microsoft
2 Apple
3 Cloudera
4 Ford
and so on.
The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols:
require(tm.plugin.webmining)
results <- WebCorpus(YahooFinanceSource("MSFT"))
I'm missing the in-between step. How can I query ticket symbols programmatically based on company names?

I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from this web file: ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt. I say rough because for some reason my calls with httr::content(httr::GET(...)) don't work every time - I think it has to do with the type of web address (ftp://) but I don't do that much web scraping so I can't really explain this. It seemed to work better on my Linux than my Mac, but that could be irrelevant. Regardless, here's what I got: Thanks to #thelatemail's comment, this seems to be working much smoother:
library(quantmod) ## optional
symbolData <- read.csv(
"ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt",
sep="|")
##
> head(symbolData,10)
Symbol Security.Name Market.Category Test.Issue Financial.Status Round.Lot.Size
1 AAIT iShares MSCI All Country Asia Information Technology Index Fund G N N 100
2 AAL American Airlines Group, Inc. - Common Stock Q N N 100
3 AAME Atlantic American Corporation - Common Stock G N N 100
4 AAOI Applied Optoelectronics, Inc. - Common Stock G N N 100
5 AAON AAON, Inc. - Common Stock Q N N 100
6 AAPL Apple Inc. - Common Stock Q N N 100
7 AAVL Avalanche Biotechnologies, Inc. - Common Stock G N N 100
8 AAWW Atlas Air Worldwide Holdings - Common Stock Q N N 100
9 AAXJ iShares MSCI All Country Asia ex Japan Index Fund G N N 100
10 ABAC Aoxin Tianli Group, Inc. - Common Shares S N N 100
Edit:
As per #GSee's suggestion, a (presumably) more robust way to obtain the source data is with the stockSymbols() function in the package TTR:
> symbolData2 <- stockSymbols(exchange="NASDAQ")
Fetching NASDAQ symbols...
> ##
> head(symbolData2)
Symbol Name LastSale MarketCap IPOyear Sector
1 AAIT iShares MSCI All Country Asia Information Technology Index Fun 34.556 6911200 NA <NA>
2 AAL American Airlines Group, Inc. 40.500 29164164453 NA Transportation
3 AAME Atlantic American Corporation 4.020 83238028 NA Finance
4 AAOI Applied Optoelectronics, Inc. 20.510 303653114 2013 Technology
5 AAON AAON, Inc. 18.420 1013324613 NA Capital Goods
6 AAPL Apple Inc. 103.300 618546661100 1980 Technology
Industry Exchange
1 <NA> NASDAQ
2 Air Freight/Delivery Services NASDAQ
3 Life Insurance NASDAQ
4 Semiconductors NASDAQ
5 Industrial Machinery/Components NASDAQ
6 Computer Manufacturing NASDAQ
I don't know if you just wanted to get ticker symbols from names, but if you are also looking for actual share price information you could do something like this:
namedStock <- function(name="Microsoft",
start=Sys.Date()-365,
end=Sys.Date()-1){
ticker <- symbolData[agrep(name,symbolData[,2]),1]
getSymbols(
Symbols=ticker,
src="yahoo",
env=.GlobalEnv,
from=start,to=end)
}
##
## an xts object named MSFT will be added to
## the global environment, no need to assign
## to an object
namedStock()
##
> str(MSFT)
An ‘xts’ object on 2013-09-03/2014-08-29 containing:
Data: num [1:251, 1:6] 31.8 31.4 31.1 31.3 31.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2014-09-02 21:51:22.792"
> chartSeries(MSFT)
So like I said, this isn't the cleanest solution but hopefully it helps you out. Also note that my data source was pulling companies traded on NASDAQ (which is most major companies), but you could easily combine this with other sources.

Related

Using spacyr for named entity recognition - inconsistent results

I plan to use the spacyr R library to perform named entity recognition across several news articles (spacyr is an R wrapper for the Python spaCy package). My goal is to identify partners for network analysis automatically. However, spacyr is not recognising common entities as expected. Here is sample code to illustrate my issue:
library(quanteda)
library(spacyr)
text <- data.frame(doc_id = c(1:5),
sentence = c("Brightmark LLC, the global waste solutions provider, and Florida Keys National Marine Sanctuary (FKNMS), today announced a new plastic recycling partnership that will reduce landfill waste and amplify concerns about ocean plastics.",
"Brightmark is launching a nationwide site search for U.S. locations suitable for its next set of advanced recycling facilities, which will convert hundreds of thousands of tons of post-consumer plastics into new products, including fuels, wax, and other products.",
"Brightmark will be constructing the facility in partnership with the NSW government, as part of its commitment to drive economic growth and prosperity in regional NSW.",
"Macon-Bibb County, the Macon-Bibb County Industrial Authority, and Brightmark have mutually agreed to end discussions around building a plastic recycling plant in Macon",
"Global petrochemical company SK Global Chemical and waste solutions provider Brightmark have signed a memorandum of understanding to create a partnership that aims to take the lead in the circular economy of plastic by construction of a commercial scale plastics renewal plant in South Korea"))
corpus <- corpus(text, text_field = "sentence")
spacy_initialize(model = "en_core_web_sm")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
I expect the company "Brightmark" to be recognised in all 5 sentences. However this is what I get:
entity
doc_id sentence_id entity entity_type
1 1 1 Florida_Keys_National_Marine_Sanctuary ORG
2 1 1 FKNMS ORG
3 2 1 U.S. GPE
4 3 1 NSW ORG
5 4 1 Macon_-_Bibb_County ORG
6 4 1 Brightmark ORG
7 4 1 Macon GPE
8 5 1 SK_Global_Chemical ORG
9 5 1 South_Korea GPE
"Brightmark" only appears as an ORG entity type in the 4th sentence (doc_id refers to sentence number). It should show up in all the sentences. The "NSW Government" does not appear at all.
I am still figuring out spaCy and spacyr. Perhaps someone can advise me why this is happening and what steps I should take to remedy this issue. Thanks in advance.
I changed the model and achieved better results:
spacy_initialize(model = "en_core_web_trf")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
entity
doc_id sentence_id entity entity_type
1 1 1 Brightmark_LLC ORG
2 1 1 Florida_Keys GPE
3 1 1 FKNMS ORG
4 2 1 Brightmark ORG
5 2 1 U.S. GPE
6 3 1 Brightmark ORG
7 3 1 NSW GPE
8 3 1 NSW GPE
9 4 1 Macon_-_Bibb_County GPE
10 4 1 the_Macon_-_Bibb_County_Industrial_Authority ORG
11 4 1 Brightmark ORG
12 4 1 Macon GPE
13 5 1 SK_Global_Chemical ORG
14 5 1 Brightmark ORG
15 5 1 South_Korea GPE
The only downside is that NSW Government and Florida Keys National Marine Sanctuary are not resolved. I also get this warning: UserWarning: User provided device_type of 'cuda', but CUDA is not available.

How to learn about possible variables in R [duplicate]

This question already has answers here:
List distinct values in a vector in R
(7 answers)
Closed 2 years ago.
In the nycflights13 package, how do I see all of the carriers in the carrier variable? I can pull up the list of variables from air_time to year, but I want to see the list of all the carriers. Please let me know!
The carriers are listed in the airlines data frame.
nycflights13::airlines
# # A tibble: 16 x 2
# carrier name
# <chr> <chr>
# 1 9E Endeavor Air Inc.
# 2 AA American Airlines Inc.
# 3 AS Alaska Airlines Inc.
# 4 B6 JetBlue Airways
# 5 DL Delta Air Lines Inc.
# 6 EV ExpressJet Airlines Inc.
# 7 F9 Frontier Airlines Inc.
# 8 FL AirTran Airways Corporation
# 9 HA Hawaiian Airlines Inc.
# 10 MQ Envoy Air
# 11 OO SkyWest Airlines Inc.
# 12 UA United Air Lines Inc.
# 13 US US Airways Inc.
# 14 VX Virgin America
# 15 WN Southwest Airlines Co.
# 16 YV Mesa Airlines Inc.
Or am I not understanding the question?
I presume you are looking at the 'flights' table in that package, since it contains a carrier variable.
For a list of the unique values in a variable, you could use base unique function:
unique(nycflights13::flights$carrier)
[1] "UA" "AA" "B6" "DL" "EV" "MQ" "US" "WN" "VX" "FL" "AS" "9E" "F9" "HA" "YV" "OO"
or table to count the number of appearances:
table(nycflights13::flights$carrier)
9E AA AS B6 DL EV F9 FL HA MQ OO UA US VX WN YV
18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 5162 12275 601

Couldn't get tq_exchange() or stockSymbols() to work

I am trying to get stock symbols with these functions (both failed)
TTR::stockSymbols("AMEX")
Error in symbols[, sort.by] : incorrect number of dimensions
tidyquant::tq_exchange("AMEX")
Getting data...
Error: Can't rename columns that don't exist.
x Column Symbol doesn't exist.
Do these functions work for you? What fixes do you know to correct them? Thank you!
I get the same error. It seems like there has been some changes in the website from which these packages get the information. There is an open issue about this.
In the same thread it is mentioned that you can get the information from the underlying JSON which returns this information.
tmp <- jsonlite::fromJSON('https://api.nasdaq.com/api/screener/stocks?tableonly=true&limit=25&offset=0&exchange=AMEX&download=true')
head(tmp$data$rows)
# symbol name
#1 AAMC Altisource Asset Management Corp Com
#2 AAU Almaden Minerals Ltd. Common Shares
#3 ACU Acme United Corporation. Common Stock
#4 ACY AeroCentury Corp. Common Stock
#5 AE Adams Resources & Energy Inc. Common Stock
#6 AEF Aberdeen Emerging Markets Equity Income Fund Inc. Common Stock
# lastsale netchange pctchange volume marketCap country ipoyear
#1 $24.60 -0.3595 -1.44% 15183 40595215.00 United States
#2 $0.846 0.0359 4.432% 2272603 101984125.00 Canada 2015
#3 $33.82 0.61 1.837% 7869 112922038.00 United States 1988
#4 $11.76 2.01 20.615% 739133 18179596.00 United States
#5 $28.31 0.11 0.39% 6217 120099060.00 United States
#6 $9.10 0.09 0.999% 40775 461841180.00 United States
# industry sector
#1 Real Estate Finance
#2 Precious Metals Basic Industries
#3 Industrial Machinery/Components Capital Goods
#4 Diversified Commercial Services Technology
#5 Oil Refining/Marketing Energy
#6
# url
#1 /market-activity/stocks/aamc
#2 /market-activity/stocks/aau
#3 /market-activity/stocks/acu
#4 /market-activity/stocks/acy
#5 /market-activity/stocks/ae
#6 /market-activity/stocks/aef

Splitting the rows that has "|" using separate() fn not splitted

my data looks like
> company
name category_list
11 1-4 All Entertainment|Games|Software
12 1.618 Technology Networking|Real Estate|Web Hosting
13 1-800-DENTIST Health and Wellness
14 1-800-DOCTORS Health and Wellness
15 1-800-PublicRelations, Inc. Internet Marketing|Media|Public Relations
i will have to split the category_list column based the values. when the values are pipe separated, the row should be split.
i tried the same using separate function but the column is not populated with any values
c1 <- company %>% separate(category_list,into=c("primary_Sector"), sep="|")
Actual output:
name primary_Sector
11 1-4 All
12 1.618 Technology
13 1-800-DENTIST
14 1-800-DOCTORS
15 1-800-PublicRelations, Inc.
Expected output
name category_list
11 1-4 All Entertainment
12 1-4 All Games
13 1-4 All Software
can someone tell me what is wrong?
tidyr::separate() does the column-wise separation, tidyr::separate_rows() does the row-wise separation:
library(tidyr)
read.table(
text="name;category_list
1-4 All;Entertainment|Games|Software
1.618 Technology;Networking|Real Estate|Web Hosting
1-800-DENTIST;Health and Wellness
1-800-DOCTORS;Health and Wellness
1-800-PublicRelations, Inc.;Internet Marketing|Media|Public Relations",
sep=";", header = TRUE, stringsAsFactors = FALSE
) %>%
separate_rows(category_list, sep = "\\|")
## name category_list
## 1 1-4 All Entertainment
## 2 1-4 All Games
## 3 1-4 All Software
## 4 1.618 Technology Networking
## 5 1.618 Technology Real Estate
## 6 1.618 Technology Web Hosting
## 7 1-800-DENTIST Health and Wellness
## 8 1-800-DOCTORS Health and Wellness
## 9 1-800-PublicRelations, Inc. Internet Marketing
## 10 1-800-PublicRelations, Inc. Media
## 11 1-800-PublicRelations, Inc. Public Relations

Find which sector a stock belongs to

Given stocks in S&P500, how can I find which sector each stock belongs to, e.g. financial, energy ...., using R package, or other sources?
The term "sector" is, by itself, an ambiguous term. What one data provider calls "consumer services" may be called "restaurants" by another. That said, TTR provides a function called stockSymbols that returns some information including Sector, from NASDAQ, for ~6400 NMS stocks.
library(TTR)
ss <- stockSymbols()
#Fetching AMEX symbols...
#Fetching NASDAQ symbols...
#Fetching NYSE symbols...
head(ss)
# Symbol Name LastSale MarketCap IPOyear Sector Industry Exchange
#1 AA-P Alcoa Inc. 92.300 0 NA Capital Goods Metal Fabrications AMEX
#2 AAU Almaden Minerals, Ltd. 1.620 97228060 NA Basic Industries Precious Metals AMEX
#3 ACU Acme United Corporation. 12.984 40798351 1988 Capital Goods Industrial Machinery/Components AMEX
#4 ACY AeroCentury Corp. 20.280 31297252 NA Technology Diversified Commercial Services AMEX
#5 ADGE American DG Energy Inc. 1.720 83404061 NA Energy Electric Utilities: Central AMEX
#6 ADK Adcare Health Systems Inc 5.800 85018494 NA Health Care Hospital/Nursing Management AMEX
If you just want stocks that are in the S&P 500, you can cheat and use the holdings of SPY (or there are tons of places that you can find the holdings of the S&P 500, including the Standard & Poors website)
#install.packages("qmao", repos="http://r-forge.r-project.org")
library(qmao)
spyh <- getHoldings("SPY", auto.assign=FALSE)
head(ss[ss$Symbol %in% rownames(spyh), ])
# Symbol Name LastSale MarketCap IPOyear Sector
#455 AAPL Apple Inc. 452.97 425179837530 1980 Technology
#490 ADBE Adobe Systems Incorporated 44.02 22095230291 1986 Technology
#493 ADI Analog Devices, Inc. 46.79 14317018779 NA Technology
#495 ADP Automatic Data Processing, Inc. 70.03 33980125863 NA Technology
#500 ADSK Autodesk, Inc. 39.75 8896050000 NA Technology
#535 AKAM Akamai Technologies, Inc. 46.70 8333728621 1999 Miscellaneous
# Industry Exchange
#455 Computer Manufacturing NASDAQ
#490 Computer Software: Prepackaged Software NASDAQ
#493 Semiconductors NASDAQ
#495 EDP Services NASDAQ
#500 Computer Software: Prepackaged Software NASDAQ
#535 Business Services NASDAQ

Resources