Couldn't get tq_exchange() or stockSymbols() to work - r

I am trying to get stock symbols with these functions (both failed)
TTR::stockSymbols("AMEX")
Error in symbols[, sort.by] : incorrect number of dimensions
tidyquant::tq_exchange("AMEX")
Getting data...
Error: Can't rename columns that don't exist.
x Column Symbol doesn't exist.
Do these functions work for you? What fixes do you know to correct them? Thank you!

I get the same error. It seems like there has been some changes in the website from which these packages get the information. There is an open issue about this.
In the same thread it is mentioned that you can get the information from the underlying JSON which returns this information.
tmp <- jsonlite::fromJSON('https://api.nasdaq.com/api/screener/stocks?tableonly=true&limit=25&offset=0&exchange=AMEX&download=true')
head(tmp$data$rows)
# symbol name
#1 AAMC Altisource Asset Management Corp Com
#2 AAU Almaden Minerals Ltd. Common Shares
#3 ACU Acme United Corporation. Common Stock
#4 ACY AeroCentury Corp. Common Stock
#5 AE Adams Resources & Energy Inc. Common Stock
#6 AEF Aberdeen Emerging Markets Equity Income Fund Inc. Common Stock
# lastsale netchange pctchange volume marketCap country ipoyear
#1 $24.60 -0.3595 -1.44% 15183 40595215.00 United States
#2 $0.846 0.0359 4.432% 2272603 101984125.00 Canada 2015
#3 $33.82 0.61 1.837% 7869 112922038.00 United States 1988
#4 $11.76 2.01 20.615% 739133 18179596.00 United States
#5 $28.31 0.11 0.39% 6217 120099060.00 United States
#6 $9.10 0.09 0.999% 40775 461841180.00 United States
# industry sector
#1 Real Estate Finance
#2 Precious Metals Basic Industries
#3 Industrial Machinery/Components Capital Goods
#4 Diversified Commercial Services Technology
#5 Oil Refining/Marketing Energy
#6
# url
#1 /market-activity/stocks/aamc
#2 /market-activity/stocks/aau
#3 /market-activity/stocks/acu
#4 /market-activity/stocks/acy
#5 /market-activity/stocks/ae
#6 /market-activity/stocks/aef

Related

Using spacyr for named entity recognition - inconsistent results

I plan to use the spacyr R library to perform named entity recognition across several news articles (spacyr is an R wrapper for the Python spaCy package). My goal is to identify partners for network analysis automatically. However, spacyr is not recognising common entities as expected. Here is sample code to illustrate my issue:
library(quanteda)
library(spacyr)
text <- data.frame(doc_id = c(1:5),
sentence = c("Brightmark LLC, the global waste solutions provider, and Florida Keys National Marine Sanctuary (FKNMS), today announced a new plastic recycling partnership that will reduce landfill waste and amplify concerns about ocean plastics.",
"Brightmark is launching a nationwide site search for U.S. locations suitable for its next set of advanced recycling facilities, which will convert hundreds of thousands of tons of post-consumer plastics into new products, including fuels, wax, and other products.",
"Brightmark will be constructing the facility in partnership with the NSW government, as part of its commitment to drive economic growth and prosperity in regional NSW.",
"Macon-Bibb County, the Macon-Bibb County Industrial Authority, and Brightmark have mutually agreed to end discussions around building a plastic recycling plant in Macon",
"Global petrochemical company SK Global Chemical and waste solutions provider Brightmark have signed a memorandum of understanding to create a partnership that aims to take the lead in the circular economy of plastic by construction of a commercial scale plastics renewal plant in South Korea"))
corpus <- corpus(text, text_field = "sentence")
spacy_initialize(model = "en_core_web_sm")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
I expect the company "Brightmark" to be recognised in all 5 sentences. However this is what I get:
entity
doc_id sentence_id entity entity_type
1 1 1 Florida_Keys_National_Marine_Sanctuary ORG
2 1 1 FKNMS ORG
3 2 1 U.S. GPE
4 3 1 NSW ORG
5 4 1 Macon_-_Bibb_County ORG
6 4 1 Brightmark ORG
7 4 1 Macon GPE
8 5 1 SK_Global_Chemical ORG
9 5 1 South_Korea GPE
"Brightmark" only appears as an ORG entity type in the 4th sentence (doc_id refers to sentence number). It should show up in all the sentences. The "NSW Government" does not appear at all.
I am still figuring out spaCy and spacyr. Perhaps someone can advise me why this is happening and what steps I should take to remedy this issue. Thanks in advance.
I changed the model and achieved better results:
spacy_initialize(model = "en_core_web_trf")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
entity
doc_id sentence_id entity entity_type
1 1 1 Brightmark_LLC ORG
2 1 1 Florida_Keys GPE
3 1 1 FKNMS ORG
4 2 1 Brightmark ORG
5 2 1 U.S. GPE
6 3 1 Brightmark ORG
7 3 1 NSW GPE
8 3 1 NSW GPE
9 4 1 Macon_-_Bibb_County GPE
10 4 1 the_Macon_-_Bibb_County_Industrial_Authority ORG
11 4 1 Brightmark ORG
12 4 1 Macon GPE
13 5 1 SK_Global_Chemical ORG
14 5 1 Brightmark ORG
15 5 1 South_Korea GPE
The only downside is that NSW Government and Florida Keys National Marine Sanctuary are not resolved. I also get this warning: UserWarning: User provided device_type of 'cuda', but CUDA is not available.

Need help pulling JSON data with RSocrata from a website API

I need help drafting code that pulls public data directly from a website that is in Socrata format. Here is a link:
https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w
There is an API endpoint:
https://data.cityofchicago.org/resource/xzkq-xp2w.json
After the data is uploaded, null values in the "Annual Salary" should be replaced with 50000.
We can use the RSocrata package
library(RSocrata)
url <- "https://data.cityofchicago.org/resource/xzkq-xp2w.json"
data <- RSocrata::read.socrata(url)
head(data)
# name job_titles department full_or_part_time salary_or_hourly annual_salary typical_hours hourly_rate
#1 AARON, JEFFERY M SERGEANT POLICE F Salary 111444 <NA> <NA>
#2 AARON, KARINA POLICE OFFICER (ASSIGNED AS DETECTIVE) POLICE F Salary 94122 <NA> <NA>
#3 AARON, KIMBERLEI R CHIEF CONTRACT EXPEDITER DAIS F Salary 118608 <NA> <NA>
#4 ABAD JR, VICENTE M CIVIL ENGINEER IV WATER MGMNT F Salary 117072 <NA> <NA>
#5 ABARCA, FRANCES J POLICE OFFICER POLICE F Salary 48078 <NA> <NA>
The following will replace the NAs in annual_salary with 50000.
data[is.na(data$annual_salary),"annual_salary"] <- 50000
However, if you'd like to do what it suggests on the city of Chicago website, you could consider multipling typical_hours with hourly_rate to estimate salary.
ind <- is.na(data$annual_salary)
data[ind,]$annual_salary <- as.numeric(data[ind,]$typical_hours) * as.numeric(data[ind,]$hourly_rate) * 52

Convert List to Tibble Plus Add Column With List Names

I'm working on a web scraping / mapping project where I've scraped address data from a restaurant website and I've stored the results as a list - in this example, called loc_list.
Question is, how best to convert these list items into a single data.frame / tibble (currently using bind_rows( )) but ALSO, in the new data.frame, have a column titled metro which corresponds to each list item name. In my example, the output would have 3 alpharettas, followed by 3 atlanta, then 1 buford.
loc_list
$alpharetta
# A tibble: 3 x 2
names address
<chr> <chr>
1 East Roswell US 2630 Holcomb Bridge Rd Alpharetta, GA 30022
2 Old Milton US 4305 Old Milton Parkway Ste 101 Alpharetta, GA 30022
3 Windward US 875 N Main Street Ste 306 Alpharetta, GA 30009
$atlanta
# A tibble: 3 x 2
names address
<chr> <chr>
1 Philips Arena US 100 Techwood Drive Atlanta, GA 30303
2 Virginia Highlands US 1006 N Highland Ave Atlanta, GA 30306
3 Perimeter US 1211 Ashford Crossing Atlanta, GA 30346
$buford
# A tibble: 1 x 2
names address
<chr> <chr>
1 Woodward US 3250 Woodward Crossing Blvd Buford, GA 30519
Targeted output:
names address metro
East Ros... US 2630... alpharetta
As alistaire pointed out bind_rows is enough with .id. Here is example data:
alpharetta <- tibble(names=c("East Roswell", "Old Milton"),
address = c("US 2630 Holcomb Bridge Rd Alpharetta, GA 30022", "4305 Old Milton Parkway Ste 101 Alpharetta, GA 30022"))
atlanta <- tibble(names=c("Philips Arena", "Virginia Highlands"),
address = c("US 100 Techwood Drive Atlanta, GA 30303", "US 1006 N Highland Ave Atlanta, GA 30306"))
loc_list <- list(alpharetta = alpharetta, atlanta = atlanta)
bind_rows(loc_list, .id="metro")
# A tibble: 4 x 3
metro names address
<chr> <chr> <chr>
1 alpharetta East Roswell US 2630 Holcomb Bridge Rd Alpharetta, GA 30022
2 alpharetta Old Milton 4305 Old Milton Parkway Ste 101 Alpharetta, GA 30022
3 atlanta Philips Arena US 100 Techwood Drive Atlanta, GA 30303
4 atlanta Virginia Highlands US 1006 N Highland Ave Atlanta, GA 30306

Programmatically look up a ticker symbol in R

I have a field of data containing company names, such as
company <- c("Microsoft", "Apple", "Cloudera", "Ford")
> company
Company
1 Microsoft
2 Apple
3 Cloudera
4 Ford
and so on.
The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols:
require(tm.plugin.webmining)
results <- WebCorpus(YahooFinanceSource("MSFT"))
I'm missing the in-between step. How can I query ticket symbols programmatically based on company names?
I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from this web file: ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt. I say rough because for some reason my calls with httr::content(httr::GET(...)) don't work every time - I think it has to do with the type of web address (ftp://) but I don't do that much web scraping so I can't really explain this. It seemed to work better on my Linux than my Mac, but that could be irrelevant. Regardless, here's what I got: Thanks to #thelatemail's comment, this seems to be working much smoother:
library(quantmod) ## optional
symbolData <- read.csv(
"ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt",
sep="|")
##
> head(symbolData,10)
Symbol Security.Name Market.Category Test.Issue Financial.Status Round.Lot.Size
1 AAIT iShares MSCI All Country Asia Information Technology Index Fund G N N 100
2 AAL American Airlines Group, Inc. - Common Stock Q N N 100
3 AAME Atlantic American Corporation - Common Stock G N N 100
4 AAOI Applied Optoelectronics, Inc. - Common Stock G N N 100
5 AAON AAON, Inc. - Common Stock Q N N 100
6 AAPL Apple Inc. - Common Stock Q N N 100
7 AAVL Avalanche Biotechnologies, Inc. - Common Stock G N N 100
8 AAWW Atlas Air Worldwide Holdings - Common Stock Q N N 100
9 AAXJ iShares MSCI All Country Asia ex Japan Index Fund G N N 100
10 ABAC Aoxin Tianli Group, Inc. - Common Shares S N N 100
Edit:
As per #GSee's suggestion, a (presumably) more robust way to obtain the source data is with the stockSymbols() function in the package TTR:
> symbolData2 <- stockSymbols(exchange="NASDAQ")
Fetching NASDAQ symbols...
> ##
> head(symbolData2)
Symbol Name LastSale MarketCap IPOyear Sector
1 AAIT iShares MSCI All Country Asia Information Technology Index Fun 34.556 6911200 NA <NA>
2 AAL American Airlines Group, Inc. 40.500 29164164453 NA Transportation
3 AAME Atlantic American Corporation 4.020 83238028 NA Finance
4 AAOI Applied Optoelectronics, Inc. 20.510 303653114 2013 Technology
5 AAON AAON, Inc. 18.420 1013324613 NA Capital Goods
6 AAPL Apple Inc. 103.300 618546661100 1980 Technology
Industry Exchange
1 <NA> NASDAQ
2 Air Freight/Delivery Services NASDAQ
3 Life Insurance NASDAQ
4 Semiconductors NASDAQ
5 Industrial Machinery/Components NASDAQ
6 Computer Manufacturing NASDAQ
I don't know if you just wanted to get ticker symbols from names, but if you are also looking for actual share price information you could do something like this:
namedStock <- function(name="Microsoft",
start=Sys.Date()-365,
end=Sys.Date()-1){
ticker <- symbolData[agrep(name,symbolData[,2]),1]
getSymbols(
Symbols=ticker,
src="yahoo",
env=.GlobalEnv,
from=start,to=end)
}
##
## an xts object named MSFT will be added to
## the global environment, no need to assign
## to an object
namedStock()
##
> str(MSFT)
An ‘xts’ object on 2013-09-03/2014-08-29 containing:
Data: num [1:251, 1:6] 31.8 31.4 31.1 31.3 31.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2014-09-02 21:51:22.792"
> chartSeries(MSFT)
So like I said, this isn't the cleanest solution but hopefully it helps you out. Also note that my data source was pulling companies traded on NASDAQ (which is most major companies), but you could easily combine this with other sources.

Find which sector a stock belongs to

Given stocks in S&P500, how can I find which sector each stock belongs to, e.g. financial, energy ...., using R package, or other sources?
The term "sector" is, by itself, an ambiguous term. What one data provider calls "consumer services" may be called "restaurants" by another. That said, TTR provides a function called stockSymbols that returns some information including Sector, from NASDAQ, for ~6400 NMS stocks.
library(TTR)
ss <- stockSymbols()
#Fetching AMEX symbols...
#Fetching NASDAQ symbols...
#Fetching NYSE symbols...
head(ss)
# Symbol Name LastSale MarketCap IPOyear Sector Industry Exchange
#1 AA-P Alcoa Inc. 92.300 0 NA Capital Goods Metal Fabrications AMEX
#2 AAU Almaden Minerals, Ltd. 1.620 97228060 NA Basic Industries Precious Metals AMEX
#3 ACU Acme United Corporation. 12.984 40798351 1988 Capital Goods Industrial Machinery/Components AMEX
#4 ACY AeroCentury Corp. 20.280 31297252 NA Technology Diversified Commercial Services AMEX
#5 ADGE American DG Energy Inc. 1.720 83404061 NA Energy Electric Utilities: Central AMEX
#6 ADK Adcare Health Systems Inc 5.800 85018494 NA Health Care Hospital/Nursing Management AMEX
If you just want stocks that are in the S&P 500, you can cheat and use the holdings of SPY (or there are tons of places that you can find the holdings of the S&P 500, including the Standard & Poors website)
#install.packages("qmao", repos="http://r-forge.r-project.org")
library(qmao)
spyh <- getHoldings("SPY", auto.assign=FALSE)
head(ss[ss$Symbol %in% rownames(spyh), ])
# Symbol Name LastSale MarketCap IPOyear Sector
#455 AAPL Apple Inc. 452.97 425179837530 1980 Technology
#490 ADBE Adobe Systems Incorporated 44.02 22095230291 1986 Technology
#493 ADI Analog Devices, Inc. 46.79 14317018779 NA Technology
#495 ADP Automatic Data Processing, Inc. 70.03 33980125863 NA Technology
#500 ADSK Autodesk, Inc. 39.75 8896050000 NA Technology
#535 AKAM Akamai Technologies, Inc. 46.70 8333728621 1999 Miscellaneous
# Industry Exchange
#455 Computer Manufacturing NASDAQ
#490 Computer Software: Prepackaged Software NASDAQ
#493 Semiconductors NASDAQ
#495 EDP Services NASDAQ
#500 Computer Software: Prepackaged Software NASDAQ
#535 Business Services NASDAQ

Resources