Retrieving S-1 filings from EDGAR based on SIC using R - r

I am trying to analyse the S-1 filings of all Special Purpose Acquisition Companies (SIC=6770) but I am having trouble finding a way of getting this data from SEC EDGAR in an efficient way. I have looked into the "edgar" and "edgarWebR" R packages but am yet to find a way of extracting the S-1 filings for such a large amount of companies based only on their SIC code. I think if I could get the CIK codes of all the companies I'm looking for I could work with the existing packages to get the information I need.
If anyone has experience working with edgar what package did you find useful? How could I get the CIK codes for an entire industry?

This isn't a complete answer, but it's too long for a comment and at least will get you started.
With the caveat that I have no familiarity with R, you can start by using the EDGAR API. For example, to get an alphabetical list of all 237 Form S-1 filings made YTD by filers with a 6770 SIC, you can use this link:
https://www.sec.gov/cgi-bin/srch-edgar?text=FORM-TYPE=S-1+and+ASSIGNED-SIC=6770+&first=2021&last=2021
What you do with this list once you get it is a different issue. I know what I would do with it using python, but for R you'll need the help of someone more familiar with R-based tools.
Good luck - the task ahead isn't easy...

Related

Hydrating Tweets

everyone!
I'm doing research using COVID 19 Tweets. I've downloaded some COVID 19-sourced tweets from https:/zenodo.org/record/3970127#.Xy12rChKiUk. However, the data only includes the Twitter ID. Does anyone know how to hydrate the data in RStudio and get the JSON file with the text? It seems I can use the Twarc
package, but I'd like to do the whole process in the R environment, not in Python.
I realize this is a tad late but here goes: Twarc's package description includes a mention of a similar package for R--which would answer OP's question.
"For R there is academictwitteR. Unlike twarc, it focuses solely on querying the Twitter Academic Research Product Track v2 API endpoint. Data gathered in twarc can be imported into R for analysis as a dataframe if you export the data into CSV using twarc-csv."
Here is the source.

R -Eye Tracking packages for SMI's BeGaze users

Currently, we are working on an eye-tracking study and we are not quite satisfied with the analysis option of SMI’s BeGaze. Hence, I would like to ask you know good way to extract data from BeGaze, which can be processed by a handy R-package, which is still working under R 3.6.1 – not like ETRAN.
It would be great to do add AOIs manually, do heat maps, analyze saccades, fixation times, and ratios between AOIs.
We came across eyetrackingR, but we are still struggling with extracting BeGaze's data in a processable way.
Any help, tutorial, hint, etc. is much appreciated.
David

How to find Sentiment Score and Magnitude (polarity) in R without using Google API?

I am trying to do sentiment analysis for my data. Data contains the consumer survey open-ended question answers in multiple columns. I want sentiment score and magnitude for each column in R. Currently I have tried using Google API, I have created the account given all required keys in gl_nlp() from "googleLanguageR" Package in R. But it throws an error on billing is not enables wherein I have checked in the Google Cloud account billing is enables.
So, how can I find sentiment score and magnitude (Polarity) in R without using Google API and with the same accuracy?
Here is a good approach using tidytext package developed by Julia Silge and David Robinson. This package follows the tidy approach in tidyverse package. The linked book mentions the existence of:
The three general-purpose lexicons are
AFINN from Finn Årup Nielsen,
bing from Bing Liu and collaborators,
and nrc from Saif Mohammad and Peter Turney.
As it is also mentioned the get_sentiments() function allows you "to get specific sentiment lexicons without the columns that are not used in that lexicon."
Hope this answered your question if not let me know!
If you already have the dataset extracted from the google api, then just apply syuzhet package. Documentation on it can be found here: https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html
Simply write: data.frame(get_sentiment(df[,col#])) and that should give you numerical sentiment scores.
I agree with Tito Sanz - the tidytext way is the best and most transparent.
The way it employs 'tidy methods' is a good habit to get in to.
You can also use the 'QDap' package - if you are using english language. It will do what you want with polarity, but its difficult to justify (my opinion)
I also used the Google API - its a dog to set up but if you are using large datasets - there are restrictions on passing more than 1 million 'characters' per 24 hour period. Also once you exceed the credit they charge you a lot.
PM me if you have more specific quastions on Sentiment analysis

R Package to Analyse Eye Tracking data

I was wondering if anyone out there has found a nice package for R to analyse eye-tracking data?
I came across eyetrackR but as far as I can tell there is no English support documentation available:
http://read.psych.uni-potsdam.de/pmr2/index.php?option=com_content&view=article&id=43:eyetrackr&catid=13:r-playground&Itemid=15
I will move onto another freeware that handles eye-tracking data if I need to but was really hoping there would be something accessible in R.
Ideas?
Cheers.
It would help if you could explain which kind of analyses you are intending to do. There are many different approaches depending on the research question and the research field. Many approaches involve the detection of fixations and saccades as a first step. An R package that can be used for fixation detection is called saccades and is available on CRAN. See also the Github page of the package for examples and screenshots.
A new eye-tracking analysis package for R (eyetrackingR) was recently released. It provides a variety of methods that handle data preparation/cleaning, visualization, and analysis.
Here's a list of several dozen instances of researcher contributed code (FOSS) for post-acquisition summarization and analysis of eye-movement data. You may be able to find something to suit your needs there.
List is provided in case anyone stumbling across this thread may find it useful.
https://github.com/davebraze/FDBeye/wiki/Researcher-Contributed-Eye-Tracking-Tools

Basic analysis of large CSV with FF package in R

I have been messing around with R for the last year and now want to get a little deeper. I want to learn more about the ff and big data packages because have been trouble getting through some of the documentation.
I like to learn by doing, so lets say I have a huge CSV called data.csv and its 300 mbs. It has 5 headers Url, PR, tweets, likes, age. I want to deduplicate the list based on URLs. Then I want to plot PR and likes on a scatter plot to see if there is any correlation. How would I go about doing that basic analysis?
I always get confused with the chunking of the big data processes and how you cant load everything in at once.
What are come common problems you have ran into using the ff package or big data?
Is there another package that works better?
Basically any information to get started using a lot of data in R would be useful.
Thanks!
Nico

Resources