everyone!
I'm doing research using COVID 19 Tweets. I've downloaded some COVID 19-sourced tweets from https:/zenodo.org/record/3970127#.Xy12rChKiUk. However, the data only includes the Twitter ID. Does anyone know how to hydrate the data in RStudio and get the JSON file with the text? It seems I can use the Twarc
package, but I'd like to do the whole process in the R environment, not in Python.
I realize this is a tad late but here goes: Twarc's package description includes a mention of a similar package for R--which would answer OP's question.
"For R there is academictwitteR. Unlike twarc, it focuses solely on querying the Twitter Academic Research Product Track v2 API endpoint. Data gathered in twarc can be imported into R for analysis as a dataframe if you export the data into CSV using twarc-csv."
Here is the source.
Related
I am trying to analyse the S-1 filings of all Special Purpose Acquisition Companies (SIC=6770) but I am having trouble finding a way of getting this data from SEC EDGAR in an efficient way. I have looked into the "edgar" and "edgarWebR" R packages but am yet to find a way of extracting the S-1 filings for such a large amount of companies based only on their SIC code. I think if I could get the CIK codes of all the companies I'm looking for I could work with the existing packages to get the information I need.
If anyone has experience working with edgar what package did you find useful? How could I get the CIK codes for an entire industry?
This isn't a complete answer, but it's too long for a comment and at least will get you started.
With the caveat that I have no familiarity with R, you can start by using the EDGAR API. For example, to get an alphabetical list of all 237 Form S-1 filings made YTD by filers with a 6770 SIC, you can use this link:
https://www.sec.gov/cgi-bin/srch-edgar?text=FORM-TYPE=S-1+and+ASSIGNED-SIC=6770+&first=2021&last=2021
What you do with this list once you get it is a different issue. I know what I would do with it using python, but for R you'll need the help of someone more familiar with R-based tools.
Good luck - the task ahead isn't easy...
I have large tsv-files containing the tweet-IDs of millions of tweets which I would like to content analyze in R. How do I get the meta data of the tweets (message, user, date etc) into a dataset without looking up every individual tweet?
I know this is possible in Python, is it also possible to do it in R since I do not know Python well. Is there a R package for this purpose?
If you use the rTweet library (which is usually preferred over twitteR, as the latter is no longer maintained), you can use the lookup_statuses function to get the metadata for large batches of Tweets.
I am new to R and have just started to use it. I am currently experimenting with the quantmod, rugarch and rmgarch packages.
In particular, I'm implementing the last package to make a multivariate portfolio analysis for the case of the european markets. In this sense, I need to download the 3-month german treasury bills, in order to use them as risk free rate. However, as far as I known, I can´t download the the mentioned data serie from Yahoo, Google or FDRA databases, so I have already downloaded them from investing.com and I want to load them in R.
The fact here is, my data is different from the ones downloaded by the getsymbols () function of yahoo, because in this case I only have 2 columns, the date column and the closing price column. To sump up, the question arises here is, is there any way to load this type of data in R for rmgarch purposes??
thanks in advance
Not sure if this is the issue, but this is how you might go about getting the data from a csv file.
data <- read.csv(file="file/path/data.csv")
head(data) # Take a look at your data
# Do this if you want the data only replacing ColumnName with the proper name
data_only <- data$ColumnName
It looks like the input data for rugarch needs to be an xts vector. So, you might want to take a look at this. You might also want to take a look at ?read.csv.
I am trying to do sentiment analysis for my data. Data contains the consumer survey open-ended question answers in multiple columns. I want sentiment score and magnitude for each column in R. Currently I have tried using Google API, I have created the account given all required keys in gl_nlp() from "googleLanguageR" Package in R. But it throws an error on billing is not enables wherein I have checked in the Google Cloud account billing is enables.
So, how can I find sentiment score and magnitude (Polarity) in R without using Google API and with the same accuracy?
Here is a good approach using tidytext package developed by Julia Silge and David Robinson. This package follows the tidy approach in tidyverse package. The linked book mentions the existence of:
The three general-purpose lexicons are
AFINN from Finn Årup Nielsen,
bing from Bing Liu and collaborators,
and nrc from Saif Mohammad and Peter Turney.
As it is also mentioned the get_sentiments() function allows you "to get specific sentiment lexicons without the columns that are not used in that lexicon."
Hope this answered your question if not let me know!
If you already have the dataset extracted from the google api, then just apply syuzhet package. Documentation on it can be found here: https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html
Simply write: data.frame(get_sentiment(df[,col#])) and that should give you numerical sentiment scores.
I agree with Tito Sanz - the tidytext way is the best and most transparent.
The way it employs 'tidy methods' is a good habit to get in to.
You can also use the 'QDap' package - if you are using english language. It will do what you want with polarity, but its difficult to justify (my opinion)
I also used the Google API - its a dog to set up but if you are using large datasets - there are restrictions on passing more than 1 million 'characters' per 24 hour period. Also once you exceed the credit they charge you a lot.
PM me if you have more specific quastions on Sentiment analysis
I have been messing around with R for the last year and now want to get a little deeper. I want to learn more about the ff and big data packages because have been trouble getting through some of the documentation.
I like to learn by doing, so lets say I have a huge CSV called data.csv and its 300 mbs. It has 5 headers Url, PR, tweets, likes, age. I want to deduplicate the list based on URLs. Then I want to plot PR and likes on a scatter plot to see if there is any correlation. How would I go about doing that basic analysis?
I always get confused with the chunking of the big data processes and how you cant load everything in at once.
What are come common problems you have ran into using the ff package or big data?
Is there another package that works better?
Basically any information to get started using a lot of data in R would be useful.
Thanks!
Nico