rtweet giving error in rbind when collecting large numbers of tweets - r

I'm using the rtweet package in R to pull tweets for data analysis.
When I run the following line of code requesting 18,000 tweets, everything works fine:
t <- search_tweets("at", n=18000, lang='en', geocode='-25.609139,134.361949,3500km', since='2017-08-01', type='recent', retryonratelimit=FALSE)
But when I try to extend this to 100,000 tweets I get an error message
t <- search_tweets("at", n=100000, lang='en', geocode='-25.609139,134.361949,3500km', since='2017-08-01', type='recent', retryonratelimit=TRUE)
Finished collecting tweets!
Error in rbind(deparse.level, ...) :
invalid list argument: all variables should have the same length
Why is this occurring and how do I solve this? Thanks

I suggest updating to the dev version of rtweet. It fixed this issue for me.
devtools::install_github("mkearney/rtweet")

Related

Quantmod/getOptionChain Error in .Date(Exp/86400) : could not find function ".Date"

I'm getting an error when running getOptionChain from quantmod package.
The code should get Option chain data and subset into a new list that contains only the puts.
The error I get is: Error in .Date(Exp/86400) : could not find function ".Date"
Same code, sometimes runs without error. If I shorten the list of Symbols, there's no error, but the error as far as I know is not related to a specific symbol, because I made to run successfully. Seems random but frequent error.
All symbols are weekly expirations and the desired output is the next weekly expiration, so from my understanding, there's no need to specify a exp date.
library(quantmod)
library(xts)
Symbols<-c ("AA","AAL","AAOI","AAPL","ABBV","ABC","ABNB","ABT","ACAD","ACB","ACN","ADBE","ADI","ADM","ADP",
"ADSK","AEO","AFL","AFRM","AG","AGNC","AHT","AIG","AKAM","ALGN","AMAT","AMBA","AMC","AMD","AMGN",
"AMPX","AMRN","AMRS","AMZN","ANET","ANF","ANY","APA","APO","APPH","APPS","APRN","APT","AR","ARVL")
Options.20221118 <- lapply(Symbols, getOptionChain)
names(Options.20221118) <- Symbols
only_puts_list <- lapply(Options.20221118, function(x) x$puts)
After upgrading to R 4.2.2 the issue is fixed.

How to fix "unused argument" error message in R studio?

I tried to run my code like I usually do, and I got an "unused argument" error message. I have previously run the code multiple times and everything worked perfectly fine, this is the first time I have gotten an error message (I haven't changed the code). The only thing I've done different is I cleared the workspace at the end of my previous session (though I have no idea if this would actually affect it?).
Below is the code:
pacman::p_load(
rio, # importing data
here, # relative file pathways
janitor, # data cleaning and tables
lubridate, # working with dates
epikit, # age_categories() function
tidyverse, # data management and visualization
skimr,
psych,
reshape2, #for reshaping dataset
dplyr,
miscFuncs,
foreign, #read data formats
rcompanion, # group means
eeptools,
plyr)
mesh_dat <- import(here("R", "BTmeshdata.xlsx"))
The error message:
Error in here("R", "BTmeshdata.xlsx") :
unused argument ("BTmeshdata.xlsx")
The issue seems to be in how the dataset is imported because I have the same issue with importing a dataset from a different project.
.here and my folder "R" are located in my Documents folder.
Thanks!

Problems parsing StreamR JSON data

I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!

twitteR package date range issue in R

So I am trying to find tweets based on date range with this code:
tweets <- searchTwitter(c("Alzheimer"), n=500, lang="en",
since="2011-03-01", until="2011-03-02")
and I get the warning message
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit
= retryOnRateLimit, :
500 tweets were requested but the API can only return 0
BUT I don't get this warning message with the code
tweets <- searchTwitter(c("Alzheimer"), n=500, lang="en", since="2011-08-01")
I've read many posts previously about twitter not allowing a date range past a few days...is this still the case currently?? I'm new to coding so any help is greatly appreciated.
MrFlick is right in his comment, the twitter API only returns tweets for the past few days, as to why you are not getting the warning on the second command, my guess is because its working. I tried this command from my terminal and got back 500 tweets.

tm_map has parallel::mclapply error in R 3.0.1 on Mac

I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)
I am trying to use tm_map from the tm library. But when I execute the this code
library(tm)
data('crude')
tm_map(crude, stemDocument)
I get this error:
Warning message:
In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
Does anyone know a solution for this?
I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:
stemDocument(crude[[1]])
For me, I got an error:
Error in loadNamespace(name) : there is no package called ‘SnowballC’
So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.
I just ran into this. It took me a bit of digging but I found out what was happening.
I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'
Running this produced the error
In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
> getOption("mc.cores", 2L)
[1] 2
>
Aha moment! Tell the 'tm_map' call to only use one core!
> rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
Error in match.fun(FUN) : object 'asPlainTextDocument' not found
> rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
Warning message:
In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
>
So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel!
I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!
So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.
I found an answer to this that was successful for me in this question:
Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.
So, your code would look like this
library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)
I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.
I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.
Below code gives same error message mentioned in this thread
companyNewsCorpus <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus,
removeWords, stopwords("english"))
But if I convert this in below, it works without issues.
a <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english"))
I ran into the same problem in tm using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:
In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected
I was creating a corpus after downloading Twitter data.
Charles Copley's solution worked for me as well.
I used: tm_map(*filename*, stemDocument, lazy = TRUE) after creating my corpus and then tm worked correctly.
I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.
None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:
# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {
n <- ceiling(length(y)/1000)
s <- 1
e <- 1000
for (i in 1:n) {
x <- tm_map(x, content_transformer(removeWords), y[s:e])
s <- s + 1000
e <- e + 1000
}
x
}
This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!
I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map() function
Warning message: In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
Installing and loading package SnowballC resolved the problem completely. Hope this helps.
Slightly related to this question but what fixed for me the
Error in library(SnowballC) : there is no package called ‘SnowballC’ was to execute R as administrator in Windows 10 and retry installing, this time it worked

Resources