Error in str.default(tweets_text) : invalid multibyte string 53 - r

I'm able to get 500 tweets in to R however when converting to character with below code i am getting the Error in str.default(tweets_text) : invalid multibyte string 53
tweets_b <-searchTwitter('bahubali',lang="en",n=500,resultType="recent")
tweets_txt <-sapply(tweets_b,function(x) x$getText() )
str(tweets_txt)
Can someone help me out?

The error is because of the encoding, check your encoding by using the code
Encoding(tweets_txt)
If you see any UTF-8 encoding, you can overcome the error by converting the encoding to latin1, by using this code
Encoding(tweets_txt) <- "latin1"
After this the str() would work.

Thank you for your answer however I had figured the resolution long time back and implemented successfully.
FYI :
Here is the code i used .
bahubali_text <- sapply(bahubali_tweets, function(x) x$getText())
removing the non-ASCII chracters in corpus
b_convert_text <- sapply(bahubali_text,function(row) iconv(row, "latin1", "ASCII", sub=""))
thanks again
cheers

Related

I'm getting a unicode error that prevents my code from running

I think my problem is Turkish character error in R.
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) : path[1]="C:/Users/sample/OneDrive - Sa?l?k �r/Belgeler":The filename, directory name, or volume label syntax is incorrect inside batch
How can i fix it?
you could try check the encoding using
enc <- Encoding(path)
and then convert it to UTF-8
enc.path <- iconv(path, enc, "UTF-8")
and then use enc.path as the path
Seriously, I don know if this works, but you could try it (it's free!)

lesson containing non-ascii characters produces error when i try to run test in swirlify

It seems that swirlify can not handle non-ascii characters (like accented character). when trying to test or run the demo with test_lesson() or demo_lesson (), it causes a file read error.
1: In readLines(con) :
invalid input found on input connection '..../lesson.yaml'
2: In readLines(con) :
incomplete final line found on '..../lesson.yaml'
the error cames from the line
con <- file(input, encoding = "UTF-8")
of the yaml.load_file function
the solution is in R studio save the yaml file with save with encoding and choose UTF-8

Encoding in developing a R package

While I run devtools::check(), one warning appears:
checking data for non-ASCII characters ... WARNING
Warning: found non-ASCII string
'Tanaid<c3><a6>' in object 'data_m'
I did the following check
library(stringi)
stri_enc_mark("Tanaid<c3><a6>") which shows "[1] "ASCII""
and all(stri_enc_isutf8('Tanaid<c3><a6>' )) which shows "[1] TRUE"
UPDATE
I followed up the comment, and attempted to convert the string from native encoding to ASCII.
I did iconv("Tanaid\xc3\xa6>", "native", "UTF-8")
However, iconv does not take native encoding and reports Error in iconv("Tanaidæ>", "native", "UTF-8") :
unsupported conversion from 'native' to 'UTF-8'
iconv("Tanaid\xc3\xa6", "latin1", "ASCII") or iconv("Tanaid\xc3\xa6", "latin2", "ASCII") also does not yield the right string.
A better solution is stri_trans_general("Tanaid\xc3\xa6", "latin-ascii") which does the work.

Errors while converting from numbers to character R

In R, I tried to see the summary of my data after converting them from numbers to characters and this message revealed continuously:
Error in nchar(x, type = "w"): invalid multibyte string, element 18
Sounds like a system locale issue. See this blog post for an example that may be similar to your case use.
In a nutshell, try running this before the rest of your code:
Sys.setlocale("LC_ALL", "English")

CharToDate(x) Error in R

I have a function named currency. It takes a parameter A which is a currency like "EUR/USD" . Then I do this: n<-getSymbols(A,src = "oanda",from = "2016-01-01",to = Sys.Date(),auto.assign = FALSE)
The paradox is that the program was running fine 2 days ago.
The error message is: Error in charToDate(x) : character string is not in a standard unambiguous format.
This is the traceback()
Thanks in advance!
I think that this a duplicate
quantmod::getFX function returns "character in a standard unambiguous format"
In any case, I followed the code of the getSymbols.oanda and it seems that oanda changed their API so instead of downloading a .csv you get a .xml file.
Speaking for me, I will go for a different source for the data until this is resolved by the quantmod guys.

Resources