I want to download all the words in Spanish from A-Z. No definitions or any of that, just the words.
When I search "download Spanish dictionary" I just get either an English to Spanish dictionary or some app that has the words.
Related
I have a CSV file with Hindi and English words. Currently, R is not reading Hindi words. How do I enable other language support along with English in R studio.
I need to study a corpus of various .txt files in R with the tm package. My files have to be stored in a single folder called "english" since their text is in English, but the file names are sometimes in Chinese, Korean, Russian or Arabic.
I have tried using the code below to create my corpus.
setwd("C:/Users/etomilina/Documents/M2/mails/english")
data <- Corpus(DirSource(encoding="UTF-8"))
However, I get the following error messages: "non-existent or non-readable files : ./??.txt, ./?????.txt" (and so on with interrogation marks instead of the file names which should be 陳國.txt or 정극지생명과학연구부.txt for example)
Currently Sys.getlocale() returns "LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252" but even when specifying Sys.Setlocale("English") the error persists.
If I only had Chinese filenames I would simply switch my system locale to Chinese, but here I have some in Korean, Arabic and Russian as well besides some English ones.
Is there any way to handle all these languages at once?
I am using Rstudio 1.4.1106 and my R version is 4.2.1.
I would like to replace the abbreviations, numbers and symbols in my text.
As my text is in german and not in english I have problems in converting it.
I tried:
review_text <- replace_abbreviation(review_text)
review_text <- replace_number(review_text)
review_text <- replace_symbol(review_text)
But this works just for an English text and not for German.
What should I add that the function also works in German?
qdap and qdap related packages are solely for use with the English language. If you want to use German text with ümlauts and everything, packages like quanteda and udpipe can handle this. But they do not handle abbreviations and symbols. Now the replace_symbol function is easy to adjust, just inspect the function, copy the code to create your own function and replace the English translations with the German translations.
The replace_abbreviation function points to a replacement table where the abbreviation are stored with the corresponding value. You need to create your own table for German.
The biggest issue is translating the numbers to text. This is different for each language is not really available online. Searching for this tends to lead to converting numbers to text in excel. But if you can read python, you can translate a python function to R (or use reticulate) to solve this. See this link to a python library on Github which can do this for a few languages including German. But I'm not sure if this can be used in a text mining context.
I use a keyboard app which suggests correct spelling of most of the Bangla words. Is there any way to extract this dictionary file as some sort of text file, so that I can use it on windows for Bangla spell checking?
I am using Windows7 Home Premium and R Studio 0.99.896.
I have a csv file containing a column with text in several different languages eg english, european, korean, simplified chinese, traditional chinese, greek, japanese etc.
I read it into R using
table<-read.csv("broker.csv",stringsAsFactors =F, encoding="UTF-8")
so that all the text is readable in it's language.
Most of the text is within a column called named "content". Within the console, when I have a look
toplines<-head(table$content,10)
I can see all the languages as they are, but when I try to write to a csv file and open it in excel, I can no longer see the languages. I typed in
write.csv(toplines,file="toplines.csv",fileEncoding="UTF-8")
then I opened toplines.csv in excel 2013 and it looked liked this
1 [<U+5916><U+5A92>:<U+4E2D><U+56FD><U+6C1.....
2 [<U+4E2D><U+56FD><U+6C11><U+822A><U+51C6.....
3 [<U+5916><U+5A92>:<U+4E2D><U+56FD><U+6C1.....
and so forth
Would anyone be able to tell me how I can write to a csv or excel file so that the languages that can be read as they are in Excel 2013? Thank you very much.
write_excel_csv() from the readr package, as suggested by #PhiSeu in the comments, has solved it for me.