Strings containing an accent do not appear

Strings containing an accent do not appear - r

I'm currently building a shiny application that needs to be translated in different languages. I have the whole structure but I'm struggling into getting values such as "Validació" that contain accents.
The structure I've followed is the following:
I have a dictionary that is simply a csv with the translation where
there's a key and then each language. The structure of this dictionary is the following:
key, cat, en
"selecció", "selecció", "Selection"
"Diferències","Diferències", "Differences"
"Descarregar","Descarregar", "Download"
"Diagnòstics","Diagnòstics", "Diagnoses"
I have a script that once the dictionary.csv is modified, generates a .bin file that later will be loaded in the code.
In strings.R I have all the strings that will appear on the code and I use a function to translate the current language to the one I want. The function is the following:
Code:
tr <- function(text){
sapply(text, function(s) translation[[s]][["cat"]], USE.NAMES=F)
}
When I translate something, since I am doing in another file, I assign it to another variable something like:
str_seleccio <- tr('Selecció)
The problem I'm facing is that for example if we translate 'Selecció' would be according to this function, tr('Selecció') and provides a correct answer if I execute it in the RStudio terminal but when I do it in the Shiny application, appears to me as a NULL. If the word I translate has no accents such as "Hello", tr("Hello") provides me a correct answer in the Shiny application and I can see it throught the code.
So mainly tr(word) gets the correct value but when assigning it "loses the value" so I'm a bit lost how to do it.
I know that you can do something like Encoding(str_seleccio) <- "UTF-8" but in this case is not working. In case of plain words it used to do but since when I asssign it, gets NULL is not working.
Any idea? Any suggestion? What I would like is to add something to tr function
The main idea comes from this repository that if you can take a look is the simplest version you can do, but (s)he has problem with utf-8 also.
https://github.com/chrislad/multilingualShinyApp

As in http://shiny.rstudio.com/articles/unicode.html suggested (re)save all files with UTF-8 encoding.
Additionaly change within updateTranslation.R:
translationContent <- read.delim("dictionary.csv", header = TRUE, sep = "\t", as.is = TRUE)
to:
translationContent <- read.delim("dictionary.csv", header = TRUE, sep = "\t", as.is = TRUE, fileEncoding = "UTF-8").
Warning, when you (re)save ui.R, your "c-cedilla" might get destroyed. Just re-insert it, in case it happens.
Happy easter :)

Related

Is there some way to change the characters encoding to its English equivalent IN R?

In R
I am extracting data from Pdf tables using Tabulizer library and the Name are on Nepali language
and after extracting i Get this Table
[1]: https://i.stack.imgur.com/Ltpqv.png
But now i want that column 2's name To change, in its English Equivalent
Is there any way to do this in R
The R code i wrote was
library(tabulizer)
location <- "https://citizenlifenepal.com/wp-content/uploads/2019/10/2nd-AGM.pdf"
out <- extract_tables(location,pages = 113)
##write.table(out,file = "try.txt")
final <- do.call(rbind,out)
final <- as.data.frame(final) ### creating df
col_name <- c("S.No.","Types of Insurance","Inforce Policy Count", "","Sum Assured of Inforce Policies","","Sum at Risk","","Sum at Risk Transferred to Re-Insurer","","Sum At Risk Retained By Insurer","")
names(final) <- col_name
final <- final[-1,]
write.csv(final,file = "/cloud/project/Extracted_data/Citizen_life.csv",row.names = FALSE)
View(final)```

It appears that document is using a non-Unicode encoding. This web site https://www.ashesh.com.np/preeti-unicode/ can convert some Nepali encodings to Unicode, which would display properly in R, assuming you have the right fonts loaded. When I tried it on the output of your code, it did something that looked okay to me, but I don't know Nepali:
> out[[1]][1,2]
[1] ";fjlws hLjg aLdf"
When I convert the contents of that string, I get
सावधिक जीवन बीमा
which looks to me something like the text on that page in the document. If it's actually written correctly, then converting it to English will need some Nepali speaker to do the translation: hopefully that's you, but if I use Google Translate, it gives
Term life insurance
So here's my suggestion: contact the owner of that www.ashesh.com.np website, and find out if they can give you the translation rules. Write an R function to implement them if you can't find one by someone else. Then do the English translations manually.

Get R to keep UTF-8 Codepoint representation

This question is related to the utf-8 package for R. I have a weird problem in which I want emojis in a data set I'm working with to stay in code point representation (i.e. as '\U0001f602'). I want to use the 'FindReplace' function from the Data Combine package to turn UTF-8 encodings into prose descriptions of emojis in a dataset of YouTube comments (using a dictionary I made available here). The only issue is that when I 'save' the output as an object in R the nice utf-8 encoding generated by utf8_encode for which I can use my dictionary, it disappears...
First I have to adjust the dictionary a bit:
emojis$YouTube <- tolower(emojis$Codepoint)
emojis$YouTube <- gsub("u\\+","\\\\U000", emojis$YouTube)
Convert to character so as to be able to use utf8_encode:
emojimovie$test <- as.character(emojimovie$textOriginal)
This works great, gives output of \U0001f595 (etc.) that can be matched with dictionary entries when it 'prints' in the console.
utf8_encode(emojimovie$test)
BUT, when I do this:
emojimovie$text2 <- utf8_encode(emojimovie$test)
and then:
emoemo <- FindReplace(data = emojimovie, Var = "text2", replaceData = emojis, from = "YouTube", to = "Name", exact = TRUE)
I get all NAs. When I look at the output in $text2 with View I don't see the \U0001f595, I see actual emojis. I think this is why the FindReplace function isn't working -- when it gets saved to an object it just gets represented as emojis again and the function can't find any matches. When I try gsub("\U0001f602", "lolface", emojimovie$text2), however, I can actually match and replace things, but I don't want to do this for all ~2,000 or so emojis.... I've tried reading as much as I can about utf-8, but I can't understand why this is happening. I'm stumped! :P

It looks like in the above, you are trying to convert the UTF-8 emoji to a text version. I would recommend going the other direction. Something like
emojis <- read.csv('Emoji Dictionary 2.1.csv', stringsAsFactors = FALSE)
# change U+1F469 U+200D U+1F467 to \U1F469\U200D\U1F467
escapes <- gsub("[[:space:]]*\\U\\+", "\\\\U", emojis$Codepoint)
# convert to UTF-8 using the R parser
codes <- sapply(parse(text = paste0("'", escapes, "'"),
keep.source = FALSE), eval)
This will convert the text representations like U+1F469 to UTF-8 strings. Then, you can search for these strings in the original data.
Note: If you are using Windows, make sure you have the latest release of R; in older versions, the parser gives you the wrong result for strings litke "\U1F469".
The utf8::utf8_encode should really only be used if you have UTF-8 and are trying to print it to the screen.

Adding items to txt output in the same file from

I would like to printout to the same txt (outfile.txt) file items one after the other.
For instance, first I would like to print to outfile.txt a dataframe - u. Afterwards, a written message 'hello' and finally a summary of model.
How can I do it? Is sink(outfile.txt) is appropriate for this case?

It is generally a very bad idea to mix data in the same file. I advise against it in the strongest terms: it makes the data file next to unusable for other programs.
That said, most functions to save data have an append argument. You can set this to TRUE to append to an existing file rather than overwriting its contents. No need for sink.
Where you do need sink (or equivalent) is when you want to write contents formatted in the same way as it’s written on the console. This, for instance, is the case for summary.
Here’s an example similar to your requirements:
filename = 'test.txt'
write.table(head(cars), filename, quote = FALSE, col.names = NA)
cat('\nHello\n\n', file = filename, append = TRUE)
capture.output(print(summary(cars)), file = filename, append = TRUE)
Rather than sink, this uses capture.output, which is a convenience wrapper around sink.

Change directory/ path in beginning of syntax will change all the following identical directories

I have have a working directory:
setwd("C:/Patient migration")
then I have other directories where I save my workspace data and where I get the source data from.
C:/Patient migration/source data
C:/Patient migration/workspace
As this directories appear many times in the syntax (as part of a complete path name) and as other persons should be able to work with my syntax as well.
Such a directory later on in the syntax would look like this:
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
Is it possible to change a directory once, for example in the beginning, so that all the same directories in the syntax further down will be changed as well?
My goal is that i can name the 2 or 3 directories in the beginning of my syntax. Other users can change those and consequently all the other directories somewhere in the syntax change as well.
Do you understand what I want to do? Are there probably smarter ways to do that?
I don't really want all this data in the working directory.
Hopefully somebody can help. Thanks a lot!

Maybe you can firstly label your file with names in the beginning of your syntax like this
source.file <- "C:/Patient migration/source data"
work.file <- "C:/Patient migration/workspace"
Then you can just use the names of those paths rather than type it every time.
Other user of your syntax can set the file path in the beginning and need not change the following code any more.

I found a solution that works for me. I use relative paths which start with the subfolder where the data i need comes from or where the output is going to. This subfolder is lying in the working directory.
Like that I just need to change the working directory. Everything else can stay the same.
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
becomes
Patient migration/Workspace/SCICases2010.RData")
and
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
becomes
source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)

How to create a new read.csv in R so it can read .csv file without typing the full name of .csv file

guys, thanks for read this. This is my first time writing a program so pardon me if I make stupid questions.
I have bunch of .csv files named like: 001-XXX.csv;002-XXX.csv...150-XXX.csv. Here XXX is a very long name tag. So it's a little annoying that every time I need to type read.csv("001-xxx.csv"). I want to make a function called "newread" that only ask me for the first three digits, the real id number, to read the .csv files. I thought "newread" should be like this:
newread <- function(id){
as.character(id)
a <- paste(id,"-XXX.csv",sep="")
read.csv(a)
}
BUt R shows Error: unexpected '}' in "}" What's going wrong? It looks logical.
I am running Rstudio on Windows 8.

as.character(id) will not change id into a character string. Change it to:
id = as.character(id)
Edit: According to comments, you should call newread() with a character paramter, and there is no difference between newread(001) and newread(1).

This is not specifically an answer to your question (others have covered that), but rather some advice that may be helpful for accomplishing your task in a different way.
First, some of the GUI's for R have file name completion. You can type the first part: read.csv("001- and then hit a key or combination of keys (In the windows GUI you press TAB) and the rest of the filename will be filled in for you (as long as it is unique).
You can use the file.choose or choose.files functions to open a dialog box to choose your file using the mouse: read.csv(file.choose()).
If you want to read in all the above files then you can do this in one step using lapply and either sprintf or list.files (or others):
mycsvlist <- lapply( 1:150, function(x) read.csv( sprintf("%03d-XXX.csv", x) ) )
or
mvcsvlist <- lapply( list.files(pattern="\\.csv$"), read.csv )
You could also use list.files to get a list of all the files matching a pattern and then pass one of the returned values to read.csv:
tmp <- list.files(pattern="001.*csv$")
read.csv(tmp[1])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Strings containing an accent do not appear - r

Related

Is there some way to change the characters encoding to its English equivalent IN R?

Get R to keep UTF-8 Codepoint representation

Adding items to txt output in the same file from

Change directory/ path in beginning of syntax will change all the following identical directories

How to create a new read.csv in R so it can read .csv file without typing the full name of .csv file

Categories

Resources