Use greek letters to name list elements in R - r

Is there any way to name list elements in R using Greek letters? I'm asked to create a list that should look like list(α = 42).
The actual result of this expression is equivalent to the result of list(a = 42).
I know it is possible to use Greek letters in plots using expression(symbol("a")), but I couldn't find a solution to use Greek letters as names for list elements. Using as.character("\U03B1") results in an error, using just "\U03B1" leads to an "a". I doubt that it makes sense to use Greek letters anywhere else than plots, but this is homework, so I have to find a way (if there is one).

I have not tested this thoroughly, but R seems to take pretty much any symbol as a variable name without any problems and I could not find any specific rule about variable naming aside from this, taken from ?name
Names are limited to 10,000 bytes (and were to 256 bytes in versions of R before 2.13.0).
The code you posted works perfectly for me (R 3.0.1 running under Fedora Core 18):
> a <- list(α = 42)
> a
$α
[1] 42
That said, I would definitely suggest to just spell out the letter as "alpha", as it is more practical to write and maintain.
a <- list(alpha = 42)

Alternatively, name the list afterwards using your unicode symbols in the character vector:
ll <- as.list( 42 )
names(ll) <- "\u03B1"
ll
#$α
#[1] 42
Interesting that you need to do this. Also your list holds the meaning of life.

Related

Rename a column with R

I'm trying to rename a specific column in my R script using the colnames function but with no sucess so far.
I'm kinda new around programming so it may be something simple to solve.
Basically, I'm trying to rename a column called Reviewer Overall Notes and name it Nota Final in a data frame called notas with the codes:
colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
and it returns to me:
> colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
Error: object 'Nota Final' not found
I also found in [this post][1] a code that goes:
colnames(notas) [13] <- `Nota Final`
But it also return the same message.
What I'm doing wrong?
Ps:. Sorry for any misspeling, English is not my primary language.
You probably want
colnames(notas)[colnames(notas) == "Reviewer Overall Notes"] <- "Nota Final"
(#Whatif's answer shows how you can do this with the numeric index, but probably better practice to do it this way; working with strings rather than column indices makes your code both easier to read [you can see what you're renaming] and more robust [in case the order of columns changes in the future])
Alternatively,
notas <- notas %>% dplyr::rename(`Nota Final` = `Reviewer Overall Notes`)
Here you do use back-ticks, because tidyverse (of which dplyr is a part) prefers its arguments to be passed as symbols rather than strings.
Why using backtick? Use the normal quotation mark.
colnames(notas)[13] <- 'Nota Final'
This seems to matter:
df <- data.frame(a = 1:4)
colnames(df)[1] <- `b`
Error: object 'b' not found
You should not use single or double quotes in naming:
I have learned that we should not use space in names. If there are spaces in names (it works and is called a non-syntactic name: And according to Wickham Hadley's description in Advanced R book this is due to historical reasons:
"You can also create non-syntactic bindings using single or double quotes (e.g. "_abc" <- 1) instead of backticks, but you shouldn’t, because you’ll have to use a different syntax to retrieve the values. The ability to use strings on the left hand side of the assignment arrow is an historical artefact, used before R supported backticks."
To get an overview what syntactic names are use ?make.names:
make.names("Nota Final")
[1] "Nota.Final"

how to use hunspell (or other package maybe) to correct words using my own "dictionary"?

I would like to correct some misspelled names based on another list, where the names are correct.
For example, I have this text:
ff Kazaroy, Sengir Pureblood S aE Didcono Ungido Ae yf Soldado do Bispo Ue ra Lamina Celeste daLegiao L
and I have this list:
Kazarov, Sengir Pureblood/
Diácono Ungido/
Soldado do Bispo/
Lâmina Celeste da Legião
I don't want hunspell to correct these words on my text based on the English dictionary, or any dictionary whatsoever (as Kazarov is a Russian name, some words are in English and others in Portuguese) so I'd rather used my list as a "dictionary". I tried adding custom words with add_words function.
I tried, as an example
text2 <- hunspell(text, dict = dictionary(add_words = "Kazarov, Sengir Pureblood")
print(text2[[1]])
hunspell_suggest(text2[[1]])
But not only it does not work but it still uses the English dictionary. I am considering creating a custom dictionary somehow, but I feel like it will not be very efficient (and I don't even know how to do it yet).
Any suggestions?
I think the issue is with:
dictionary(add_words = "Kazarov, Sengir Pureblood")
add_words should be a character vector.
Try:
dictionary(add_words = c("Kazarov", "Sengir", "Pureblood"))
It still looks like it has problems with Kazarov/Kazaroy, but the other 2 words go fine.

Insert specific unicode symbols inside values of data.frame variable

In my data.frame I would like to add two variables, "A" and "B", whose values contain respectively an n with the i subscript and an n with the s subscript.
As I have understood so far, it's not possible to specify an expression for the values of a variable, and hence to add special characters it's necessary to use unicode symbols. Some of this unicodes work in R, as for example the greek letter "mu", identified with the unicode \U00B5, or numeric subscripts, as you can see in this reprex in your R console:
x <- data.frame("A" = c("\U00B5"),
"B" = c("B\U2082"))
print(x)
These unicodes work also if I decide to put this variable in a ggplot() object, because I will display the correct symbol ("mu" for example) on the axis text or the facets.
The problem is that when I do the same for the subscripts of i (unicode: \U1D62) and s (unicode: \u209B), R doesn't recognise the unicode and prints the whole string inside the variable name.
Do you know how I can resolve this issue and if this unicode works on every operating system?
Thanks
Is there a reason you can't use the expression() function? It seems this would solve your problem (at least concerning greek letters).
Here is the site i used to learn how to input greek letters into my R/ggplot-legends.
https://stats.idre.ucla.edu/r/codefragments/greek_letters/
Altough it is not exactly the answer you look for, i still hope it helps!
If you are on Windows 10 recently updated with as of April 2018 Update:
Use the Windows key + '.' (e.g. hold together Windows Key plus period) in your text editor. This brings up Microsoft Emoji keyboard.
Select the Greek letters variable for your script.
The R Console will not accept the Greek letters as variables directly but only the from the editor script. Some of the Greek letters don't translate to English (like "µ" or "ß".) You can paste and copy them from ls() output to access. You may be able to use some math symbols as well for variable names. I can't however, get this to work with source(). That must be a text encoding problem.

Using grep() with Unicode characters in R

(strap in!)
Hi, I'm running into issues involving Unicode encoding in R.
Basically, I'm importing data sets that contain Unicode (UTF-8) characters, and then running grep() searches to match values. For example, say I have:
bigData <- c("foo","αβγ","bar","αβγγ (abgg)", ...)
smallData <- c("αβγ","foo", ...)
What I'm trying to do is take the entries in smallData and match them to entries in bigData. (The actual sets are matrixes with columns of values, so what I'm trying to do is find the indexes of the matches, so I can tell what row to add the values to.) I've been using
matches <- grepl(smallData[i], bigData, fixed=T)
which usually results in a vector of matches. For i=2, it would return 1, since "foo" is element 1 of bigData. This is peachy and all is well. But RStudio seems to not be dealing with unicode characters properly. When I import the sets and view them, they use the character IDs.
dataset <- read_csv("[file].csv", col_names = FALSE, locale = locale())
Using View(dataset) shows "aß<U+03B3>" instead of "αβγ." The same goes for
dataset[1]
A tibble: 1x1 <chr>
[1] aß<U+03B3>
print(dataset[1])
A tibble: 1x1 <chr>
[1] aß<U+03B3>
However, and this is why I'm stuck rather than just adjusting the encoding:
paste(dataset[1])
[1] "αβγ"
Encoding(toString(dataset[1]))
[1] "UTF-8"
So it appears that R is recognizing in certain contexts that it should display Unicode characters, while in others it just sticks to--ASCII? I'm not entirely sure, but certainly a more limited set.
In any case, regardless of how it displays, what I want to do is be able to get
grep("αβγ", bigData)
[1] 2 4
However, none of the following work:
grep("αβ", bigData) #(Searching the two letters that do appear to convert)
grep("<U+03B3>",bigData,fixed=T) #(Searching the code ID itself)
grep("αβ", toString(bigData)) #(converts the whole thing to one string)
grep("\\β", bigData) #(only mentioning because it matches, bizarrely, to ß)
The only solution I've found is:
grep("\u03B3", bigData)
[1] 2 4
Which is not ideal for a couple reasons, most jarringly that it doesn't look like it's possible to just take every <U+####> and replace it with \u####, since not every Unicode character is converted to the <U+####> format, but none of them can be searched. (i.e., α and ß didn't turn into their unicode keys, but they're also not searchable by themselves. So I'd have to turn them into their keys, then alter their keys to a form that grep() can use, then search.)
That means I can't just regex the keys into a searchable format--and even if I could, I have a lot of entries including characters that'd need to be escaped (e.g., () or ), so having to remove the fixed=T term would be its own headache involving nested escapes.
Anyway...I realize that a significant part of the problem is that my set apparently involves every sort of character under the sun, and it seems I have thoroughly entrapped myself in a net of regular expressions.
Is there any way of forcing a search with (arbitrary) unicode characters? Or do I have to find a way of using regular expressions to escape every ( and α in my data set? (coordinate to that second question: is there a method to convert a unicode character to its key? I can't seem to find anything that does that specific function.)

Unicode normalization (form C) in R : convert all characters with accents into their one-unicode-character form?

In Unicode, letters with accents can be represented in two ways: the accentuated letter itself, and the combination of the bare letter plus the accent. For example, é (+U00E9) and e´ (+U0065 +U0301) are usually displayed in the same way.
R renders the following (version 3.0.2, Mac OS 10.7.5):
> "\u00e9"
[1] "é"
> "\u0065\u0301"
[1] "é"
However, of course:
> "\u00e9" == "\u0065\u0301"
[1] FALSE
Is there a function in R which converts two-unicode-character-letters into their one-character form? In particular, here it would collapse "\u0065\u0301" into "\u00e9".
That would be extremely handy to process large quantities of strings. Plus, the one-character forms can easily be converted to other encodings via iconv -- at least for the usual Latin1 characters -- and is better handled by plot.
Thanks a lot in advance.
Ok, it appears that a package has been developed to enhance and simplify the string manipulation toolbox in R (finally!). It is called stringi and looks very promising. Its documentation is very well written, and in particular I find the pages about encodings and locales much more enlightening than some of the standard R documentation on the subject.
It has Unicode normalization functions, as I was looking for (here form C):
> stri_trans_nfc('\u00e9') == stri_trans_nfc('\u0065\u0301')
[1] TRUE
It also contains a smart comparison function which integrates these normalization questions and lessens the pain of having to think about them:
> stri_compare('\u00e9', '\u0065\u0301')
[1] 0
# i.e. equal ;
# otherwise it returns 1 or -1, i.e. greater or lesser, in the alphabetic order.
Thanks to the developers, Marek Gągolewski and Bartek Tartanus, and to Kurt Hornik for the info!

Resources