Change decimal character in sprintf() - r

I can change the decimal character from output using:
> 1/2
[1] 0.5
> options(OutDec = ',')
> 1/2
[1] 0,5
But, this change does not affect sprintf() function.
> sprintf('%.1f', 1/2)
[1] "0.5"
So, my question is: There is an easy way to change it (the decimal character)? I think that I can't use a 'simple' RE because not every . need be traded by ,.
I don't have any idea of how to do it, so I can't say what I've already done.

I think you can do this by setting your locale appropriately, making sure that the LC_NUMERIC component is set to a locale that uses a comma as the decimal separator (http://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html).
Sys.setlocale("LC_NUMERIC","es_ES.utf8")
sprintf("%f",1.5)
## "1,500000"
This gives a warning that R may behave strangely; you probably want to switch LC_NUMERIC back to C as soon as you're done generating output.

Try this
sprintf("%s",format(1.5,decimal.mark=","))

Or try this in other cases (e.g. I wanted "%+3.1f %%" in sprintf) :
gsub("\\.",",", sprintf("%+3.1f %%",1.99))

Related

iconv() returns NA when given a string with a specific special character

I am trying to convert some strings of an input file from UTF8 to ASCII. For most of the strings I give it, the conversion works perfectly fine with iconv(). However on some of them, it returns NA. While manually fixing the issue in the file seems like the simplest option, it is unfortunately not an option that I have available at the moment at all.
I have made a reproducible example of my problem but we assume to assume that I have to figure a way for iconv() to somehow convert the string in s1 and not get NA.
Here is the reproducible example:
s1 <- "Besançon" #as read from an input file I cannot modify
s2 <- "Paris"
s3 <- "Linköping"
s4 <- "Besançon" #Manual input for testing
s1 <- iconv(s1, to='ASCII//TRANSLIT')
s2 <- iconv(s2, to='ASCII//TRANSLIT')
s3 <- iconv(s3, to='ASCII//TRANSLIT')
s4 <- iconv(s4, to='ASCII//TRANSLIT')
I get the following output:
> s1
[1] NA
> s2
[1] "Paris"
> s3
[1] "Link\"oping"
> s4
[1] "Besancon"
After playing around with the code, I figured that something was wrong in the entry "Besançon" that is now copied exactly from the input file. When I input it manually myself, the problem is solved. Since I can't modify the input file at all, what do you think is the exact issue and would you have any idea on how to solve it?
Thanks in advance,
Edit:
After closer inspection, there is something odd in the characters of the first line. It seems to be taken away by SO's formatting.
But to reproduce it, the best I could give is these two images describing it. First image places my cursor just before the #
Second image is after pressing delete, which should delete the white space... turns out it deletes the ". So there is definitely something weird there.
It turns out that using sub='' actually solved the issue although I am quite unsure why.
iconv(s1, to='ASCII//TRANSLIT', sub='')
From the documentation sub
character string. If not NA it is used to replace any non-convertible
bytes in the input. (This would normally be a single character, but
can be more.) If "byte", the indication is "" with the hex code of
the byte. If "Unicode" and converting from UTF-8, the Unicode point in
the form "<U+xxxx>".
So I eventually figured out that there was a character I couldn't convert (nor see) in the string and using sub was a way to eliminate it. I am still not sure what this character is though. But the problem is solved.
There is probably a latin1 (or other encoding) character in your supposedly utf8 file. For example:
> latin=iconv('Besançon','utf8','latin1')
> iconv(latin,to='ascii//translit')
[1] NA
> iconv(latin,'utf8','ascii//translit')
[1] NA
> iconv(latin,'latin1','ascii//translit')
[1] "Besancon"
> iconv(l,'Windows-1250','ascii//translit')
[1] "Besancon"
You can e.g. make one new vector or data column with the result of each character set encoding in your data, and if one is NA, fall back to the next one, e.g.
utf8 = iconv(x,'utf8','ascii//translit')
latin1 = iconv(x,'latin1','ascii//translit')
win1250 = iconv(x,'Windows-1250','ascii//translit')
result = ifelse(
is.na(utf8),
ifelse(
is.na(latin1),
win1250,
latin1
),
utf8
)
If these encodings don't work, make a file with just the problem word, then use the unix/linux file command to detect the encoding, or else try some likely encodings.
I have in the past just listed all of iconv's supported encodings, tried all with lapply, and then used whichever results worked on each string, but some "from" encodings will return a non-NA but incorrect result, so it's best to try this on each unique character in your data in order to decide which subset of iconv's encodings to use and in which order.

R - f_num, but with comma

The f_num function from the numform package will remove leading zeros from a number:
f_num(0.1)
Output:
.1
I need this very same thing, but with a comma instad of the period. It would also be great if the functionality of the f_num function which allows you to round up the number of decimals would be kept.
Here is a custom alternative(see note below):
detrail <- function(num,round_dec=NULL){
if(!is.null(round_dec)){
num<-round(num,round_dec)
}
gsub("^\\d\\.",",",num)
}
detrail(0.1)
[1] ",1"
detrail(1.1)
[1] ",1"
detrail(0.276,2)
[1] ",28"
NOTE:
To read this as numeric, you'll need to change options(OutDec) to , instead of . ie options(OutDec= ","). I have not done this as I do not like changing global options.See Also
This also removes any number that is not zero. Disable this by using 0 instead of \\d.

R sanitize pattern for regular expression detection [duplicate]

Here is sample string
x<-"My name is XYZ, I'm from ABc, working at PQR"
and want to detect "," in the string and using two forms:
> str_detect(x,",")
[1] TRUE
>
> str_detect(x,fixed(","))
[1] TRUE
Both returning same result. Then what is the difference b/w these two?
For this, we may need to use a different example with regex. Here, we are trying to check whether the upper case letter 'R' is at the end ($) of the string. With fixed as wrapper, it checks whether we have R$ as characterand without it, it evaluates$` as the end of the string as it is a metacharacter.
str_detect(x,fixed("R$"))
#[1] FALSE
str_detect(x,"R$")
#[1] TRUE
The , is not a metacharacter and is evaluated as , whether we are using with fixed or without fixed. In general, if we are specifically looking for finding the literal character, use the fixed wrapper and it should be fast as well.

grepping special characters in R

I have a variable named full.path.
And I am checking if the string contained in it is having certain special character or not.
From my code below, I am trying to grep some special character. As the characters are not there, still the output that I get is true.
Could someone explain and help. Thanks in advance.
full.path <- "/home/xyz"
#This returns TRUE :(
grepl("[?.,;:'-_+=()!##$%^&*|~`{}]", full.path)
By plugging this regex into https://regexr.com/ I was able to spot the issue: if you have - in a character class, you will create a range. The range from ' to _ happens to include uppercase letters, so you get spurious matches.
To avoid this behaviour, you can put - first in the character class, which is how you signal you want to actually match - and not a range:
> grepl("[-?.,;:'_+=()!##$%^&*|~`{}]", full.path)
[1] FALSE

Variable name restrictions in R

What are the restrictions as to what characters (and maybe other restrictions) can be used for a variable name in R?
(This screams of general reference, but I can't seem to find the answer)
You might be looking for the discussion from ?make.names:
A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way" are not valid, and neither are the
reserved words.
In the help file itself, there's a link to a list of reserved words, which are:
if else repeat while function for in next break
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_
NA_character_
Many other good notes from the comments include the point by James to the R FAQ addressing this issue and Josh's pointer to a related SO question dealing with checking for syntactically valid names.
Almost NONE! You can use 'assign' to make ridiculous variable names:
assign("1",99)
ls()
# [1] "1"
Yes, that's a variable called '1'. Digit 1. Luckily it doesn't change the value of integer 1, and you have to work slightly harder to get its value:
1
# [1] 1
get("1")
# [1] 99
The "syntactic restrictions" some people might mention are purely imposed by the parser. Fundamentally, there's very little you can't call an R object. You just can't do it via the '<-' assignment operator. "get" will set you free :)
The following may not directly address your question but is of great help.
Try the exists() command to see if something already exists and this way you know you should not use the system names for your variables or function.
Example...
> exists('for')
[1] TRUE
>exists('myvariable')
[1] FALSE
Using the make.names() function from the built in base package may help:
is_valid_name<- function(x)
{
length_condition = if(getRversion() < "2.13.0") 256L else 10000L
is_short_enough = nchar(x) <= length_condition
is_valid_name = (make.names(x) == x)
final_condition = is_short_enough && is_valid_name
return(final_condition)
}

Resources