Issue with multibyte character in Unix

Issue with multibyte character in Unix - unix

We are facing issue for multibyte character when we are trying to run the below command:
awk 'length<30'
The File content is :
ASDFGHJKLQWERTYUIOPZXJM0000023 حكمت مزبان إبراهيم العزاوي
ASDFGHJKLQWERTYUIOPZXJM000
So it should give only one record.

length<30
Will always return true - you want the length() function, length is a simple variable which is initialized as zero.
awk 'length($0)<30'

Related

How to parse #{TEST TAGS} into only the Tags, eliminating current formatting?

Situation.. I have two tags defined, then I try to output them to the console. What comes out seems to be similar to an array, but I'd like to remove the formatting and just have the actual words outputted.
Here's what I currently have:
[Tags] ready ver10
Log To Console \n#{TEST TAGS}
And the result is
['ready', 'ver10']
So, how would I chuck the [', the ', ' and the '], thus only retaining the words ready and ver10?
Note: I was getting [u'ready', u'ver10'] - but once I got some advice to make sure I was running Python3 RobotFramework - after uninstalling robotframework via pip, and now only having robotframework installed via pip3, the u has vanished. That's great!

There are several ways to do it. For example, you could use a loop, or you could convert the list to a string before calling log to console
Using a loop.
Since the data is a list, it's easy to iterate over the list:
FOR ${tag} IN #{Test Tags}
log to console ${tag}
END
Converting to a string
You can use the evaluate keyword to convert the list to a string of values separated by a newline. Note: you have to use two backslashes in the call to evaluate since both robot and python use the backslash as an escape character. So, the first backslash escapes the second so that python will see \n and convert it to a newline.
${tags}= evaluate "\\n".join($test_tags)
log to console \n${tags}

R customize error message when string contains unrecognized escape

I would like to give a more informative error message when users of my R functions supply a string with an unrecognized escape
my_string <- "sql\sql"
# Error: '\s' is an unrecognized escape in character string starting ""sql\s"
Something like this would be ideal.
my_string <- "sql\sql"
# Error: my_string contains an unrecognized escape. Try sql\\sql with double backslashes instead.
I have tried an if statement that looks for single backslashes
if (stringr::str_detect("sql\sql", "\")) stop("my error message")
but I get the same error.
Almost all of my users are Windows users running R 3.3 and up.

Code execution in R happens in two phases. First, R takes the raw string you enter and parses that into commands that can be run; then, R actually runs those commands. The parsing step makes sure what you've written actually makes sense as code. If it doesn't make any sense, then R can't even turn it into anything it can attempt to run.
The error message you are getting about the unrecognized escape sequence is happening at the parsing stage. That means R isn't really even attempting to execute the command, it just straight up can't understand what you are saying. There is no way to catch in error like this in code because there's no user code that's running at that point.
So if you are counting on your users writing code like my_string <- "something", then they need to write valid code. They can't change how strings are encoded or what the assignment operator looks like or how variables can be named. They also can't type !my_string! <=== %something% because R can't parse that either. R can't parse my_string <- "sql\sql" but it can parse my_string <- "sql\\sql" (slashes much be escaped in string literals). If they are not savy users, you might want to consider providing an alternative interface that can sanitize user input before trying to run it as code. Maybe make a shiny front end or have users pass arguments to your scripts via command line parameters.

If you're capturing your user input correctly, for a string input of\, R will store that in my_string as \\.
readline()
\
[1] "\\"
readline()
sql\sql
[1] "sql\\sql"
That means internally in R:
my_string <- "sql\\sql"
However
cat(my_string)
sql\sql
To check the input, you need to escape each escape, because you're looking for \\
stringr::str_detect(my_string, "\\\\")
Which returns TRUE if the input string is sql\sql. So the full line is:
if (stringr::str_detect("sql\\sql", "\\\\")) stop("my error message")

printf not printing past '.' in string

I am having a problem using printf on my unix system. It is throwing an error every time I try to print the following
printf "%-15s %-15.2s" "Total Acounts:\nChecks=$" checks
checks should be a decimal, but I have tried printing it as a float and a decimal and get the same error.
fatal: not enough arguments to satisfy format string
`%-15s %-15.2sTotal Acounts:
Checks=$2135.92'
^
I have been working at this for a while now and can't figure it out, so any help is appreciated.

That's not how you call printf in awk. You are missing the commas indicating arguments.
You've given printf only a format string (the concatenation of "%-15s %-15.2s", "Total Acounts:\nChecks=$" and the value of checks).
As you can see from the error message that shows the entire string as the format string and includes the value of checks in the string.
You probably meant:
printf "%-15s %-15.2s", "Total Acounts:\nChecks=$", checks
#---------------------^---------------------------^
though %-15s there isn't really doing anything useful for you as `"Total Acounts:\nChecks=$" is longer than 15 characters.

R error: regular expression is invalid in this locale

I am trying to gather all instances of "Walloni\xeb" within a data-frame column in order to remove "\" using the grep function. However, I'm getting the following error message as shown below:
grep("Walloni\xeb", InvoAndinfo2$Regio)
Error in grep("Walloni\xeb", InvoAndinfo2$Regio) :
regular expression is invalid in this locale
Does anyone know what to do to resolve this?

The backslash is a special character in regexp, if you want to look for a string that has a backslash, you should escape it by adding another backslah in front of it.
Try:
grep("Walloni\\xeb", InvoAndinfo2$Regio)

R: invalid multibyte string [duplicate]

This question already has answers here:
Invalid multibyte string in read.csv
(9 answers)
Closed 8 years ago.
I use read.delim(filename) without any parameters to read a tab delimited text file in R.
df = read.delim(file)
This worked as intended. Now I have a weird error message and I can't make any sense of it:
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at '<fd>'
Calls: read.delim -> read.table -> type.convert
Execution halted
Can anybody explain what a multibyte string is? What does fd mean? Are there other ways to read a tab file in R? I have column headers and lines which do not have data for all columns.

I realize this is pretty late, but I had a similar problem and I figured I'd post what worked for me. I used the iconv utility (e.g., "iconv file.pcl -f UTF-8 -t ISO-8859-1 -c"). The "-c" option skips characters that can't be translated.

If you want an R solution, here's a small convenience function I sometimes use to find where the offending (multiByte) character is lurking. Note that it is the next character to what gets printed. This works because print will work fine, but substr throws an error when multibyte characters are present.
find_offending_character <- function(x, maxStringLength=256){
print(x)
for (c in 1:maxStringLength){
offendingChar <- substr(x,c,c)
#print(offendingChar) #uncomment if you want the indiv characters printed
#the next character is the offending multibyte Character
}
}
string_vector <- c("test", "Se\x96ora", "works fine")
lapply(string_vector, find_offending_character)
I fix that character and run this again. Hope that helps someone who encounters the invalid multibyte string error.

I had a similarly strange problem with a file from the program e-prime (edat -> SPSS conversion), but then I discovered that there are many additional encodings you can use. this did the trick for me:
tbl <- read.delim("dir/file.txt", fileEncoding="UCS-2LE")

This happened to me because I had the 'copyright' symbol in one of my strings! Once it was removed, problem solved.
A good rule of thumb, make sure that characters not appearing on your keyboard are removed if you are seeing this error.

I figured out Leafpad to be an adequate and simple text-editor to view and save/convert in certain character sets - at least in the linux-world.
I used this to save the Latin-15 to UTF-8 and it worked.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Issue with multibyte character in Unix - unix

We are facing issue for multibyte character when we are trying to run the below command: awk 'length<30' The File content is : ASDFGHJKLQWERTYUIOPZXJM0000023 حكمت مزبان إبراهيم العزاوي ASDFGHJKLQWERTYUIOPZXJM000 So it should give only one record.

length<30 Will always return true - you want the length() function, length is a simple variable which is initialized as zero. awk 'length($0)<30'

Related

How to parse #{TEST TAGS} into only the Tags, eliminating current formatting?

R customize error message when string contains unrecognized escape

printf not printing past '.' in string

R error: regular expression is invalid in this locale

R: invalid multibyte string [duplicate]

Categories

Resources