Should I use utf-8 encoding for an online course? - r

Hello this is my question:
I am currently working on an introductory course on R programming for people with zero background on programming (this is people studying biology, veterinary, medicine, economics, ...), so they tend to be not very tech savvy and to use Windows. After they download and open the R scripts that I prepared, they are going to find every now and then badly encoded characters (as the course is in spanish and has many accents). This happens because my scripts are made with UTF-8 encoding and is not supported by default in Windows.
The options to avoid this nuisance are:
change all my scripts to the encoding WINDOWS-1252
instruct everyone to change their encoding to UTF-8
The first option is more annoying for me and helps prevents the students to be distracted with a quite minor detail.
The second option has no clear advantages from the pedagogic point of view, so I'd like to ask which virtues do you think it has...
Thanks in advance!

I would highly recommend instructing them to change their encoding to UTF-8. I've had the same issue on numerous occassions with web-app scripting and generally speaking it's alot more hassle to go through the code than to instruct the customer (or in your case, student) to use the UTF-8 encoding.
Afterall the course you're holding is an introductionary course, you might want to consider briefly covering the topic and explain the differences between the two - and more specifically: What happens when it doesn't work?
You have a golden opportunity to save yourself some time later down road, and possibly avoid the "Why is there question marks all over my screen"-question altogether!

Maybe you can avoid non-ASCII characters in your scripts. For example, to represent the greek "mu" character, you could use
> mu <- "\u03BC"
> Encoding(mu) <- "UTF-8"
> mu
[1] "μ"
Now if you print mu on the console, it is displayed correctly. In the script, you did not use any non-ASCII character at all.

Related

what's preventing additions to the current set of R reserved words/symbols?

Is there a historical precedent of internal changes to the R parser, adding new reserved words or symbols?
If I remember correctly data.table uses a serendipitous := that was once defined but left unused in R internals, but I'm not aware of others. However, as the language evolves, it would sometimes seem useful to define new symbols.
An obvious case could be made for magrittr's pipe %>% which has become ubiquitous for many, but remains a pain to type (sure, there are keyboard tricks, but still). Similarly, dplyr/rlang introduce/repurpose notations for "tidy evaluation" (!!, !!!, :=, ~, etc.).
Another case I'm seeing is the verbosity of lambda functions. Would it be possible, theoretically, to define internally something like f = λ(x) x+1 instead of f = function(x) x+1, or are there character restrictions on top of other reasons?
Why add an ergonomics feature if you risk breaking a runtime that hosts a huge ecosystem? Also, once you add one feature, you are on a slippery slope and are staring straight in the face of feature bloat.
And if you say that we can be smart and judicious about what features we add, how do we structure that decision process? R does not have a "benevolent dictator" having a final word in decisions like this so you are left with design by committee with all that it entails.
The big thing with R has always been the package ecosystem, in which if you want a feature you write it yourself -- as in your magrittr example. The language itself has remained close to its S roots and has successfully served as a stable platform for all the development that has been happening.

Math ML MO uses

What do following snippets of code do in Math ML files? I removed those lines and it still worked fine for me.
<mo>⁡</mo>
<mo>⁢</mo>
<mo></mo>
Answering to any of them or just letting me know what they are would be very much appreciated.
The first two are ⁡ function application and ⁢ invisible times. They help indicate semantic information, see this Wikipedia entry
The last one, , could be anything since it lies in the Unicode Private Use Area which is provided so that font developers can store glyphs that do not correspond to regular Unicode positions. (Unless it's a typo and really 6349 in which case it's a a Han character.)

Can I implement a small subset of Curses in pure C++ (or any similar language) easily?

(I couldn't find anything related to this, as I don't know what keywords to search for).
I want a simple function - one that prints 3 lines, then erases the 3 lines and replaces with new ones. If it were a single line, I could just print \r or \b and overwrite it.
How can I do this without a Curses library? There must be some escape codes or something for this.
I found some escape codes to print colored text, so I'm guessing there is something similar to overwrite previous lines.
I want this to run on OSX and Ubuntu at least.
Edit: I found this - http://www.perlmonks.org/?displaytype=displaycode;node_id=575125
Is there a list of ALL such available commands?
(Short answer: Yes. See "ANSI Escape code" in Wikipedia for a complete list of ANSI sequences. Your terminal may or may not be ANSI, but ANSI sequence support seems pretty common - a good starting point at least).
The commands depends on the terminal you are using, or these days of course the terminal emulator.
Back in the day there were physical boxes with names such as "VT-100" or "Ontel".
Each implemented whatever set of escape sequence commands they chose.
Lately of course we only use emulators. Nearly every sort of command line type interface operates in a text-window that emulates something or other.
Curses is a library that allowes your average programmer to write code to manipulate the terminal without having to know how to code for each of the many difference terminals out there. Kind like printer drivers let you print without having to know the details of any particular printer.
First you need to find out what kind of terminal you are using.
Then you can look up the specific commands.
One possible answer is here.
"ANSI" is a common one, typical of MSDOS.
Or, use curses and be happy for it :-)

Coding mathematical algorithms - should I use variables in the book or more descriptive ones?

I'm maintaining code for a mathematical algorithm that came from a book, with references in the comments. Is it better to have variable names that are descriptive of what the variables represent, or should the variables match what is in the book?
For a simple example, I may see this code, which reflects the variable in the book.
A_c = v*v/r
I could rewrite it as
centripetal_acceleration = velocity*velocity/radius
The advantage of the latter is that anyone looking at the code could understand it. However, the advantage of the former is that it is easier to compare the code with what is in the book. I may do this in order to double check the implementation of the algorithms, or I may want to add additional calculations.
Perhaps I am over-thinking this, and should simply use comments to describe what the variables are. I tend to favor self-documenting code however (use descriptive variable names instead of adding comments to describe what they are), but maybe this is a case where comments would be very helpful.
I know this question can be subjective, but I wondered if anyone had any guiding principles in order to make a decision, or had links to guidelines for coding math algorithms.
I would prefer to use the more descriptive variable names. You can't guarantee everyone that is going to look at the code has access to "the book". You may leave and take your copy, it may go out of print, etc. In my opinion it's better to be descriptive.
We use a lot of mathematical reference books in our work, and we reference them in comments, but we rarely use the same mathematically abbreviated variable names.
A common practise is to summarise all your variables, indexes and descriptions in a comment header before starting the code proper. eg.
// A_c = Centripetal Acceleration
// v = Velocity
// r = Radius
A_c = (v^2)/r
I write a lot of mathematical software. IF I can insert in the comments a very specific reference to a book or a paper or (best) web site that explains the algorithm and defines the variable names, then I will use the SHORT names like a = v * v / r because it makes the formulas easier to read and write and verify visually.
IF not, then I will write very verbose code with lots of comments and long descriptive variable names. Essentially, my code becomes a paper that describes the algorithm (anyone remember Knuth's "Literate Programming" efforts, years ago? Though the technology for it never took off, I emulate the spirit of that effort). I use a LOT of ascii art in my comments, with box-and-arrow diagrams and other descriptive graphics. I use Jave.de -- the Java Ascii Vmumble Editor.
I will sometimes write my math with short, angry little variable names, easier to read and write for ME because I know the math, then use REFACTOR to replace the names with longer, more descriptive ones at the end, but only for code that is much more informal.
I think it depends almost entirely upon the audience for whom you're writing -- and don't ever mistake the compiler for the audience either. If your code is likely to be maintained by more or less "general purpose" programmers who may not/probably won't know much about physics so they won't recognize what v and r mean, then it's probably better to expand them to be recognizable for non-physicists. If they're going to be physicists (or, for another example, game programmers) for whom the textbook abbreviations are clear and obvious, then use the abbreviations. If you don't know/can't guess which, it's probably safer to err on the side of the names being longer and more descriptive.
I vote for the "book" version. 'v' and 'r' etc are pretty well understood as acronymns for velocity and radius and is more compact.
How far would you take it?
Most (non-greek :-)) keyboards don't provide easy access to Δ, but it's valid as part of an identifier in some languages (e.g. C#):
int Δv;
int Δx;
Anyone coming afterwards and maintaining the code may curse you every day. Similarly for a lot of other symbols used in maths. So if you're not going to use those actual symbols (and I'd encourage you not to), I'd argue you ought to translate the rest, where it doesn't make for code that's too verbose.
In addition, what if you need to combine algorithms, and those algorithms have conflicting usage of variables?
A compromise could be to code and debug as contained in the book, and then perform a global search and replace for all of your variables towards the end of your development, so that it is easier to read. If you do this I would change the names of the variables slightly so that it is easier to change them later.
e.g A_c# = v#*v#/r#

Is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm?

Well I have been working on an assigment and it states:
A program has to be developed, and coded in C language, to decipher a document written
in Italian that is encoded using a secret key. The secret key is obtained as random
permutation of all the uppercase letters, lowercase letters, numbers and blank space. As
an example, let us consider the following two strings:
Plain: “ABCDEFGHIJKLMNOPQRSTUVXWYZabcdefghijklmnopqrstuvwxyz0123456789 ”
Code: “BZJ9y0KePWopxYkQlRjhzsaNTFAtM7H6S24fC5mcIgXbnLOq8Uid 3EDv1ruVGw”
The secret key modifies only letters, numbers, and spaces of the original document, while
the remaining characters are left unchanged. The document is stored in a text file whose
length is unknown.
The program has to read the document, find the secret key (which by definition is
unknown; the above table is just an example and it is not the key used for preparing the
sample files available on the web course) using a suitable decoding algorithm, and write
the decoded document to a new text file.
And I know that I have to upload an English dictionary into the program but I don't why it has been asked (may be not in that statement but I have to do THAT). My question is, while I can do that program using simple encryption/decryption algorithm then what's the use of uploading the English dictionary in our program? So is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm? Or can somebody tell me what approach or algorithm should I use to solve that problem???
An early reply (and also authentic one) will be highly appreciated from you.
Thank you guys.
This is a simple substitution cipher. It can be broken using frequency analysis. The Wikipedia articles explain both concepts thoroughly. What you need to do is:
Find the statistical frequency of characters in Italian texts. If you can't find this published anywhere, you can build it yourself by analyzing a large corpus of Italian texts.
Analyze the frequency of characters in the cipher text, and match it to the statistical data.
The first Wikipedia article links to a set of tools that implement all of the above. You just need to use and possibly adapt it to your use case.
Your cipher is a substitution cipher. That is it substitutes one letter for another.
consider the cipher text
"yjr,1drv2ry1od1q1..."
We can use a dictionary to find the plaintext.
Find punctuation, since a space always follows a comma, you can find the substitution rule for spaces.
which gives you.
"yjr, drv2ry od q..."
Notice the word lengths. Since there only two 1 letter words in the english language the q is probably i or a. "yjr" is probably "why", "the", "how" etc.
We try why with the result
"why, dyv2yw od q..."
There are no english words with two y's, and end in w.
So we try "the" and get
"the, dev2et od q..."
We conclude that the is a likely answer.
Now we search our dictionary for words that start look like ?e??et.
rinse repeat.
That is, find some set of words which fit into the lengths available and do not break each others substitution rules.
Personally I just do the frequency analysis suggested above.
Frequency analysis, as both other respondents said, is the way to go, and you can use digrams and trigrams to make it much stronger. Just grab tons of Italian text from the web and churn ahead! It's really pretty simple programming.

Resources