In computer science, what is NOT a formal language? [closed] - math

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
On the wikipedia https://www.wikiwand.com/en/Formal_language, I found the definition of a formal language:
In mathematics, computer science, and linguistics, a formal language
is a set of strings of symbols that may be constrained by rules that
are specific to it.
This looks quite abstract to me. And I can't image any language which doesn't fit to this definition. Does anyone have ideas about what a informal language looks like and how it doesn't fit the definition?

Let me get to your question first. A good non-example of a formal language are the natural languages. English and Slovene are examples. So are Tagalog and Tarifit Berber. Unfortunately linguists don't seem to have a definition of natural language that all would agree upon.
Noam Chomsky famously tried to model natural language using context-free gammars in his 1956 paper Three Models for the Description of Language. He invented (or discovered, if you prefer) them in that paper; although he didn't called them that; while they weren't useful to model the English language, they revolutionized computer science.
Formally, a formal language is just a set of strings over a finite alphabet. That's it.
Examples include all valid C programs, all valid HTML files, all valid XML files, all strings of "balanced" parentheses (e.g. (), ()(), ((()))()(()), ...), the set (codes under some encoding) of all deterministic Turing machines that always halt, the set of all simple graphs that can be colored with k-colors (actually their codes under some encoding), the set of all binary strings that end and begin with a 1, etc.
Some are easy to recognize using regex (or, equivalently, DFA); some are impossible to recognize using DFA's, but can be recognized using PDA (or, equivalently, can be described with a context-free grammar); other's don't admit such a description, but can be recognized by a Turing machine; some aren't recognizable even by a Turing machine (called uncomputable).
This is why the definition is so useful. Many things we encounter in CS evey day can be cast in terms of formal languages.
For a good introduction to the subject, I highly recommend the superb book Introduction to Automata Theory, Languages, and Computation by Hopcroft et al.

English isn't a formal language. It's not just a set of strings; it has a spoken form, and evolution over time, and dialects, and all sorts of other things a formal language doesn't have. A formal language couldn't gain the word "email" from one decade to the next.

A language is a set of sequences made up from given symbols. It can be either finite or infinite (the set of English sentences is infinite even though there are sentences, eg excessively long, which can not be comprehended even by a native speaker). If it is finite then any description of it is a formal definition.
If the language is infinite, say the language of arithmetic expressions involving numbers, two binary operators '+', '*' and variables then you can't possibly list all strings which belong to the language, but sometimes (see blazs's comment below) you can give a finite description as a set of rules.
E := NUM | v | E '+' E | E '*' E
(where NUM is a sequence of digits, v is a variable) is a finite description of an infinite set. That's what makes it formal.
The various other aspects like speech or the evolution of the language are different issues. Those can also be formalised.

Related

What is denotational semantics?

I am looking for an accurate and understandable definition. The ones I have found differ from each other:
From a book on functional reactive programming
Denotational semantics is a mathematical expression of the
formal meaning of a programming language.
However, wikipedia refers to it as an approach and not a math expression
Denotational semantics is an approach of formalizing the meanings of
programming languages by constructing mathematical objects (called
denotations) that describe the meanings of expressions from the
languages
The term "denotational semantics" refers to both the mathematical meanings of programs and the approach of giving such meanings to programs. It is like, say, the word "history", which means the history of something as well as the entire research field on histories of things.
I've never found the definitions of the term "denotational semantics" useful for understanding the concept and its significance. Rather, I think it's best approached instead by considering the forms of reasoning that denotational semantics enables.
Specifically, denotational semantics enables equational reasoning with referentially transparent programs. Wikipedia gives this introductory definition of referential transparency:
An expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).
But a more precise definition wouldn't talk about replacing an expression with a "value", but rather replacing it with another expression. Then, referential transparency is the property where, if your replace parts with replacements that have the same denotation, then the resulting wholes also have the same denotation.
So IMHO, as a programmer, that's the key thing to understand: denotational semantics is about how to give mathematical "teeth" to the concept of referential transparency, so we can give principled answers to claims about correctness of substitution. In the context of functional programming, for example, one of the key applications is: when can we say that two function-valued expressions actually denote "the same" function, and thus either can safely substitute for the other? The classic denotational answer is extensional equality: two functions are equal if and only if they map the same inputs to the same outputs, so we just have to prove whether the expressions in question denote extensionally equivalent functions. So for example, Quicksort and Bubblesort are notably different arguments, but denotationally they are the same function.
In the context of reactive programming, the big question would be: when can we say that two different expressions nevertheless denote the same event stream or time-dependent value?

What do functional programmers mean by "moral"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I've noticed the word "moral" keeps coming up in functional programming contexts. A couple examples:
Fast and Loose Reasoning is Morally Correct
Purescript Aff documentation:
This is moral equivalent of ErrorT (ContT Unit (Eff e)) a.
I'm unfamiliar with these usages of the word. I can mostly infer what they're trying to say, but can we clarify more precisely what it means?
(Cross-posted on English Language & Usage)
The term "moral equivalence" in (formalized) logics, and by extension, in programming has nothing to do with appeal to morality (as in, ethical or philosophical questions). It is co-opting the term "morally", but means something different. It is generally supposed to mean "P holds, but only under certain side-conditions". These conditions are often omitted if they have no educational value, are trivial, technical and/or boring. Hence, the linked article about "moral equivalence" has nothing to do it – there are no value judgements involved here.
I don't know much about Purescript, but the way I'd interpret the statement you mentioned as "you can achieve the same thing with Aff as ErrorT (ContT Unit (Eff e)) a."
To give another example: Let's say you have two functions and you are only interested in a specific (maybe large) subset of their domains. Let's also say that these two functions agree on these domains, that is, for all x ∈ dom, f(x) = g(x). But for the sake of the example, maybe they do something different on 0, but you will not ever pass 0 into them (because 0 violates some assumption). One could reasonably say that f and g "are morally equivalent".
Especially in the logics community, there are other uses of "moral", for example in the phrase "the proof is morally questionable", which means that the author considers the proof to be sloppy and that it may have gaps, but technically fixable. In a particular case, namely carrying out proofs about potentially non-terminating programs, the paper you have mentioned gives such a justification, which is echoed in the title "Fast and Loose Reasoning is Morally Correct."
As Conor McBride points out on twitter, this usage stems from the category theory community, which inspires much in fp.
https://twitter.com/pigworker/status/739971825128607744
Eugenia Cheng has a good paper describing the concept of morality as used in mathematics.
http://www.cheng.staff.shef.ac.uk/morality/morality.pdf

Expression Simplification Algorithm

I'm currently working on a Calculator app and I want output to be in both a simplified expression and decimal answer form. An example would be sqrt 2 * sqrt 3 = sqrt 6 which can also be outputted as 2.44948... What is the best approach to this and is there any well-established algorithms to do this?
Yes. What you likely want is a computer algebra system, which understands formulas as artifacts to be manipulated by explicit mathematics rules.
Mathematica and Macsyma are applications which do this. However, these are quite sophisticated systems and it is not easy to see how they "work".
What you need to do is:
Represent formulas as abstract syntax trees
Parse text formulas (your example equations) into such trees
Encode a set of tree-manipulation rules that represent algebra operations
Apply these rules to your algebra trees
Prettyprint the algebra trees back as text when done
Rules are best written in the surface syntax of algebra. (Mathematica doesn't do this; it represents formulas as trees using a kind of prefix S-expressions, and rules as the same kind of trees with special variable nodes).
One of the issues is deciding how many "algebra" rules you are wiling to encode. Mathematics is a lot more than pure 9th grade algebra, and people using such systems tend to want to extend what is there by adding more knowledge (the point of Mathematica and Macsyma: they are infinitely extendable).
Here's a very simple version of this. You can see all the "gears" and how things are described, in terms of parse trees and rewrite rules.
http://www.semdesigns.com/Products/DMS/SimpleDMSDomainExample.html

Which FP language follows lambda calculus the closest? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Which FP language follows lambda calculus the closest in terms of its code looking, feeling, acting like lambda calculus abstractions?
This might not be a real answer, it's more of a guess about what you actually want.
In general, there's very little in the lambda calculus -- you basically need (first-class) functions, function applications, and variables. These days you'll have hard time finding a language that does not provide you with these things... However, things can get confusing when you're trying to learn about it -- for example, it's very easy to just use plain numbers and then get them mixed up with church numerals. (I've seen this happen with many students, adapting to the kind of formal thinking that you need for this material is hard enough that throwing encodings onto the pile doesn't really help...)
As Don said, Scheme is very close to the "plain" untyped lambda calculus, and it's probably very fitting in your case if you're going through The Little Schemer. If you really want to use a "proper" LC, you need to make sure that you use only functions (problems as above); but there are some additional problems that you'll run into, especially when you read various other texts on the subject. First, most texts will use lazy evaluation which you don't get in Scheme. Second, since LC has only unary functions, it's very common to shorten terms and use, for example, λxyz.zxy instead of the "real" form which in this case is λx.(λy.(λz.((z x) y))) or (lambda (x) (lambda (y) (lambda (z) ((z x) y)))) in Scheme. (This is called Currying.)
So yes, Scheme is pretty close to LC, but that's not saying much with all of these issues. Haskell is arguably a better candidate since it's both lazy, and does that kind of currying for multiple arguments to functions. OTOH, you're dealing with a typed language which is a pretty big piece of baggage to bring into this game -- and you'll get in some serious mud if you try to do TLS-style examples...
If you do want to get everything (lazy, shorthands, untyped, close enough to Scheme), then Racket has another point to consider. At a high-level, it's very close to Scheme, but it goes much farther in that you can quickly slap up a language that is a restriction of the Racket language to just lambda expressions and function applications. With some more work, you can also get it to do currying and you can even make it lazy. That's not really an exercise that you should try doing yourself at this point -- but if it sounds like what you want, then I can point you to my course (look for "Schlac" in the class notes) where we use a language that is doing all of the above, and it's extremely restricted so you get nothing more than the basic LC constructs. (For example, 3 is an unbound identifier until you define it.) Note that this is not some interpreter -- it's compiled into Racket code which means that it runs fast enough that you can even write code that uses numbers. You can get the implementation for that language there too, and once you install that, you get this language if you start files with #lang pl schlac.
Lambda calculus is a very, very restricted programming model. You have only functions. No literals, no built in arithmetic operators, no data structures. Everything is encoded as functions. As such, most functional languages try to extend the lambda calculus in ways to make it more convenient for everyday programming.
Haskell uses a modern extension of lambda calculus as its core language: System F, extended with data types. (GHC has since extended this further to System Fc, supporting type equality coercions).
As all Haskell can be written directly in its core language, and its core language is an extension of typed lambda calculus (specifically, second-order lambda calculus), it could be said that Haskell follows lambda calculus closely, modulo its builtin operators for concurrency; parallelism; and memory side effects (and the FFI). This makes development of new compiler optimizations significantly easier, and also makes the semantics of a given program more tractable to understand.
On the other hand, Scheme is a variant of the untyped lambda calculus, extended with side effects and other non-lambda calculus concepts (such as concurrency primitives). It can be said to closely follow the untyped lambda calculus.
The only people that this matters to are: people learning the lambda calculus; and compiler writers.

What is your preferred style for naming variables in R? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Which conventions for naming variables and functions do you favor in R code?
As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:
1. Use of period separator, e.g.
stock.prices <- c(12.01, 10.12)
col.names <- c('symbol','price')
Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.
Cons: Rife with object-oriented connotations, and confusing to R newbies
2. Use of underscores
stock_prices <- c(12.01, 10.12)
col_names <- c('symbol','price')
Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.
Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').
3. Use of mixed capitalization (camelCase)
stockPrices <- c(12.01, 10.12)
colNames <- c('symbol','price')
Pros: Appears to have wide adoption in several language communities.
Cons: Has recent precedent, but not historically used (in either R base or its documentation).
Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.
The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).
Good previous answers so just a little to add here:
underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);
dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;
so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.
And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)
I did a survey of what naming conventions that are actually used on CRAN that got accepted to the R Journal :) Here is a graph summarizing the results:
Turns out (no surprises perhaps) that lowerCamelCase was most often used for function names and period.separated names most often used for parameters. To use UpperCamelCase, as advocated by Google's R style guide is really rare however, and it is a bit strange that they advocate using that naming convention.
The full paper is here:
http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf
Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T) to see them all.
I use the official Hadley style of coding ;)
I like camelCase when the camel actually provides something meaningful -- like the datatype.
dfProfitLoss, where df = dataframe
or
vdfMergedFiles(), where the function takes in a vector and spits out a dataframe
While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.
This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.
As I point out here:
How does the verbosity of identifiers affect the performance of a programmer?
it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...
For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.
As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.
Using dots as a separator gets a little hairy with S3 classes and the like.
In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.
I have a preference for mixedCapitals.
But I often use periods to indicate what the variable type is:
mixedCapitals.mat is a matrix.
mixedCapitals.lm is a linear model.
mixedCapitals.lst is a list object.
and so on.
Usually I rename my variables using a ix of underscores and a mixed capitalization (camelCase). Simple variables are naming using underscores, example:
PSOE_votes -> number of votes for the PSOE (political group of Spain).
PSOE_states -> Categorical, indicates the state where PSOE wins {Aragon, Andalucia, ...)
PSOE_political_force -> Categorial, indicates the position between political groups of PSOE {first, second, third)
PSOE_07 -> Union of PSOE_votes + PSOE_states + PSOE_political_force at 2007 (header -> votes, states, position)
If my variable is a result of to applied function in one/two Variables I using a mixed capitalization.
Example:
positionXstates <- xtabs(~states+position, PSOE_07)

Resources