What is binding occurrences? - functional-programming

I am learning Jason Hickey's Introduction to Objective Caml.
I got several simple questions:
What is binding?
What is occurrences?
What is binding occurrences?
I ask these question because the book says:

I have not read this text, and am just a humble practitioner (not a theorist) but I'm pretty sure I know what the terms mean.
Binding is the association of a name with a value.
An occurrence is a single appearance of a name in an expression. If the name shows up twice, there are two occurrences.
A binding occurrence is an appearance of a name in a spot that causes it to be bound to a value. In let x = 4 in x + 2 the first occurrence of x is a binding occurrence and the second is just an occurrence.
Edit: What the quoted text is telling you is that when a name shows up in a pattern, then a successful match of the pattern causes the name to be bound to a value.

Related

Align Text To Row Identified by Number and to ID Matching that Embedded in String

I need to align text in an ALERT STRING column with the row identified by number in an ID ROW column.
Additionally, I need to also align the same ALERT STRING text with the same ID ROW number AND with the ID matching that embedded in a string in the TEXT WITH ID column. (This double-check will sometimes be necessary with the real-world data.)
So far, I've only figured out how to align the ALERT STRING with the ID matching that embedded in the TEXT WITH ID column:
=LOOKUP(2,1/SEARCH(A2,$F$2:$F$11),$G$2:$G$11)
I appreciate any help folks can offer. You can find an editable copy of the workbook here:
https://1drv.ms/x/s!ArQ7Kw6ayNMY2zktTW3pDCbMmJZ_
UPDATE: Nayan provided a solution to the first part of this question (please see answer below). I'm still trying to work out a formula for the column D part of this question, in which the row reference shown in column E is combined with a match of the ID shown in column A with its corresponding value in one of the text strings in column F.
The best I've been able to come up with so far is a formula with a high failure rate:
=INDEX($G$2:$G$11,MATCH(ROW(D2),$E$2:$E$11,MATCH("*"&A2&"*",$F$2:$F$11,0)))
Any help with this part of the question will be greatly appreciated.
ROW([reference])
Returns the row number of a reference
E.g.: Row(B2) returns 2. If nothing provided like ROW() will also
return row number based on position of cell where it is called.
VLOOKUP(loolup_value, table_array, col_index_num, [range_lookup])
Looks for a value in the leftmost column of a table, and then returns a value in the same row from a column you specify (col_index_num)
By default - the table must be sorted in an ascending order.
Try this:
=VLOOKUP(ROW(B2),$E$2:$G$11,3,FALSE)
INDEX(array, row_num, [column_num]) INDEX(reference, row_num,
[column_num], [area_num])
Returns a value or reference of the cell at the intersection of a particular row and column, in a given range.
In this case, you have to get row_num with MATCH function.
MATCH(lookup_value, lookup_array, [match_type])
Returns a relative position of an item in an array that matches a specified value in a specified order.
match_type: 1 (Less than), 0 (Exact match), -1 (Greater than)
Try this:
=INDEX($G$2:$G$11,MATCH(ROW(B2),$E$2:$E$11,0))
Identity Data with Multiple Criteria Condition using MATCH()
=INDEX($G$2:$G$11,MATCH(1, (ROW(D2) = $E$2:$E$11) * (ISNUMBER(SEARCH(A2, $F$2:$F$11))),0))
References:
https://exceljet.net/excel-functions/excel-vlookup-function
https://exceljet.net/excel-functions/excel-index-function
https://exceljet.net/formula/index-and-match-with-multiple-criteria
This is the formula I was looking for in column D:
=INDEX($G$2:$G$11,MATCH(ROW(D2)&"*"&A2&"*",INDEX($E$2:$E$11&$F$2:$F$11,),0))
You can see it working here.
Nayan provided a great deal of help with answering this question, so I will mark his answer as the accepted solution.
Syeda Fahima Nazreen provided the example I referenced to figure out the formula shown above.
Reference:
Nested Excel Formula with Two INDEX Functions and a MATCH Function with Multiple Criteria

How can I return a vector with a dataframe inside in R?

Here is a challenge for you: I was trying to make a tic tac toe based on R. First, the players have to configure putting in the name of the players, and the game should check if the name exists in a file called "Players.txt" (if not, the game will create one), if the name exists, the game will ask for a new one. The last part of the game is that the game should record all the punctuation of the players (each gambling chip used will subtract 5 points of 100 that the player has at the beginning of the game). The problem is when a player wins, the game shows the following error: "Error in table[location_name1, 3]: Incorrect number of dimension in R".
A vector can either be atomic or a list. Atomic vectors can only contain elements of one and the same data type. That means, you are "accidentally" creating a list with
vector=c(win,name1,name2,table)
with the result that each column of the data frame should become an entry.
You can solve it with
vector <- list(win, name1, name2, table)
vector is still a list but now it has the format I believe you want.
Having done that you still get errors. The reason is that these assignments fail.
location_name1=which(grepl(name1,table$gamers))
location_name2=which(grepl(name2,table$gamers))
They return an empty vector because earlier in the code you set win=vector[1]... table=vector[4]. Since vector is now a list, you have to subset it accordingly. That means you have to chance the statements to table=vector[[4]].
Now you are going to get another problem. The reason is that you treat the columns table$scores as text. When you read the data you need to make sure that this columns is not interpreted as text. You also have to eliminate all statements that coerce the column into text. Otherwise table[location_name1,3]=table[location_name1,3]+pointsx will obviously fail because you cannot add a number to a string.
For example, you coerce the column into a character column with this statement:
name1 <- data.frame(gamers=name1,games="1",scores="100")
games and scores are strings not numbers. Another example is the assigment after reading the table from the file. You can make sure that scoresare numeric by doing this.
scores <- as.numeric(table[,3])
Please get familiar with Rstudio debugging capabilities (https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio). This way you can go through your code line by line and check consequences of each assignment to the data frame.

Parsing - Adding a capturing group

I am attempting to use a fairly complex REGEX expression (see REGEX101 demos below), which I amended slightly from one created by an expert on this site. It parses specific patterns of log events:
1EXE_IN1EXE_CO2CONTENT_ACCESS3CONTENT_ACCESS
These log sequences will always begin with a random selection of EXE_IN or EXE_CO events, preceded sequence numbers. These selections can be any number, in any order. In this case, we just have two EXE events but this may be 200. Or 1. Note that there is a sequence number and we need to capture it.
The second part of the sequence will always be a series of digit-prefaced CONTENT.ACCESS events. Again from 1 to infinity in length.
The following demo shows a working example and probably conveys the concept better than I can : Demo 1
It nicely captures a full match, sequence number, and event in separate groups.
I need to add a timestamp to the pattern (after the sequence number, with a preceding underscore), and then parse this event log e.g.
1_11/08/2014 23:03EXE_IN1_11/08/2014 23:03EXE_CO2_12/08/2014 09:17CONTENT_ACCESS3_13/08/2014 09:17CONTENT_ACCESS
I need to capture the timestamps as well.
I attempted to adjust the regex expression, with mixed results. Please see this demo: demo2
Ideally I'd like to see something like this for each event:
Match n
Full match 266-308 `2_12/08/2014 09:17CONTENT_ACCESS`
Group 1. 266-267 `2`
Group 2. 268-284 `12/08/2014 09:17`
Group 3. 284-308 `CONTENT_ACCESS`
I hope you can help me. REGEX101 pcre testing is sufficient (for the record, I am using perl-compatible str_match_all_perl function in R).
Many thanks in advance.
(\d+)_(.*?)(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/1
Due to comments it was changed to (?:\G(?!^)(?(?=\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}(?:EXE_CO|EXE_IN))(?<!\d_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}CONTENT_ACCESS))|(?=(?:\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}(?:EXE_CO|EXE_IN))+(?:\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}CONTENT_ACCESS)+))(\d+)_(\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/3
Ans also another version, which is shorter
(?:\G(?!^)(?(?=\d+_.{16}(?:EXE_CO|EXE_IN))(?<!\d_.{16}CONTENT_ACCESS))|(?=(?:\d+_.{16}(?:EXE_CO|EXE_IN))+(?:\d+_.{16}CONTENT_ACCESS)+))(\d+)_(.{16})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/4
And even more shorter (?:\G(?!^)(?(?=\d+_.{16}E)(?<!S))|(?=(?:\d+_.{16}(?:EXE_CO|EXE_IN))+\d+_.{16}C))(\d+)_(.{16})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/5
And super short (?:\G|(?=\d+_.{16}E.*CON))(\d+)_(.*?)(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/8

How to find out the longest definition entry in an English dictionary text file?

I asked over at the English Stack Exchange, "What is the English word with the longest single definition?" The best answer they could give is that I would need a program that could figure out the longest entry in a (text) file listing dictionary definitions, by counting the amount of characters or words in a given entry, and then provide a list of the longest entries. I also asked at Superuser but they couldn't come up with an answer either, so I decided to give it a shot here.
I managed to find a dictionary file which converted to text has the following format:
a /a/ indefinite article (an before a vowel) 1 any, some, one (have a cookie). 2 one single thing (there’s not a store for miles). 3 per, for each (take this twice a day).
aardvark /ard-vark/ n an African mammal with a long snout that feeds on ants.
abacus /a-ba-kus, a-ba-kus/ n a counting frame with beads.
As you can see, each definition comes after the pronunciation (enclosed by slashes), and then either:
1) ends with a period, or
2) ends before an example (enclosed by parenthesis), or
3) follows a number and ends with a period or before an example, when a word has multiple definitions.
What I would need, then, is a function or program that can distinguish each definition (including considering multiple definitions of a single word as separate ones), then count the amount of characters and/or words within (ignoring the examples in parenthesis since that is not the proper definition), and finally provide a list of the longest definitions (I don't think I would need more than say, a top 20 or so to compare). If the file format was an issue, I can convert the file to PDF, EPUB, etc. with no problem. And, I guess ideally I would want to be able to choose between counting length by characters and by words, if it was possible.
How should I go to do this? I have little experience from programming classes I took a long time ago, but I think it's better to assume I know close to nothing about programming at all.
Thanks in advance.
I'm not going to write code for you, but I'll help think the problem through. Pick the programming language you're most familiar with from long ago, and give it a whack. When you run in to problems, come back and ask for help.
I'd chop this task up into a bunch of subproblems:
Read the dictionary file from the filesystem.
Chunk the file up into discrete entries. If it's a text file like you show, most programming languages have a facility to easily iterate linewise through a file (i.e. take a line ending character or character sequence as the separator).
Filter bad entries: in your example, your lines appear separated by an empty line. As you iterate, you'll just drop those.
Use your human observation and judgement to look for strong patterns in the data that you can give communicate as firm rules -- this is one of the central activities of programming. You've already started identifying some patterns in your question, i.e.
All entries have a preamble with the pronounciation and part of speech.
A multiple definition entry will be interspersed with lone numerals.
Otherwise, a single definition just follows the preamble.
Write the rules you've invented into code. It'll go something like this: First find a way to lop off the word itself and the preamble. With the remainder, identify multiple-def entries by presence of lone numerals or whatever; if it's not, treat it as single-def.
For each entry, iterate over each of the one-or-more definitions you've identified.
Write a function that will count a definition either word-wise or character-wise. If word-wise, you'll probably tokenize based on whitespace. Counting the length of a string character-wise is trivial in most programming languages. Why not implement both!
Keep a data structure in memory as you iterate the file to track "longest". For each definition in each entry, after you apply the length calculation, you'll compare against the previous longest entry. If the new one is longer, you'll record this new leading word and its word count in your data structure. Comparing 'greater than' and storing a variable are fundamental in most programming languages, so while this is the real meat of your program, this shouldn't be hard.
Implement some way to display your results once iteration is done. This may be as simple as a print statement.
Finally, write the glue code that lets you execute the program easily. A program like this could easily be a command-line tool that takes one or two arguments (the path to the file to be analyzed, perhaps you pass your desired counting method 'character|word' as an argument too, since you implemented both). Different languages vary in how easy it is to create an executable to run from the command line, but most support it, so it's a good option for tasks like this.

Matching specific items in several discrete collections

I have a problem whereby I have several discrete lists of ID's eg.
List (A) 1,2,3,4,5,7,8
List (B) 2,3,4,5
List (C) 4,2,8,9,1
etc...
I then have another collection of ID's...
For example: 1,2,4
I need to try and match one into each list. If I can perfectly match all ID's in my secondary collection (one collection ID matched with an ID from each list) then I get a true result....
I have found that it becomes complicated because if you simply iterate over the lists matching the first collection/list pair that you encounter it may result in you precluding a possible combination further on down the line hence returning a false negative result.
For example:
List (A) 1,2,3,4
List (B) 1,2,3,4
List (C) 3,4
Collection is: 3,1,2
The first ID from the collection (3) matches with an entry in list A, the second ID in the collection (1) matches an item in list B, however the final ID in the collection (2) DOESNT match any entry in list C however if you rearrange the order of the collection to be: 2,1,3 then a match is found.... Therefore I am looking for some form of logic for attempting a match on all possible combinations in an efficient manner(?)
To make it more complicated the ID's are actually GUID's so cant just be sorted in ascending order
I hope I have described this well enough to make it clear what I am attempting and with a bit of luck somebody will be able to tell me that what I need to do is very easy and I am missing something real simple!
I am forced to code this in VB6 but any methods or pseudo code would be great. The backend of this is SQL server so if a solution using TSQL was possible this would be even better as all of the ID's are held in tables already.
Many thanks in advance.
Jake, yep the lists and the collection both contain GUIDS. I used plain integers to simplify the problem a bit.
Once a list has been matched it cant be searched again, hence the ordering problem that I tried to explain. If you say that a list as 'matched' then no further attempts to match this will be performed. It is this very behaviour that can cause a false negative.
'Sending' the collection in in every possible combination of orders would work but would be a massive job .....
I feel I must be missing a really straightforward concept or solution here??!!
Thanks for your assistance so far.
I don't see a way around checking each GUID contained in the lists against each GUID in the collection. You would have to keep record of in which lists each GUID in the collection occurs.
To use your example of the Collection (3, 1, 2), 3 occurs in List A, B and C.
You will basically be left with this dataset.
3 (A, B, C)
1 (A, B)
2 (A, B)
Once you have distilled it down to this dataset you can determine whether there are any GUIDs with zero occurrences in the lists which would result in a negative.
I am not at all well versed in algorithms, but this is how I would proceed after that :
Start with the first set (A, B, C), and check how many times it occurs further on in the dataset. In this case no occurrences are found.
Moving on to the next set (A, B), if the number of occurrences of this set is found to be greater than the length of this set, i.e. more than two occurrences, would result in a negative. If the number of occurrences match the length exactly, as is the case here, the set (A, B) can be removed from any further consideration.
3 (C)
1 ()
2 ()
I guess you would continue to repeat the process until a negative is identified or all the occurrences have been excluded. There is probably a recognized algorithm for this sort of problem, but my knowledge is a bit lacking in that respect. :(

Resources