length of the longest possible string contains no repeated 3-mers

length of the longest possible string contains no repeated 3-mers - math

I'm trying to find the length of the longest possible string of consecutive digits that contains no repeated 3-mers.
This is a bioinformatics question, and I'm sorting this for protein sequence.
basically, something like 0102340109 does not work because 010repeats.
But something like 0002223589765 works because you cannot find any repeated 3 digits.
I need to find the longest sequence and I'm kinda stuck and clueless.

The following codes are written in ES6. You can make a sliding procedure which takes a string input a returns an Iterable of substring "windows"
Array.from(sliding (3,1) ('012345'))
// [ '012', '123', '234', '345' ]
Array.from(sliding (2,2) ('012345'))
// [ '01', '23', '45' ]
Array.from(sliding (4,2) ('012345'))
// [ '0123', '1234', '2345' ]
Then, using this, you can define a seqIsRepeated procedure which iterates thru the sliding windows. Instead of pre-computing the entire list of windows, we will look at them 1-by-1, adding each result to a Set. If the window already exists in the Set, true will be returned immediately and iteration is stopped. If the procedure makes it thru all windows without finding a duplicate, false will be returned.
const sliding = (m,n) => function* (xs) {
for (let i = 0; i + m <= xs.length; i += n)
yield xs.substr(i, m);
};
const seqIsRepeated = n => xs => {
let set = new Set();
for (let seq of sliding (n,1) (xs))
if (set.has(seq))
return true;
else
set.add(seq);
return false;
};
console.log (seqIsRepeated (3) ('0102340109')); // true
console.log (seqIsRepeated (3) ('0002223589765')); // false
This doesn't find you the longest sequence, but hopefully it does give you a start. From here, you'd be looking at substrings of your input sequence and using seqIsRepeated(3) to eliminate substrings as possibilities

Related

Combination Sum in Go

/*
Given an array: [1,2] and a target: 4
Find the solution set that adds up to the target
in this case:
[1,1,1,1]
[1,1,2]
[2,2]
*/
import "sort"
func combinationSum(candidates []int, target int) [][]int {
sort.Ints(candidates)
return combine(0, target, []int{}, candidates)
}
func combine(sum int, target int, curComb []int, candidates []int) [][]int {
var tmp [][]int
var result [][]int
if sum == target {
fmt.Println(curComb)
return [][]int{curComb}
} else if sum < target {
for i,v := range candidates {
tmp = combine(sum+v, target, append(curComb, v), candidates[i:])
result = append(result,tmp...)
}
}
return result
}
This is a problem in Leetcode and I use recursion to solve it.
In line 18, I print every case when the sum is equal to the target.
The output is :
[1,1,1,1]
[1,1,2]
[2,2]
And that is the answer that I want!
But why is the final answer (two-dimensional):
[[1,1,1,2],[1,1,2],[2,2]]
Expected answer is : [[1,1,1,1],[1,1,2],[2,2]]
Please help me find the mistake in the code. Thanks for your time.

This happens because of the way slices work. A slice object is a reference to an underlying array, along with the length of the slice, a pointer to the start of the slice in the array, and the slice's capacity. The capacity of a slice is the number of elements from the beginning of the slice to the end of the array. When you append to a slice, if there is available capacity for the new element, it is added to the existing array. However, if there isn't sufficient capacity, append allocates a new array and copies the elements. The new array is allocated with extra capacity so that an allocation isn't required for every append.
In your for loop, when curComb is [1, 1, 1], its capacity is 4. On successive iterations of the loop, you append 1 and then 2, neither of which causes a reallocation because there's enough room in the array for the new element. When curComb is [1, 1, 1, 1], it is put on the results list, but in the next iteration of the for loop, the append changes the last element to 2 (remember that it's the same underlying array), so that's what you see when you print the results at the end.
The solution to this is to return a copy of curComb when the sum equals the target:
if sum == target {
fmt.Println(curComb)
tmpCurComb := make([]int, len(curComb))
copy(tmpCurComb, curComb)
return [][]int{tmpCurComb}
This article gives a good explanation of how slices work.

Creating Sequence of Sequences is Causing a StackOverflowException

I'm trying to take a large file and split it into many smaller files. The location where each split occurs is based on a predicate returned from examining the contents of each given line (isNextObject function).
I have attempted to read in the large file via the File.ReadLines function so that I can iterate through the file one line at a time without having to hold the entire file in memory. My approach was to group the sequence into a sequence of smaller sub-sequences (one per file to be written out).
I found a useful function that Tomas Petricek created on fssnip called groupWhen. This function worked great for my initial testing on a small subset of the file, but a StackoverflowException is thrown when using the real file. I am not sure how to adjust the groupWhen function to prevent this (I'm still an F# greenie).
Here is a simplified version of the code showing only the relevant parts that will recreate the StackoverflowExcpetion::
// This is the function created by Tomas Petricek where the StackoverflowExcpetion is occuring
module Seq =
/// Iterates over elements of the input sequence and groups adjacent elements.
/// A new group is started when the specified predicate holds about the element
/// of the sequence (and at the beginning of the iteration).
///
/// For example:
/// Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]]
let groupWhen f (input:seq<_>) = seq {
use en = input.GetEnumerator()
let running = ref true
// Generate a group starting with the current element. Stops generating
// when it founds element such that 'f en.Current' is 'true'
let rec group() =
[ yield en.Current
if en.MoveNext() then
if not (f en.Current) then yield! group() // *** Exception occurs here ***
else running := false ]
if en.MoveNext() then
// While there are still elements, start a new group
while running.Value do
yield group() |> Seq.ofList }
This is the gist of the code making use Tomas' function:
module Extractor =
open System
open System.IO
open Microsoft.FSharp.Reflection
// ... elided a few functions include "isNextObject" which is
// a string -> bool (examines the line and returns true
// if the string meets the criteria to that we are at the
// start of the next inner file)
let writeFile outputDir file =
// ... write out "file" to the file system
// NOTE: file is a seq<string>
let writeFiles outputDir (files : seq<seq<_>>) =
files
|> Seq.iter (fun file -> writeFile outputDir file)
And here is the relevant code in the console application that makes use of the functions:
let lines = inputFile |> File.ReadLines
writeFiles outputDir (lines |> Seq.groupWhen isNextObject)
Any ideas on the proper way to stop groupWhen from blowing the stack? I'm not sure how I would convert the function to use an accumulator (or to use a continuation instead, which I think is the correct terminology).

The problem with this is that the group() function returns a list, which is an eagerly evaluated data structure, which means that every time you call group() it has to run to the end, collect all results in a list, and return the list. This means that the recursive call happens within that same evaluation - i.e. truly recursively, - thus creating stack pressure.
To mitigate this problem, you could just replace the list with a lazy sequence:
let rec group() = seq {
yield en.Current
if en.MoveNext() then
if not (f en.Current) then yield! group()
else running := false }
However, I would consider less drastic approaches. This example is a good illustration of why you should avoid doing recursion yourself and resort to ready-made folds instead.
For example, judging by your description, it seems that Seq.windowed may work for you.

It's easy to overuse sequences in F#, IMO. You can accidentally get stack overflows, plus they are slow.
So (not actually answering your question),
personally I would just fold over the seq of lines using something like this:
let isNextObject line =
line = "---"
type State = {
fileIndex : int
filename: string
writer: System.IO.TextWriter
}
let makeFilename index =
sprintf "File%i" index
let closeFile (state:State) =
//state.writer.Close() // would use this in real code
state.writer.WriteLine("=== Closing {0} ===",state.filename)
let createFile index =
let newFilename = makeFilename index
let newWriter = System.Console.Out // dummy
newWriter.WriteLine("=== Creating {0} ===",newFilename)
// create new state with new writer
{fileIndex=index + 1; writer = newWriter; filename=newFilename }
let writeLine (state:State) line =
if isNextObject line then
/// finish old file here
closeFile state
/// create new file here and return updated state
createFile state.fileIndex
else
//write the line to the current file
state.writer.WriteLine(line)
// return the unchanged state
state
let processLines (lines: string seq) =
//setup
let initialState = createFile 1
// process the file
let finalState = lines |> Seq.fold writeLine initialState
// tidy up
closeFile finalState
(Obviously a real version would use files rather than the console)
Yes, it is crude, but it is easy to reason about, with
no unpleasant surprises.
Here's a test:
processLines [
"a"; "b"
"---";"c"; "d"
"---";"e"; "f"
]
And here's what the output looks like:
=== Creating File1 ===
a
b
=== Closing File1 ===
=== Creating File2 ===
c
d
=== Closing File2 ===
=== Creating File3 ===
e
f
=== Closing File3 ===

Mapping a given value to an action depending on certain characteristics

Suppose I have a certain value, and I want to do something with it depending on certain characteristics it might have.
For example, suppose the value is a string, and I want to print it to the screen if it starts with the letter L, save it to a file if it's length is less than 20 characters, and play a sound if the last character is the same as the first one.
One option of course is a simple if else if construct:
if (value[0] == 'L')
....
else if (value.Length < 20)
....
else if (value[0] == value.Last())
....
However with a lot of conditions, this can get ugly really fast. So the other option is a Dictionary. However I'm not sure how I can use a Dictionary to achieve this.
How can this be done?

You can construct a dictionary that contains conditions and actions that should be performed if a condition is met. In general, if you need to work with type T, this dictionary will have a type Dictionary<Predicate<T>, Action<T>>. For a string it can be:
var conditions = new Dictionary<Predicate<string>, Action<string>>
{
{s => s.StartsWith("L"), s => Console.WriteLine("Starts with L")},
{s => s.Length < 20, s => Console.WriteLine("Has fewer that 20 symbols")},
};
string input = "some input";
foreach (var condition in conditions)
{
if (condition.Key(input)) condition.Value(input);
}
In fact, you don't even need a Dictionary here - you can use List<Tuple<Predicate<string>, Action<string>>>, or, even better - to introduce a simple small class that contains a predicate and an action.

How can this imperative code be rewritten to be more functional?

I found an answer on SO that explained how to write a randomly weighted drop system for a game. I would prefer to write this code in a more functional-programming style but I couldn't figure out a way to do that for this code. I'll inline the pseudo code here:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
How could this be written in a style that's more functional? I am using CoffeeScript and underscore.js, but I'd prefer this answer to be language agnostic because I'm having trouble thinking about this in a functional way.

Here are two more functional versions in Clojure and JavaScript, but the ideas here should work in any language that supports closures. Basically, we use recursion instead of iteration to accomplish the same thing, and instead of breaking in the middle we just return a value and stop recursing.
Original pseudo code:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
Clojure version (objects are just treated as clojure maps):
(defn recursive-version
[r objects]
(loop [t 0
others objects]
(let [obj (first others)
new_t (+ t (:weight obj))]
(if (> new_t r)
obj
(recur new_t (rest others))))))
JavaScript version (using underscore for convenience).
Be careful, because this could blow out the stack.
This is conceptually the same as the clojure version.
var js_recursive_version = function(objects, r) {
var main_helper = function(t, others) {
var obj = _.first(others);
var new_t = t + obj.weight;
if (new_t > r) {
return obj;
} else {
return main_helper(new_t, _.rest(others));
}
};
return main_helper(0, objects);
};

You can implement this with a fold (aka Array#reduce, or Underscore's _.reduce):
An SSCCE:
items = [
{item: 'foo', weight: 50}
{item: 'bar', weight: 35}
{item: 'baz', weight: 15}
]
r = Math.random() * 100
{item} = items.reduce (memo, {item, weight}) ->
if memo.sum > r
memo
else
{item, sum: memo.sum + weight}
, {sum: 0}
console.log 'r:', r, 'item:', item
You can run it many times at coffeescript.org and see that the results make sense :)
That being said, i find the fold a bit contrived, as you have to remember both the selected item and the accumulated weight between iterations, and it doesn't short-circuit when the item is found.
Maybe a compromise solution between pure FP and the tedium of reimplementing a find algorithm can be considered (using _.find):
total = 0
{item} = _.find items, ({weight}) ->
total += weight
total > r
Runnable example.
I find (no pun intended) this algorithm much more accessible than the first one (and it should perform better, as it doesn't create intermediate objects, and it does short-circuiting).
Update/side-note: the second algorithm is not "pure" because the function passed to _.find is not referentially transparent (it has the side effect of modifying the external total variable), but the whole of the algorithm is referentially transparent. If you were to encapsulate it in a findItem = (items, r) -> function, the function will be pure and will always return the same output for the same input. That's a very important thing, because it means that you can get the benefits of FP while using some non-FP constructs (for performance, readability, or whatever reason) under the hoods :D

I think the underlying task is randomly selecting 'events' (objects) from array os with a frequency defined by their respective weights. The approach is to map (i.e. search) a random number (with uniform distribution) onto the stairstep cumulative probability distribution function.
With positive weights, their cumulative sum is increasing from 0 to 1. The code you gave us simply searches starting at the 0 end. To maximize speed with repeated calls, pre calculate sums, and order the events so the largest weights are first.
It really doesn't matter whether you search with iteration (looping) or recursion. Recursion is nice in a language that tries to be 'purely functional' but doesn't help understanding the underlying mathematical problem. And it doesn't help you package the task into a clean function. The underscore functions are another way of packaging the iterations, but don't change the basic functionality. Only any and all exit early when the target is found.
For small os array this simple search is sufficient. But with a large array, a binary search will be faster. Looking in underscore I find that sortedIndex uses this strategy. From Lo-Dash (an underscore dropin), "Uses a binary search to determine the smallest index at which the value should be inserted into array in order to maintain the sort order of the sorted array"
The basic use of sortedIndex is:
os = [{name:'one',weight:.7},
{name:'two',weight:.25},
{name:'three',weight:.05}]
t=0; cumweights = (t+=o.weight for o in os)
i = _.sortedIndex(cumweights, R)
os[i]
You can hide the cumulative sum calculation with a nested function like:
osEventGen = (os)->
t=0; xw = (t+=y.weight for y in os)
return (R) ->
i = __.sortedIndex(xw, R)
return os[i]
osEvent = osEventGen(os)
osEvent(.3)
# { name: 'one', weight: 0.7 }
osEvent(.8)
# { name: 'two', weight: 0.25 }
osEvent(.99)
# { name: 'three', weight: 0.05 }
In coffeescript, Jed Clinger's recursive search could be written like this:
foo = (x, r, t=0)->
[y, x...] = x
t += y
return [y, t] if x.length==0 or t>r
return foo(x, r, t)
An loop version using the same basic idea is:
foo=(x,r)->
t=0
while x.length and t<=r
[y,x...]=x # the [first, rest] split
t+=y
y
Tests on jsPerf http://jsperf.com/sortedindex
suggest that sortedIndex is faster when os.length is around 1000, but slower than the simple loop when the length is more like 30.

Sorting in AdvancedDatagrid in Flex 3

I am using AdvancedDatagrid in Flex 3. One column of AdvancedDatagrid contains numbers and alphabets. When I sort this column, numbers come before alphabets (Default behavior of internal sorting of AdvancedDatagrid). But I want alphabets to come before number when I sort.
I know I will have to write the custom sort function. But can anybody give some idea on how to proceed.
Thanks in advance.

Use sortCompareFunction
The AdvancedDataGrid control uses this function to sort the elements of the data provider collection. The function signature of the callback function takes two parameters and has the following form:
mySortCompareFunction(obj1:Object, obj2:Object):int
obj1 — A data element to compare.
obj2 — Another data element to compare with obj1.
The function should return a value based on the comparison of the objects:
-1 if obj1 should appear before obj2 in ascending order.
0 if obj1 = obj2.
1 if obj1 should appear after obj2 in ascending order.
<mx:AdvancedDataGridColumn sortCompareFunction="mySort"
dataField="colData"/>
Try the following sort compare function.
public function mySort(obj1:Object, obj2:Object):int
{
var s1:String = obj1.colData;
var s2:String = obj2.colData;
var result:Number = s1.localeCompare(s2);
if(result != 0)
result = result > 0 ? 1 : -1;
if(s1.match(/^\d/))
{
if(s2.match(/^\d/))
return result;
else
return 1;
}
else if(s2.match(/^\d/))
return -1;
else
return result;
}
It checks the first character of strings and pushes the ones that start with a digit downwards in the sort order. It uses localeCompare to compare two strings if they both start with letters or digits - otherwise it says the one starting with a letter should come before the one with digit. Thus abc will precede 123 but a12 will still come before abc.
If you want a totally different sort where letters always precede numbers irrespective of their position in the string, you would have to write one from the scratch - String::charCodeAt might be a good place to start.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

length of the longest possible string contains no repeated 3-mers - math

Related

Combination Sum in Go

Creating Sequence of Sequences is Causing a StackOverflowException

Mapping a given value to an action depending on certain characteristics

How can this imperative code be rewritten to be more functional?

Sorting in AdvancedDatagrid in Flex 3

Categories

Resources