Let’s consider you have a collection of N wines placed next to each other on a shelf. The price of the ith wine is pi. (prices of different wines can be different). Because the wines get better every year, supposing today is the year 1, on year y the price of the ith wine will be y*pi, i.e. y-times the value that current year.
You want to sell all the wines you have, but you want to sell exactly one wine per year, starting on this year. One more constraint - on each year you are allowed to sell only either the leftmost or the rightmost wine on the shelf and you are not allowed to reorder the wines on the shelf (i.e. they must stay in the same order as they are in the beginning).
You want to find out, what is the maximum profit you can get, if you sell the wines in optimal order?
int N; // number of wines
int p[N]; // array of wine prices
int cache[N][N]; // all values initialized to -1
int profit(int be, int en) {
if (be > en)
return 0;
if (cache[be][en] != -1)
return cache[be][en];
int year = N - (en-be+1) + 1;
return cache[be][en] = max(profit(be+1, en) + year * p[be],profit(be, en-1) + year * p[en]);
}
Time Complexity: O(n^2).
I have already found this O(n^2) solution. Can we do it in O(n) ? (Better time complexity)
You are supposed to find the optimal cost by selling all the wines from the shelf. And only the constraint is that you are allowed to pick only left or right wine(you can't pick a wine bottle from the middle of the shelf).
As we are allowed to pick the left or right wine, the optimal sequence of solution will include either left or right bottle.
Let's find a recursive solution for that.
Just pick-up the left bottle and calculate it's cost
pick-up the right bottle and calculate the cost
Compare both the cost and choose the maximum cost
Write the necessary condition for the base case
Let's write a c++ program for this--
#include<bits/stdc++.h>
using namespace std;
int max_cost(int wine[], int cost, int counter, int i, int j){
// Here `counter` keeps track of the number of years
// `i` is the left indices of the shelf
// `j` is the right indices of the shelf
// `cost` is the maximum cost that we have to find
if(i > j)
return cost;
else if(i == j){
cost += counter * wine[i];
return cost;
}
else{
int cost1 = counter * wine[i] + max_cost(wine, 0, counter + 1, i + 1, j);
int cost2 = counter * wine[j] + max_cost(wine, 0, counter + 1, i, j - 1);
cost += max(cost1, cost2);
return cost;
}
}
int main(){
int n;
cin >> n;
int wine[n];
for(int j = 0; j < n; ++j)
cin >> wine[j];
cout << max_cost(wine, 0, 1, 0, n - 1) << endl;
return 0;
}
I think the above code is self exlanatory
Let's run it:
Input1:
5
1
3
1
5
2
Output:
43
Input2:
4
10
1
10
9
Output:
79
The time complexity of above code is O(2^n), where n is the no. of wine bottles in the shelf.
Can we improvise the time complexity?
Ofcourse. We are basically calculating for some sequences again and again, which can be avoided by memorization technique.
The recurrence relation will be basically same. In addition to that, we will memorize the value for the specific i and j. And hence we will not have to calculate the value for the same i and j again and again.
The c++ code will be --
#include<bits/stdc++.h>
using namespace std;
int find_cost(vector<int>& box, vector<vector<int>>& dp, int i, int j){
if(i == j) // base case
dp[i][j] = box[i] * box.size();
else if(!dp[i][j]){ // If not calculated so far
int n = box.size();
dp[i][j] = max(find_cost(box, dp, i, j - 1) + box[j] * (n - (j - i)),
find_cost(box, dp, i + 1, j) + box[i] * (n - (j - i)));
}
return dp[i][j];
}
void cost_wine(vector<int> box){
int n = box.size();
vector<vector<int>> dp(n + 1, vector<int>(n + 1)); // Initialize dp array
cout << find_cost(box, dp, 0, n - 1);
return;
}
int main(){
int n;
cin >> n;
vector<int> box(n);
for(int i = 0; i < n; ++i)
cin >> box[i];
cost_wine(box);
return 0;
}
Now the time complexity of the above code would be O(n^2), which is far better than the recursion method.
Related
I'm an amateur playing with discrete math. This isn't a
homework problem though I am doing it at home.
I want to solve ax + by = c for natural numbers, with a, b and c
given and x and y to be computed. I want to find all x, y pairs
that will satisfy the equation.
This has a similar structure to Bezout's identity for integers
where there are multiple (infinite?) solution pairs. I thought
the similarity might mean that the extended Euclidian algorithm
could help here. Below are two implementations of the EEA that
seem to work; they're both adapted from code found on the net.
Could these be adapted to the task, or perhaps can someone
find a more promising avenue?
typedef long int Int;
#ifdef RECURSIVE_EEA
Int // returns the GCD of a and b and finds x and y
// such that ax + by == GCD(a,b), recursively
eea(Int a, Int b, Int &x, Int &y) {
if (0==a) {
x = 0;
y = 1;
return b;
}
Int x1; x1=0;
Int y1; y1=0;
Int gcd = eea(b%a, a, x1, y1);
x = y1 - b/a*x1;
y = x1;
return gcd;
}
#endif
#ifdef ITERATIVE_EEA
Int // returns the GCD of a and b and finds x and y
// such that ax + by == GCD(a,b), iteratively
eea(Int a, Int b, Int &x, Int &y) {
x = 0;
y = 1;
Int u; u=1;
Int v; v=0; // does this need initialising?
Int q; // quotient
Int r; // remainder
Int m;
Int n;
while (0!=a) {
q = b/a; // quotient
r = b%a; // remainder
m = x - u*q; // ?? what are the invariants?
n = y - v*q; // ?? When does this overflow?
b = a; // A candidate for the gcd - a's last nonzero value.
a = r; // a becomes the remainder - it shrinks each time.
// When a hits zero, the u and v that are written out
// are final values and the gcd is a's previous value.
x = u; // Here we have u and v shuffling values out
y = v; // via x and y. If a has gone to zero, they're final.
u = m; // ... and getting new values
v = n; // from m and n
}
return b;
}
#endif
If we slightly change the equation form:
ax + by = c
by = c - ax
y = (c - ax)/b
Then we can loop x through all numbers in its range (a*x <= c) and compute if viable natural y exists. So no there is not infinite number of solutions the limit is min(c/a,c/b) ... Here small C++ example of naive solution:
int a=123,b=321,c=987654321;
int x,y,ax;
for (x=1,ax=a;ax<=c;x++,ax+=a)
{
y = (c-ax)/b;
if (ax+(b*y)==c) here output x,y solution somewhere;
}
If you want to speed this up then just iterate y too and just check if c-ax is divisible by b Something like this:
int a=123,b=321,c=987654321;
int x,y,ax,cax,by;
for (x=1,ax=a,y=(c/b),by=b*y;ax<=c;x++,ax+=a)
{
cax=c-ax;
while (by>cax){ by-=b; y--; if (!y) break; }
if (by==cax) here output x,y solution somewhere;
}
As you can see now both x,y are iterated in opposite directions in the same loop and no division or multiplication is present inside loop anymore so its much faster here first few results:
method1 method2
[ 78.707 ms] | [ 21.277 ms] // time needed for computation
75044 | 75044 // found solutions
-------------------------------
75,3076776 | 75,3076776 // first few solutions in x,y order
182,3076735 | 182,3076735
289,3076694 | 289,3076694
396,3076653 | 396,3076653
503,3076612 | 503,3076612
610,3076571 | 610,3076571
717,3076530 | 717,3076530
824,3076489 | 824,3076489
931,3076448 | 931,3076448
1038,3076407 | 1038,3076407
1145,3076366 | 1145,3076366
I expect that for really huge c and small a,b numbers this
while (by>cax){ by-=b; y--; if (!y) break; }
might be slower than actual division using GCD ...
Given a bag with a maximum of 100 chips,each chip has its value written over it.
Determine the most fair division between two persons. This means that the difference between the amount each person obtains should be minimized. The value of a chips varies from 1 to 1000.
Input: The number of coins m, and the value of each coin.
Output: Minimal positive difference between the amount the two persons obtain when they divide the chips from the corresponding bag.
I am finding it difficult to form a DP solution for it. Please help me.
Initially I had to tried it as a Non DP solution.Actually I havent thought of solving it using DP. I simply sorted the value array. And assigned the largest value to one of the person, and incrementally assigned the other values to one of the two depending upon which creates minimum difference. But that solution actually didnt work.
I am posting my solution here :
bool myfunction(int i, int j)
{
return(i >= j) ;
}
int main()
{
int T, m, sum1, sum2, temp_sum1, temp_sum2,i ;
cin >> T ;
while(T--)
{
cin >> m ;
sum1 = 0 ; sum2 = 0 ; temp_sum1 = 0 ; temp_sum2 = 0 ;
vector<int> arr(m) ;
for(i=0 ; i < m ; i++)
{
cin>>arr[i] ;
}
if(m==1 )
{
if(arr[0]%2==0)
cout<<0<<endl ;
else
cout<<1<<endl ;
}
else {
sort(arr.begin(), arr.end(), myfunction) ;
// vector<int> s1 ;
// vector<int> s2 ;
for(i=0 ; i < m ; i++)
{
temp_sum1 = sum1 + arr[i] ;
temp_sum2 = sum2 + arr[i] ;
if(abs(temp_sum1 - sum2) <= abs(temp_sum2 -sum1))
{
sum1 = sum1 + arr[i] ;
}
else
{
sum2 = sum2 + arr[i] ;
}
temp_sum1 = 0 ;
temp_sum2 = 0 ;
}
cout<<abs(sum1 -sum2)<<endl ;
}
}
return 0 ;
}
what i understand from your question is you want to divide chips in two persons so as to minimize the difference between sum of numbers written on those.
If understanding is correct, then potentially you can follow below approach to arrive at solution.
Sort the values array i.e. int values[100]
Start adding elements from both ends of array in for loop i.e. for(i=0; j=values.length;i<j;i++,j--)
Odd numbered iteration sum belongs to one person & even numbered sum to other person
run the loop till i < j
now, the difference between two sums obtained in odd & even iterations should be minimum as array was sorted earlier.
If my understanding of the question is correct, then this solution should resolve your problem.
Reflect as appropriate.
Thanks
Ravindra
I am very new to OpenCL and am going through the Altera OpenCL examples.
In their matrix multiplication example, they have used the concept of blocks, where dimensions of the input matrices are multiple of block size. Here's the code:
void matrixMult( // Input and output matrices
__global float *restrict C,
__global float *A,
__global float *B,
// Widths of matrices.
int A_width, int B_width)
{
// Local storage for a block of input matrices A and B
__local float A_local[BLOCK_SIZE][BLOCK_SIZE];
__local float B_local[BLOCK_SIZE][BLOCK_SIZE];
// Block index
int block_x = get_group_id(0);
int block_y = get_group_id(1);
// Local ID index (offset within a block)
int local_x = get_local_id(0);
int local_y = get_local_id(1);
// Compute loop bounds
int a_start = A_width * BLOCK_SIZE * block_y;
int a_end = a_start + A_width - 1;
int b_start = BLOCK_SIZE * block_x;
float running_sum = 0.0f;
for (int a = a_start, b = b_start; a <= a_end; a += BLOCK_SIZE, b += (BLOCK_SIZE * B_width))
{
A_local[local_y][local_x] = A[a + A_width * local_y + local_x];
B_local[local_x][local_y] = B[b + B_width * local_y + local_x];
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; ++k)
{
running_sum += A_local[local_y][k] * B_local[local_x][k];
}
}
// Store result in matrix C
C[get_global_id(1) * get_global_size(0) + get_global_id(0)] = running_sum;
}
Assume block size is 2, then: block_x and block_y are both 0; and local_x and local_y are both 0.
Then A_local[0][0] would be A[0] and B_local[0][0] would be B[0].
Sizes of A_local and B_local are 4 elements each.
In that case, how would A_local and B_local access other elements of the block in that iteration?
Also would separate threads/cores be assigned for each local_x and local_y?
There is definitely a barrier missing in your code sample. The outer for loop as you have it will only produce correct results if all work items are executing instructions in lockstep fashion, thus guaranteeing the local memory is populated before the for k loop.
Maybe this is the case for Altera and other FPGAs, but this is not correct for CPUs and GPUs.
You should add barrier(CLK_LOCAL_MEM_FENCE); if you are getting unexpected results, or want to be compatible with other type of hardware.
float running_sum = 0.0f;
for (int a = a_start, b = b_start; a <= a_end; a += BLOCK_SIZE, b += (BLOCK_SIZE * B_width))
{
A_local[local_y][local_x] = A[a + A_width * local_y + local_x];
B_local[local_x][local_y] = B[b + B_width * local_y + local_x];
barrier(CLK_LOCAL_MEM_FENCE);
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; ++k)
{
running_sum += A_local[local_y][k] * B_local[local_x][k];
}
}
A_local and B_local are both shared by all work items of the work group, so all their elements are loaded in parallel (by all work items of the work group) at each step of the encompassing for loop.
Then each work item uses some of the loaded values (not necessarily the values the work item loaded itself) to do its share of the computation.
And finally, the work item stores its individual result into the global output matrix.
It is a classical tiled implementation of a matrix-matrix multiplication. However, I'm really surprised not to see any sort of call to a memory synchronisation function, such as work_group_barrier(CLK_LOCAL_MEM_FENCE) between the load of A_local and B_local and their use in the k loop... But I might very well have overlooked something here.
Hi all, I have an array of length N, and I'd like to divide it as best as possible between 'size' processors. N/size has a remainder, e.g. 1000 array elements divided by 7 processes, or 14 processes by 3 processes.
I'm aware of at least a couple of ways of work sharing in MPI, such as:
for (i=rank; i<N;i+=size){ a[i] = DO_SOME_WORK }
However, this does not divide the array into contiguous chunks, which I'd like to do as I believe is faster for IO reasons.
Another one I'm aware of is:
int count = N / size;
int start = rank * count;
int stop = start + count;
// now perform the loop
int nloops = 0;
for (int i=start; i<stop; ++i)
{
a[i] = DO_SOME_WORK;
}
However, with this method, for my first example we get 1000/7 = 142 = count. And so the last rank starts at 852 and ends at 994. The last 6 lines are ignored.
Would be best solution to append something like this to the previous code?
int remainder = N%size;
int start = N-remainder;
if (rank == 0){
for (i=start;i<N;i++){
a[i] = DO_SOME_WORK;
}
This seems messy, and if its the best solution I'm surprised I haven't seen it elsewhere.
Thanks for any help!
If I had N tasks (e.g., array elements) and size workers (e.g., MPI ranks), I would go as follows:
int count = N / size;
int remainder = N % size;
int start, stop;
if (rank < remainder) {
// The first 'remainder' ranks get 'count + 1' tasks each
start = rank * (count + 1);
stop = start + count;
} else {
// The remaining 'size - remainder' ranks get 'count' task each
start = rank * count + remainder;
stop = start + (count - 1);
}
for (int i = start; i <= stop; ++i) { a[i] = DO_SOME_WORK(); }
That is how it works:
/*
# ranks: remainder size - remainder
/------------------------------------\ /-----------------------------\
rank: 0 1 remainder-1 size-1
+---------+---------+-......-+---------+-------+-------+-.....-+-------+
tasks: | count+1 | count+1 | ...... | count+1 | count | count | ..... | count |
+---------+---------+-......-+---------+-------+-------+-.....-+-------+
^ ^ ^ ^
| | | |
task #: rank * (count+1) | rank * count + remainder |
| |
task #: rank * (count+1) + count rank * count + remainder + count - 1
\------------------------------------/
# tasks: remainder * count + remainder
*/
Here's a closed-form solution.
Let N = array length and P = number of processors.
From j = 0 to P-1,
Starting point of array on processor j = floor(N * j / P)
Length of array on processor j = floor(N * (j + 1) / P) – floor(N * j / P)
Consider your "1000 steps and 7 processes" example.
simple division won't work because integer division (in C) gives you the floor, and you are left with some remainder: i.e. 1000 / 7 is 142, and there will be 6 doodads hanging out
ceiling division has the opposite problem: ceil(1000/7) is 143, but then the last processor overruns the array, or ends up with less to do than the others.
You are asking for a scheme to evenly distribute the remainder over processors. Some processes should have 142, others 143. There must be a more formal approach but considering the attention this question's gotten in the last six months maybe not.
Here's my approach. Every process needs to do this algorithm, and just pick out the answer it needs for itself.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char ** argv)
{
#define NR_ITEMS 1000
int i, rank, nprocs;;
int *bins;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
bins = calloc(nprocs, sizeof(int));
int nr_alloced = 0;
for (i=0; i<nprocs; i++) {
remainder = NR_ITEMS - nr_alloced;
buckets = (nprocs - i);
/* if you want the "big" buckets up front, do ceiling division */
bins[i] = remainder / buckets;
nr_alloced += bins[i];
}
if (rank == 0)
for (i=0; i<nprocs; i++) printf("%d ", bins[i]);
MPI_Finalize();
return 0;
}
I know this is long sense gone but a simple way to do this is to give each process the floor of the (number of items) / (number of processes) + (1 if process_num < num_items mod num_procs). In python, an array with work counts:
# Number of items
NI=128
# Number of processes
NP=20
# Items per process
[NI/NP + (1 if P < NI%NP else 0)for P in range(0,NP)]
Improving off of #Alexander's answer: make use of min to condense the logic.
int count = N / size;
int remainder = N % size;
int start = rank * count + min(rank, remainder);
int stop = (rank + 1) * count + min(rank + 1, remainder);
for (int i = start; i < stop; ++i) { a[i] = DO_SOME_WORK(); }
I think that the best solution is to write yourself a little function for splitting work across processes evenly enough. Here's some pseudo-code, I'm sure you can write C (is that C in your question ?) better than I can.
function split_evenly_enough(num_steps, num_processes)
return = repmat(0, num_processes) ! pseudo-Matlab for an array of num_processes 0s
steps_per_process = ceiling(num_steps/num_processes)
return = steps_per_process - 1 ! set all elements of the return vector to this number
return(1:mod(num_steps, num_processes)) = steps_per_process ! some processes have 1 more step
end
How about this?
int* distribute(int total, int processes) {
int* distribution = new int[processes];
int last = processes - 1;
int remaining = total;
int process = 0;
while (remaining != 0) {
++distribution[process];
--remaining;
if (process != last) {
++process;
}
else {
process = 0;
}
}
return distribution;
}
The idea is that you assign an element to the first process, then an element to the second process, then an element to the third process, and so on, jumping back to the first process whenever the last one is reached.
This method works even when the number of processes is greater than the number of elements. It uses only very simple operations and should therefore be very fast.
I had a similar problem, and here is my non optimum solution with Python and mpi4py API. An optimum solution would take into account how the processors are laid out, here extra work is ditributed to lower ranks. The uneven workload only differ by one task, so it should not be a big deal in general.
from mpi4py import MPI
import sys
def get_start_end(comm,N):
"""
Distribute N consecutive things (rows of a matrix , blocks of a 1D array)
as evenly as possible over a given communicator.
Uneven workload (differs by 1 at most) is on the initial ranks.
Parameters
----------
comm: MPI communicator
N: int
Total number of things to be distributed.
Returns
----------
rstart: index of first local row
rend: 1 + index of last row
Notes
----------
Index is zero based.
"""
P = comm.size
rank = comm.rank
rstart = 0
rend = N
if P >= N:
if rank < N:
rstart = rank
rend = rank + 1
else:
rstart = 0
rend = 0
else:
n = N//P # Integer division PEP-238
remainder = N%P
rstart = n * rank
rend = n * (rank+1)
if remainder:
if rank >= remainder:
rstart += remainder
rend += remainder
else:
rstart += rank
rend += rank + 1
return rstart, rend
if __name__ == '__main__':
comm = MPI.COMM_WORLD
n = int(sys.argv[1])
print(comm.rank,get_start_end(comm,n))
Have you tried the latest Codility test?
I felt like there was an error in the definition of what a K-Sparse number is that left me confused and I wasn't sure what the right way to proceed was. So it starts out by defining a K-Sparse Number:
In the binary number "100100010000" there are at least two 0s between
any two consecutive 1s. In the binary number "100010000100010" there
are at least three 0s between any two consecutive 1s. A positive
integer N is called K-sparse if there are at least K 0s between any
two consecutive 1s in its binary representation. (My emphasis)
So the first number you see, 100100010000 is 2-sparse and the second one, 100010000100010, is 3-sparse. Pretty simple, but then it gets down into the algorithm:
Write a function:
class Solution { public int sparse_binary_count(String S,String T,int K); }
that, given:
string S containing a binary representation of some positive integer A,
string T containing a binary representation of some positive integer B,
a positive integer K.
returns the number of K-sparse integers within the range [A..B] (both
ends included)
and then states this test case:
For example, given S = "101" (A = 5), T = "1111" (B=15) and K=2, the
function should return 2, because there are just two 2-sparse integers
in the range [5..15], namely "1000" (i.e. 8) and "1001" (i.e. 9).
Basically it is saying that 8, or 1000 in base 2, is a 2-sparse number, even though it does not have two consecutive ones in its binary representation. What gives? Am I missing something here?
Tried solving that one. The assumption that the problem makes about binary representations of "power of two" numbers being K sparse by default is somewhat confusing and contrary.
What I understood was 8-->1000 is 2 power 3 so 8 is 3 sparse. 16-->10000 2 power 4 , and hence 4 sparse.
Even we assume it as true , and if you are interested in below is my solution code(C) for this problem. Doesn't handle some cases correctly, where there are powers of two numbers involved in between the two input numbers, trying to see if i can fix that:
int sparse_binary_count (const string &S,const string &T,int K)
{
char buf[50];
char *str1,*tptr,*Sstr,*Tstr;
int i,len1,len2,cnt=0;
long int num1,num2;
char *pend,*ch;
Sstr = (char *)S.c_str();
Tstr = (char *)T.c_str();
str1 = (char *)malloc(300001);
tptr = str1;
num1 = strtol(Sstr,&pend,2);
num2 = strtol(Tstr,&pend,2);
for(i=0;i<K;i++)
{
buf[i] = '0';
}
buf[i] = '\0';
for(i=num1;i<=num2;i++)
{
str1 = tptr;
if( (i & (i-1))==0)
{
if(i >= (pow((float)2,(float)K)))
{
cnt++;
continue;
}
}
str1 = myitoa(i,str1,2);
ch = strstr(str1,buf);
if(ch == NULL)
continue;
else
{
if((i % 2) != 0)
cnt++;
}
}
return cnt;
}
char* myitoa(int val, char *buf, int base){
int i = 299999;
int cnt=0;
for(; val && i ; --i, val /= base)
{
buf[i] = "0123456789abcdef"[val % base];
cnt++;
}
buf[i+cnt+1] = '\0';
return &buf[i+1];
}
There was an information within the test details, showing this specific case. According to this information, any power of 2 is considered K-sparse for any K.
You can solve this simply by binary operations on integers. You are even able to tell, that you will find no K-sparse integers bigger than some specific integer and lower than (or equal to) integer represented by T.
As far as I can see, you must pay also a lot of attention to the performance, as there are sometimes hundreds of milions of integers to be checked.
My own solution, written in Python, working very efficiently even on large ranges of integers and being successfully tested for many inputs, has failed. The results were not very descriptive, saying it does not work as required within question (although it meets all the requirements in my opinion).
/////////////////////////////////////
solutions with bitwise operators:
no of bits per int = 32 on 32 bit system,check for pattern (for K=2,
like 1001, 1000) in each shift and increment the count, repeat this
for all numbers in range.
///////////////////////////////////////////////////////
int KsparseNumbers(int a, int b, int s) {
int nbits = sizeof(int)*8;
int slen = 0;
int lslen = pow(2, s);
int scount = 0;
int i = 0;
for (; i < s; ++i) {
slen += pow(2, i);
}
printf("\n slen = %d\n", slen);
for(; a <= b; ++a) {
int num = a;
for(i = 0 ; i < nbits-2; ++i) {
if ( (num & slen) == 0 && (num & lslen) ) {
scount++;
printf("\n Scount = %d\n", scount);
break;
}
num >>=1;
}
}
return scount;
}
int main() {
printf("\n No of 2-sparse numbers between 5 and 15 = %d\n", KsparseNumbers(5, 15, 2));
}