Related
While preparing for an exam I came across a question about hash tables.
I am given a table of length 11 with the following hash function:
h(k,i) = ( k mod 13 + i * (1 + k mod 7) ) mod 11
The hash table is then resized to size 12. So the new hash function becomes:
h'(k,i) = ( k mod 13 + i * (1 + k mod 7) ) mod 12
Which problems occur?
The problem is that the hash function becomes worse.
In the first case, the distribution of different combinations of k and i is very even among the 11 hash bins. In the second case, the distribution is not so even - particularly, the number of combinations of k and i for which the result of the hash function will be 0 is noticably higher.
Of course, during an exam, one would probably have to argue why it is this way. It's somehow related to
k mod 13 being a value between 0 and 12
k mod 7 being a value between 0 and 6 (which divides 12)
maybe, somehow: 11 is a prime number and 12 has many divisors...
but (at least for me) it is hard to find a convincing reasoning that goes beyond these trivial insights. Maybe you have another idea based on that.
import java.util.LinkedHashMap;
import java.util.Map;
public class HashTest
{
public static void main(String[] args)
{
int maxK = 30;
int maxI = 30;
System.out.println(computeFrequencies(h0, maxK, maxI));
System.out.println(computeFrequencies(h1, maxK, maxI));
}
private static Map<Integer, Integer> computeFrequencies(
Hash hash, int maxK, int maxI)
{
Map<Integer, Integer> frequencies =
new LinkedHashMap<Integer, Integer>();
for (int k=0; k<maxK; k++)
{
for (int i=0; i<maxI; i++)
{
int value = hash.compute(k, i);
Integer count = frequencies.get(value);
if (count == null)
{
count = 0;
}
frequencies.put(value, count+1);
}
}
return frequencies;
}
private static interface Hash
{
int compute(int k, int i);
}
private static final Hash h0 = new Hash()
{
#Override
public int compute(int k, int i)
{
return ((k % 13) + i * (1 + (k % 7))) % 11;
}
};
private static final Hash h1 = new Hash()
{
#Override
public int compute(int k, int i)
{
return ((k % 13) + i * (1 + (k % 7))) % 12;
}
};
}
Here is a heapsort program I've created in Java, but I'm having an issue where it won't run.
I'm not getting any errors during compilation, which made the error hard to identify, but if I comment out the size decrement in my extract maximum function the program will run, so I assume that's where the error is. Unfortunately, that line is crucial to the program functioning properly.
If there's anything simple causing this problem, or if major adjustments need to be made to the program, I'd like to know either way.
All input is welcome.
Update
added main function.
Code can now be copy-and-pasted to run.
public class Heap
{
private int [] data;
private int [] fin;
private int size;
private int tmp = 0;
/**
* Constructor for objects of class Heap
*/
public Heap(int[] A)
{
data = A;
size = data.length;
fin = new int [size];
this.buildHeap(0);
for(int n = size - 1; n >= 0; n--)
{
fin[n] = this.extractMax();
}
}
public int getSize()
{
return size;
}
private void setSize(int i)
{
size = i;
}
public void print()
{
for(int i = 0; i < this.getSize(); i++)
System.out.printf("%d\n", fin[i]);
}
/**
* build heap using top down method
*
* #param i the index of the node being built upon
*/
private void buildHeap(int i)
{
if(i <= (size - 2)/2)
{
buildHeap(2*i + 1);
buildHeap(2*i + 2);
heapify(i);
}
}
/**
* Extract maximum number
*
* #return maximum number of heap
*/
private int extractMax()
{
int n = size;
int store = 0;
store = data[0];
data[0] = data[n - 1];
size--;
this.heapify(0);
return store;
}
/**
* Heapify array
*
* #param i the index to heapify upon
*/
private void heapify(int i)
{
if(2*i + 1 < size && data[2*i + 1] > data[i])
{
if(2*i + 2 < size && data[2*i + 2] > data[2*i + 1])
{
this.exchange(i, 2*i + 2);
heapify(2*i + 2);
}
else
{
this.exchange(i, 2*i + 1);
heapify(2*i + 1);
}
}
if(2*i + 2 < size && data[2*i + 2] > data[i])
{
this.exchange(i, 2*i + 2);
heapify(2*i + 2);
}
}
private boolean exchange(int i, int k)
{
tmp = data[i];
data[i] = data[k];
data[k] = tmp;
return true;
}
public static void main(String [] args)
{
int [] arr = {5,13,2,25,7,17,20,8,4};
Heap heapsort = new Heap(arr);
heapsort.print();
}
}
I am having troubles with this piece of CUDA code I have written. This is supposed to be the CUDA implementation of the Dijkstra's algorithm. The code is as follows:
__global__ void cuda_dijkstra_kernel_1(float* Va, int* Ea, int* Sa, float* Ca, float* Ua, char* Ma, unsigned int* lock){
int tid = blockIdx.x;
if(Ma[tid]=='1'){
Ma[tid] = '0';
int ind_Ea = Sa[tid * 2];
int num_edges = Sa[(tid * 2) + 1];
int v;
float wt = 0;
unsigned int leaveloop;
leaveloop = 0u;
while(leaveloop==0u){
if(atomicExch(lock, 1u) == 0u){
for(v = 0; v < num_edges; v++){
wt = (Va[tid * 3] - Va[Ea[ind_Ea + v] * 3]) * (Va[tid * 3] - Va[Ea[ind_Ea + v] * 3]) +
(Va[(tid * 3) + 1] - Va[(Ea[ind_Ea + v] * 3) + 1]) * (Va[(tid * 3) + 1] - Va[(Ea[ind_Ea + v] * 3) + 1]) +
(Va[(tid * 3) + 2] - Va[(Ea[ind_Ea + v] * 3) + 2]) * (Va[(tid * 3) + 2] - Va[(Ea[ind_Ea + v] * 3) + 2]) ;
wt = sqrt(wt);
if(Ca[Ea[ind_Ea + v]] > (Ca[tid] + wt)){
Ca[Ea[ind_Ea + v]] = Ca[tid] + wt;
Ma[Ea[ind_Ea + v]] = '1';
}
__threadfence();
leaveloop = 1u;
atomicExch(lock, 0u);
}
}
}
}
}
The problem is in the relaxation phase of the Dijkstra's algorithm. I have implemented such a phase as a critical section. If there is a vertex (lets say a) which is a neighbor of more than one vertex (i.e., connecting to other vertices with edges), then all of the threads for those vertices will try to write to the location of vertex a in the Cost Array Ca. Now my goal is to have the smaller value written in that location. To do that, I am trying to serialize the process and applying __threadfence() as well so that value written by one thread is visible to others and then eventually the smaller value is retained in the location of vertex a. But the problem is, that this logic is not working. The location of vertex a does not get the smallest value of all the threads trying to write to that location and I don't understand why. Any help will be highly appreciated.
There is a "classical" (at least, mostly referenced) implementation of Dijkstra's Single-Source Shortest Path (SSSP) algorithm on the GPU contained in the paper
Accelerating large graph algorithms on the GPU using CUDA by Parwan Harish and P.J. Narayanan
However, the implementation in that paper has been recognized to be bugged, see
CUDA Solutions for the SSSP Problem by Pedro J. MartÃn, Roberto Torres, and Antonio Gavilanes
I'm reporting below the implementation suggested in the first paper fixed according to the remark of the second. The code also contains a C++ version.
#include <sstream>
#include <vector>
#include <iostream>
#include <stdio.h>
#include <float.h>
#include "Utilities.cuh"
#define NUM_ASYNCHRONOUS_ITERATIONS 20 // Number of async loop iterations before attempting to read results back
#define BLOCK_SIZE 16
/***********************/
/* GRAPHDATA STRUCTURE */
/***********************/
// --- The graph data structure is an adjacency list.
typedef struct {
// --- Contains the integer offset to point to the edge list for each vertex
int *vertexArray;
// --- Overall number of vertices
int numVertices;
// --- Contains the "destination" vertices each edge is attached to
int *edgeArray;
// --- Overall number of edges
int numEdges;
// --- Contains the weight of each edge
float *weightArray;
} GraphData;
/**********************************/
/* GENERATE RANDOM GRAPH FUNCTION */
/**********************************/
void generateRandomGraph(GraphData *graph, int numVertices, int neighborsPerVertex) {
graph -> numVertices = numVertices;
graph -> vertexArray = (int *)malloc(graph -> numVertices * sizeof(int));
graph -> numEdges = numVertices * neighborsPerVertex;
graph -> edgeArray = (int *)malloc(graph -> numEdges * sizeof(int));
graph -> weightArray = (float *)malloc(graph -> numEdges * sizeof(float));
for (int i = 0; i < graph -> numVertices; i++) graph -> vertexArray[i] = i * neighborsPerVertex;
int *tempArray = (int *)malloc(neighborsPerVertex * sizeof(int));
for (int k = 0; k < numVertices; k++) {
for (int l = 0; l < neighborsPerVertex; l++) tempArray[l] = INT_MAX;
for (int l = 0; l < neighborsPerVertex; l++) {
bool goOn = false;
int temp;
while (goOn == false) {
goOn = true;
temp = (rand() % graph->numVertices);
for (int t = 0; t < neighborsPerVertex; t++)
if (temp == tempArray[t]) goOn = false;
if (temp == k) goOn = false;
if (goOn == true) tempArray[l] = temp;
}
graph -> edgeArray [k * neighborsPerVertex + l] = temp;
graph -> weightArray[k * neighborsPerVertex + l] = (float)(rand() % 1000) / 1000.0f;
}
}
}
/************************/
/* minDistance FUNCTION */
/************************/
// --- Finds the vertex with minimum distance value, from the set of vertices not yet included in shortest path tree
int minDistance(float *shortestDistances, bool *finalizedVertices, const int sourceVertex, const int N) {
// --- Initialize minimum value
int minIndex = sourceVertex;
float min = FLT_MAX;
for (int v = 0; v < N; v++)
if (finalizedVertices[v] == false && shortestDistances[v] <= min) min = shortestDistances[v], minIndex = v;
return minIndex;
}
/************************/
/* dijkstraCPU FUNCTION */
/************************/
void dijkstraCPU(float *graph, float *h_shortestDistances, int sourceVertex, const int N) {
// --- h_finalizedVertices[i] is true if vertex i is included in the shortest path tree
// or the shortest distance from the source node to i is finalized
bool *h_finalizedVertices = (bool *)malloc(N * sizeof(bool));
// --- Initialize h_shortestDistancesances as infinite and h_shortestDistances as false
for (int i = 0; i < N; i++) h_shortestDistances[i] = FLT_MAX, h_finalizedVertices[i] = false;
// --- h_shortestDistancesance of the source vertex from itself is always 0
h_shortestDistances[sourceVertex] = 0.f;
// --- Dijkstra iterations
for (int iterCount = 0; iterCount < N - 1; iterCount++) {
// --- Selecting the minimum distance vertex from the set of vertices not yet
// processed. currentVertex is always equal to sourceVertex in the first iteration.
int currentVertex = minDistance(h_shortestDistances, h_finalizedVertices, sourceVertex, N);
// --- Mark the current vertex as processed
h_finalizedVertices[currentVertex] = true;
// --- Relaxation loop
for (int v = 0; v < N; v++) {
// --- Update dist[v] only if it is not in h_finalizedVertices, there is an edge
// from u to v, and the cost of the path from the source vertex to v through
// currentVertex is smaller than the current value of h_shortestDistances[v]
if (!h_finalizedVertices[v] &&
graph[currentVertex * N + v] &&
h_shortestDistances[currentVertex] != FLT_MAX &&
h_shortestDistances[currentVertex] + graph[currentVertex * N + v] < h_shortestDistances[v])
h_shortestDistances[v] = h_shortestDistances[currentVertex] + graph[currentVertex * N + v];
}
}
}
/***************************/
/* MASKARRAYEMPTY FUNCTION */
/***************************/
// --- Check whether all the vertices have been finalized. This tells the algorithm whether it needs to continue running or not.
bool allFinalizedVertices(bool *finalizedVertices, int numVertices) {
for (int i = 0; i < numVertices; i++) if (finalizedVertices[i] == true) { return false; }
return true;
}
/*************************/
/* ARRAY INITIALIZATIONS */
/*************************/
__global__ void initializeArrays(bool * __restrict__ d_finalizedVertices, float* __restrict__ d_shortestDistances, float* __restrict__ d_updatingShortestDistances,
const int sourceVertex, const int numVertices) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < numVertices) {
if (sourceVertex == tid) {
d_finalizedVertices[tid] = true;
d_shortestDistances[tid] = 0.f;
d_updatingShortestDistances[tid] = 0.f; }
else {
d_finalizedVertices[tid] = false;
d_shortestDistances[tid] = FLT_MAX;
d_updatingShortestDistances[tid] = FLT_MAX;
}
}
}
/**************************/
/* DIJKSTRA GPU KERNEL #1 */
/**************************/
__global__ void Kernel1(const int * __restrict__ vertexArray, const int* __restrict__ edgeArray,
const float * __restrict__ weightArray, bool * __restrict__ finalizedVertices, float* __restrict__ shortestDistances,
float * __restrict__ updatingShortestDistances, const int numVertices, const int numEdges) {
int tid = blockIdx.x*blockDim.x + threadIdx.x;
if (tid < numVertices) {
if (finalizedVertices[tid] == true) {
finalizedVertices[tid] = false;
int edgeStart = vertexArray[tid], edgeEnd;
if (tid + 1 < (numVertices)) edgeEnd = vertexArray[tid + 1];
else edgeEnd = numEdges;
for (int edge = edgeStart; edge < edgeEnd; edge++) {
int nid = edgeArray[edge];
atomicMin(&updatingShortestDistances[nid], shortestDistances[tid] + weightArray[edge]);
}
}
}
}
/**************************/
/* DIJKSTRA GPU KERNEL #1 */
/**************************/
__global__ void Kernel2(const int * __restrict__ vertexArray, const int * __restrict__ edgeArray, const float* __restrict__ weightArray,
bool * __restrict__ finalizedVertices, float* __restrict__ shortestDistances, float* __restrict__ updatingShortestDistances,
const int numVertices) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < numVertices) {
if (shortestDistances[tid] > updatingShortestDistances[tid]) {
shortestDistances[tid] = updatingShortestDistances[tid];
finalizedVertices[tid] = true; }
updatingShortestDistances[tid] = shortestDistances[tid];
}
}
/************************/
/* dijkstraGPU FUNCTION */
/************************/
void dijkstraGPU(GraphData *graph, const int sourceVertex, float * __restrict__ h_shortestDistances) {
// --- Create device-side adjacency-list, namely, vertex array Va, edge array Ea and weight array Wa from G(V,E,W)
int *d_vertexArray; gpuErrchk(cudaMalloc(&d_vertexArray, sizeof(int) * graph -> numVertices));
int *d_edgeArray; gpuErrchk(cudaMalloc(&d_edgeArray, sizeof(int) * graph -> numEdges));
float *d_weightArray; gpuErrchk(cudaMalloc(&d_weightArray, sizeof(float) * graph -> numEdges));
// --- Copy adjacency-list to the device
gpuErrchk(cudaMemcpy(d_vertexArray, graph -> vertexArray, sizeof(int) * graph -> numVertices, cudaMemcpyHostToDevice));
gpuErrchk(cudaMemcpy(d_edgeArray, graph -> edgeArray, sizeof(int) * graph -> numEdges, cudaMemcpyHostToDevice));
gpuErrchk(cudaMemcpy(d_weightArray, graph -> weightArray, sizeof(float) * graph -> numEdges, cudaMemcpyHostToDevice));
// --- Create mask array Ma, cost array Ca and updating cost array Ua of size V
bool *d_finalizedVertices; gpuErrchk(cudaMalloc(&d_finalizedVertices, sizeof(bool) * graph->numVertices));
float *d_shortestDistances; gpuErrchk(cudaMalloc(&d_shortestDistances, sizeof(float) * graph->numVertices));
float *d_updatingShortestDistances; gpuErrchk(cudaMalloc(&d_updatingShortestDistances, sizeof(float) * graph->numVertices));
bool *h_finalizedVertices = (bool *)malloc(sizeof(bool) * graph->numVertices);
// --- Initialize mask Ma to false, cost array Ca and Updating cost array Ua to \u221e
initializeArrays <<<iDivUp(graph->numVertices, BLOCK_SIZE), BLOCK_SIZE >>>(d_finalizedVertices, d_shortestDistances,
d_updatingShortestDistances, sourceVertex, graph -> numVertices);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
// --- Read mask array from device -> host
gpuErrchk(cudaMemcpy(h_finalizedVertices, d_finalizedVertices, sizeof(bool) * graph->numVertices, cudaMemcpyDeviceToHost));
while (!allFinalizedVertices(h_finalizedVertices, graph->numVertices)) {
// --- In order to improve performance, we run some number of iterations without reading the results. This might result
// in running more iterations than necessary at times, but it will in most cases be faster because we are doing less
// stalling of the GPU waiting for results.
for (int asyncIter = 0; asyncIter < NUM_ASYNCHRONOUS_ITERATIONS; asyncIter++) {
Kernel1 <<<iDivUp(graph->numVertices, BLOCK_SIZE), BLOCK_SIZE >>>(d_vertexArray, d_edgeArray, d_weightArray, d_finalizedVertices, d_shortestDistances,
d_updatingShortestDistances, graph->numVertices, graph->numEdges);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
Kernel2 <<<iDivUp(graph->numVertices, BLOCK_SIZE), BLOCK_SIZE >>>(d_vertexArray, d_edgeArray, d_weightArray, d_finalizedVertices, d_shortestDistances, d_updatingShortestDistances,
graph->numVertices);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
}
gpuErrchk(cudaMemcpy(h_finalizedVertices, d_finalizedVertices, sizeof(bool) * graph->numVertices, cudaMemcpyDeviceToHost));
}
// --- Copy the result to host
gpuErrchk(cudaMemcpy(h_shortestDistances, d_shortestDistances, sizeof(float) * graph->numVertices, cudaMemcpyDeviceToHost));
free(h_finalizedVertices);
gpuErrchk(cudaFree(d_vertexArray));
gpuErrchk(cudaFree(d_edgeArray));
gpuErrchk(cudaFree(d_weightArray));
gpuErrchk(cudaFree(d_finalizedVertices));
gpuErrchk(cudaFree(d_shortestDistances));
gpuErrchk(cudaFree(d_updatingShortestDistances));
}
/****************/
/* MAIN PROGRAM */
/****************/
int main() {
// --- Number of graph vertices
int numVertices = 8;
// --- Number of edges per graph vertex
int neighborsPerVertex = 6;
// --- Source vertex
int sourceVertex = 0;
// --- Allocate memory for arrays
GraphData graph;
generateRandomGraph(&graph, numVertices, neighborsPerVertex);
// --- From adjacency list to adjacency matrix.
// Initializing the adjacency matrix
float *weightMatrix = (float *)malloc(numVertices * numVertices * sizeof(float));
for (int k = 0; k < numVertices * numVertices; k++) weightMatrix[k] = FLT_MAX;
// --- Displaying the adjacency list and constructing the adjacency matrix
printf("Adjacency list\n");
for (int k = 0; k < numVertices; k++) weightMatrix[k * numVertices + k] = 0.f;
for (int k = 0; k < numVertices; k++)
for (int l = 0; l < neighborsPerVertex; l++) {
weightMatrix[k * numVertices + graph.edgeArray[graph.vertexArray[k] + l]] = graph.weightArray[graph.vertexArray[k] + l];
printf("Vertex nr. %i; Edge nr. %i; Weight = %f\n", k, graph.edgeArray[graph.vertexArray[k] + l],
graph.weightArray[graph.vertexArray[k] + l]);
}
for (int k = 0; k < numVertices * neighborsPerVertex; k++)
printf("%i %i %f\n", k, graph.edgeArray[k], graph.weightArray[k]);
// --- Displaying the adjacency matrix
printf("\nAdjacency matrix\n");
for (int k = 0; k < numVertices; k++) {
for (int l = 0; l < numVertices; l++)
if (weightMatrix[k * numVertices + l] < FLT_MAX)
printf("%1.3f\t", weightMatrix[k * numVertices + l]);
else
printf("--\t");
printf("\n");
}
// --- Running Dijkstra on the CPU
float *h_shortestDistancesCPU = (float *)malloc(numVertices * sizeof(float));
dijkstraCPU(weightMatrix, h_shortestDistancesCPU, sourceVertex, numVertices);
printf("\nCPU results\n");
for (int k = 0; k < numVertices; k++) printf("From vertex %i to vertex %i = %f\n", sourceVertex, k, h_shortestDistancesCPU[k]);
// --- Allocate space for the h_shortestDistancesGPU
float *h_shortestDistancesGPU = (float*)malloc(sizeof(float) * graph.numVertices);
dijkstraGPU(&graph, sourceVertex, h_shortestDistancesGPU);
printf("\nGPU results\n");
for (int k = 0; k < numVertices; k++) printf("From vertex %i to vertex %i = %f\n", sourceVertex, k, h_shortestDistancesGPU[k]);
free(h_shortestDistancesCPU);
free(h_shortestDistancesGPU);
return 0;
}
Ok so I found this article and I am confused by some parts of it. If anyone can explain this process in more depth to me I would greatly appreciate it because I have been trying to code this for 2 months now and still have not gotten a correct version working yet. I am specifically confused about the Persistence part of the article because I mostly do not understand what the author is trying to explain about it and at the bottom of the article he talks about a 2D pseudo code implementation of this but the PerlinNoise_2D function does not make sense to me because after the random value is smoothed and interpolated, it is an integer value but the function takes float values? Underneath the persistence portion there is the octaves part. I do not quite understand because he "adds" the smoothed functions together to get the Perlin function. What does he mean by"adds" because you obviously do not add the values together. So if anyone can explain these parts to me I would be very happy. Thanks.
Here is my code:
import java.awt.Color;
import java.awt.Graphics;
import java.util.Random;
import javax.swing.JFrame;
import javax.swing.JPanel;
#SuppressWarnings("serial")
public class TerrainGen extends JPanel {
public static int layers = 3;
public static float[][][][] noise = new float[16][16][81][layers];
public static int[][][][] octaves = new int[16][16][81][layers];
public static int[][][][] perlin = new int[16][16][81][layers];
public static int[][][] perlinnoise = new int[16][16][81];
public static int SmoothAmount = 3;
public static int interpolate1 = 0;
public static int interpolate2 = 10;
public static double persistence = 0.25;
//generate noise
//smooth noise
//interpolate noise
//perlin equation
public TerrainGen() {
for(int t = 0; t < layers; t++) {
for(int z = 0; z < 81; z++) {
for(int y = 0; y < 16; y++) {
for(int x = 0; x < 16; x++) {
noise[x][y][z][t] = GenerateNoise();
}
}
}
}
for(int t = 0; t < layers; t++) {
SmoothNoise(t);
}
for(int t = 0; t < layers; t++) {
for(int z = 0; z < 81; z++) {
for(int y = 0; y < 16; y++) {
for(int x = 0; x < 16; x++) {
octaves[x][y][z][t] = InterpolateNoise(interpolate1, interpolate2, noise[x][y][z][t]);
}
}
}
}
for(int t = 0; t < layers; t++) {
PerlinNoise(t);
}
}
public static Random generation = new Random(5);
public float GenerateNoise() {
float i = generation.nextFloat();
return i;
}
public void SmoothNoise(int t) {
//Huge smoothing algorithm
}
//Cosine interpolation
public int InterpolateNoise(int base, int top, float input) {
return (int) ((1 - ((1 - Math.cos(input * 3.1415927)) * 0.5)) + top * ((1 - Math.cos(input * 3.1415927)) * 0.5));
}
public void PerlinNoise(int t) {
double f = Math.pow(2.0, new Double(t));
double a = Math.pow(persistence, new Double(t));
for(int z = 0; z < 81; z++) {
for(int y = 0; y < 16; y++) {
for(int x = 0; x < 16; x++) {
perlin[x][y][z][t] = (int) ((octaves[x][y][z][t] * f) * a);
}
}
}
}
public static void main(String [] args) {
JFrame frame = new JFrame();
frame.setSize(180, 180);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
TerrainGen test = new TerrainGen();
frame.add(test);
frame.setVisible(true);
}
public static int size = 5;
public void paintComponent(Graphics g) {
super.paintComponent(g);
int i = 0;
for(int t = 0; t < 9; t++) {
for(int z = 0; z < 9; z++) {
for(int y = 0; y < 16; y++) {
for(int x = 0; x < 16; x++) {
g.setColor(new Color(perlin[x][y][i][0] * 10, perlin[x][y][i][0] * 10, perlin[x][y][i][0] * 10));
g.fillRect((z * (16 * size)) + (x * size), (t * (16 * size)) + (y * size), size, size);
}
}
i++;
}
}
repaint();
}
}
And I did not include the smoothing part because that was about 400 lines of code to smooth between chunks.
What the article calls persistence is how the amplitude of the higher frequency noises "falls off" when they are combined.
"octaves" are just what the article calls the noise functions at different frequencies.
You take 1.0 and repeatedly multiply by the persistence to get the list of amplitudes to multiply each octave by - e.g. a persistence of 0.8 gives factors 1.0, 0.8, 0.64, 0.512.
The noise is not an integer, his function Noise1 produces noise in the range 0..1 - i.e. variable n is an Int32 bit it returns a float.
The input paramters are integers i.e. The Noise1 function is only evaluated at (1, 0) or (2, 2).
After smoothing/smearing the noise a bit in SmoothNoise_1 the values get interpolated to produce the values inbetween.
Hope that helped!!
this loop makes octaves from 2d noise. same loop would work for 3d perlin...
function octaves( vtx: Vector3 ): float
{
var total = 0.0;
for (var i:int = 1; i < 7; i ++)//num octaves
{
total+= PerlinNoise(Vector3 (vtx.x*(i*i),0.0,vtx.z*(i*i)))/(i*i);
}
return total;//added multiple perlins into noise with 1/2/4/8 etc ratios
}
the best thing i have seen for learning perlin is the following code. instead of hash tables, it uses sin based semi random function. using 2-3 octaves it becomes high quality perlin... the amazing thing is that i ran 30 octave of this on a realtime landscape and it didnt slow down, whereas i used 1 voronoi once and it was slowing. so... amazing code to learn from.
#ifndef __noise_hlsl_
#define __noise_hlsl_
// hash based 3d value noise
// function taken from https://www.shadertoy.com/view/XslGRr
// Created by inigo quilez - iq/2013
// License Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
// ported from GLSL to HLSL
float hash( float n )
{
return frac(sin(n)*43758.5453);
}
float noise( float3 x )
{
// The noise function returns a value in the range -1.0f -> 1.0f
float3 p = floor(x);
float3 f = frac(x);
f = f*f*(3.0-2.0*f);
float n = p.x + p.y*57.0 + 113.0*p.z;
return lerp(lerp(lerp( hash(n+0.0), hash(n+1.0),f.x),
lerp( hash(n+57.0), hash(n+58.0),f.x),f.y),
lerp(lerp( hash(n+113.0), hash(n+114.0),f.x),
lerp( hash(n+170.0), hash(n+171.0),f.x),f.y),f.z);
}
note that sin is expensive on CPU, instead you would use:
function hash ( n: float ): float
{//random -1, 1
var e = ( n *73.9543)%1;
return (e*e*142.05432)%2-1;// fast cpu random by me :) uses e*e rather than sin
}
Implement Biginteger Multiply
use integer array to store a biginteger
like 297897654 will be stored as {2,9,7,8,9,7,6,5,4}
implement the multiply function for bigintegers
Expamples: {2, 9, 8, 8, 9, 8} * {3,6,3,4,5,8,9,1,2} = {1,0,8,6,3,7,1,4,1,8,7,8,9,7,6}
I failed to implement this class and thought it for a few weeks, couldn't get the answer.
Anybody can help me implement it using C#/Java?
Thanks a lot.
Do you know how to do multiplication on paper?
123
x 456
-----
738
615
492
-----
56088
I would just implement that algorithm in code.
C++ Implementation:
Source Code:
#include <iostream>
using namespace std;
int main()
{
int a[10] = {8,9,8,8,9,2};
int b[10] = {2,1,9,8,5,4,3,6,3};
// INPUT DISPLAY
for(int i=9;i>=0;i--) cout << a[i];
cout << " x ";
for(int i=9;i>=0;i--) cout << b[i];
cout << " = ";
int c[20] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
for(int i=0;i<10;i++)
{
int carry = 0;
for(int j=0;j<10;j++)
{
int t = (a[j] * b[i]) + c[i+j] + carry;
carry = t/10;
c[i+j] = t%10;
}
}
// RESULT DISPLAY
for(int i=19;i>=0;i--) cout << c[i];
cout << endl;
}
Output:
0000298898 x 0363458912 = 00000108637141878976
There is a superb algorithm called Karatsuba algorithm..Here
Which uses divide and conquer startegy..Where you can multiply large numbers..
I have implemented my it in java..
Using some manipulation..
package aoa;
import java.io.*;
public class LargeMult {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException
{
// TODO code application logic here
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter 1st number");
String a=br.readLine();
System.out.println("Enter 2nd number");
String b=br.readLine();
System.out.println("Result:"+multiply(a,b));
}
static String multiply(String t1,String t2)
{
if(t1.length()>1&&t2.length()>1)
{
int mid1=t1.length()/2;
int mid2=t2.length()/2;
String a=t1.substring(0, mid1);//Al
String b=t1.substring(mid1, t1.length());//Ar
String c=t2.substring(0, mid2);//Bl
String d=t2.substring(mid2, t2.length());//Br
String s1=multiply(a, c);
String s2=multiply(a, d);
String s3=multiply(b, c);
String s4=multiply(b, d);
long ans;
ans=Long.parseLong(s1)*(long)Math.pow(10,
b.length()+d.length())+Long.parseLong(s3)*(long)Math.pow(10,d.length())+
Long.parseLong(s2)*(long)Math.pow(10, b.length())+Long.parseLong(s4);
return ans+"";
}
else
{
return (Integer.parseInt(t1)*Integer.parseInt(t2))+"";
}
}
}
I hope this helps!!Enjoy..
Give the number you want to multiply in integer type array i.e. int[] one & int[] two.
public class VeryLongMultiplication {
public static void main(String args[]){
int[] one={9,9,9,9,9,9};
String[] temp=new String[100];
int c=0;
String[] temp1=new String[100];
int c1=0;
int[] two={9,9,9,9,9,9};
int car=0,mul=1; int rem=0; int sum=0;
String str="";
////////////////////////////////////////////
for(int i=one.length-1;i>=0;i--)
{
for(int j=two.length-1;j>=0;j--)
{
mul=one[i]*two[j]+car;
rem=mul%10;
car=mul/10;
if(j>0)
str=rem+str;
else
str=mul+str;
}
temp[c]=str;
c++;
str="";
car=0;
}
////////////////////////////////////////
for(int jk=0;jk<c;jk++)
{
for(int l=c-jk;l>0;l--)
str="0"+str;
str=str+temp[jk];
for(int l=0;l<=jk-1;l++)
str=str+"0";
System.out.println(str);
temp1[c1]=str;
c1++;
str="";
}
///////////////////////////////////
String ag="";int carry=0;
System.out.println("========================================================");
for(int jw=temp1[0].length()-1;jw>=0;jw--)
{
for(int iw=0;iw<c1;iw++)
{
int x=temp1[iw].charAt(jw)-'0';
sum+=x;
}
sum+=carry;
int n=sum;
sum=n%10;carry=n/10;
ag=sum+ag;
sum=0;
}
System.out.println(ag);
}
}
Output:
0000008999991
0000089999910
0000899999100
0008999991000
0089999910000
0899999100000
______________
0999998000001
If you do it the long-hand way, you'll have to implement an Add() method too to add up all the parts at the end. I started there just to get the ball rolling. Once you have the Add() down, the Multipy() method gets implemented along the same lines.
public static int[] Add(int[] a, int[] b) {
var maxLen = (a.Length > b.Length ? a.Length : b.Length);
var carryOver = 0;
var result = new List<int>();
for (int i = 0; i < maxLen; i++) {
var idx1 = a.Length - i - 1;
var idx2 = b.Length - i - 1;
var val1 = (idx1 < 0 ? 0 : a[idx1]);
var val2 = (idx2 < 0 ? 0 : b[idx2]);
var addResult = (val1 + val2) + carryOver;
var strAddResult = String.Format("{0:00}", addResult);
carryOver = Convert.ToInt32(strAddResult.Substring(0, 1));
var partialAddResult = Convert.ToInt32(strAddResult.Substring(1));
result.Insert(0, partialAddResult);
}
if (carryOver > 0) result.Insert(0, carryOver);
return result.ToArray();
}
Hint: use divide-and-conquer to split the int into halves, this can effectively reduce the time complexity from O(n^2) to O(n^(log3)). The gist is the reduction of multiplication operations.
I'm posting java code that I wrote. Hope, this will help
import org.junit.Test;
import static org.junit.Assert.*;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
/**
* Created by ${YogenRai} on 11/27/2015.
*
* method multiply BigInteger stored as digits in integer array and returns results
*/
public class BigIntegerMultiply {
public static List<Integer> multiply(int[] num1,int[] num2){
BigInteger first=new BigInteger(toString(num1));
BigInteger result=new BigInteger("0");
for (int i = num2.length-1,k=1; i >=0; i--,k=k*10) {
result = (first.multiply(BigInteger.valueOf(num2[i]))).multiply(BigInteger.valueOf(k)).add(result);
}
return convertToArray(result);
}
private static List<Integer> convertToArray(BigInteger result) {
List<Integer> rs=new ArrayList<>();
while (result.intValue()!=0){
int digit=result.mod(BigInteger.TEN).intValue();
rs.add(digit);
result = result.divide(BigInteger.TEN);
}
Collections.reverse(rs);
return rs;
}
public static String toString(int[] array){
StringBuilder sb=new StringBuilder();
for (int element:array){
sb.append(element);
}
return sb.toString();
}
#Test
public void testArray(){
int[] num1={2, 9, 8, 8, 9, 8};
int[] num2 = {3,6,3,4,5,8,9,1,2};
System.out.println(multiply(num1, num2));
}
}