how to get the dimension of two dimension GlobalVariable array zeroinitializer in LLVM .ll - llvm-ir

I am writing a LLVM pass to read the global variables from .ll, I found one problem to get the dimension of the array that initialaized by zeor, for example:
#b = common local_unnamed_addr global [5 x [10 x i32]] zeroinitializer, align 16
In this example, I want to get the dimensions 5 and 10.
For the first dimension 5 I can get it but the second 10 I canoot. My code look like:
virtual bool runOnModule(Module &M)
{
for (auto gv_iter = M.global_begin();gv_iter != M.global_end(); gv_iter++)
{
GlobalVariable *gv = &*gv_iter;
Constant *const_gv = gv->getInitializer();
if (const_gv->isNullValue())
{
ConstantArray *constArrayd = cast<ConstantArray>(const_gv) ;
int firstdim = datafile << constArraydd->getType()->getNumElements();
int seconddim = ?????
}
}
}
Any idea please??

Related

C5.0 package: Error in paste(apply(x, 1, paste, collapse = ","), collapse = "\n") : result would exceed 2^31-1 bytes

When trying to train a model with a dataset of around 3 million rows and 600 columns using the C5.0 CRAN package I get the following error:
Error in paste(apply(x, 1, paste, collapse = ","), collapse = "\n") : result would exceed 2^31-1 bytes
From what the owner of the repository answered to a similar issue, it is due to an R limitation in the number of bytes in a character string, which is limited to 2^31 - 1.
Long answer ahead:
So, as stated in the question, the error occurs in the last line of the makeDataFile function from the Cubist package, used in C5.0, which concatenates all rows into one string. As this string is needed to pass the data to the C5.0 function in C, but is not needed to make any operations in R, and C has no memory limitation aside from those of the machine itself, the approach I have taken is to create such string in C instead. In order to do this, the R code will pass the information in a character vector containing various strings that don’t surpass the length limit, instead of one, so that once in C these elements can be concatenated.
However, instead of leaving all rows as separate elements in the character vector to be concatenated in C using strcat in a loop, I have found that the strcat function is quite slow, so I have chosen to create another R function (create_max_len_strings) in order to concatenate the rows into the longest (~or close~) strings possible without reaching the memory limit so that strcat only needs to be applied a few times to concatenate these longer strings.
So, the last line of the original makeDataFile() function will be replaced so that each row is left separately as an element of a character vector, only adding a line break at the end of each string row so that when concatenating some of these elements into longer strings, using create_max_len_strings(), they will be differentiated:
makeDataFile.R:
create_max_len_strings <- function(original_vector) {
vector_length = length(original_vector)
nchars = sum(nchar(original_vector, type = "chars"))
## Check if the length of the string would reach 1900000000, which is close to the memory limitation
if(nchars >= 1900000000){
## Calculate how many strings we could create of the maximum length
nchunks = 0
while(nchars > 0){
nchars = nchars - 1900000000
nchunks = nchunks + 1
}
## Get the number of rows that would be contained in each string
chunk_size = vector_length/nchunks
## Get the rounded number of rows in each string
chunk_size = floor(chunk_size)
index = chunk_size
## Create a vector with the indexes of the rows that delimit each string
indexes_vector = c()
indexes_vector = append(indexes_vector, 0)
n = nchunks
while(n > 0){
indexes_vector = append(indexes_vector, index)
index = index + chunk_size
n = n - 1
}
## Get the last few rows if the division had remainder
remainder = vector_length %% nchunks
if (remainder != 0){
indexes_vector = append(indexes_vector, vector_length)
nchunks = nchunks + 1
}
## Create the strings pasting together the rows from the indexes in the indexes vector
strings_vector = c()
i = 2
while (i <= length(indexes_vector)){
## Sum 1 to the index_init so that the next string does not contain the last row of the previous string
index_init = indexes_vector[i-1] + 1
index_end = indexes_vector[i]
## Paste the rows from the vector from index_init to index_end
string <- paste0(original_vector[index_init:index_end], collapse="")
## Create vector containing the strings that were created
strings_vector <- append(strings_vector, string)
i = i + 1
}
}else {
strings_vector = paste0(original_vector, collapse="")
}
strings_vector
}
makeDataFile <- function(x, y, w = NULL) {
## Previous code stays the same
...
x = apply(x, 1, paste, collapse = ",")
x = paste(x, "\n", sep="")
char_vec = create_max_len_strings(x)
}
CALLING C5.0
Now, in order to create the final string to pass to the c50() function in C, an intermediate function is created and called instead. In order to do this, the .C() statement that calls c50() in R is replaced with a .Call() statement calling this function, as .Call() allows for complex objects such as vectors to be passed to C. Also, it allows for the result to be returned in the variable result instead of having to pass back the variables tree, rules and output by reference. The result of calling C5.0 will be received in the character vector result containing the strings corresponding to the tree, rules and output in the first three positions:
C5.0.R:
C5.0.default <- function(x,
y,
trials = 1,
rules = FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL,
...) {
## Previous code stays the same
...
dataString <- makeDataFile(x, y, weights)
num_chars = sum(nchar(dataString, type = "chars"))
result <- .Call(
"call_C50",
as.character(namesString),
dataString,
as.character(num_chars), ## The length of the resulting string is passed as character because it is too long for an integer
as.character(costString),
as.logical(control$subset),
# -s "use the Subset option" var name: SUBSET
as.logical(rules),
# -r "use the Ruleset option" var name: RULES
## for the bands option, I'm not sure what the default should be.
as.integer(control$bands),
# -u "sort rules by their utility into bands" var name: UTILITY
## The documentation has two options for boosting:
## -b use the Boosting option with 10 trials
## -t trials ditto with specified number of trial
## I think we should use -t
as.integer(trials),
# -t : " ditto with specified number of trial", var name: TRIALS
as.logical(control$winnow),
# -w "winnow attributes before constructing a classifier" var name: WINNOW
as.double(control$sample),
# -S : use a sample of x% for training
# and a disjoint sample for testing var name: SAMPLE
as.integer(control$seed),
# -I : set the sampling seed value
as.integer(control$noGlobalPruning),
# -g: "turn off the global tree pruning stage" var name: GLOBAL
as.double(control$CF),
# -c: "set the Pruning CF value" var name: CF
## Also, for the number of minimum cases, I'm not sure what the
## default should be. The code looks like it dynamically sets the
## value (as opposed to a static, universal integer
as.integer(control$minCases),
# -m : "set the Minimum cases" var name: MINITEMS
as.logical(control$fuzzyThreshold),
# -p "use the Fuzzy thresholds option" var name: PROBTHRESH
as.logical(control$earlyStopping)
)
## Get the first three positions of the character vector that contain the tree, rules and output returned by C5.0 in C
result_tree = result[1]
result_rules = result[2]
result_output = result[3]
modelContent <- strsplit(
if (rules)
result_rules
else
result_tree, "\n"
)[[1]]
entries <- grep("^entries", modelContent, value = TRUE)
if (length(entries) > 0) {
actual <- as.numeric(substring(entries, 10, nchar(entries) - 1))
} else
actual <- trials
if (trials > 1) {
boostResults <- getBoostResults(result_output)
## This next line is here to avoid a false positive warning in R
## CMD check:
## * checking R code for possible problems ... NOTE
## C5.0.default: no visible binding for global variable 'Data'
Data <- NULL
size <-
if (!is.null(boostResults))
subset(boostResults, Data == "Training Set")$Size
else
NA
} else {
boostResults <- NULL
size <- length(grep("[0-9])$", strsplit(result_output, "\n")[[1]]))
}
out <- list(
names = namesString,
cost = costString,
costMatrix = costs,
caseWeights = !is.null(weights),
control = control,
trials = c(Requested = trials, Actual = actual),
rbm = rules,
boostResults = boostResults,
size = size,
dims = dim(x),
call = funcCall,
levels = levels(y),
output = result_output,
tree = result_tree,
predictors = colnames(x),
rules = result_rules
)
class(out) <- "C5.0"
out
}
Now onto the C code, the function call_c50() basically acts as an intermediate between the R code and the C code, concatenating the elements in the dataString array to obtain the string needed by the C function c50(), by accessing each position of the array using CHAR(STRING_ELT(x, i)) and concatenating (strcat) them together. Then the rest of the variables are casted to their respective types and the c50() function in file top.c (where this function should also be placed) is called. The result of calling c50() will be returned to the R routine by creating a character vector and placing the strings corresponding to the tree, rules and output in each position.
Lastly, the c50() function is basically left as is, except for the variables treev, rulesv and outputv, as these are the values that are going to be returned by .Call() instead of being passed by reference, they no longer need to be in the arguments of the function. As they are all strings they can be returned in a single array, by setting each string to a position in the array c50_return.
top.c:
SEXP call_C50(SEXP namesString, SEXP data_vec, SEXP datavec_len, SEXP costString, SEXP subset, SEXP rules, SEXP bands, SEXP trials, SEXP winnow, SEXP sample,
SEXP seed, SEXP noGlobalPruning, SEXP CF, SEXP minCases, SEXP fuzzyThreshold, SEXP earlyStopping){
char* string;
char* concat;
long n = 0;
long size;
int i;
char* eptr;
// Get the length of the data vector
n = length(data_vec);
// Get the string indicating the length of the final string
char* size_str = malloc((strlen(CHAR(STRING_ELT(datavec_len, 0)))+1)*sizeof(char));
strcpy(size_str, CHAR(STRING_ELT(datavec_len, 0)));
// Turn the string to long
size = strtol(size_str, &eptr, 10);
// Allocate memory for the number of characters indicated by datavec_len
string = malloc((size+1)*sizeof(char));
// Copy the first element of data_vec into the string variable
strcpy(string, CHAR(STRING_ELT(data_vec, 0)));
// Loop over the data vector until all elements are concatenated in the string variable
for (i = 1; i < n; i++) {
strcat(string, CHAR(STRING_ELT(data_vec, i)));
}
// Copy the value of namesString into a char*
char* namesv = malloc((strlen(CHAR(STRING_ELT(namesString, 0)))+1)*sizeof(char));
strcpy(namesv, CHAR(STRING_ELT(namesString, 0)));
// Copy the value of costString into a char*
char* costv = malloc((strlen(CHAR(STRING_ELT(costString, 0)))+1)*sizeof(char));
strcpy(costv, CHAR(STRING_ELT(costString, 0)));
// Call c50() function casting the rest of arguments into their respective C types
char** c50_return = c50(namesv, string, costv, asLogical(subset), asLogical(rules), asInteger(bands), asInteger(trials), asLogical(winnow), asReal(sample), asInteger(seed), asInteger(noGlobalPruning), asReal(CF), asInteger(minCases), asLogical(fuzzyThreshold), asLogical(earlyStopping));
free(string);
free(namesv);
free(costv);
// Create a character vector to be returned to the C5.0 R function
SEXP out = PROTECT(allocVector(STRSXP, 3));
SET_STRING_ELT(out, 0, mkChar(c50_return[0]));
SET_STRING_ELT(out, 1, mkChar(c50_return[1]));
SET_STRING_ELT(out, 2, mkChar(c50_return[2]));
UNPROTECT(1);
return out;
}
static char** c50(char *namesv, char *datav, char *costv, int subset,
int rules, int utility, int trials, int winnow,
double sample, int seed, int noGlobalPruning, double CF,
int minCases, int fuzzyThreshold, int earlyStopping) {
int val; /* Used by setjmp/longjmp for implementing rbm_exit */
char ** c50_return = malloc(3 * sizeof(char*));
// Initialize the globals to the values that the c50
// program would have at the start of execution
initglobals();
// Set globals based on the arguments. This is analogous
// to parsing the command line in the c50 program.
setglobals(subset, rules, utility, trials, winnow, sample, seed,
noGlobalPruning, CF, minCases, fuzzyThreshold, earlyStopping,
costv);
// Handles the strbufv data structure
rbm_removeall();
// Deallocates memory allocated by NewCase.
// Not necessary since it's also called at the end of this function,
// but it doesn't hurt, and I'm feeling paranoid.
FreeCases();
// XXX Should this be controlled via an option?
// Rprintf("Calling setOf\n");
setOf();
// Create a strbuf using *namesv as the buffer.
// Note that this is a readonly strbuf since we can't
// extend *namesv.
STRBUF *sb_names = strbuf_create_full(namesv, strlen(namesv))
// Register this strbuf using the name "undefined.names"
if (rbm_register(sb_names, "undefined.names", 0) < 0) {
error("undefined.names already exists");
}
// Create a strbuf using *datav and register it as "undefined.data"
STRBUF *sb_datav = strbuf_create_full(datav, strlen(datav));
// XXX why is sb_datav copied? was that part of my debugging?
// XXX or is this the cause of the leak?
if (rbm_register(strbuf_copy(sb_datav), "undefined.data", 0) < 0) {
error("undefined data already exists");
}
// Create a strbuf using *costv and register it as "undefined.costs"
if (strlen(costv) > 0) {
// Rprintf("registering cost matrix: %s", *costv);
STRBUF *sb_costv = strbuf_create_full(costv, strlen(costv));
// XXX should sb_costv be copied?
if (rbm_register(sb_costv, "undefined.costs", 0) < 0) {
error("undefined.cost already exists");
}
} else {
// Rprintf("no cost matrix to register\n");
}
/*
* We need to initialize rbm_buf before calling any code that
* might call exit/rbm_exit.
*/
if ((val = setjmp(rbm_buf)) == 0) {
// Real work is done here
c50main();
if (rules == 0) {
// Get the contents of the the tree file
STRBUF *treebuf = rbm_lookup("undefined.tree");
if (treebuf != NULL) {
char *treeString = strbuf_getall(treebuf);
c50_return[0] = R_alloc(strlen(treeString) + 1, 1);
strcpy(c50_return[0], treeString);
c50_return[1] = "";
} else {
// XXX Should *treev be assigned something in this case?
// XXX Throw an error?
}
} else {
// Get the contents of the the rules file
STRBUF *rulesbuf = rbm_lookup("undefined.rules");
if (rulesbuf != NULL) {
char *rulesString = strbuf_getall(rulesbuf);
c50_return[1] = R_alloc(strlen(rulesString) + 1, 1);
strcpy(c50_return[1], rulesString);
c50_return[0] = "";
} else {
// XXX Should *rulesv be assigned something in this case?
// XXX Throw an error?
}
}
} else {
Rprintf("c50 code called exit with value %d\n", val - JMP_OFFSET);
}
// Close file object "Of", and return its contents via argument outputv
char *outputString = closeOf();
c50_return[2] = R_alloc(strlen(outputString) + 1, 1);
strcpy(c50_return[2], outputString);
// Deallocates memory allocated by NewCase
FreeCases();
// We reinitialize the globals on exit out of general paranoia
initglobals();
return c50_return;
}
***IMPORTANT: if the string created is longer than 2147483647, you also will need to change the definition of the variables i and j in the function strbuf_gets() in strbuf.c. This function basically iterates through each position of the string, so trying to increase their value above the INT limit to access those positions in the array will cause a segmentation fault. I suggest changing the declaration type to long in order to avoid this issue.
C5.0 PREDICTIONS
However, as the makeDataFile function is not only used to create the model but also to pass the data to the predictions() function, this function will also have to be modified. Just like previously, the .C() statement in predict.C5.0() used to call predictions() will be replaced with a .Call() statement in order to be able to pass the character vector to C, and the result will be returned in the result variable instead of being passed by reference:
predict.C5.0.R:
predict.C5.0 <- function (object,
newdata = NULL,
trials = object$trials["Actual"],
type = "class",
na.action = na.pass,
...) {
## Previous code stays the same
...
caseString <- makeDataFile(x = newdata, y = NULL)
num_chars = sum(nchar(caseString, type = "chars"))
## When passing trials to the C code, convert to
## zero if the original version of trials is used
if (trials <= 0)
stop("'trials should be a positive integer", call. = FALSE)
if (trials == object$trials["Actual"])
trials <- 0
## Add trials (not object$trials) as an argument
results <- .Call(
"call_predictions",
caseString,
as.character(num_chars),
as.character(object$names),
as.character(object$tree),
as.character(object$rules),
as.character(object$cost),
pred = integer(nrow(newdata)),
confidence = double(length(object$levels) * nrow(newdata)),
trials = as.integer(trials)
)
predictions = as.numeric(unlist(results[1]))
confidence = as.numeric(unlist(results[2]))
output = as.character(results[3])
if(any(grepl("Error limit exceeded", output)))
stop(output, call. = FALSE)
if (type == "class") {
out <- factor(object$levels[predictions], levels = object$levels)
} else {
out <-
matrix(confidence,
ncol = length(object$levels),
byrow = TRUE)
if (!is.null(rownames(newdata)))
rownames(out) <- rownames(newdata)
colnames(out) <- object$levels
}
out
}
In the file top.c, the predictions() function will be modified to receive the variables passed by the .Call() statement, so that just like previously, the caseString array will be concatenated into a single string and the rest of the variables casted to their respective types. In this case the variables pred and confidence will be also received as vectors of integer and double types and so they will need to be casted to int* and double*. The rest of the function is left as it was in order to create the predictions and the resulting variables predv, confidencev and output variables will be placed in the first three positions of a vector respectively.
top.c:
SEXP call_predictions(SEXP caseString, SEXP case_len, SEXP names, SEXP tree, SEXP rules, SEXP cost, SEXP pred, SEXP confidence, SEXP trials){
char* casev;
char* outputv = "";
char* eptr;
char* size_str = malloc((strlen(CHAR(STRING_ELT(case_len, 0)))+1)*sizeof(char));
strcpy(size_str, CHAR(STRING_ELT(case_len, 0)));
long size = strtol(size_str, &eptr, 10);
casev = malloc((size+1)*sizeof(char));
strcpy(casev, CHAR(STRING_ELT(caseString, 0)));
int n = length(caseString);
for (int i = 1; i < n; i++) {
strcat(casev, CHAR(STRING_ELT(caseString, i)));
}
char* namesv = malloc((strlen(CHAR(STRING_ELT(names, 0)))+1)*sizeof(char));
strcpy(namesv, CHAR(STRING_ELT(names, 0)));
char* treev = malloc((strlen(CHAR(STRING_ELT(tree, 0)))+1)*sizeof(char));
strcpy(treev, CHAR(STRING_ELT(tree, 0)));
char* rulesv = malloc((strlen(CHAR(STRING_ELT(rules, 0)))+1)*sizeof(char));
strcpy(rulesv, CHAR(STRING_ELT(rules, 0)));
char* costv = malloc((strlen(CHAR(STRING_ELT(cost, 0)))+1)*sizeof(char));
strcpy(costv, CHAR(STRING_ELT(cost, 0)));
int variable;
int* predv = &variable;
int npred = length(pred);
predv = malloc((npred+1)*sizeof(int));
for (int i = 0; i < npred; i++) {
predv[i] = INTEGER(pred)[i];
}
double variable1;
double* confidencev = &variable1;
int nconf = length(confidence);
confidencev = malloc((nconf+1)*sizeof(double));
for (int i = 0; i < nconf; i++) {
confidencev[i] = REAL(confidence)[i];
}
int* trialsv = &variable;
*trialsv = asInteger(trials);
/* Original code for predictions starts */
int val;
// Announce ourselves for testing
// Rprintf("predictions called\n");
// Initialize the globals
initglobals();
// Handles the strbufv data structure
rbm_removeall();
// XXX Should this be controlled via an option?
// Rprintf("Calling setOf\n");
setOf();
STRBUF *sb_cases = strbuf_create_full(casev, strlen(casev));
if (rbm_register(sb_cases, "undefined.cases", 0) < 0) {
error("undefined.cases already exists");
}
STRBUF *sb_names = strbuf_create_full(namesv, strlen(namesv));
if (rbm_register(sb_names, "undefined.names", 0) < 0) {
error("undefined.names already exists");
}
if (strlen(treev)) {
STRBUF *sb_treev = strbuf_create_full(treev, strlen(treev));
if (rbm_register(sb_treev, "undefined.tree", 0) < 0) {
error("undefined.tree already exists");
}
} else if (strlen(rulesv)) {
STRBUF *sb_rulesv = strbuf_create_full(rulesv, strlen(rulesv));
if (rbm_register(sb_rulesv, "undefined.rules", 0) < 0) {
error("undefined.rules already exists");
}
setrules(1);
} else {
error("either a tree or rules must be provided");
}
// Create a strbuf using *costv and register it as "undefined.costs"
if (strlen(costv) > 0) {
// Rprintf("registering cost matrix: %s", *costv);
STRBUF *sb_costv = strbuf_create_full(costv, strlen(costv));
// XXX should sb_costv be copied?
if (rbm_register(sb_costv, "undefined.costs", 0) < 0) {
error("undefined.cost already exists");
}
} else {
// Rprintf("no cost matrix to register\n");
}
if ((val = setjmp(rbm_buf)) == 0) {
// Real work is done here
// Rprintf("\n\nCalling rpredictmain\n");
rpredictmain(trialsv, predv, confidencev);
// Rprintf("predict finished\n\n");
} else {
// Rprintf("predict code called exit with value %d\n\n", val - JMP_OFFSET);
}
// Close file object "Of", and return its contents via argument outputv
char *outputString = closeOf();
char *output = R_alloc(strlen(outputString) + 1, 1);
strcpy(output, outputString);
// We reinitialize the globals on exit out of general paranoia
initglobals();
/* Original code for predictions ends */
free(namesv);
free(treev);
free(rulesv);
free(costv);
SEXP predx = PROTECT(allocVector(INTSXP, npred));
for (int i = 0; i < npred; i++) {
INTEGER(predx)[i] = predv[i];
}
SEXP confidencex = PROTECT(allocVector(REALSXP, nconf));
for (int i = 0; i < npred; i++) {
REAL(confidencex)[i] = confidencev[i];
}
SEXP outputx = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(outputx, 0, mkChar(output));
SEXP vector = PROTECT(allocVector(VECSXP, 3));
SET_VECTOR_ELT(vector, 0, predx);
SET_VECTOR_ELT(vector, 1, confidencex);
SET_VECTOR_ELT(vector, 2, outputx);
UNPROTECT(4);
free(predv);
free(confidencev);
return vector;
}

panic: assignment to entry in nil map on single simple map

I was under the impression that the assignment to entry in nil map error would only happen if we would want to assign to a double map, that is, when a map on a deeper level is trying to be assigned while the higher one doesn't exist, e.g.:
var mm map[int]map[int]int
mm[1][2] = 3
But it also happens for a simple map (though with struct as a key):
package main
import "fmt"
type COO struct {
x int
y int
}
var neighbours map[COO][]COO
func main() {
for i := 0; i < 30; i++ {
for j := 0; j < 20; j++ {
var buds []COO
if i < 29 {
buds = append(buds, COO{x: i + 1, y: j})
}
if i > 0 {
buds = append(buds, COO{x: i - 1, y: j})
}
if j < 19 {
buds = append(buds, COO{x: i, y: j + 1})
}
if j > 0 {
buds = append(buds, COO{x: i, y: j - 1})
}
neighbours[COO{x: i, y: j}] = buds // <--- yields error
}
}
fmt.Println(neighbours)
}
What could be wrong?
You need to initialize neighbours: var neighbours = make(map[COO][]COO)
See the second section in: https://blog.golang.org/go-maps-in-action
You'll get a panic whenever you try to insert a value into a map that hasn't been initialized.
In Golang, everything is initialized to a zero value, it's the default value for uninitialized variables.
So, as it has been conceived, a map's zero value is nil. When trying to use an non-initialized map, it panics. (Kind of a null pointer exception)
Sometimes it can be useful, because if you know the zero value of something you don't have to initialize it explicitly:
var str string
str += "42"
fmt.Println(str)
// 42 ; A string zero value is ""
var i int
i++
fmt.Println(i)
// 1 ; An int zero value is 0
var b bool
b = !b
fmt.Println(b)
// true ; A bool zero value is false
If you have a Java background, that's the same thing: primitive types have a default value and objects are initialized to null;
Now, for more complex types like chan and map, the zero value is nil, that's why you have to use make to instantiate them. Pointers also have a nil zero value. The case of arrays and slice is a bit more tricky:
var a [2]int
fmt.Println(a)
// [0 0]
var b []int
fmt.Println(b)
// [] ; initialized to an empty slice
The compiler knows the length of the array (it cannot be changed) and its type, so it can already instantiate the right amount of memory. All of the values are initialized to their zero value (unlike C where you can have anything inside your array). For the slice, it is initialized to the empty slice [], so you can use append normally.
Now, for structs, it is the same as for arrays. Go creates a struct with all its fields initialized to zero values. It makes a deep initialization, example here:
type Point struct {
x int
y int
}
type Line struct {
a Point
b Point
}
func main() {
var line Line
// the %#v format prints Golang's deep representation of a value
fmt.Printf("%#v\n", line)
}
// main.Line{a:main.Point{x:0, y:0}, b:main.Point{x:0, y:0}}
Finally, the interface and func types are also initialized to nil.
That's really all there is to it. When working with complex types, you just have to remember to initialize them. The only exception is for arrays because you can't do make([2]int).
In your case, you have map of slice, so you need at least two steps to put something inside: Initialize the nested slice, and initialize the first map:
var buds []COO
neighbours := make(map[COO][]COO)
neighbours[COO{}] = buds
// alternative (shorter)
neighbours := make(map[COO][]COO)
// You have to use equal here because the type of neighbours[0] is known
neighbours[COO{}] = make([]COO, 0)

How to make comparator for a map of pairs cpp

My code is like this . Now I want to sort this map of pairs based on increasing order of the second value of pair. I have tried to make a comparator but seems its not correct. Please someone help me correcting it
bool cmp1(map < string, pair < int , int > > a, map < string, pair <int , int > > b)
{
return a.second < b.second;
}
int main()
{
map < string, pair <int , int > > mapi;
mapi["peter"]=make_pair(2,4);
mapi["ravsal"]=make_pair(4,23);
sort(mapi.begin(), mapi.end(), cmp);
return 0;
}

pointer to arrays of struct

struct a{
double array[2][3];
};
struct b{
double array[3][4];
};
void main(){
a x = {{1,2,3,4,5,6}};
b y = {{1,2,3,4,5,6,7,8,9,10,11,12}};
}
I have two structs, inside which there are two dim arrays with different sizes. If I want to define only one function, which can deal with both x and y (one for each time), i.e., the function allows both x.array and y.array to be its argument. How can I define the input argument? I think I should use a pointer.... But **x.array seems not to work.
For example, I want to write a function PrintArray which can print the input array.
void PrintArray( ){}
What should I input into the parenthesis? double ** seems not work for me... (we can let dimension to be the PrintArray's argument as well, telling them its 2*3 array)
Write a function that takes three parameters: a pointer, the number of rows, and the number of columns. When you call the function, reduce the array to a pointer.
void PrintArray(const double *a, int rows, int cols) {
int r, c;
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; ++c) {
printf("%3.1f ", a[r * cols + c]);
}
printf("\n");
}
}
int main(){
struct a x = {{{1,2,3},{4,5,6}}};
struct b y = {{{1,2,3,4},{5,6,7,8},{9,10,11,12}}};
PrintArray(&x.array[0][0], 2, 3);
PrintArray(&y.array[0][0], 3, 4);
return 0;
}

Handling large groups of numbers

Project Euler problem 14:
The following iterative sequence is
defined for the set of positive
integers:
n → n/2 (n is even) n → 3n + 1 (n is
odd)
Using the rule above and starting with
13, we generate the following
sequence: 13 → 40 → 20 → 10 → 5 → 16 →
8 → 4 → 2 → 1
It can be seen that this sequence
(starting at 13 and finishing at 1)
contains 10 terms. Although it has not
been proved yet (Collatz Problem), it
is thought that all starting numbers
finish at 1.
Which starting number, under one
million, produces the longest chain?
My first instinct is to create a function to calculate the chains, and run it with every number between 1 and 1 million. Obviously, that takes a long time. Way longer than solving this should take, according to Project Euler's "About" page. I've found several problems on Project Euler that involve large groups of numbers that a program running for hours didn't finish. Clearly, I'm doing something wrong.
How can I handle large groups of numbers quickly?
What am I missing here?
Have a read about memoization. The key insight is that if you've got a sequence starting A that has length 1001, and then you get a sequence B that produces an A, you don't to repeat all that work again.
This is the code in Mathematica, using memoization and recursion. Just four lines :)
f[x_] := f[x] = If[x == 1, 1, 1 + f[If[EvenQ[x], x/2, (3 x + 1)]]];
Block[{$RecursionLimit = 1000, a = 0, j},
Do[If[a < f[i], a = f[i]; j = i], {i, Reverse#Range#10^6}];
Print#a; Print[j];
]
Output .... chain length´525´ and the number is ... ohhhh ... font too small ! :)
BTW, here you can see a plot of the frequency for each chain length
Starting with 1,000,000, generate the chain. Keep track of each number that was generated in the chain, as you know for sure that their chain is smaller than the chain for the starting number. Once you reach 1, store the starting number along with its chain length. Take the next biggest number that has not being generated before, and repeat the process.
This will give you the list of numbers and chain length. Take the greatest chain length, and that's your answer.
I'll make some code to clarify.
public static long nextInChain(long n) {
if (n==1) return 1;
if (n%2==0) {
return n/2;
} else {
return (3 * n) + 1;
}
}
public static void main(String[] args) {
long iniTime=System.currentTimeMillis();
HashSet<Long> numbers=new HashSet<Long>();
HashMap<Long,Long> lenghts=new HashMap<Long, Long>();
long currentTry=1000000l;
int i=0;
do {
doTry(currentTry,numbers, lenghts);
currentTry=findNext(currentTry,numbers);
i++;
} while (currentTry!=0);
Set<Long> longs = lenghts.keySet();
long max=0;
long key=0;
for (Long aLong : longs) {
if (max < lenghts.get(aLong)) {
key = aLong;
max = lenghts.get(aLong);
}
}
System.out.println("number = " + key);
System.out.println("chain lenght = " + max);
System.out.println("Elapsed = " + ((System.currentTimeMillis()-iniTime)/1000));
}
private static long findNext(long currentTry, HashSet<Long> numbers) {
for(currentTry=currentTry-1;currentTry>=0;currentTry--) {
if (!numbers.contains(currentTry)) return currentTry;
}
return 0;
}
private static void doTry(Long tryNumber,HashSet<Long> numbers, HashMap<Long, Long> lenghts) {
long i=1;
long n=tryNumber;
do {
numbers.add(n);
n=nextInChain(n);
i++;
} while (n!=1);
lenghts.put(tryNumber,i);
}
Suppose you have a function CalcDistance(i) that calculates the "distance" to 1. For instance, CalcDistance(1) == 0 and CalcDistance(13) == 9. Here is a naive recursive implementation of this function (in C#):
public static int CalcDistance(long i)
{
if (i == 1)
return 0;
return (i % 2 == 0) ? CalcDistance(i / 2) + 1 : CalcDistance(3 * i + 1) + 1;
}
The problem is that this function has to calculate the distance of many numbers over and over again. You can make it a little bit smarter (and a lot faster) by giving it a memory. For instance, lets create a static array that can store the distance for the first million numbers:
static int[] list = new int[1000000];
We prefill each value in the list with -1 to indicate that the value for that position is not yet calculated. After this, we can optimize the CalcDistance() function:
public static int CalcDistance(long i)
{
if (i == 1)
return 0;
if (i >= 1000000)
return (i % 2 == 0) ? CalcDistance(i / 2) + 1 : CalcDistance(3 * i + 1) + 1;
if (list[i] == -1)
list[i] = (i % 2 == 0) ? CalcDistance(i / 2) + 1: CalcDistance(3 * i + 1) + 1;
return list[i];
}
If i >= 1000000, then we cannot use our list, so we must always calculate it. If i < 1000000, then we check if the value is in the list. If not, we calculate it first and store it in the list. Otherwise we just return the value from the list. With this code, it took about ~120ms to process all million numbers.
This is a very simple example of memoization. I use a simple list to store intermediate values in this example. You can use more advanced data structures like hashtables, vectors or graphs when appropriate.
Minimize how many levels deep your loops are, and use an efficient data structure such as IList or IDictionary, that can auto-resize itself when it needs to expand. If you use plain arrays they need to be copied to larger arrays as they expand - not nearly as efficient.
This variant doesn't use an HashMap but tries only to not repeat the first 1000000 numbers. I don't use an hashmap because the biggest number found is around 56 billions, and an hash map could crash.
I have already done some premature optimization. Instead of / I use >>, instead of % I use &. Instead of * I use some +.
void Main()
{
var elements = new bool[1000000];
int longestStart = -1;
int longestRun = -1;
long biggest = 0;
for (int i = elements.Length - 1; i >= 1; i--) {
if (elements[i]) {
continue;
}
elements[i] = true;
int currentStart = i;
int currentRun = 1;
long current = i;
while (current != 1) {
if (current > biggest) {
biggest = current;
}
if ((current & 1) == 0) {
current = current >> 1;
} else {
current = current + current + current + 1;
}
currentRun++;
if (current < elements.Length) {
elements[current] = true;
}
}
if (currentRun > longestRun) {
longestStart = i;
longestRun = currentRun;
}
}
Console.WriteLine("Longest Start: {0}, Run {1}", longestStart, longestRun);
Console.WriteLine("Biggest number: {0}", biggest);
}

Resources