Undefined reference when using intrinsic - intel

I want to test the SIMD intrinsic of xeon phi. So I wrote following code:
#pragma offload target(mic) in(a:length(N))
#pragma omp parallel for
for(int i=0;i<16;++i){
__m512i p ;
p = _mm512_loadunpackhi_epi64(p, &a[i*10]);
}
When compiling, icpc gave me undefined reference error
/tmp/icpc3kLMRg.o: In function `main':
./src/test.cc:(.text+0x2e8): undefined reference to `_mm512_extloadunpackhi_epi64'
make: *** [test.cc] Error 1
Is there any other header files to be included besides immintrin.h

The compiler compiles for the host as well as for the xeon phi. The host doesn't support the function you are trying to call so you need to do this:
#ifdef __MIC__
#pragma offload target(mic) in(a:length(N))
#pragma omp parallel for
for(int i=0;i<16;++i){
__m512i p ;
p = _mm512_loadunpackhi_epi64(p, &a[i*10]);
}
#else
<do something differnt on the host (or nothing)>
#endif

Related

OpenMP code in CUDA source file not compiling on Google Colab

I am trying to run a simple Hello World program with OpenMP directives on Google Colab using OpenMP library and CUDA. I have followed this tutorial but I am getting an error even if I am trying to include %%cu in my code. This is my code-
%%cu
#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
/* Main Program */
int main(int argc , char **argv)
{
int Threadid, Noofthreads;
printf("\n\t\t---------------------------------------------------------------------------");
printf("\n\t\t Objective : OpenMP program to print \"Hello World\" using OpenMP PARALLEL directives\n ");
printf("\n\t\t..........................................................................\n");
/* Set the number of threads */
/* omp_set_num_threads(4); */
/* OpenMP Parallel Construct : Fork a team of threads */
#pragma omp parallel private(Threadid)
{
/* Obtain the thread id */
Threadid = omp_get_thread_num();
printf("\n\t\t Hello World is being printed by the thread : %d\n", Threadid);
/* Master Thread Has Its Threadid 0 */
if (Threadid == 0) {
Noofthreads = omp_get_num_threads();
printf("\n\t\t Master thread printing total number of threads for this execution are : %d\n", Noofthreads);
}
}/* All thread join Master thread */
return 0;
}
And this is the error I am getting-
/tmp/tmpxft_00003eb7_00000000-10_15fcc2da-f354-487a-8206-ea228a09c770.o: In function `main':
tmpxft_00003eb7_00000000-5_15fcc2da-f354-487a-8206-ea228a09c770.cudafe1.cpp:(.text+0x54): undefined reference to `omp_get_thread_num'
tmpxft_00003eb7_00000000-5_15fcc2da-f354-487a-8206-ea228a09c770.cudafe1.cpp:(.text+0x78): undefined reference to `omp_get_num_threads'
collect2: error: ld returned 1 exit status
Without OpenMP directives, a simple Hello World program is running perfectly as can be seen below-
%%cu
#include <iostream>
int main()
{
std::cout << "Welcome To GeeksforGeeks\n";
return 0;
}
Output-
Welcome To GeeksforGeeks
There are two problems here:
nvcc doesn't enable or natively support OpenMP compilation. This has to be enabled by additional command line arguments passed through to the host compiler (gcc by default)
The standard Google Colab/Jupyter notebook plugin for nvcc doesn't allow passing of extra compilation arguments, meaning that even if you solve the first issue, it doesn't help in Colab or Jupyter.
You can solve the first problem as described here, and you can solve the second as described here and here.
Combining these in Colab got me this:
and then this:

Rcpp not compiling with Intel MIC

I am trying to make this minimal Rcpp/Intel pragma code to work, however running into some pretty big errors which I am struggling to overcome.
Code
This is the full code that I am trying to run - it is a simple text read out showing if the target Xeon Phi device is engaged as per this website: Offload Computations from Servers with an Intel® Xeon Phi™ Processor, and this is based on a sample code found here: Lightning-Fast R Machine Learning Algorithms:
library(Rcpp)
library(inline)
# Create and register a Rcpp plugin
plug <- Rcpp:::Rcpp.plugin.maker(
include.before = "#include <stdint.h>
#include <stdio.h>
#include <omp.h>"
)
registerPlugin("daalNB", plug)
whatCPU <-
'
#pragma omp declare target
void what_cpu()
{
uint32_t eax;
const uint32_t xeon_phi_x100_id = 0x00010;
const uint32_t xeon_phi_x200_id = 0x50070;
__asm volatile("cpuid":"=a"(eax):"a"(1));
uint32_t this_cpu_id = eax & 0xF00F0;
if (this_cpu_id == xeon_phi_x100_id)
printf("This CPU is Intel(R) XeonPhi(TM) x100 Processor!");
else
if (this_cpu_id == xeon_phi_x200_id)
printf("This CPU is Intel(R) XeonPhi(TM) x200 Processor!");
else
printf("This CPU is other Intel(R) Processor.");
}
'
offloadExampleRcpp <-
'
//[[Rcpp::plugins(openmp)]]
printf("Running on host: ");
what_cpu();
#pragma offload target(mic:0)
{
printf("Running on target: ");
what_cpu();
}
'
runOffloadExample <- cxxfunction(sig = signature(), body = offloadExampleRcpp, plugin="daalNB", includes = '
//[[Rcpp::plugins(openmp)]]
#pragma omp declare target
void what_cpu()
{
uint32_t eax;
const uint32_t xeon_phi_x100_id = 0x225d;
const uint32_t xeon_phi_x200_id = 0x50070;
__asm volatile("cpuid":"=a"(eax):"a"(1));
uint32_t this_cpu_id = eax & 0xF00F0;
if (this_cpu_id == xeon_phi_x100_id)
printf("This CPU is Intel(R) XeonPhi(TM) x100 Processor!");
else
if (this_cpu_id == xeon_phi_x200_id)
printf("This CPU is Intel(R) XeonPhi(TM) x200 Processor!");
else
printf("This CPU is other Intel(R) Processor.");
}
', verbose = 2)
runOffloadExample()
What I have tried and Errors:
I have set up the software stack the the Xeon Phi properly, and this can be confirmed by when I compile the .c code (that is wrapped inside Rcpp in the above code) outside of R, using the Intel icc compiler, it is successful; namely I am able to get the exact output as show in the Intel website processor.
It seems however, that when the same .c code is wrapped inside Rcpp, the following errors arise which prevent compilation (this a sample from a much longer readout):
Compilation argument:
/usr/local/lib64/R/bin/R CMD SHLIB file306737f15222.cpp 2> file306737f15222.cpp.err.txt
/opt/intel/compilers_and_libraries_2017.1.132/linux/bin/intel64/icc -I/usr/local/lib64/R/include -DNDEBUG -I"/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include -fpic -qopenmp -c file306737f15222.cpp -o file306737f15222.o
file306737f15222.cpp(52): warning #2571: variable has not been declared with compatible "target" attribute
BEGIN_RCPP
^
file306737f15222.cpp(63): warning #2570: function has not been declared with compatible "target" attribute
END_RCPP
^
file306737f15222.cpp(63): warning #2570: function has not been declared with compatible "target" attribute
END_RCPP
^
Furthermore, the above error readout is followed by tonnes and tonnes of the below errors (again, I have only included a sample for the sake of brevity, however I believe that the other errors are all related to the same issue as posed in my question):
/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/protection/Shelter.h(34): warning #2570: function has not been declared with compatible "target" attribute
Rcpp_unprotect(nprotected) ;
^
detected during:
instantiation of "Rcpp::Shelter<T>::~Shelter() [with T=SEXP]" at line 323 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/exceptions.h"
instantiation of "SEXP exception_to_condition_template(const Exception &, bool) [with Exception=Rcpp::exception]" at line 339 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/exceptions.h"
/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/protection/Shelter.h(30): warning #2570: function has not been declared with compatible "target" attribute
return Rcpp_protect(x) ;
^
detected during:
instantiation of "SEXP Rcpp::Shelter<T>::operator()(SEXP) [with T=SEXP]" at line 326 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/exceptions.h"
instantiation of "SEXP exception_to_condition_template(const Exception &, bool) [with Exception=Rcpp::exception]" at line 339 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/exceptions.h"
/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/utils/tinyformat/tinyformat.h(218): warning #2570: function has not been declared with compatible "target" attribute
static void invoke(std::ostream& /*out*/, const T& /*value*/) { TINYFORMAT_ASSERT(0); }
^
detected during:
instantiation of "void tinyformat::detail::formatValueAsType<T, fmtT, convertible>::invoke(std::ostream &, const T &) [with T=const char *, fmtT=char, convertible=false]" at line 329
instantiation of "void tinyformat::formatValue(std::ostream &, const char *, const char *, int, const T &) [with T=const char *]" at line 528
instantiation of "void tinyformat::detail::FormatArg::formatImpl<T>(std::ostream &, const char *, const char *, int, const void *) [with T=const char *]" at line 504
instantiation of "tinyformat::detail::FormatArg::FormatArg(const T &) [with T=const char *]" at line 881
instantiation of "tinyformat::detail::FormatListN<N>::FormatListN(const Args &...) [with N=1, Args=<const char *>]" at line 930
instantiation of "tinyformat::detail::FormatListN<<expression>> tinyformat::makeFormatList(const Args &...) [with Args=<const char *>]" at line 966
instantiation of "void tinyformat::format(std::ostream &, const char *, const Args &...) [with Args=<const char *>]" at line 975
instantiation of "std::string tinyformat::format(const char *, const Args &...) [with Args=<const char *>]" at line 226 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/exceptions.h"
instantiation of "Rcpp::not_compatible::not_compatible(const char *, Args &&...) [with Args=<const char *const &>]" at line 37 of "/home/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/r_cast.h"
Wondering if anyone can help point me to what issues are resulting in the above errors? I recognise that I might be missing something fundamental owing to my in familiarity with C and the Rcpp package and so please excuse this.
Many thanks in advance,

Error compiling E-ACSL FRAMA-C

I am new to Frama-C framework and I am trying to do some contract testing with C programs. I intend to use E-ACSL plugin for this, and I tried a test program to see how it works, but I get some compilation errors. Here is my code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int x = 0;
/*# assert x == 1;*/
/*# assert x == 0;*/
return 0;
}
Then, here is the Frama-C annotated code:
/* Generated by Frama-C */
#include "stdio.h"
#include "stdlib.h"
struct __e_acsl_mpz_struct {
int _mp_alloc ;
int _mp_size ;
unsigned long *_mp_d ;
};
typedef struct __e_acsl_mpz_struct __e_acsl_mpz_struct;
typedef __e_acsl_mpz_struct ( __attribute__((__FC_BUILTIN__)) __e_acsl_mpz_t)[1];
/*# ghost extern int __e_acsl_init; */
/*# ghost extern int __e_acsl_internal_heap; */
extern size_t __e_acsl_heap_allocation_size;
/*#
predicate diffSize{L1, L2}(ℤ i) =
\at(__e_acsl_heap_allocation_size,L1) -
\at(__e_acsl_heap_allocation_size,L2) ≡ i;
*/
int main(void)
{
int __retres;
int x = 0;
/*# assert x ≡ 1; */ ;
/*# assert x ≡ 0; */ ;
__retres = 0;
return __retres;
}
Finally, I try to compile it with gcc and the flags the manual indicates (page 13) but I get the following errors (and warnings):
$ gcc monitored_second.c -o monitored_second -leacsl -leacsl-gmp -leacsl -jemalloc -lpthread -lm
monitored_second.c:10:1: warning: ‘__FC_BUILTIN__’ attribute directive ignored [-Wattributes]
typedef __e_acsl_mpz_struct ( __attribute__((__FC_BUILTIN__)) __e_acsl_mpz_t)[1];
monitored_second.c:18:55: warning: ‘__FC_BUILTIN__’ attribute directive ignored [-Wattributes]
int line);
^
monitored_second.c:25:60: warning: ‘__FC_BUILTIN__’ attribute directive ignored [-Wattributes]
size_t ptr_size);
^
/usr/bin/ld: cannot find -leacsl
/usr/bin/ld: cannot find -leacsl-jemalloc
collect2: error: ld returned 1 exit status
I've also removed the "-rtl-bittree" label because it returns another error.
Frama-C version is the latest: Sulfur-20171101
Got any idea of what is happening?
Thanks!
Normally, you should have a script called e-acsl-gcc.sh installed in the same directory as frama-c binary, that can take care of calling gcc with appropriate options. Its basic usage is documented in section 2.2 of the manual, and man e-acsl-gcc.sh gives more details on the options that can be used. In short, you should be able to type
e-acsl-gcc.sh -c \
--oexec-eacsl=first_monitored \
--oexec=first \
--ocode=first_monitored.i \
first.i
to obtain
an executable first_monitored with the e-acsl instrumentation
an executable first with the original program
a source file first_monitored.i with the e-acsl generated C code
Edit Looking at the linking command used by the script, I'd say that the command line proposed earlier in the manual is out of date (in particular, it refers to eacsl-jemalloc whereas e-acsl-gcc.sh seems to prefer eacsl-dlmalloc), which could probably be reported as a bug at https://bts.frama-c.com

breakpad not generate minidump on erase iterator twice

I find breakpad does not handle sigsegv sometimes.
and i wrote a simple example to reproduce it:
#include <vector>
#include <breakpad/client/linux/handler/exception_handler.h>
int InitBreakpad()
{
char core_file_folder[] = "/tmp/cores/";
google_breakpad::MinidumpDescriptor descriptor(core_file_folder);
auto exception_handler_ =
new google_breakpad::ExceptionHandler(descriptor,
nullptr,
nullptr,
nullptr,
true,
-1);
}
int main()
{
InitBreakpad();
// int* ptr = nullptr;
// *ptr = 1;
std::vector<int> sum;
sum.push_back(1);
auto it = sum.begin();
sum.erase(it);
sum.erase(it);
return 0;
}
and gcc is 4.8.5 and my comiple cmd is
g++ test_breakpad.cpp -I./include -I./include/breakpad -L./lib -lbreakpad -lbreakpad_client -std=c++11 -lpthread
run a.out, get "Segmentation fault" but no minidump is generated.
if i uncomment nullptr write, breakpad works!
what should i do to correct it?
GDB debug output:
(gdb) b google_breakpad::ExceptionHandler::~ExceptionHandler()
Breakpoint 2 at 0x402ed0: file src/client/linux/handler/exception_handler.cc, line 264.
(gdb) c
The program is not being run.
(gdb) r
Starting program: /home/zen/tmp/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, google_breakpad::ExceptionHandler::ExceptionHandler (this=0x619040, descriptor=..., filter=0x0, callback=0x0, callback_context=0x0, install_handler=true, server_fd=-1) at src/client/linux/handler/exception_handler.cc:224
224 ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
and i tried breakpad out of process dump, but still got nothing(nullptr write works).
After some debugging I think that the reason that the sum.erase(it) does not create a minidump in your example is due to stack corruption.
While debugging you can see that the variable g_handler_stack_ in src/client/linux/handler/exception_handler.cc is correctly initialized and the google_breakpad::ExceptionHandler instance is correctly added to the vector. However when google_breakpad::ExceptionHandler::SignalHandler is called the vector is reported empty despite no calls to google_breakpad::ExceptionHandler::~ExceptionHandler or any of the std::vector methods that would change the vector.
Some further data points that point to stack corruption is that the code works with clang++. Additionally, as soon as we change the std::vector<int> sum; to a std::vector<int>* sum, which will ensure that we don't corrupt the stack, the minidump is written to disk.

How to load a dynamic library on demand from a C++ function/Qt method

I have dynamic library created as follows
cat myfile.cc
struct Tcl_Interp;
extern "C" int My_Init(Tcl_Interp *) { return 0; }
1) complile the cc file
g++ -fPIC -c myfile.cc
2) Creating a shared library
g++ -static-libstdc++ -static-libgcc -shared -o libmy.so myfile.o -L/tools/linux64/qt-4.6.0/lib -lQtCore -lQtGui
3) load the library from a TCL proc
then I give command
tclsh
and given command
% load libmy.so
is there any C++ function/ Qt equivalent to load that can load the shared library on demand from another C++ function.
My requirement is to load the dynamic library on run time inside the function and then use the qt functions directly
1) load the qt shared libraries (for lib1.so)
2) call directly the functions without any call for resolve
For example we have dopen, but for that for each function call we have to call dsym. My requirement is only call for shared library then directly call those functions.
You want boilerplate-less delay loading. On Windows, MSVC implements delay loading by emitting a stub that resolves the function through a function pointer. You can do the same. First, let's observe that function pointers and functions are interchangeable if all you do is call them. The syntax for invoking a function or a function pointer is the same:
void foo_impl() {}
void (*foo)() = foo_impl;
int main() {
foo_impl();
foo();
}
The idea is to set the function pointer initially to a thunk that will resolve the real function at runtime:
extern void (*foo)();
void foo_thunk() {
foo = QLibrary::resolve("libmylib", "foo");
if (!foo) abort();
return foo();
}
void (*foo)() = foo_thunk;
int main() {
foo(); // calls foo_thunk to resolve foo and calls foo from libmylib
foo(); // calls foo from libmylib
}
When you first call foo, it will really call foo_thunk, resolve the function address, and call real foo implementation.
To do this, you can split the library into two libraries:
The library implementation. It is unaware of demand-loading.
A demand-load stub.
The executable will link to the demand-load stub library; that is either static or dynamic. The demand-load stub will automatically resolve the symbols at runtime and call into the implementation.
If you're clever, you can design the header for the implementation such that the header itself can be used to generate all the stubs without having to enter their details twice.
Complete Example
Everything follows, it's also available from https://github.com/KubaO/stackoverflown/tree/master/questions/demand-load-39291032
The top-level project consists of:
lib1 - the dynamic library
lib1_demand - the static demand-load thunk for lib1
main - the application that uses lib1_demand
demand-load-39291032.pro
TEMPLATE = subdirs
SUBDIRS = lib1 lib1_demand main
main.depends = lib1_demand
lib1_demand.depends = lib1
We can factor out the cleverness into a separate header. This header allows us to define the library interface so that the thunks can be automatically generated.
The heavy use of preprocessor and a somewhat redundant syntax is needed due to limitations of C. If you wanted to implement this for C++ only, there'd be no need to repeat the argument list.
demand_load.h
// Configuration macros:
// DEMAND_NAME - must be set to a unique identifier of the library
// DEMAND_LOAD - if defined, the functions are declared as function pointers, **or**
// DEMAND_BUILD - if defined, the thunks and function pointers are defined
#if defined(DEMAND_FUN)
#error Multiple inclusion of demand_load.h without undefining DEMAND_FUN first.
#endif
#if !defined(DEMAND_NAME)
#error DEMAND_NAME must be defined
#endif
#if defined(DEMAND_LOAD)
// Interface via a function pointer
#define DEMAND_FUN(ret,name,args,arg_call) \
extern ret (*name)args;
#elif defined(DEMAND_BUILD)
// Implementation of the demand loader stub
#ifndef DEMAND_CAT
#define DEMAND_CAT_(x,y) x##y
#define DEMAND_CAT(x,y) DEMAND_CAT_(x,y)
#endif
void (* DEMAND_CAT(resolve_,DEMAND_NAME)(const char *))();
#if defined(__cplusplus)
#define DEMAND_FUN(ret,name,args,arg_call) \
extern ret (*name)args; \
ret name##_thunk args { \
name = reinterpret_cast<decltype(name)>(DEMAND_CAT(resolve_,DEMAND_NAME)(#name)); \
return name arg_call; \
}\
ret (*name)args = name##_thunk;
#else
#define DEMAND_FUN(ret,name,args,arg_call) \
extern ret (*name)args; \
ret name##_impl args { \
name = (void*)DEMAND_CAT(resolve_,DEMAND_NAME)(#name); \
name arg_call; \
}\
ret (*name)args = name##_impl;
#endif // __cplusplus
#else
// Interface via a function
#define DEMAND_FUN(ret,name,args,arg_call) \
ret name args;
#endif
Then, the dynamic library itself:
lib1/lib1.pro
TEMPLATE = lib
SOURCES = lib1.c
HEADERS = lib1.h
INCLUDEPATH += ..
DEPENDPATH += ..
Instead of declaring the functions directly, we'll use DEMAND_FUN from demand_load.h. If DEMAND_LOAD_LIB1 is defined when the header is included, it will offer a demand-load interface to the library. If DEMAND_BUILD is defined, it'll define the demand-load thunks. If neither is defined, it will offer a normal interface.
We take care to undefine the implementation-specific macros so that the global namespace is not polluted. We can then include multiple libraries the project, each one individually selectable between demand- and non-demand loading.
lib1/lib1.h
#ifndef LIB_H
#define LIB_H
#ifdef __cplusplus
extern "C" {
#endif
#define DEMAND_NAME LIB1
#ifdef DEMAND_LOAD_LIB1
#define DEMAND_LOAD
#endif
#include "demand_load.h"
#undef DEMAND_LOAD
DEMAND_FUN(int, My_Add, (int i, int j), (i,j))
DEMAND_FUN(int, My_Subtract, (int i, int j), (i,j))
#undef DEMAND_FUN
#undef DEMAND_NAME
#ifdef __cplusplus
}
#endif
#endif
The implementation is uncontroversial:
lib1/lib1.c
#include "lib1.h"
int My_Add(int i, int j) {
return i+j;
}
int My_Subtract(int i, int j) {
return i-j;
}
For the user of such a library, demand loading is reduced to defining one macro and using the thunk library lib1_demand instead of the dynamic library lib1.
main/main.pro
if (true) {
# Use demand-loaded lib1
DEFINES += DEMAND_LOAD_LIB1
LIBS += -L../lib1_demand -llib1_demand
} else {
# Use direct-loaded lib1
LIBS += -L../lib1 -llib1
}
QT = core
CONFIG += console c++11
CONFIG -= app_bundle
TARGET = demand-load-39291032
TEMPLATE = app
INCLUDEPATH += ..
DEPENDPATH += ..
SOURCES = main.cpp
main/main.cpp
#include "lib1/lib1.h"
#include <QtCore>
int main() {
auto a = My_Add(1, 2);
Q_ASSERT(a == 3);
auto b = My_Add(3, 4);
Q_ASSERT(b == 7);
auto c = My_Subtract(5, 7);
Q_ASSERT(c == -2);
}
Finally, the implementation of the thunk. Here we have a choice between using dlopen+dlsym or QLibrary. For simplicity, I opted for the latter:
lib1_demand/lib1_demand.pro
QT = core
TEMPLATE = lib
CONFIG += staticlib
INCLUDEPATH += ..
DEPENDPATH += ..
SOURCES = lib1_demand.cpp
HEADERS = ../demand_load.h
lib1_demand/lib1_demand.cpp
#define DEMAND_BUILD
#include "lib1/lib1.h"
#include <QLibrary>
void (* resolve_LIB1(const char * name))() {
auto f = QLibrary::resolve("../lib1/liblib1", name);
return f;
}
Quite apart from the process of loading a library into your C++ code (which Kuber Ober's answer covers just fine) the code that you are loading is wrong; even if you manage to load it, your code will crash! This is because you have a variable of type Tcl_Interp at file scope; that's wrong use of the Tcl library. Instead, the library provides only one way to obtain a handle to an interpreter context, Tcl_CreateInterp() (and a few other functions that are wrappers round it), and that returns a Tcl_Interp* that has already been initialised correctly. (Strictly, it actually returns a handle to what is effectively an internal subclass of Tcl_Interp, so you really can't usefully allocate one yourself.)
The correct usage of the library is this:
Tcl_FindExecutable(NULL); // Or argv[0] if you have it
Tcl_Interp *interp = Tcl_CreateInterp();
// And now, you can use the rest of the API as you see fit
That's for putting a Tcl interpreter inside your code. To do it the other way round, you create an int My_Init(Tcl_Interp*) function as you describe and it is used to tell you where the interpreter is, but then you wouldn't be asking how to load the code, as Tcl has reasonable support for that already.

Resources