How to use go's TestMain with terratest? - integration-testing

I would like to use go test and the terratest library for an integration test of a cluster with ~10 different components (pods, services, load balancers, links between components, etc). Tools used to build the infrastructure are terraform, kubernetes, and helm. Building the infrastructure takes approx. 10 minutes, so that I do not want to do it separately for every test. My solution suggests to use the pattern of setting up the test infrastructure in TestMain(*testing.M) and to group tests in test suits like TestAuth(*testing.T), TestMonitoring(*testing.T), etc. Now, I need to call terratest components such as terraform.InitAndApply(*testing.T, terraformOptions) outside a test suite that -- apparently to me-- is not possible.
I tried the following:
func TestMain(m *testing.M) {
setupInfrastructure()
rc = m.Run()
teadDownInfrastructure()
os.Exit(rc)
}
func setupInfrastructure() {
terraformOptions := &terraform.Options{
TerraformDir: testFolder,
EnvVars: map[string]string{
"TF_VAR_cluster_size": 3,
},
}
terraform.InitAndApply(t, terraformOptions) // <-- this is the problem
}
As this is the natural way of setting up a comprehensive test infrastructure, what do I miss?
I saw that all terratest samples (https://github.com/gruntwork-io/terratest/tree/master/test) use one test suite, stages and sub-tests, which I do not want to do as it gives up most of the features from go testing. Is this really the only way to do the job?

Related

Is it possible to create dynamic jobs with Dagster?

Consider this example - you need to load table1 from source database, do some generic transformations (like convert time zones for timestamped columns) and write resulting data into Snowflake. This is an easy one and can be implemented using 3 dagster ops.
Now, imagine you need to do the same thing but with 100s of tables. How would you do it with dagster? Do you literally need to create 100 jobs/graphs? Or can you create one job, that will be executed 100 times? Can you throttle how many of these jobs will run at the same time?
You have a two main options for doing this:
Use a single job with Dynamic Outputs:
With this setup, all of your ETLs would happen in a single job. You would have an initial op that would yield a DynamicOutput for each table name that you wanted to do this process for, and feed that into a set of ops (probably organized into a graph) that would be run on each individual DynamicOutput.
Depending on what executor you're using, it's possible to limit the overall step concurrency (for example, the default multiprocess_executor supports this option).
Create a configurable job (I think this is more likely what you want)
from dagster import job, op, graph
import pandas as pd
#op(config_schema={"table_name": str})
def extract_table(context) -> pd.DataFrame:
table_name = context.op_config["table_name"]
# do some load...
return pd.DataFrame()
#op
def transform_table(table: pd.DataFrame) -> pd.DataFrame:
# do some transform...
return table
#op(config_schema={"table_name": str})
def load_table(context, table: pd.DataFrame):
table_name = context.op_config["table_name"]
# load to snowflake...
#job
def configurable_etl():
load_table(transform_table(extract_table()))
# this is what the configuration would look like to extract from table
# src_foo and load into table dest_foo
configurable_etl.execute_in_process(
run_config={
"ops": {
"extract_table": {"config": {"table_name": "src_foo"}},
"load_table": {"config": {"table_name": "dest_foo"}},
}
}
)
Here, you create a job that can be pointed at a source table and a destination table by giving the relevant ops a config schema. Depending on those config options, (which are provided when you create a run through the run config), your job will operate on different source / destination tables.
The example shows explicitly running this job using python APIs, but if you're running it from Dagit, you'll also be able to input the yaml version of this config there. If you want to simplify the config schema (as it's pretty nested as shown), you can always create a Config Mapping to make the interface nicer :)
From here, you can limit run concurrency by supplying a unique tag to your job, and using a QueuedRunCoordinator to limit the maximum number of concurrent runs for that tag.

K6 Load Testing - How to run different scenarios at the same time

I have written a simple K6 Load testing script that performs a successful login.
I have written a separate K6 Load testing script that performs an unsuccessful login attempt
They are currently separate scripts that you have to run on their own.
What I want to know is how do you simulate users performing different scenarios in one load test? e.g. valid login, invalid login, logout, any other actions.
Do you put the different scenarios all in one script?
There are two approaches the "old" and the "new" (from v0.27.0) onward.
The old approach is to have a default function that chooses to do one or the other on some condition, for example, each third VU iteration is unsuccessful, the others are successfule ones:
export default function() {
if (__ITER % 3 == 2) {
call_to_unsuccessful_login();
} else {
call_to_successful_login();
}
}
In the above example, you obviously need to define the two functions either in the same script or import them from another
After v0.27.0 and the new execution model, you have multiple scenarios using different executors each execution a different "default" function.
So in this case instead of having one default function which chooses, we can configure different execution plans for the successful and unsuccessful logins and directly call the functions that do them.
export let options = {
"scenarios": {
"successful": {
"executor": "constant-vus".
"vus": 2,
"duration": 1m,
"exec": "call_to_successful_login"
},
"unsuccessful": {
"executor": "constant-vus".
"vus": 1,
"duration": 1m,
"exec": "call_to_unsuccessful_login"
}
}
}
In this case both call... functions need to also be exported in the main script.
You can read more on how to configure scenarios and their different options in the documentation.

How am I sure that my process has a user interface?

When running a class which can be used interactively or silently by batch, I want to display a hourglass, only if in interactive mode.
I found the function xGlobal::clientKind() , read below, but not sure it is sufficient (can't batches also run on Client ?)
if (xGlobal::clientKind() == ClientType::Client)
startLengthyOperation();
// here do the process
if (xGlobal::clientKind() == ClientType::Client)
endLengthyOperation();
Do not bother to test client kind when using startLengthyOperation, the method does a sufficient test itself.
Testing should be like this:
if (clientKind() == ClientType::Client)
...
Don't use xGlobal::clientKind, use without qualification.
The ClientType has four values, matching what you see in "Online Users".
Batch can be called interactively in Basic/Periodic/Batch, but it should be rarely used.

File locks in R

The short version
How would I go about blocking the access to a file until a specific function that both involves read and write processes to that very file has returned?
The use case
I often want to create some sort of central registry and there might be more than one R process involved in reading from and writing to that registry (in kind of a "poor man's parallelization" setting where different processes run independently from each other except with respect to the registry access).
I would not like to depend on any DBMS such as SQLite, PostgreSQL, MongoDB etc. early on in the devel process. And even though I later might use a DBMS, a filesystem-based solution might still be a handy fallback option. Thus I'm curious how I could realize it with base R functionality (at best).
I'm aware that having a lot of reads and writes to the file system in a parallel setting is not very efficient compared to DBMS solutions.
I'm running on MS Windows 8.1 (64 Bit)
What I'd like to get a deeper understanding of
What actually exactly happens when two or more R processes try to write to or read from a file at the same time? Does the OS figure out the "accesss order" automatically and does the process that "came in second" wait or does it trigger an error as the file access might is blocked by the first process? How could I prevent the second process from returning with an error but instead "just wait" until it's his turn?
Shared workspace of processes
Besides the rredis Package: are there any other options for shared memory on MS Windows?
Illustration
Path to registry file:
path_registry <- file.path(tempdir(), "registry.rdata")
Example function that registers events:
registerEvent <- function(
id=gsub("-| |:", "", Sys.time()),
values,
path_registry
) {
if (!file.exists(path_registry)) {
registry <- new.env()
save(registry, file=path_registry)
} else {
load(path_registry)
}
message("Simulated additional runtime between reading and writing (5 seconds)")
Sys.sleep(5)
if (!exists(id, envir=registry, inherits=FALSE)) {
assign(id, values, registry)
save(registry, file=path_registry)
message(sprintf("Registering with ID %s", id))
out <- TRUE
} else {
message(sprintf("ID %s already registered", id))
out <- FALSE
}
out
}
Example content that is registered:
x <- new.env()
x$a <- TRUE
x$b <- letters[1:5]
Note that the content usually is "nested", i.e. RDBMS would not be really "useful" anyway or at least would involve some normalization steps before writing to the DB. That's why I prefer environments (unique variable IDs and pass-by-reference is possible) over lists and, if one does make the step to use a true DBMS, I would rather turn NoSQL approaches such as MongoDB.
Registration cycle:
The actual calls might be spread over different processes, so there is a possibility of concurrent access atempts.
I want to have other processes/calls "wait" until a registerEvent read-write cycle is finished before doing their read-write cycle (without triggering errors).
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
Check registry content:
load(path_registry)
ls(registry)
See filelock R package, available since 2018. It is cross-platform. I am using it on Windows and have not found a single problem.
Make sure to read the documentation.
?filelock::lock
Although the docs suggest to leave the lock file, I have had no problems removing it on function exit in a multi-process environment:
on.exit({filelock::unlock(lock); file.remove(path.lock)})

Apache camel using seda

I want to have a behavior like this:
Camel reads a file from a directory, splits it into chunks (using streaming), sends each chunk to a seda queue for concurrent processing, and after the processing is done, a report generator is invoked.
This is my camel route:
from("file://c:/mydir?move=.done")
.to("bean:firstBean")
.split(ExpressionBuilder.beanExpression("splitterBean", "split"))
.streaming()
.to("seda:processIt")
.end()
.to("bean:reportGenerator");
from("seda:processIt")
.to("bean:firstProcessingBean")
.to("bean:secondProcessingBean");
When I run this, the reportGenerator bean is run concurrently with the seda processing.
How to make it run once after the whole seda processing is done?
The splitter has built-in parallel so you can do this easier as follows:
from("file://c:/mydir?move=.done")
.to("bean:firstBean")
.split(ExpressionBuilder.beanExpression("splitterBean", "split"))
.streaming().parallelProcessing()
.to("bean:firstProcessingBean")
.to("bean:secondProcessingBean");
.end()
.to("bean:reportGenerator");
You can see more details about the parallel option at the Camel splitter page: http://camel.apache.org/splitter
I think you can use the delayer pattern of Camel on the second route to achieve the purpose.
delay(long) in which the argument indicates time in milliseconds. You can read more abuout this pattern here
For eg; from("seda:processIt").delay(2000)
.to("bean:firstProcessingBean"); //delays this route by 2 seconds
I'd suggest the usage of startupOrder to configure the startup of route though.
The official documentation provides good details on the topic. Kindly read it here
Point to note - " The routes with the lowest startupOrder is started first. All startupOrder defined must be unique among all routes in your CamelContext."
So, I'd suggest something like this -
from("endpoint1").startupOrder(1)
.to("endpoint2");
from("endpoint2").startupOrder(2)
.to("endpoint3");
Hope that helps..
PS : I'm new to Apache Camel and to stackoverflow as well. Kindly pardon any mistake that might've occured.

Resources