Is there a very simple graphite tutorial available somewhere? - graphite

Given that I have Graphite installed within Docker, does anyone know of a very simple graphite tutorial somewhere that shows how to feed in data, then plot the data on a graph in Graphite Webapp? I mean the very basic things and not the endless configurations and pages after pages of setting various components up.
I know there is the actual Graphite documentation but it is setup after setup after setup of the various components. It is enough to drive anyone away from using Graphite.
Given that Graphite is running within Docker, as a start I just need to know the step of feeding in data using text, display the data in Graphite Web App, and query the data back.

I suppose that you containerized and configured all the graphite components.
First, be sure that you published plaintext and pickle port if you plan to feed graphite from the local or external host. (default: 2003-2004 )
After that, according to the documentation you can perform a simple Netcat command to send metrics over TCP/UDP to carbon with the format <metric path> <metric value> <metric timestamp>
while true; do
echo "local.random.diceroll $RANDOM `date +%s`" | nc -q 1 ${SERVER} ${PORT}
done
You should see in graphite-web GUI the path local/rando/diceroll generated with a graph of random integers.
Ref: https://graphite.readthedocs.io/en/latest/feeding-carbon.html

Related

How to send 50.000 HTTP requests in a few seconds?

I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.

WSO2 clustering in a distributed deployment

I am trying to understand clustering concept of WSO2. My basic understanding of cluster is that there are 2 or more server with same function using VIP or load balance in front. So I would like to know which of the WSO2 components can be clustered. I am trying to achieve configuration mentioned in this diagram.
Image of Config I am trying to achieve:
Can this configuration is achievable or not?
Can we cluster 2 Publisher nodes and 2 store nodes or not?
And how do we cluster Key Manager use same setting as Identity Manager?
Should we use port offset when running 2 components on the same server? And if yes how we make sure that components are using the ports as mentioned in port offset?
Should we create separate external database for each CarnonDB datasource entry in master_datasource.xml file or we can keep using local H2 database for this. I have created following databases Let me know if I am correct in doing this or not. wso2 databases I created:
I made several copies of wso2 binary files as shown in Image and copied them to the servers where I want to run 2 components on same server. Is this correct way of running 2 components on same server?
For Load balancing which components should we load balance and what ports should be use for load balancing?
That configuration is achievable. But Analytics servers are best to run on separate servers as they utilize a lot of resources.
Yes, you can.
Yes, you need port-offset. If you're on Linux, you can use netstat -pln command and filter by server PID.
Every server needs a local database and other databases are shared as mentioned in https://docs.wso2.com/display/CLUSTER44x/Clustering+API+Manager+2.0.0
Having copies is one way of doing that. Another way is letting a single server act as multiple components. For example, you can run publisher and store components together. You can see the recommended patterns in https://docs.wso2.com/display/AM210/Deployment+Patterns.
Except for Traffic manager, you can load balance every other component. For traffic manager, you can use fail-over. Here are the ports you need to load balance.
Servlet port - 9443(https)/9763 (For admin console and admin services)
NIO port - 8243(https)/8280 (For API calls at gateway)

How to target a single back end "Node" to process a client request

I have a Java EE application that resides on multiple servers over multiple sites.
Each instance of the application produces logs locally.
The Java EE application also communicates with IBM Mainframe CICS applications via SOAP/HTTP.
These CICS applications execute in multiple CICS regions over multiple mainframe LPARS over multiple sites.
Like the Java EE application the CICS application produces logs locally.
Attempting to trouble shoot issues is extremely time consuming. This entails support staff having to manually log onto UNIX servers and or mainframe LPARS tracking down all related Logs for a particular issue.
One solution we are looking at is to create a single point that collects all distributed logs from both UNIX and Mainframe.
Another area we are looking at is whether or not its possible to drive client traffic to designated Java EE servers and IBM Mainframe LAPS right down to a particular application server node and a single IBM CICS region.
We would only want to do this for "synthetic" client calls, e.g. calls generated by our support staff, not "real" customer traffic.
Is this possible?
So for example say we had 10 UNIX servers distributed over two geographical sites as follows:-
Geo One: UNIX_1, UNIX_3, UNIX_5, UNIX_7, UNIX_9
Geo Two: UNIX_2, UNIX_4, UNIX_6, UNIX_8, UNIX_0
Four IBM Mainframe lpars over two two geographical sites as follows:-
Geo One: lpar_a, lpar_c
Geo Two: lpar_b, lpar_d
each lpar has 8 cics regions
cicsa_1, cicsa_2... cicsa_8
cicsb_1, cicsb_2... cicsb_8
cicsc_1, cicsc_2... cicsc_8
cicsd_1, cicsd_2... cicsd_8
we would want to target a single route for our synthetic traffic of
unix_5 > lpar_b, > cicsb_6
this way we will know where to look for the log output on all platforms
UPDATE - 0001
By "synthetic traffic" I mean that our support staff would make client calls to our back end API's instead of "Real" front end users.
If our support staff could specify the exact route these synthetic calls traversed, they would know exactly which log files to search at each step.
These log files are very large 10's of MB each and there are many of them
for example, one of our applications runs on 64 UNIX physical servers, split across 2 geographical locations. Each UNIX server hosts multiple application server nodes, each node produces multiple log files, each of these log files are 10MB+. the log files roll over so log output can be lost very quickly .
One solution we are looking at is to create a single point that
collects all distributed logs from both UNIX and Mainframe.
I believe collecting all logs into a single point is the way to go. When the log files roll over, perhaps you could SFTP them to your single point as part of that rolling over process. Or use NFS mounts to copy them.
I think you can make your synthetic traffic solution work, but I'm not sure what it accomplishes.
You could have your Java applications send to a synthetic URL, which is mapped by DNS to a single CICS region containing a synthetic WEBSERVICE definition, synthetic PIPELINE definition, and a synthetic URIMAP definition which in turn maps to a synthetic transaction which is defined to run locally. The local part of the definition should keep it from being routed to another CICS region in the CICSPlex.
In order to get the synthetic URIMAP you would have to run your WSDL through the IBM tooling (DFHWS2LS or DFHLS2WS) with a URI control card indicating your synthetic URL. You would also use the TRANSACTION control card to point to your synthetic transaction defined to run locally.
I think this is seriously twisting the CICS definitions such that it barely resembles your non-synthetic environment - and that's provided it would work at all, I am not a CICS Systems Programmer and yours might read this and conclude my sanity is in question. Your auditors, on the other hand, may simply ask for my head on a platter.
All of the extra definitions are needed (IMHO) to defeat the function of the CICSPlex, which is to load balance incoming requests, sending them to the CICS region that is best able to service them. You need some requests to go to a specific region, short-circuiting all the load balancing being done for you.

Machine's uptime in OpenStack

I would like to know (and retrieve via REST API) the uptime of individual VMs running in OpenStack.
I was quite surprised that OpenStack web UI has a colon called "Uptime" but it actually show time since the VM was created. If i stop the VM, the UI shows Status=Shutoff, Power State=Shutdown, but the Uptime is still being incremented...
Is there a "real" uptime (I mean for a machine that is UP)?
Can I retrieve it somehow via the OpenStack's REST API?
I saw the comment at How can I get VM instance running time in openstack via python API? but the page with the extension mentioned there does not exists and it looks to me that this extension will not be available in all OpenStack environment. I would like to have some standard way to retrieve the uptime.
Thanks.
(Version Havana)
I haven't seen any documentation saying this is the reason, but the nova-scheduler doesn't differentiate between a running and powered off instance. So your cloud can't be over-allocated or leave an instance in a position that would be unable to be powered on. I would like to see a metric of actual system runtime as well, but at the moment the only way to gather that would be through ceilometer or via Rackspaces StackTach

Should i run Carbon-relay or carbon-cashe or both?

I want to ask about the Graphite carbon daemons.
https://graphite.readthedocs.org/en/latest/carbon-daemons.html
I would like to ask while running a carbon-rely.py, should i also run carbon-cache.py or the relay is okay?
Regards
Murtaza
Carbon relay is used when you set up a cluster of graphite instances. However, a carbon cache does not need a cluster;
Reg Carbon cache: As we all know that write operations are expensive; Graphite enables collected data to be collected in a cache where the graphite webapp can be used irrespective of a cluster to read and display the most recent data recorded into graphite ( irrespective of whether it was written into disk).
Hope this answers your question.
Carbon-relay only resends data to one or more destinations, so it needed only if you want fork data into several points. Example schemas can be:
save locally and resend to another node (cache or temporary-storage and relay)
resend all data into multiple remote daemons (multiple remote storages)
save all data in multiple local daemons (parallel storage & redundancy)
save different data sets in multiple local daemons (performance)
... other cases ...
So,
in case you need store data locally - you have to use carbon-cache.
in case you need fork data flow on the node,- you have to use carbon-relay before or instead carbon-cache

Resources