How does resource manager acquire resource in HAWQ? - resourcemanager

In HAWQ, how does resource manager acquire resource? And what kind of granularity of resource can be specific by user?

The documentation covers this pretty well.

granularity of resource is virtual segments. There are various kinds of strategies and GUCs to control the virtual segments for different clusters and workloads.
For details, please ref the document:
http://hdb.docs.pivotal.io/20/query/query-performance.html#topic_wv3_gzc_d5

Generally, HAWQ resource manager get resources from YARN in the hadoop cluster and organize these resources into some internal resource which serve query when needed.

Related

Application Insights: Filter website health checks?

I'm using Azure Application Insights on the free tier. We also use amazon AWS health checks that hits a pre-determined page expecting a 200 response it then does things if it gets a different response.
All the requests from AWS are filling up telemetry pretty quickly.
Is there a simple way to filter or exclude these requests?
Can it be done from the App Insights console, or does it require modifying the telemetry collector on the actual application. I'd rather not create my own implementation of the ITelemtryProcessor...
And if i am stuck going that route, would this work to filter AWS Route53 checks?
public void Process(ITelemetry item)
{
if (!string.IsNullOrEmpty(item.Context.Operation.SyntheticSource)) {return;}
this.Next.Process(item);
}
Edit-Update
Has anyone seen this part of the applicationinsights.config I'm not certain by what it means that it will not have correlation headers.
<ExcludeComponentCorrelationHttpHeadersOnDomains>
<!--
Requests to the following hostnames will not be modified by adding correlation headers.
This is only applicable if Profiler is installed via either StatusMonitor or Azure Extension.
Add entries here to exclude additional hostnames.
NOTE: this configuration will be lost upon NuGet upgrade.
-->
<Add>core.windows.net</Add>
<Add>core.chinacloudapi.cn</Add>
<Add>core.cloudapi.de</Add>
<Add>core.usgovcloudapi.net</Add>
<Add>localhost</Add>
<Add>127.0.0.1</Add>
</ExcludeComponentCorrelationHttpHeadersOnDomains>
Does anyone have any other resources or tutorials, the only one i was able to find: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-api-filtering-sampling#filtering
It seems that probably most simple way to implement is to grab a collection from the web.config, define the processor in its own class file, then insert the processor into the chain into the global config...
You'll have to write a telemetry initializer like you have above.
However, you might want to look more specifically at the synthetic source and verify content and only throw away the amazon health checks instead of all synthetic traffic (you could also look at request name, etc to make your decisions), as i'm not exactly sure what information is in those inbound requests from amazon.
Otherwise, you might be throwing away incoming requests/dependencies/exceptions that might occur from your webtests, which would also show up as synthetic.

Does U-SQL allow custom code to call external services

In U-SQL custom code (code behind or Assemblies) can external services be called e.g. bing search or map.
Thanks,
Nasir
This is currently not supported for the following reason:
Imagine that you write a UDF or UDO (e.g., an extractor) that calls a REST endpoint of a service that is used to get a few calls per minute from the same originating IP address. But now you execute this user code in a U-SQL job that is scaled out over millions of rows, running possibly on hundreds of vertices concurrently. This is a - hopefully unintended - distributed denial of service attack against that service. And it most likely will lead to that service experiencing an outage and our IP ranges getting blocked.
Thus, we are currently closing off our containers and recommend that you use other mechanisms (like getting a data set for coordinate translations) instead.

WSO2 ESB Clustered databases and Data Services

I could setup a cluster of wso2 ESB, following the dedicated doc, with a manager and two workers.
I am not sure about two points:
Does each worker node need it's own REGISTRY_LOCAL database ?
With both workers using the same db it works, but I'm not sure it is the way to do and the doc isn't clear about that.
Adding Data Services as a feature ?
Almost no docs about that, but not being able to fetch more than one row is a big limitation for me, so is it possible to add this feature in a clustered environment or is it better to separate the Data Services Servers from the ESB ones ?
If someone has experience in that kind of stuff I will really appreciate a feedback.
Thanks
You can use the same registry database for all the members in the cluster so that they can communicate with each other using that. You may refer my blog [1] for further information. Personally I have never used local registry DBs for each member in the cluster setup.
You cannot fetch multiple records when you install the DSS feature into the ESB. Also installing the feature will incur some performance overheads in the ESB. Therefore I thoroughly recommend you to use a separate DSS instance to get your work done. It also separates the concerns clearly which sounds good.
[1] http://ravindraranwala.blogspot.com/2015/09/wso2-esb-worker-manager-cluster-without.html

How do you retrieve cpu usage from Node in Kubernetes via API?

I want to calculate and show node specific cpu usage in percent in my own web application using Kubernetes API.
I need the same information as Kube UI and Cadvisor displays but I want to use the Kubernetes API.
I have found some cpu metrics under node-ip:10255/stats which contains timestamp, cpu usage: total, user and system in big weird numbers which I do not understand. Also the CPU-Limit is reported as 1024.
How does Kube UI calculate cpu usage and is it possible to do the same via the API?
If you use Kubernetes v1.2, there is a new, cleaner metrics summary API. From the release note:
Kubelet exposes a new Alpha metrics API - /stats/summary in a user friendly format with reduced system overhead.
You can access the endpoint through <node-ip>:10255/stats/summary and detailed API objects is here.
So the way CPU usage metrics are usually collected in Kubernetes is using cAdvisor https://github.com/google/cadvisor which looks at the cgroups to get metircs, so mostly CPU and some memory metrics. cAdvisor then can put its data into a metrics DB like heapster, influxDB or prometheus. Kubernetes does not directly deal with metrics, so therefore does not expose it through the API, however you can use the metrics DB instead. Additionally you can use an additional container in your pod to collect metrics and place that into your metrics DB. Additionally, you can get resource quotas through the API, but not usage. This proposal may be of some interest for you as well https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/proposals/metrics-plumbing.md.

ASP.NET: Location for storing files that should be shared between several web-applications

I have two web-applications. One is an outwards-facing application that will be accessible from the internet. The other is an application to manage the first, that will only be accessible from the intranet.
They keep their data in files on the filesystem (I think a database would be overkill for these applications).
The management-application should be able to write some files that the outwards-facing application can read (data-files that are used to supply responses to requests from the internet) and the outwards-facing application should be able to write a file that the management can read (log-file).
My question is: what is the best place to store these files?
Application Data/[Company Name]/[Product Name]?
An APP_DATA under one of the web-applications?
Somewhere else?
Some factors to consider are: What extra permissions do the solution need? Can the web-applications discover the location without needing to know where the other application has been installed?
Thanks in advance for any suggestions!
I know you said a database would be overkill, but a two-sided app with one side potentially giving access to internal systems, would be much more secure (though not entirely secure) if resources were stored in a DB. It just gives an extra layer. I think Internet users should be given the bare minimum of permission on the host file-system (via a web layer such as NETWORK SERVICE or not).
Otherwise, why not a "sandbox" path, on a physically separate device (that may be disconnected if needed, eg. suspicious activity) such as a USB hard disk?

Resources