Making a basic https-get request from a pipeline transform results in "connectionError".
How should one consume an API using the "requests" library to extend some data within a pipeline?
from transforms.api import Input, Output, transform_pandas
import requests
#transform_pandas(
Output("..."),
df=Input("..."),
)
def compute(df):
# Random example
response = requests.get('https://api.github.com')
print(response.content)
return df
results in
Is this a configuration issue?
Following #nicornk's comment and the docs on external transforms, using external APIs from within Palantir is restricted by default and requires several administrative steps. The steps to call external APIs from within pipeline transforms are:
Check the settings of your code repository
Add the library "transforms-external-systems" to your code repositories libraries.
Adding the library, adds a new icon in the control panel on the left.
Click on the new icon and import an egress policy, if any available. Otherwise try to create a "network egress policy", which usually involves an approval process.
After having managed to "import"/install an egress policy into your code repository, import methods from the package "transforms.external.systems" and decorate your transformation, following the docs.
Related
We have a microservice that needs to be integration tested (real calls, but no network communication with anything outside of the test namespace in kubernetes) in our pipeline. It also relies on an external gRPC server which we have no control over.
Above is a picture of what we'd like to have happen. The white box on the left is code that provides the Microservice Boundary with 'external' data. It then keeps calling the Code via REST until it gets back the proper number of records or it times out. The Code pulls records from an internal database, as well as data associated to those records from a gRPC call. Since we do not own the gRPC service, but are doing integration tests, we need a few pre-defined responses to the two gRPC services we call (blue box).
Since our integration tests are self-contained right now, and we don't want to write an entirely new actual gRPC server implementation just to mimick calls, is there a way to stand up a real gRPC server and configure it to return responses? The request is pretty much like a mock setup, except with an actual server.
We need to be able to:
give the server multiple proto files to interpret and have it expose those as endpoints. Proto files must be able to have different package names
using files we can store in source control, configure the responses to each call
able to run in a linux docker container (no windows)
I did find gripmock which seemed almost exactly what we need, but it only serves one proto file per container. It supposedly can serve more than one, but I can't get it to work and their example that serves two files implies each proto file must have the same package name which will likely never happen with our scenarios. In the meantime we are using it, but if we have 10 gRPC call dependencies, we now have to run 10 gripmock servers.
Wikipedia contains a list of API mocking tools. Looking at that list today there is a commercial tool that supports gRPC called Traffic Parrot which allows you to create gRPC mocks based on your Proto files. You can give it multiple proto files, store the mocks in Git and run the tool in Docker.
There are also open-source tools like GripMock but it does not generate stubs based on Proto files, you have to create them manually. Also, the project up to today was not keeping up to date with Proto and gRPC developments i.e. the package name issue you have discovered yourself above (works only if the package names in different proto files are the same). There are a few other open-source tools like grpc-wiremock, grpc-mock or bloomrpc-mock but they still lack widespread adoption and hence might be risky to adopt for an important enterprise project.
Keep in mind, the mock generated will be only a test double, it will not replicate the full behaviour of the system the Proto file corresponds to. If you wanted to also replicate partially the semantics of the messages consider doing a recording of the gRPC messages to create the mocks, that way you can see the sample data as well.
Take a look at this JS library which hopefully does what you need:
https://github.com/alenon/grpc-mock-server
Usage example:
private static readonly PROTO_PATH: string = __dirname + "example.proto";
private static readonly PKG_NAME: string = "com.alenon.example";
private static readonly SERVICE_NAME: string = "ExampleService";
...
const implementations = {
ex1: (call: any, callback: any) => {
const response: any =
new this.proto.ExampleResponse.constructor({msg: "the response message"});
callback(null, response);
},
};
this.server.addService(PROTO_PATH, PKG_NAME, SERVICE_NAME, implementations);
this.server.start();
I have created a brand new free tier project, cloned Puppeteer Firebase Functions demo repository and only changed the default project name in .firebaserc file.
When I run the simple test or version functions I get the correct result. When I open the .com/screenshot page without any parameter I get correct ("Please provide a URL...") response.
But when I try any url, i.e. .com/screenshot?url=https://en.wikipedia.org/wiki/Google I get Error: net::ERR_NAME_RESOLUTION_FAILED at https://en.wikipedia.org/wiki/Google thrown in response.
I tried looking for any name resolution errors related to Puppeteer but I could not find anything. Could this be a problem of using free tier?
The free Spark payment plan restricts all outgoing connections except those API endpoints that are fully controlled by Google. As a result, I expect that puppeteer would not be able to make any outgoing connections to external web sites.
In cloudera is there a way to update list of configurations at a time using CM-API or CURL?
Currently I am updating one by one one using below CM API.
services_api_instance.update_service_config()
How can we update all configurations stored in json/config file at a time.
The CM API endpoint you're looking for is PUT /cm/deployment. From the CM API documentation:
Apply the supplied deployment description to the system. This will create the clusters, services, hosts and other objects specified in the argument. This call does not allow for any merge conflicts. If an entity already exists in the system, this call will fail. You can request, however, that all entities in the system are deleted before instantiating the new ones.
This basically allows you to configure all your services with one call rather than doing them one at a time.
If you are using services that require a database (Hive, Hue, Oozie ...) then make sure you set them up before you call the API. It expects all the parameters you pass in to work so external dependencies must be resolved first.
I have a scheduled job (i'm using apscheduler.scheduler lib) that needs access to the plone site object, but I don't have the context in this case. I subscribed IProcessingStart event, but unfortunately getSite() function returns None.
Also, is there a programmatic way to obtain a specific Plone Site from Zope Server root?
Additional info:
I have a job like this:
from zope.site import hooks
sched = Scheduler()
#sched.cron_schedule(day_of_week="*", hour="9", minute="0")
def myjob():
site = hooks.getSite()
print site
print site.absolute_url()
catalogtool = getToolByName(site, "portal_catalog")
print catalogtool
The site variable is always None inside a APScheduler job. And we need informations about the site to run correctly the job.
We have avoided to execute using a public URL because an user could execute the job directly.
Build a context first with setSite(), and perhaps a request object:
from zope.app.component.hooks import setSite
from Testing.makerequest import makerequest
app = makerequest(app)
site = app[site_id]
setSite(site)
This does require that you open a ZODB connection yourself and traverse to the site object yourself.
However, it is not clear how you are accessing the Plone site from your scheduler. Instead of running a full new Zope process, consider calling a URL from your scheduling job. If you integrated APScheduler into your Zope process, you'd have to create a new ZODB connection in the job, traverse to the Plone site from the root, and use the above method to set up the site hooks (needed for a lot of local components anyway).
In the book 'ActionScript Developer’s Guide to Robotlegs' it is said that services should use parsers for transforming data. In which package should I put a parser for a service in com.stackoverflow.services.fooService?
There is no strict rule in this, what works well for you.
A possible setup could be:
besides:
com.stackoverflow.control
com.stackoverflow.event
com.stackoverflow.model
you could have:
com.stackoverflow.remote
with another split up according the different API's which are used (divided according to endpoints + type e.g. JSON/AMF/SOAP/...)
com.stackoverflow.remote.api -> own back-end
com.stackoverflow.remote.twitterapi
from there you can go further:
com.stackoverflow.remote.api.service OR com.stackoverflow.remote.api.resource
com.stackoverflow.remote.api.host -> endpoint setup, security,... to inject in the service class
com.stackoverflow.remote.api.parser -> data parsers
Additionally you could include a control package to include the commands to configure the api (setup events, hosts, injection for parsers,...) and how it integrates to the main context.