OpenStack SDK - How to create image with Kernel id and Ramdisk parameters? - openstack

I've been trying to create an OpenStack image informing the Kernel Id and Ramdisk Id, using the OpenStack Unified SDK (https://github.com/openstack/python-openstacksdk), but without success. I know this is possible, because the OpenStack CLI have this parameters, as shown on this page (http://docs.openstack.org/cli-reference/glance.html#glance-image-create), where the CLI have the "--kernel-id" and "--ramdisk-id" parameters. I've used this parameter in the terminal and confirmed they work, but I need to use them in python.
I'm trying to use the upload_method, as described here http://developer.openstack.org/sdks/python/openstacksdk/users/proxies/image.html#image-api-v2 but I can't get the attrs parameter right. Documentation only say it is suposed to be a dictionary. Here is the code I'm using
...
atrib = {
'properties': {
'kernel_id': 'd84e1f2b-8d8c-4a4a-8858-77a8d5a93cb1',
'ramdisk_id': 'cfef18e0-006e-477a-a098-593d43435a1e'
}
}
with open(file) as fimage:
image = image_service.upload_image(
name=name,
data=fimage,
disk_format='qcow2',
container_format='bare',
**atrib)
....
And here is the error I'm getting:
File "builder.py", line 121, in main
**atrib
File "/usr/lib/python2.7/site-packages/openstack/image/v2/_proxy.py", line 51, in upload_image
**attrs)
File "/usr/lib/python2.7/site-packages/openstack/proxy2.py", line 193, in _create
return res.create(self.session)
File "/usr/lib/python2.7/site-packages/openstack/resource2.py", line 570, in create
json=request.body, headers=request.headers)
File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 675, in post
return self.request(url, 'POST', **kwargs)
File "/usr/lib/python2.7/site-packages/openstack/session.py", line 52, in map_exceptions_wrapper
http_status=e.http_status, cause=e)
openstack.exceptions.HttpException: HttpException: Bad Request, 400 Bad Request
Provided object does not match schema 'image': {u'kernel_id': u'd84e1f2b-8d8c-4a4a-8858-77a8d5a93cb1', u'ramdisk_id': u'cfef18e0-006e-477a-a098-593d43435a1e'} is not of type 'string' Failed validating 'type' in schema['additionalProperties']: {'type': 'string'} On instance[u'properties']: {u'kernel_id': u'd84e1f2b-8d8c-4a4a-8858-77a8d5a93cb1', u'ramdisk_id': u'cfef18e0-006e-477a-a098-593d43435a1e'}
Already tried to use the update_image method, but without success, passing kernel id and ramdisk id as a strings creates the instance, but it does not boot.
Does anyone knows how to solve this?

what the version of the glance api you use?
I have read the code in openstackclient/image/v1/images.py, openstackclient/v1/shell.py
## in shell.py
def do_image_create(gc, args):
...
fields = dict(filter(lambda x: x[1] is not None, vars(args).items()))
raw_properties = fields.pop('property')
fields['properties'] = {}
for datum in raw_properties:
key, value = datum.split('=', 1)
fields['properties'][key] = value
...
image = gc.images.create(**fields)
## in images.py
def create(self, **kwargs):
...
for field in kwargs:
if field in CREATE_PARAMS:
fields[field] = kwargs[field]
elif field == 'return_req_id':
continue
else:
msg = 'create() got an unexpected keyword argument \'%s\''
raise TypeError(msg % field)
hdrs = self._image_meta_to_headers(fields)
...
resp, body = self.client.post('/v1/images',
headers=hdrs,
data=image_data)
...
and openstackclient/v2/shell.py,openstackclient/image/v2.images.py(and i have debuged this too)
## in shell.py
def do_image_create(gc, args):
...
raw_properties = fields.pop('property', [])
for datum in raw_properties:
key, value = datum.split('=', 1)
fields[key] = value
...
image = gc.images.create(**fields)
##in images.py
def create(self, **kwargs):
"""Create an image.""
url = '/v2/images'
image = self.model()
for (key, value) in kwargs.items():
try:
setattr(image, key, value)
except warlock.InvalidOperation as e:
raise TypeError(utils.exception_to_str(e))
resp, body = self.http_client.post(url, data=image)
...
it seems that, you can create a image use your way in version 1.0, but in version 2.0, you should use the kernel_id and ramdisk_id as below
atrib = {
'kernel_id': 'd84e1f2b-8d8c-4a4a-8858-77a8d5a93cb1',
'ramdisk_id': 'cfef18e0-006e-477a-a098-593d43435a1e'
}
but the OpenStack SDK seems it can't trans those two argments to the url (because there is no Body define in openstack/image/v2/image.py. so you should modify the OpenStack SDK to support this.
BTW, the code of OpenStack is a little different from it's version, but many things are same.

Related

Why is the YouTube API v3 inconsistent with the amount of comments it lets you download before an error 400?

I am downloading YouTube comments with a python script that uses API keys and the YouTube Data API V3, but sooner or later I run into the following error:
{'error': {'code': 400, 'message': "The API server failed to successfully process the request. While this can be a transient error, it usually indicates that the request's input is invalid. Check the structure of the commentThread resource in the request body to ensure that it is valid.", 'errors': [{'message': "The API server failed to successfully process the request. While this can be a transient error, it usually indicates that the request's input is invalid. Check the structure of the commentThread resource in the request body to ensure that it is valid.", 'domain': 'youtube.commentThread', 'reason': 'processingFailure', 'location': 'body', 'locationType': 'other'}]}}
I am using the following code:
import argparse
import requests
import json
import time
start_time = time.time()
class YouTubeApi():
YOUTUBE_COMMENTS_URL = 'https://www.googleapis.com/youtube/v3/commentThreads'
comment_counter = 0
def is_error_response(self, response):
error = response.get('error')
if error is None:
return False
print("API Error: "
f"code={error['code']} "
f"domain={error['errors'][0]['domain']} "
f"reason={error['errors'][0]['reason']} "
f"message={error['errors'][0]['message']!r}")
print(self.comment_counter)
return True
def format_comments(self, results, likes_required):
comments_list = []
try:
for item in results["items"]:
comment = item["snippet"]["topLevelComment"]
likes = comment["snippet"]["likeCount"]
if likes < likes_required:
continue
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
str = "Comment by {}:\n \"{}\"\n\n".format(author, text)
str = str.encode('ascii', 'replace').decode()
comments_list.append(str)
self.comment_counter += 1
print("Comments downloaded:", self.comment_counter, end="\r")
except(KeyError):
print(results)
return comments_list
def get_video_comments(self, video_id, likes_required):
with open("API_keys.txt", "r") as f:
key_list = f.readlines()
comments_list = []
key_list = [key.strip('/n') for key in key_list]
params = {
'part': 'snippet,replies',
'maxResults': 100,
'videoId': video_id,
'textFormat': 'plainText',
'key': key_list[0]
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
comments_data = requests.get(self.YOUTUBE_COMMENTS_URL, params=params, headers=headers)
results = comments_data.json()
if self.is_error_response(results):
return []
nextPageToken = results.get("nextPageToken")
comments_list = []
comments_list += self.format_comments(results, likes_required)
while nextPageToken:
params.update({'pageToken': nextPageToken})
if self.comment_counter <= 900000:
params.update({'key': key_list[0]})
elif self.comment_counter <= 1800000:
params.update({'key': key_list[1]})
elif self.comment_counter <= 2700000:
params.update({'key': key_list[2]})
elif self.comment_counter <= 3600000:
params.update({'key': key_list[3]})
elif self.comment_counter <= 4500000:
params.update({'key': key_list[4]})
else:
params.update({'key': key_list[5]})
if self.comment_counter % 900001 == 0:
print(params["key"])
comments_data = requests.get(self.YOUTUBE_COMMENTS_URL, params=params, headers=headers)
results = comments_data.json()
if self.is_error_response(results):
return comments_list
nextPageToken = results.get("nextPageToken")
comments_list += self.format_comments(results, likes_required)
return comments_list
def get_video_id_list(self, filename):
try:
with open(filename, 'r') as file:
URL_list = file.readlines()
except FileNotFoundError:
exit("File \"" + filename + "\" not found")
list = []
for url in URL_list:
if url == "\n": # ignore empty lines
continue
if url[-1] == '\n': # delete '\n' at the end of line
url = url[:-1]
if url.find('='): # get id
id = url[url.find('=') + 1:]
list.append(id)
else:
print("Wrong URL")
return list
def main():
yt = YouTubeApi()
parser = argparse.ArgumentParser(add_help=False, description=("Download youtube comments from many videos into txt file"))
required = parser.add_argument_group("required arguments")
optional = parser.add_argument_group("optional arguments")
optional.add_argument("--likes", '-l', help="The amount of likes a comment needs to be saved", type=int)
optional.add_argument("--input", '-i', help="URL list file name")
optional.add_argument("--output", '-o', help="Output file name")
optional.add_argument("--help", '-h', help="Help", action='help')
args = parser.parse_args()
# --------------------------------------------------------------------- #
likes = 0
if args.likes:
likes = args.likes
input_file = "URL_list.txt"
if args.input:
input_file = args.input
output_file = "Comments.txt"
if args.output:
output_file = args.output
list = yt.get_video_id_list(input_file)
if not list:
exit("No URLs in input file")
try:
vid_counter = 0
with open(output_file, "a") as f:
for video_id in list:
vid_counter += 1
print("Downloading comments for video ", vid_counter, ", id: ", video_id, sep='')
comments = yt.get_video_comments(video_id, likes)
if comments:
for comment in comments:
f.write(comment)
print('\nDone!')
except KeyboardInterrupt:
exit("User Aborted the Operation")
# --------------------------------------------------------------------- #
if __name__ == '__main__':
main()
In another thread, it was discovered that google does not currently permit downloading all the comments on a popular video, however you would expect it to cut off at the same point. Instead, I have found that it can range anywhere betweek 1.5 million to 200k comments downloaded before it returns a code 400. Is this to do with a bug in my code, or is the YouTube API rejecting my request as it is clear that is a script? Would adding a time.sleep clause help with this?
(I bring forward this answer -- that I prepared to the question above at the time of its initial post -- because my assertions below seems to be confirmed once again by recent SO posts of this very kind.)
Your observations are correct. But, unfortunately, nobody but Google itself is able to provide a sound and complete answer to your question. Us -- non-Googlers (as myself!), or even the Googlers themselves (since they all sign NDAs) -- can only guess about the things implied.
Here is my educated guess, based on the investigations I made recently when responding to a very much related question (which you quoted above, yourself!):
As you already know, the API uses pagination for to return to callers sets of items of which cardinality exceed the internal limit of 50, or, by case, 100 items to be returned by each and every API endpoint invocation that provides result sets.
If you'll log the nextPageToken property that you obtain from CommentThreads.list via your object results, you'll see that those page tokens get bigger and bigger. Each and every such page token has to be passed on to the next CommentThreads.list call as the parameter pageToken.
The problem is that internally (not specified publicly, not documented) the API has a limit on the sheer length of the HTTP requests it accepts from its callers. (This happens for various reasons; e.g. security.) Therefore, when a given page token is sufficiently long, the HTTP request that the API user issues will exceed that internal limit, producing an internal error. That error surfaces to the API caller as the processingFailure error that you've encountered.
Many questions remain to be answered (e.g. why is that the page tokens have unbounded length?), but, again, those questions belong very much to the internal realm of the back-end system that's behind the API we're using. And those questions cannot be answered publicly, since are very much Google's internal business.

Openstack Not able to connect to using rest api

I have a local install on openstack on my virtual box. I am trying to use the lib cloud api to connnect and get a list of images,flavours etc.
Below is the code that I am trying to execute
from libcloud.compute.types import Provider
from libcloud.compute.providers import get_driver
# Authentication information so you can authenticate to DreamCompute
# copy the details from the OpenStack RC file
# https://dashboard.dreamcompute.com/project/access_and_security/api_access/openrc/
auth_username = 'admin'
auth_password = 'f882e2f4eaad434c'
TENANT_NAME = 'admin'
project_name = 'admin'
auth_url = 'http://192.168.56.101:5000/v3/tokens'
region_name = 'RegionOne'
provider = get_driver(Provider.OPENSTACK)
conn = provider(auth_username,
auth_password,
ex_force_auth_url=auth_url,
ex_force_auth_version='2.0_password',
ex_tenant_name=project_name,
ex_force_service_type='compute',
ex_force_service_name='compute',
ex_force_base_url='http://192.168.56.101:8774/v2.1/29a8949bc3a04bfead0654be8e552017')
# Get the image that we want to use by its id
# NOTE: the image_id may change. See the documentation to find
# all the images available to your user
image_id = '4525d442-e9f4-4d19-889f-49ab03be93df'
image = conn.get_image(image_id)
# Get the flavor that we want to use by its id
flavor_id = '100'
flavor = conn.ex_get_size(flavor_id)
# Create the node with the name “PracticeInstance”
# and the size and image we chose above
instance = conn.create_node(name='PracticeInstance', image=image, size=flavor)
When I run the above code I am getting below error:
C:\Python dev\website\music\openstack>python openstack.py
Traceback (most recent call last):
File "openstack.py", line 30, in <module>
image = conn.get_image(image_id)
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\compute\drivers\openstack.py", line 2028, in get_image
'/images/%s' % (image_id,)).object['image'])
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack.py", line 223, in request
raw=raw)
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\base.py", line 536, in request
action = self.morph_action_hook(action)
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack.py", line 290, in morph_action_hook
self._populate_hosts_and_request_paths()
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack.py", line 324, in _populate_hosts_and_request_paths
osa = osa.authenticate(**kwargs) # may throw InvalidCreds
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack_identity.py", line 855, in authenticate
return self._authenticate_2_0_with_password()
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack_identity.py", line 880, in _authenticate_2_0_with_password
return self._authenticate_2_0_with_body(reqbody)
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\openstack_identity.py", line 885, in _authenticate_2_0_with_body
method='POST')
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\base.py", line 637, in request
response = responseCls(**kwargs)
File "C:\Users\C5265680\AppData\Local\Programs\Python\Python36\lib\site-packages\libcloud\common\base.py", line 157, in __init__
message=self.parse_error())
libcloud.common.exceptions.BaseHTTPError: {"error": {"message": "The resource could not be found.", "code": 404, "title": "Not Found"}}
I have checked the logs in my server at /var/log/keystone and it does not give any error so I am guessing that I am able to login.
Also there are many examples which show that after the above steps I should be able to get list of images/flavors/servers.
Not sure why i am not able to connect. Can someone please help me with this.
I think you should modify the auth_url and provider arguments.
As following argument set is worked at my environment.
auth_username = 'admin'
auth_password = 'f882e2f4eaad434c'
TENANT_NAME = 'admin'
project_name = 'admin'
auth_url = 'http://192.168.56.101:5000'
region_name = 'RegionOne'
provider = get_driver(Provider.OPENSTACK)
conn = provider(auth_username,
auth_password,
ex_force_auth_url=auth_url,
ex_force_auth_version='2.0_password',
ex_tenant_name=project_name)
Updated 2017.11.16
#user8040338 Your error message is the first clue,
The resource could not be found. and status code 404, the most cause of its message and status code is wrong rest api url format.
Firstly, you need to check keystone v2.0 rest api format.
At the same time, you check again Libcloud reference.
For instance, the argument ex_force_auth_version had been specified the api version 2.0_password (v2), but auth_url variable formed the url including the version resource /v3, it was wrong version and usage of API with the libcloud API argument you specified. auth_url should be a base URL from API usage.
The similar processes about each argument of API should be conducted repeatedly until solving issues.

Ejabberd: error in simple module to handle offline messages

I have an Ejabberd 17.01 installation where I need to push a notification in case a recipient is offline. This seems the be a common task and solutions using a customized Ejabberd module can be found everywhere. However, I just don't get it running. First, here's me script:
-module(mod_offline_push).
-behaviour(gen_mod).
-export([start/2, stop/1]).
-export([push_message/3]).
-include("ejabberd.hrl").
-include("logger.hrl").
-include("jlib.hrl").
start(Host, _Opts) ->
?INFO_MSG("mod_offline_push loading", []),
ejabberd_hooks:add(offline_message_hook, Host, ?MODULE, push_message, 10),
ok.
stop(Host) ->
?INFO_MSG("mod_offline_push stopping", []),
ejabberd_hooks:add(offline_message_hook, Host, ?MODULE, push_message, 10),
ok.
push_message(From, To, Packet) ->
?INFO_MSG("mod_offline_push -> push_message", [To]),
Type = fxml:get_tag_attr_s(<<"type">>, Packet), % Supposedly since 16.04
%Type = xml:get_tag_attr_s(<<"type">>, Packet), % Supposedly since 13.XX
%Type = xml:get_tag_attr_s("type", Packet),
%Type = xml:get_tag_attr_s(list_to_binary("type"), Packet),
?INFO_MSG("mod_offline_push -> push_message", []),
ok.
The problem is the line Type = ... line in method push_message; without that line the last info message is logged (so the hook definitely works). When browsing online, I can find all kinds of function calls to extract elements from Packet. As far as I understand it changed over time with new releases. But it's not good, all variants lead in some kind of error. The current way returns:
2017-01-25 20:38:08.701 [error] <0.21678.0>#ejabberd_hooks:run1:332 {function_clause,[{fxml,get_tag_attr_s,[<<"type">>,{message,<<>>,normal,<<>>,{jid,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>},{jid,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>},[],[{text,<<>>,<<"sfsdfsdf">>}],undefined,[],#{}}],[{file,"src/fxml.erl"},{line,169}]},{mod_offline_push,push_message,3,[{file,"mod_offline_push.erl"},{line,33}]},{ejabberd_hooks,safe_apply,3,[{file,"src/ejabberd_hooks.erl"},{line,382}]},{ejabberd_hooks,run1,3,[{file,"src/ejabberd_hooks.erl"},{line,329}]},{ejabberd_sm,route,3,[{file,"src/ejabberd_sm.erl"},{line,126}]},{ejabberd_local,route,3,[{file,"src/ejabberd_local.erl"},{line,110}]},{ejabberd_router,route,3,[{file,"src/ejabberd_router.erl"},{line,87}]},{ejabberd_c2s,check_privacy_route,5,[{file,"src/ejabberd_c2s.erl"},{line,1886}]}]}
running hook: {offline_message_hook,[{jid,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>},{jid,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>},{message,<<>>,normal,<<>>,{jid,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>},{jid,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>},[],[{text,<<>>,<<"sfsdfsdf">>}],undefined,[],#{}}]}
I'm new Ejabberd and Erlang, so I cannot really interpret the error, but the Line 33 as mentioned in {mod_offline_push,push_message,3,[{file,"mod_offline_push.erl"}, {line,33}]} is definitely the line calling get_tag_attr_s.
UPDATE 2017/01/27: Since this cost me a lot of headache -- and I'm still not perfectly happy -- I post here my current working module in the hopes it might help others. My setup is Ejabberd 17.01 running on Ubuntu 16.04. Most stuff I tried and failed with seem to for older versions of Ejabberd:
-module(mod_fcm_fork).
-behaviour(gen_mod).
%% public methods for this module
-export([start/2, stop/1]).
-export([push_notification/3]).
%% included for writing to ejabberd log file
-include("ejabberd.hrl").
-include("logger.hrl").
-include("xmpp_codec.hrl").
%% Copied this record definition from jlib.hrl
%% Including "xmpp_codec.hrl" and "jlib.hrl" resulted in errors ("XYZ already defined")
-record(jid, {user = <<"">> :: binary(),
server = <<"">> :: binary(),
resource = <<"">> :: binary(),
luser = <<"">> :: binary(),
lserver = <<"">> :: binary(),
lresource = <<"">> :: binary()}).
start(Host, _Opts) ->
?INFO_MSG("mod_fcm_fork loading", []),
% Providing the most basic API to the clients and servers that are part of the Inets application
inets:start(),
% Add hook to handle message to user who are offline
ejabberd_hooks:add(offline_message_hook, Host, ?MODULE, push_notification, 10),
ok.
stop(Host) ->
?INFO_MSG("mod_fcm_fork stopping", []),
ejabberd_hooks:add(offline_message_hook, Host, ?MODULE, push_notification, 10),
ok.
push_notification(From, To, Packet) ->
% Generate JID of sender and receiver
FromJid = lists:concat([binary_to_list(From#jid.user), "#", binary_to_list(From#jid.server), "/", binary_to_list(From#jid.resource)]),
ToJid = lists:concat([binary_to_list(To#jid.user), "#", binary_to_list(To#jid.server), "/", binary_to_list(To#jid.resource)]),
% Get message body
MessageBody = Packet#message.body,
% Check of MessageBody is not empty
case MessageBody/=[] of
true ->
% Get first element (no idea when this list can have more elements)
[First | _ ] = MessageBody,
% Get message data and convert to string
MessageBodyText = binary_to_list(First#text.data),
send_post_request(FromJid, ToJid, MessageBodyText);
false ->
?INFO_MSG("mod_fcm_fork -> push_notification: MessageBody is empty",[])
end,
ok.
send_post_request(FromJid, ToJid, MessageBodyText) ->
%?INFO_MSG("mod_fcm_fork -> send_post_request -> MessageBodyText = ~p", [Demo]),
Method = post,
PostURL = gen_mod:get_module_opt(global, ?MODULE, post_url,fun(X) -> X end, all),
% Add data as query string. Not nice, query body would be preferable
% Problem: message body itself can be in a JSON string, and I couldn't figure out the correct encoding.
URL = lists:concat([binary_to_list(PostURL), "?", "fromjid=", FromJid,"&tojid=", ToJid,"&body=", edoc_lib:escape_uri(MessageBodyText)]),
Header = [],
ContentType = "application/json",
Body = [],
?INFO_MSG("mod_fcm_fork -> send_post_request -> URL = ~p", [URL]),
% ADD SSL CONFIG BELOW!
%HTTPOptions = [{ssl,[{versions, ['tlsv1.2']}]}],
HTTPOptions = [],
Options = [],
httpc:request(Method, {URL, Header, ContentType, Body}, HTTPOptions, Options),
ok.
Actually it fails with second arg Packet you pass to fxml:get_tag_attr_s in push_message function
{message,<<>>,normal,<<>>,
{jid,<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>,
<<"homer">>,<<"xxx.xxx.xxx.xxx">>,<<"conference">>},
{jid,<<"carl">>,<<"xxx.xxx.xxx.xxx">>,<<>>,<<"carl">>,
<<"xxx.xxx.xxx.xxx">>,<<>>},
[],
[{text,<<>>,<<"sfsdfsdf">>}],
undefined,[],#{}}
because it is not xmlel
Looks like it is record "message" defined in tools/xmpp_codec.hrl
with <<>> id and type 'normal'
xmpp_codec.hrl
-record(message, {id :: binary(),
type = normal :: 'chat' | 'error' | 'groupchat' | 'headline' | 'normal',
lang :: binary(),
from :: any(),
to :: any(),
subject = [] :: [#text{}],
body = [] :: [#text{}],
thread :: binary(),
error :: #error{},
sub_els = [] :: [any()]}).
Include this file and use just
Type = Packet#message.type
or, if you expect binary value
Type = erlang:atom_to_binary(Packet#message.type, utf8)
The newest way to do that seems to be with xmpp:get_type/1:
Type = xmpp:get_type(Packet),
It returns an atom, in this case normal.

scrapy InitSpider: set Rules in __init__?

I am building a recursive webspider with an optional login. I want to make most settings dynamic via a json config file.
In my __init__ function, I am reading this file and try to populate all variables, however, this does not work with Rules.
class CrawlpySpider(InitSpider):
...
#----------------------------------------------------------------------
def __init__(self, *args, **kwargs):
"""Constructor: overwrite parent __init__ function"""
# Call parent init
super(CrawlpySpider, self).__init__(*args, **kwargs)
# Get command line arg provided configuration param
config_file = kwargs.get('config')
# Validate configuration file parameter
if not config_file:
logging.error('Missing argument "-a config"')
logging.error('Usage: scrapy crawl crawlpy -a config=/path/to/config.json')
self.abort = True
# Check if it is actually a file
elif not os.path.isfile(config_file):
logging.error('Specified config file does not exist')
logging.error('Not found in: "' + config_file + '"')
self.abort = True
# All good, read config
else:
# Load json config
fpointer = open(config_file)
data = fpointer.read()
fpointer.close()
# convert JSON to dict
config = json.loads(data)
# config['rules'] is simply a string array which looks like this:
# config['rules'] = [
# 'password',
# 'reset',
# 'delete',
# 'disable',
# 'drop',
# 'logout',
# ]
CrawlpySpider.rules = (
Rule(
LinkExtractor(
allow_domains=(self.allowed_domains),
unique=True,
deny=tuple(config['rules'])
),
callback='parse',
follow=False
),
)
Scrapy still crawls the pages that are present in config['rules'] and therefore also hits the logout page. So the specified pages are not being denied. What am I missing here?
Update:
I have already tried by setting CrawlpySpider.rules = ... as well as self.rules = ... inside __init__. Both variants do not work.
Spider: InitSpider
Rules: LinkExtractor
Before crawl: Doing login prior crawling
I even try to deny that in my parse function
# Dive deeper?
# The nesting depth is now handled via a custom middle-ware (middlewares.py)
#if curr_depth < self.max_depth or self.max_depth == 0:
links = LinkExtractor().extract_links(response)
for link in links:
for ignore in self.ignores:
if (ignore not in link.url) and (ignore.lower() not in link.url.lower()) and link.url.find(ignore) == -1:
yield Request(link.url, meta={'depth': curr_depth+1, 'referer': response.url})
You are setting a class attribute where you want to set an instance attribute:
# this:
CrawlpySpider.rules = (
# should be this:
self.rules = (
<...>

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Resources