Set Max erros while uploading data from bucket using R - r

Im bigrquery to upload data from google bucket to bigquery and i like to set the maximum number of errors i will allow (default is 0 )
for that im using bq_perform_load
bq_job<-
bq_perform_load(bq_table(destination_project,destination_dataset,destination_table_temp),
file_name,
source_format ="CSV",
nskip="1",
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_TRUNCATE",
fields = bq_fields,
billing = destination_project)
as part of the documentation it looks like there is an ability to add additional arguments
...
Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase.
but im not sure how to use it
ive tried something like that
bq_perform_load(bq_table(destination_project,destination_dataset,destination_table_temp),
file_name,
source_format ="CSV",
nskip="1",
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_TRUNCATE",
fields = bq_fields,
maxBadRecords = "500",
billing = destination_project)
but it didn't work

Related

CosmosDB SDK python odata.error InvalidInput

I have a table in a cosmosDB where I would like to store some query coming in from a azure data explorer.
In data explorer (using python sdk) I run a query to filter and aggregate my data and I would like to push this query result into a cosmosDB.
I did set all the configurations and connection, but when I run the command upset_item or create_item
I get the following error:
azure.cosmos.exceptions.CosmosHttpResponseError: Status code: 400
{"odata.error":{"code":"InvalidInput","message":{"lang":"en-us","value":"Request url is invalid.\r\nActivityId: d8994113-504e-4198-9976-41316aaafb5f, documentdb-dotnet-sdk/2.11.0 Host/64-bit MicrosoftWindowsNT/6.2.9200.0\nRequestID:d8994113-504e-4198-9976-41316aaafb5f\n"}}}
this is how I configured the azure-cosmos:
url = "url"
key = {"masterkey": "my-key"}
client = CosmosClient(url,key)
database_name = "database"
container_name = "table"
database = client.get_database_client(database_name)
container = database.get_container_client(container_name)
for i in df:
container.upsert_item(i)
df is the result of my query from azure data explorer
the error I believe is due how I am passing the body of the query to the upsert_item
Any advice about this?
UPDATE:
so I tried to follow the sample. I updated my cosmosdb to a sql API.
my current code now looks like this:
url = "url"
key = {"masterkey": "my-key"}
client = CosmosClient(url,key)
database_name = "database"
container_name = "table"
database = client.get_database_client(database_name)
container = database.get_container_client(container_name)
data = json.dumps(test)
data_dict = json.loads(data)
for i in data_dict:
container.create_item(body=i)
I converted the data frame to a json. but when I run the for I get this error:
AttributeError: 'str' object has no attribute 'get'
And I have no idea what am I doing wrong
Thanks for your reply and I think we need more contact on debug.
Here's my code and it worked well.
from azure.cosmos import exceptions, CosmosClient, PartitionKey,ProxyConfiguration
import family
# Initialize the Cosmos client
endpoint = "https://cosmosdb_name.documents.azure.com:443/"
key = 'primary_key_in_keys_blade'
client = CosmosClient(endpoint, key)
database_name = 'AzureSampleFamilyDatabase'
database = client.create_database_if_not_exists(id=database_name)
# </create_database_if_not_exists>
# <create_container_if_not_exists>
container_name = 'FamilyContainer'
container = database.create_container_if_not_exists(
id=container_name,
partition_key=PartitionKey(path="/lastName"),
offer_throughput=400
)
family_items_to_create = [
{
"id":"112233",
"name":"andersen"
},
{
"id":"455646",
"name":"johnson"
},
{
"id":"999999",
"name":"smith"
}
]
# <create_item>
for family_item in family_items_to_create:
container.upsert_item(body=family_item)
Per your error, I think you need to check the data format of data_dict or you can use my sample data for testing to rule out other problems. In my position, I don't know why you'd like to use json.loads() here.

API call from Call of Duty API in R - authentication problem

I am trying to call the stats of a list of players from the call of duty API. This API requires firstly the login in website https://profile.callofduty.com/cod/login. Once logged in, the user can see the stats of a player using the call-of-duty API. For example, the stats of the streamer savyultras90 from Warzone can be seen through the following link: https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz.
If I log in from the website and try to see the stats of a player and the related json, I am able to do via browser. However, this doesn't seem straightforward in R.
I try to log in using the GET function from httr package as follows
respo <- GET('https://profile.callofduty.com/cod/login', authenticate('USER', 'PWD'))
But when I try to have access to the api and download the JSON file using the function fromJSON from the package jsonlite as follows
data <- fromJSON('https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz')
I get the error message "Not permitted: not authenticated".
How can I authenticate in one website and stay logged in to call from the API which relies on that authentication?
Seeing I've recently had to develop a PHP API for Warzone, I might be able to guide you in the right direction on how to handle this. But first a few remarks:
You need to authenticate each user individually with the appropriate platform if you want to request that player's data
There is a throttle limit on the amount of API requests
The API of Call of Duty is under strict usage guidelines and should only be used by registered partners. Making usage of the API could result in claims and eventually lawsuits: link
There is no public documentation of the API and the API has changed in the past, breaking several 3rd party tools.
Nevertheless, the process involves several steps as described below:
Register the device making the call
https://profile.callofduty.com/cod/mapp/registerDevice
with a json body in the form of {"deviceId":"INSERT_ID_HERE"}
This will return a response with the authHeader which we will use for as Token in the next calls
Login with Activision credentials
https://profile.callofduty.com/cod/mapp/login
Set the following headers:
Authorization: "INSERT_AUTHHEADER_HERE"
x_cod_device_id: "INSERT_PREVIOUSLY_USED_DEVICEID_HERE"
This in terms will generate a dataset where we will save the following data from:
rtkn, ACT_SSO_COOKIE and atkn.
Make the wanted API call for data
We have all the data now required to make the API call.
For each request we will submit 3 headers:
Authorization: "INSERT_AUTHHEADER_HERE"
x_cod_device_id: "INSERT_PREVIOUSLY_USED_DEVICEID_HERE"
Cookie: ACT_SSO_LOCALE=en_GB;country=GB;API_CSRF_TOKEN=**GENERATE_CSRF_TOKEN**;rtkn=**RTKN_HERE**;ACT_SSO_COOKIE=**ACT_SSO_COOKIE_HERE**;atkn=**ATKN_HERE**
For more reference, you can always look through a Python library or NodeJS library which succesfully implemented the API.
I struggled with this yesterday but finally made some progress. The issue is that you have to obtain an authentication token. The steps can be followed here: https://documenter.getpostman.com/view/7896975/SW7aXSo5#a37a2e5b-84bb-441d-b978-0fd8d42ffd29 but not available in R though.
My code works at first, as long as you don't authenticate again (still trying to figure out why). Basically what I did was to translate the steps in the link and extracted the content in the response from GET:
# Get token ---------------------------------------------------------------
resp <- GET('https://profile.callofduty.com/cod/login')
cookies = c(
'XSRF-TOKEN' = resp$cookies$value[1]
,'new_SiteId' = resp$cookies$value[2]
,'comid' = resp$cookies$value[3]
,'bm_sz' = resp$cookies$value[4]
,'_abck' = resp$cookies$value[5]
# ,'ACT_SSO_COOKIE' = resp$cookies$value[6]
# ,'ACT_SSO_COOKIE_EXPIRY' = resp$cookies$value[7]
# ,'atkn' = resp$cookies$value[8]
# ,'ACT_SSO_REMEMBER_ME' = resp$cookies$value[9]
# ,'ACT_SSO_EVENT' = resp$cookies$value[10]
# ,'pgacct' = resp$cookies$value[11]
# ,'CRM_BLOB' = resp$cookies$value[12]
# ,'tfa_enrollment_seen' = resp$cookies$value[13]
)
headers = c(
)
params = list(
`new_SiteId` = 'cod',
`username` = 'USER',
`password` = 'PWD',
`remember_me` = 'true',
`_csrf` = resp$cookies$value[1]
)
# Authenticate ------------------------------------------------------------
resp_post <- POST('https://profile.callofduty.com/do_login?new_SiteId=cod',
httr::add_headers(.headers=headers),
query = params,
httr::set_cookies(.cookies = cookies))
cookies = c(
'XSRF-TOKEN' = resp_post$cookies$value[1]
,'new_SiteId' = resp_post$cookies$value[2]
,'comid' = resp_post$cookies$value[3]
,'bm_sz' = resp_post$cookies$value[4]
,'_abck' = resp_post$cookies$value[5]
,'ACT_SSO_COOKIE' = resp_post$cookies$value[6]
,'ACT_SSO_COOKIE_EXPIRY' = resp_post$cookies$value[7]
,'atkn' = resp_post$cookies$value[8]
,'ACT_SSO_REMEMBER_ME' = resp_post$cookies$value[9]
,'ACT_SSO_EVENT' = resp_post$cookies$value[10]
,'pgacct' = resp_post$cookies$value[11]
,'CRM_BLOB' = resp_post$cookies$value[12]
,'tfa_enrollment_seen' = resp_post$cookies$value[13]
)
headers = c(
)
params = list(
`new_SiteId` = 'cod',
`username` = 'USER',
`password` = 'PWD',
`remember_me` = 'true',
`_csrf` = resp_post$cookies$value[1]
)
# Get data:
resp_psn <- httr::GET(url = 'https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz',
httr::add_headers(.headers=headers),
query = params,
httr::set_cookies(.cookies = cookies))
resp_psn_json <- content(resp_psn)
Let me know if you've already managed to resolve this!

Export data from Google Analytics API with rga in R

Let's assume that the google api link start like this:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%29A5366921&start-date...
And now I need to export data from, but for the code below I got errors:
ga$getData(29A5366921, batch = TRUE, '2017-12-01', '2017-12-10',
metrics = "ga:visits", dimensions = "ga:date",
sort = "", filters = "", segment = "",
start = 1, max = 1000)
One of the errors (others are pretty much the same, just for different parameter): unexpected ',' in " start = 1,"
I don't have ',' anywhere between '', commas are here just to separate the parameters, but this is the first time in my life I trying to use R and maybe there is some hidden rule.
Not sure what to type at the beginning of the code, instead ga$getData did I need to type 29A5366921$getData?
I'm using rga library.
Any help is welcomed. Thanks in advance.

Get ObjectID in mongolite R library

I can successfully retrieve data from my mongoDB instance but need to re-use the objectID for a depending query.
The following code seems to get my entire object but NOT the id. What am I missing?
# Perform a query and retrieve data
mongoOBj <- m$find('{"em": "test#test.com"}')
I realise this is an old question and OP has probably figured it out by now, but I think the answer should be
mongoOBj <- m$find(query = '{"em": "test#test.com"}', field = '{}')
instead of
mongoOBj <- m$find(query = '{"em": "test#test.com"}', field = '{"_id": 1}')
In the second case, the result will be a data frame containing ONLY the IDs. The first line will result in a data frame containing the queried data, including the IDs.
By default, field = '{"_id": 0}', meaning _id is not part of the output.
If you look at the documentation you see that the find method takes a field argument, where you specify the fields you want:
find(query = ’{}’, fields = ’{"_id" : 0}’, sort = ’{}’, skip = 0, limit = 0, handler = NULL, pagesize = NULL)
So in your case it will be something like
mongoOBj <- m$find(query = '{"em": "test#test.com"}', field = '{"_id": 1}')
FYI So the easiest way to get all fields is to do the query with field="{}". That will overwrite the default in the mongolite:: package find() arguments list.
It was driving me nuts for a little while too.

Losing some z3c relation data on restart

I have the following code which is meant to programmatically assign relation values to a custom content type.
publications = # some data
catalog = getToolByName(context, 'portal_catalog')
for pub in publications:
if pub['custom_id']:
results = catalog(custom_id=pub['custom_id'])
if len(results) == 1:
obj = results[0].getObject()
measures = []
for m in pub['measure']:
if m in context.objectIds():
m_id = intids.getId(context[m])
relation = RelationValue(m_id)
measures.append(relation)
obj.measures = measures
obj.reindexObject()
notify(ObjectModifiedEvent(obj))
Snippet of schema for custom content type
measures = RelationList(
title=_(u'Measure(s)'),
required=False,
value_type=RelationChoice(title=_(u'Measure'),
source=ObjPathSourceBinder(object_provides='foo.bar.interfaces.measure.IMeasure')),
)
When I run my script everything looks good. The problem is when my template for the custom content tries to call "pub/from_object/absolute_url" the value is blank - only after a restart. Interestingly, I can get other attributes of pub/from_object after a restart, just not it's URL.
from_object retrieves the referencing object from the relation catalog, but doesn't put the object back in its proper Acquisition chain. See http://docs.plone.org/external/plone.app.dexterity/docs/advanced/references.html#back-references for a way to do it that should work.

Resources