Join elements of a list into a path - filepath

I've got a list and I need to join th elements to form a path. os.join.path does not seem to work. The list is obtianed as:
file_path.split("\\")[:-1]
this returns:
['L:', 'JM6', 'jm6', 'test', 'turb', 'results', 'v6.2', 'examples']
Using:
print(os.path.join(file_path.split("\\")[:-1]))
returns exactly the same list without joining it into a path:
['L:', 'JM6', 'jm6', 'test', 'turb', 'results', 'v6.2', 'examples']
Using:
print(os.path.join(os.path.sep, file_path.split("\\")[:-1]))
returns the error:
print(os.path.join(os.path.sep, file_path.split("\\")[:-1]))
File "C:\Python\lib\ntpath.py", line 73, in join
elif isabs(b):
File "C:\Python\lib\ntpath.py", line 58, in isabs
return s != '' and s[:1] in '/\\'
TypeError: 'in <string>' requires string as left operand, not list
Thanks

os.path.join() doesn't take a list as argument, it takes several arguments.
using * (the 'splat' operator) should work:
list=['L:', 'JM6', 'jm6', 'test', 'turb', 'results', 'v6.2', 'examples']
os.path.join(*list)

Related

r json mongodb query $in operator syntax error due to double quotes?

I'm building a json query to pass to a mongodb database in R.
In one scenario, I have a vector of dates and I want to query the database to return all records which have a date in the relevant field that matches a date in my vector of dates.
The second scenario is the same as the first, but this time I have a vector of character strings (IDs) and need to return all the records with matching IDs.
I understood the correct way to do this in a json query is to use the $in operator, and then put my vector in an array.
However, when I pass the query to my mongodb database, the exportLogId returns NULL. I'm quite sure that the problem is something to do with how I am representing the $in operator in the final query, since I have very similarly structured queries without the $in operator and they are all working. If I look for just one of my target dates or character strings, I get the desired result.
I followed the mongodb manual here to construct my query, and the only issue I can see is that the $in operator in the output of jsonlite::toJSON() is enclosed in double quotes; whereas I think it might need to be in single quotes (or no quotes at all, but I don't know how to write the syntax for that).
I'm creating my query in two steps:
Create the query as a series of nested lists
Convert the list object to json with jsonlite::toJSON()
Here is my code:
# Load libraries:
library(jsonlite)
# Create list of example dates to query in mongodb format:
sampledates <- c("2022-08-11T00:00:00.000Z",
"2022-08-15T00:00:00.000Z",
"2022-08-16T00:00:00.000Z",
"2022-08-17T00:00:00.000Z",
"2022-08-19T00:00:00.000Z")
# Create query as a list object:
query_list_l <- list(filter =
# Add where clause:
list(where =
# Filter results by list of sample dates:
list(dateSampleTaken = list('$in' = sampledates),
# Define format of column names and values:
useDbColumns = "true",
dontTranslateValues = "true",
jsonReplaceUndefinedWithNull = "true"),
# Define columns to return:
fields = c("id",
"updatedAt",
"person.visualId",
"labName",
"sampleIdentifier",
"dateSampleTaken",
"sequence.hasSequence")))
# Convert list object to JSON:
query_json = jsonlite::toJSON(x = query_list_l,
pretty = TRUE,
auto_unbox = TRUE)
The JSON query now looks like this:
> query_json
{
"filter": {
"where": {
"dateSampleTaken": {
"$in": ["2022-08-11T00:00:00.000Z", "2022-08-15T00:00:00.000Z", "2022-08-16T00:00:00.000Z", "2022-08-17T00:00:00.000Z", "2022-08-19T00:00:00.000Z"]
},
"useDbColumns": "true",
"dontTranslateValues": "true",
"jsonReplaceUndefinedWithNull": "true"
},
"fields": ["id", "updatedAt", "person.visualId", "labName", "sampleIdentifier", "dateSampleTaken", "sequence.hasSequence"]
}
}
As you can see, $in is now enclosed in double quotes, even though I put it in single quotes when I created the query as a list object. I have tried replacing with sprintf() but that just adds a lot of backslashes to my query. I also tried:
query_fixed <- gsub(pattern = "\\"\\$\\in\\"",
replacement = "\\'$in\\'",
x = query_json)
... but this fails with an error.
I would be very grateful to know if:
The syntax problem that is preventing $in from working is actually the double quotes?
If double quotes is the problem, how do I replace them with single quotes without messing up the JSON format?
UPDATE:
The issue seems to occur when R is passing the query to the database, but I still can't work out exactly why.
If I try the query out in loopback explorer in the database, it works and using the export log ID produced, I can then fetch the results with httr::GET() in R. Example query results are shown below (sorry for the hashes - the main point is you can see the format of the returned values):
[1] "[{\"_id\":\"e59953b6-a106-4b69-9e25-1c54eef5264a\",\"updatedAt\":\"2022-09-12T20:08:39.554Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0044-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0002\"}},{\"_id\":\"af5cd9cc-4813-4194-b60b-7d130bae47bc\",\"updatedAt\":\"2022-09-12T20:11:07.467Z\",\"dateSampleTaken\":\"2022-08-17T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0061-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0003\"}},{\"_id\":\"b5930079-8d57-43a8-85c0-c95f7e0338d9\",\"updatedAt\":\"2022-09-12T20:13:54.378Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0043-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0004\"}}]"

Using Element and Split Gets First Item Rather than Last Item in Terraform

We're trying to apply a dynamic name to a firewall rule for opening 8089 and 8843 in GCP using terraform based on the list of instance group urls. Instead of taking that result and giving us the last item in the url, it gives us https:
tf:
#This is to resolve an error when deploying to nginx
resource "google_compute_firewall" "ingress" {
for_each = toset(google_container_cluster.standard-cluster.instance_group_urls)
description = "Allow traffic on ports 8843, 8089 for nginx ingress"
direction = "INGRESS"
name = element(split("/", each.key), length(each.key))
network = "https://www.googleapis.com/compute/v1/projects/${local.ws_vars["project-id"]}/global/networks/${local.ws_vars["environment"]}"
priority = 1000
source_ranges = google_container_cluster.standard-cluster.private_cluster_config.*.master_ipv4_cidr_block
target_tags = [
element(split("/", each.key), length(each.key))
]
allow {
ports = [
"8089",
]
protocol = "tcp"
}
allow {
ports = [
"8443",
]
protocol = "tcp"
}
}
Result:
Error: "name" ("https:") doesn't match regexp "^(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)$"
on main.tf line 133, in resource "google_compute_firewall" "ingress":
133: name = element(split("/", each.key), length(each.key))
What is the solution here? Why is it not giving the last item in the array? Is there a better way?
Like with many languages, Terraform/HCL uses zero based indexing so if you want the last element in an array you need to subtract one from the length like this:
locals {
list = ["foo", "bar", "baz"]
}
output "last_element" {
value = element(local.list, length(local.list) - 1)
}
The element function is causing this confusion because instead of getting an out of bounds/range error when you attempt to access beyond the length of the list it wraps around and so you are getting the first element:
The index is zero-based. This function produces an error if used with
an empty list. The index must be a non-negative integer.
Use the built-in index syntax list[index] in most cases. Use this
function only for the special additional "wrap-around" behavior
described below.
To get the last element from the list use length to find the size of
the list (minus 1 as the list is zero-based) and then pick the last
element:
> element(["a", "b", "c"], length(["a", "b", "c"])-1)
c
Unfortunately, at the time of writing, Terraform doesn't currently support negative indexes in the built-in index syntax:
locals {
list = ["foo", "bar", "baz"]
}
output "last_element" {
value = local.list[-1]
}
throws the following error:
Error: Invalid index
on main.tf line 6, in output "last_element":
6: value = local.list[-1]
|----------------
| local.list is tuple with 3 elements
The given key does not identify an element in this collection value.
As suggested in the comments, a better approach here would be to first reverse the list and then take the first element from the reversed list using the reverse function:
output "last_element" {
value = reverse(local.list)[0]
}

Build query with aggregate functions in HAVING clause

I am trying to figure out how to have aggregate functions in the having clause with CakePHP's query builder.
Background: the intent is to correct all rows in a table with compound primary-keys (page-ID and URL) such that each page-ID-group has only one default video. There are some groups with no, and some groups with more than one "default" row, which needs to be corrected. I've figured out all the steps – except for this detail.
This is the query that I'm trying to build.
SELECT
video_page_id, video_url
FROM page_video
WHERE
video_page_id IN (
SELECT video_page_id
FROM page_video
GROUP BY video_page_id
HAVING SUM(video_is_default) < 1
)
AND video_order = 0
;
And this is what I have built:
// sub-select: all groups that have too few defaults.
// Returns list of page-IDs.
$qb = $this->getQueryBuilder();
$group_selection = $qb
->select(array(
'video_page_id',
))
->from('page_video')
->group('video_page_id')
->having(array(
'1 >' => $qb->func()->sum('video_is_default'),
))
;
// sub-select: compound-primary-key identifiers of all rows where
// `video_is_default` has to be modified from `0` to `1`.
// Returns list of two columns.
$qb = $this->getQueryBuilder();
$modifiable_selection = $qb
->select(array(
'video_page_id',
'video_url',
))
->from('page_video')
->where(array(
'video_page_id IN' => $group_selection,
'video_order = 0',
))
;
But then I get this exception: Column not found: 1054 Unknown column '1' in 'having clause'
The crux is the HAVING clause. I basically don't know how to combine the aggregate function with the attribute-value properties of an array. Usually, in order to craft lower/greater-than clauses, you write it like this: array('col1 >' => $value). But here, I needed to flip the equation because the complex expression can't fit into an array key. And now the 1 gets interpreted as a column name.
Writing it as a concatenated string doesn't seem to help either.
array(
$qb->func()->sum('video_is_default') .' > 1',
)
Exception: PHP Recoverable fatal error: Object of class Cake\Database\Expression\FunctionExpression could not be converted to string
I know I could do …
SELECT (…), SUM(video_is_default) AS default_sum FROM (…) HAVING default_sum < 1 (…)
… but then the sub-select column count doesn't match anymore.
Exception: ERROR 1241 (21000): Operand should contain 1 column(s)
I feel silly for figuring out the solution so soon after asking the question.
The lt method acccepts complex values as the first parameter.
->having(function($exp, $qb) {
$default_sum = $qb->func()->sum('video_is_default');
return $exp->lt($default_sum, 1);
})

Invalid type for parameter error when using put_item dynamodb

I want to write data in dataframe to dynamodb table
item = {}
for row in datasource_archived_df_join_repartition.rdd.collect():
item['x'] = row.x
item['y'] = row.y
client.put_item( TableName='tryfail',
Item=item)
but im gettin this error
Invalid type for parameter Item.x, value: 478.2, type: '<'type 'float''>', valid types: '<'type 'dict''>'
Invalid type for parameter Item.y, value: 696- 18C 12, type: '<'type 'unicode''>', valid types: '<'type 'dict''>'
Old question, but it still comes up high in a search and hasn't been answered properly, so here we go.
When putting an item in a DynamoDB table it must be a dictionary in a particular nested form that indicates to the database engine the data type of the value for each attribute. The form looks like below. The way to think of this is that an AttributeValue is not a bare variable value but a combination of that value and its type. For example, an AttributeValue for the AlbumTitle attribute below is the dict {'S': 'Somewhat Famous'} where the 'S' indicates a string type.
response = client.put_item(
TableName='Music',
Item={
'AlbumTitle': { # <-------------- Attribute
'S': 'Somewhat Famous', # <-- Attribute Value with type string ('S')
},
'Artist': {
'S': 'No One You Know',
},
'SongTitle': {
'S': 'Call Me Today',
},
'Year': {
'N': '2021' # <----------- Note that numeric values are supplied as strings
}
}
)
In your case (assuming x and y are numbers) you might want something like this:
for row in datasource_archived_df_join_repartition.rdd.collect():
item = {
'x': {'N': str(row.x)},
'y': {'N': str(row.y)}
}
client.put_item( TableName='tryfail', Item=item)
Two things to note here: first, each item corresponds to a row, so if you are putting items in a loop you must instantiate a new one with each iteration. Second, regarding the conversion of the numeric x and y into strings, the DynamoDB docs explain that the reason the AttributeValue dict requires this is "to maximize compatibility across languages and libraries. However, DynamoDB treats them as number type attributes for mathematical operations." For fuller documentation on the type system for DynamoDB take a look at this or read the Boto3 doc here since you are using Python.
The error message is indicating you are using the wrong type, it looks like you need to be using a dictionary when assigning values to item['x'] and item[y]. e.g.
item['x'] = {'value': row.x}
item['y'] = {'value': row.y}

How to attach relationships in existing nodes in neo4j?

I'm trying to make a graph from a csv file, but I'm not being able to add additional relationship in the existing nodes.
My actual code is:
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'my_file.csv' AS line
MERGE (p:Title { title: line[0]})
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH line WHERE line[2] IS NOT NULL
MERGE (b:Author {name: line[2]})
MERGE (b)-[:COLABORATE_IN]->(p) //not working
RETURN line[2]
It should be a simple, It creates well the nodes and the firsts relationships, but for the line[2] it just create the relationships for new nodes. What could I do?
Thanks
Everything that is not piped in the WITH clause is not available to the next part of the query :
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH line WHERE line[2] IS NOT NULL
// p is no more available here
Just add the p identifier to make it available in the remaining part of the query :
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'my_file.csv' AS line
MERGE (p:Title { title: line[0]})
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH p, line
WHERE line[2] IS NOT NULL
MERGE (b:Author {name: line[2]})
MERGE (b)-[:COLABORATE_IN]->(p) //not working
RETURN line[2]

Resources