unable to get vDataframe for a table - vertica-python

I am not able to create vDataFrame in VerticaPy.
https://www.vertica.com/python/documentation_last/vdataframe/object/index.php
In my Jupyter notebook, I read a csv file
train_vdf = read_csv("train.csv")
type(train_vdf)
The train_vdf is of type vDataFrame, which is fine. read_csv also creates a table called train in v_temp_schema.
When I try to create vDataFrame directly from the temp table, I get error No table or views 'v_temp_schema' found
vdf = vDataFrame("v_temp_schema.train")
25 type(train_vdf)
26 train_vdf
---> 27 vdf = vDataFrame("kaggle_titanic.v_temp_schema.train")
28 #vdf
29
~\Anaconda3\lib\site-packages\verticapy\vdataframe.py in __init__(self, input_relation, cursor, dsn, usecols, schema, empty)
217 if not (usecols):
218 self._VERTICAPY_VARIABLES_["allcols_ind"] = len(columns)
--> 219 assert columns != [], MissingRelation("No table or views '{}' found.".format(self._VERTICAPY_VARIABLES_["input_relation"]))
220 self._VERTICAPY_VARIABLES_["columns"] = [elem for elem in columns]
221 for col_dtype in columns_dtype:
AssertionError: No table or views 'v_temp_schema' found.
Interestingly, I can create vDataFrame if I use some other table in public schema. Eg. this works where titanic_train_flex_view is a view I created in the database directly.
vdf = vDataFrame("titanic_train_flex_view")
vdf
Seems the issue is onlywith v_temp_view. Why can't I create vDataFrame for the temp view created by VerticaPy apis?

Related

kubeflow pipeline SDK use dsl.ParallelFor to build a loop

#dsl.pipeline( name='classfier') def classifiertest(): make_classification_com_res = make_classification_com() rng_res = np_random_random_state() uniform_res = rng_uniform(make_classification_com_res.output,rng_res.output) all_datas_res = get_all_datas(x_input=uniform_res.output,y_input=make_classification_com_res.output) forlist= list([0,1,2,3,4,5,6,7,8,9]) with dsl.ParallelFor(forlist) as item_index: for_outter_func(item_index,ds_input=all_datas_res.output)
When I run this pipeline, the following error occurs after clicking the start button of run:
{"error":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1","code":13,"message":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1","details":[{"#type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1"}]}
When I delete these two lines of code, pipeline can successfully commit and run.
with dsl.ParallelFor(forlist) as item_index: for_outter_func(item_index,ds_input=all_datas_res.output)
Delete inline comments.
The following writing style is the cause of problems in kfp.
from kfp.components import create_component_from_func
# This comment is OK.
def inputdata_outputtable(input_arr, output_path):
import numpy as np
# This comment is also OK.
_input_arr = np.array(input_arr) # This comment causes an error.
np.save(output_path, _input_arr)

HBASE table mapped on Hive, how to get the real table size

I'm using HBase tables from Hive
for example:
CREATE EXTERNAL TABLE `ddid_link_msisdn`(
`msisdn` string COMMENT 'from deserializer',
`ddid` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,fc:ddid',
'serialization.format'='1')
TBLPROPERTIES (
'hbase.table.name'='unifieddata:ddid_link_msisdn_ddid',
'transient_lastDdlTime'='1535099920')
And anything is fine as usual but...
When I execute analyse table ddid_link_msisdn compute static and later ask the table description using describe extended ddid_link_msisdn I get:
... a lot of info ...etc etc...
{
totalSize=35566884,
numRows=427422,
rawDataSize=35139462,
COLUMN_STATS_ACCURATE=true, numFiles=1, transient_lastDdlTime=1591783620}
... other info ...
numRows is exactly the number of the rows, no surprise, totalSize or rawDataSize are about 35Mbytes but if I look the HDFS I see that:
root#mid1-db138hd-12 [10:42am] ~> hdfs dfs -du -h /hbase/data/unifieddata
337 1011 /hbase/data/unifieddata/ddid_hub
265.7 M 797.2 M /hbase/data/unifieddata/ddid_link_cat_ddid
>> 45.2 M 135.5 M /hbase/data/unifieddata/ddid_link_msisdn_ddid <---
the real size is 45M.
Are there any kindly people can explain to me this extra 28% and, if does it exists, how to get the correct table size from Hive?

ConvertFrom-StringData values stored in variable

I'm new to Powershell and I'm putting together a script that will populate all variables from data stored in a Excel file. Basically to create numerous VMs.
This works fine apart from where i have a variable with multiple name/value pairs which powershell needs to be a hashtable.
As each VM will need multiple tags applying, i have a column in excel called Tags.
The data in the field would look something like: "Top = Red `n Bottom = Blue".
I'm struggling to use ConvertFrom-StringData to create the hashtable however for these tags.
If i run:
ConvertFrom-StringData $exceldata.Tags
I end up with something like:
Name Value
---- -----
Top Red `n bottom = blue
I need help please with formatting the excel field correctly so ConvertFrom-StringData properly creates the hashtable. Or a better way of achieving this.
Thanks.
Sorted it, formatted the excel field as: Dept=it;env=prod;owner=Me
Then ran the following commands. No ConvertFrom-StringData required.
$artifacts = Import-Excel -Path "C:\temp\Artifacts.xlsx" -WorkSheetname "VM"
foreach ($artifact in $artifacts) {
$inputkeyvalues = $artifact.Tags
# Create hashtable
$tags = #{}
# Split input string into pairs
$inputkeyvalues.Split(';') |ForEach-Object {
# Split each pair into key and value
$key,$value = $_.Split('=')
# Populate $tags
$tags[$key] = $value
}
}

Repetition of data for import_from_csv_file() in web2py for sqlite database table

I want to load data from csv file into SQlite database. My code is :
Model:
db.define_table('data_table',
Field('Title',requires =IS_NOT_EMPTY()),
Field('Link',requires =IS_NOT_EMPTY()))
db.data_table.import_from_csv_file(open('mycsv'),'rb')
Controller:
def index():
query=((db.data_table.id))
fields = (db.data_table.id, db.data_table.Title, db.data_table.Link)
headers = {'data_table.id': 'ID',
'db.data_table.Title': 'Title',
'db.data_table.Link': 'Link',
}
default_sort_order=[db.data_table.id]
form = SQLFORM.grid(query=query, fields=fields, headers=headers,orderby=default_sort_order,
create=False, deletable=False, editable=False, maxtextlength=64, paginate=10)
return dict(form=form)
Here when I load this form in index.html I am getting repeated rows in Gridview.
e.g. If I have 5 rows in csv file then very first time it will show 5 records but when I refresh the index.html then each time it adds on those 5 records.
On 1st refresh it gives me 45 records. On 2nd refresh it gives 100 records. In this way it goes on increasing.
My doubt is where should I write this line
db.data_table.import_from_csv_file(open('mycsv'),'rb')
In Model or Controller or View
Please help me in this.
Thank you in advance.

Cannot unpublish component

I am using Tridion R5.3. I am trying to delete a component in but it it showing as published. No matter what I do I cannot get it to unpublish. I ran the following query against the database to determine where the component is published.
SELECT *
FROM [dbo].[PUBLISH_STATES] WITH (NOLOCK)
WHERE REFERENCE_ID = 268494
And I received the following information
ID : 45173
REFERENCE_ID : 268494
ITEM_TYPE : 16
PUBLICATION_ID : 4
STATE : 1
STATE_CHANGE_DATE : 2006-08-18 12:50:25.597
PUBLICATION_TARGET_ID : 2
TRUSTEE_ID : 43
TEMPLATE_REFERENCE_ID : 89798
TEMPLATE_ITEM_TYPE : 32
I have tried to unpublish the component from the publication target with an ID of 2 but no luck.
Would I be safe to just delete the row in the database?
Update
On the suggestion of Nuno and after reading another question I figure I have to un-publish the associated component template. I have tried the following and I am getting a Type Mismatch when executing the SetPublishedTo() method.
TDS.TDSE tdse = new TDS.TDSE();
var componentTemplate = (TDS.ComponentTemplate)tdse.GetObject("tcm:4-89798-32", TDSDefines.EnumOpenMode.OpenModeView);
componentTemplate.SetPublishedTo("tcm:4-268494", "tcm:0-2-65537", false, tdse.User);
After contacting SDL Support the solution was to set the STATE field to 0 in both the ITEM_STATES and PUBLISH_STATES tables for the relevant component.
UPDATE dbo.ITEM_STATES
SET STATE = 0
WHERE ITEM_REFERENCE_ID = 268494
UPDATE dbo.PUBLISH_STATES
SET STATE = 0
WHERE REFERENCE_ID = 268494

Resources