I need a function wide_to_long that turns a wide table into a long table and that accepts an argument id_vars for which the values have to be repeated (see example).
Sample input
let T_wide = datatable(name: string, timestamp: datetime, A: int, B: int) [
'abc','2022-01-01 12:00:00',1,2,
'def','2022-01-01 13:00:00',3,4
];
Desired output
Calling wide_to_long(T_wide, dynamic(['name', 'timestamp'])) should produce the following table.
let T_long = datatable(name: string, timestamp: datetime, variable: string, value: int) [
'abc','2022-01-01 12:00:00','A',1,
'abc','2022-01-01 12:00:00','B',2,
'def','2022-01-01 13:00:00','A',3,
'def','2022-01-01 13:00:00','B',4
];
Attempt
I've come pretty far with the following code.
let wide_to_long = (T:(*), id_vars: dynamic) {
// get names of keys to remove later
let all_columns = toscalar(T | getschema | summarize make_list(ColumnName));
let remove = set_difference(all_columns, id_vars);
// expand columns not contained in id_vars
T
| extend packed1 = pack_all()
| extend packed1 = bag_remove_keys(packed1, id_vars)
| mv-expand kind=array packed1
| extend variable = packed1[0], value = packed1[1]
// remove unwanted columns
| project packed2 = pack_all()
| project packed2 = bag_remove_keys(packed2, remove)
| evaluate bag_unpack(packed2)
| project-away packed1
};
The problems are that the solution feels clunky (is there a better way?) and the columns in the result are ordered randomly. The second issue is minor, but annoying.
I have two python files the first one is named employee.py and this is its code
import requests
class Employee:
"""A sample Employee class"""
raise_amt = 1.05
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.pay = pay
#property
def email(self):
return '{}.{}#email.com'.format(self.first, self.last)
#property
def fullname(self):
return '{} {}'.format(self.first, self.last)
def apply_raise(self):
self.pay = int(self.pay * self.raise_amt)
def monthly_schedule(self, month):
response = requests.get(f'http://company.com/{self.last}/{month}')
if response.ok:
return response.text
else:
return 'Bad Response!'
The other file is named test_employee.py
and this is its code
import unittest
from unittest.mock import patch
from employee import Employee
class TestEmployee(unittest.TestCase):
#classmethod
def setUpClass(cls):
print('setupClass')
#classmethod
def tearDownClass(cls):
print('teardownClass')
def setUp(self):
print('setUp')
self.emp_1 = Employee('Corey', 'Schafer', 50000)
self.emp_2 = Employee('Sue', 'Smith', 60000)
def tearDown(self):
print('tearDown\n')
def test_email(self):
print('test_email')
self.assertEqual(self.emp_1.email, 'Corey.Schafer#email.com')
self.assertEqual(self.emp_2.email, 'Sue.Smith#email.com')
self.emp_1.first = 'John'
self.emp_2.first = 'Jane'
self.assertEqual(self.emp_1.email, 'John.Schafer#email.com')
self.assertEqual(self.emp_2.email, 'Jane.Smith#email.com')
def test_fullname(self):
print('test_fullname')
self.assertEqual(self.emp_1.fullname, 'Corey Schafer')
self.assertEqual(self.emp_2.fullname, 'Sue Smith')
self.emp_1.first = 'John'
self.emp_2.first = 'Jane'
self.assertEqual(self.emp_1.fullname, 'John Schafer')
self.assertEqual(self.emp_2.fullname, 'Jane Smith')
def test_apply_raise(self):
print('test_apply_raise')
self.emp_1.apply_raise()
self.emp_2.apply_raise()
self.assertEqual(self.emp_1.pay, 52500)
self.assertEqual(self.emp_2.pay, 63000)
def test_monthly_schedule(self):
with patch('employee.requests.get') as mocked_get:
mocked_get.return_value.ok = True
mocked_get.return_value.text = 'Success'
schedule = self.emp_1.monthly_schedule('May')
mocked_get.assert_called_with('http://company.com/Schafer/May')
self.assertEqual(schedule,'Success')
if __name__ == '__main__':
unittest.main()
when I run test_employee.py, I get this error
ModuleNotFoundError: No module named 'employee.requests'; 'employee' is not a package
The code runs well if you delete the function test_monthly_schedule from test_employee.py
and delete monthly_schedule from employee.py
also I do not know if that will make a difference, but I use python 3.8. Also, I'm using Mac
I have 2 datastore models:
class KindA(ndb.Model):
field_a1 = ndb.StringProperty()
field_a2 = ndb.StringProperty()
class KindB(ndb.Model):
field_b1 = ndb.StringProperty()
field_b2 = ndb.StringProperty()
key_to_kind_a = ndb.KeyProperty(KindA)
I want to query KindB and output it to a csv file, but if an entity of KindB points to an entity in KindA I want those fields to be present in the csv as well.
If I was able to use ndb inside of a transform I would setup my pipeline like this
def format(element): # element is an `entity_pb2` object of KindB
try:
obj_a_key_id = element.properties.get('key_to_kind_a', None).key_value.path[0]
except:
obj_a_key_id = None
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HOW DO I DO THIS
obj_a = ndb.Key(KindA, obj_a_key_id).get() if obj_a_key_id else None
return ",".join([
element.properties.get('field_b1', None).string_value,
element.properties.get('field_b2', None).string_value,
obj_a.properties.get('field_a1', None).string_value if obj_a else '',
obj_a.properties.get('field_a2', None).string_value if obj_a else '',
]
def build_pipeline(project, start_date, end_date, export_path):
query = query_pb2.Query()
query.kind.add().name = 'KindB'
filter_1 = datastore_helper.set_property_filter(query_pb2.Filter(), 'field_b1', PropertyFilter.GREATER_THAN, start_date)
filter_2 = datastore_helper.set_property_filter(query_pb2.Filter(), 'field_b1', PropertyFilter.LESS_THAN, end_date)
datastore_helper.set_composite_filter(query.filter, CompositeFilter.AND, filter_1, filter_2)
p = beam.Pipeline(options=pipeline_options)
_ = (p
| 'read from datastore' >> ReadFromDatastore(project, query, None)
| 'format' >> beam.Map(format)
| 'write' >> apache_beam.io.WriteToText(
file_path_prefix=export_path,
file_name_suffix='.csv',
header='field_b1,field_b2,field_a1,field_a2',
num_shards=1)
)
return p
I suppose I could use ReadFromDatastore to query all entities of KindA and then use CoGroupByKey to merge them, but KindA has millions of records and that would be very inefficient.
Per the reccommendations in this answer: https://stackoverflow.com/a/49130224/4458510
I created the following utils, which were inspired by the source code of
DatastoreWriteFn in apache_beam.io.gcp.datastore.v1.datastoreio
write_mutations and fetch_entities in apache_beam.io.gcp.datastore.v1.helper
import logging
import time
from socket import error as _socket_error
from apache_beam.metrics import Metrics
from apache_beam.transforms import DoFn, window
from apache_beam.utils import retry
from apache_beam.io.gcp.datastore.v1.adaptive_throttler import AdaptiveThrottler
from apache_beam.io.gcp.datastore.v1.helper import make_partition, retry_on_rpc_error, get_datastore
from apache_beam.io.gcp.datastore.v1.util import MovingSum
from apache_beam.utils.windowed_value import WindowedValue
from google.cloud.proto.datastore.v1 import datastore_pb2, query_pb2
from googledatastore.connection import Datastore, RPCError
_WRITE_BATCH_INITIAL_SIZE = 200
_WRITE_BATCH_MAX_SIZE = 500
_WRITE_BATCH_MIN_SIZE = 10
_WRITE_BATCH_TARGET_LATENCY_MS = 5000
def _fetch_keys(project_id, keys, datastore, throttler, rpc_stats_callback=None, throttle_delay=1):
req = datastore_pb2.LookupRequest()
req.project_id = project_id
for key in keys:
req.keys.add().CopyFrom(key)
#retry.with_exponential_backoff(num_retries=5, retry_filter=retry_on_rpc_error)
def run(request):
# Client-side throttling.
while throttler.throttle_request(time.time() * 1000):
logging.info("Delaying request for %ds due to previous failures", throttle_delay)
time.sleep(throttle_delay)
if rpc_stats_callback:
rpc_stats_callback(throttled_secs=throttle_delay)
try:
start_time = time.time()
response = datastore.lookup(request)
end_time = time.time()
if rpc_stats_callback:
rpc_stats_callback(successes=1)
throttler.successful_request(start_time * 1000)
commit_time_ms = int((end_time - start_time) * 1000)
return response, commit_time_ms
except (RPCError, _socket_error):
if rpc_stats_callback:
rpc_stats_callback(errors=1)
raise
return run(req)
# Copied from _DynamicBatchSizer in apache_beam.io.gcp.datastore.v1.datastoreio
class _DynamicBatchSizer(object):
"""Determines request sizes for future Datastore RPCS."""
def __init__(self):
self._commit_time_per_entity_ms = MovingSum(window_ms=120000, bucket_ms=10000)
def get_batch_size(self, now):
"""Returns the recommended size for datastore RPCs at this time."""
if not self._commit_time_per_entity_ms.has_data(now):
return _WRITE_BATCH_INITIAL_SIZE
recent_mean_latency_ms = (self._commit_time_per_entity_ms.sum(now) / self._commit_time_per_entity_ms.count(now))
return max(_WRITE_BATCH_MIN_SIZE,
min(_WRITE_BATCH_MAX_SIZE,
_WRITE_BATCH_TARGET_LATENCY_MS / max(recent_mean_latency_ms, 1)))
def report_latency(self, now, latency_ms, num_mutations):
"""Reports the latency of an RPC to Datastore.
Args:
now: double, completion time of the RPC as seconds since the epoch.
latency_ms: double, the observed latency in milliseconds for this RPC.
num_mutations: int, number of mutations contained in the RPC.
"""
self._commit_time_per_entity_ms.add(now, latency_ms / num_mutations)
class LookupKeysFn(DoFn):
"""A `DoFn` that looks up keys in the Datastore."""
def __init__(self, project_id, fixed_batch_size=None):
self._project_id = project_id
self._datastore = None
self._fixed_batch_size = fixed_batch_size
self._rpc_successes = Metrics.counter(self.__class__, "datastoreRpcSuccesses")
self._rpc_errors = Metrics.counter(self.__class__, "datastoreRpcErrors")
self._throttled_secs = Metrics.counter(self.__class__, "cumulativeThrottlingSeconds")
self._throttler = AdaptiveThrottler(window_ms=120000, bucket_ms=1000, overload_ratio=1.25)
self._elements = []
self._batch_sizer = None
self._target_batch_size = None
def _update_rpc_stats(self, successes=0, errors=0, throttled_secs=0):
"""Callback function, called by _fetch_keys()"""
self._rpc_successes.inc(successes)
self._rpc_errors.inc(errors)
self._throttled_secs.inc(throttled_secs)
def start_bundle(self):
"""(re)initialize: connection with datastore, _DynamicBatchSizer obj"""
self._elements = []
self._datastore = get_datastore(self._project_id)
if self._fixed_batch_size:
self._target_batch_size = self._fixed_batch_size
else:
self._batch_sizer = _DynamicBatchSizer()
self._target_batch_size = self._batch_sizer.get_batch_size(time.time()*1000)
def process(self, element):
"""Collect elements and process them as a batch"""
self._elements.append(element)
if len(self._elements) >= self._target_batch_size:
return self._flush_batch()
def finish_bundle(self):
"""Flush any remaining elements"""
if self._elements:
objs = self._flush_batch()
for obj in objs:
yield WindowedValue(obj, window.MAX_TIMESTAMP, [window.GlobalWindow()])
def _flush_batch(self):
"""Fetch all of the collected keys from datastore"""
response, latency_ms = _fetch_keys(
project_id=self._project_id,
keys=self._elements,
datastore=self._datastore,
throttler=self._throttler,
rpc_stats_callback=self._update_rpc_stats)
logging.info("Successfully read %d keys in %dms.", len(self._elements), latency_ms)
if not self._fixed_batch_size:
now = time.time()*1000
self._batch_sizer.report_latency(now, latency_ms, len(self._elements))
self._target_batch_size = self._batch_sizer.get_batch_size(now)
self._elements = []
return [entity_result.entity for entity_result in response.found]
class LookupEntityFieldFn(LookupKeysFn):
"""
Looks-up a field on an EntityPb2 object
Expects a EntityPb2 object as input
Outputs a tuple, where the first element is the input object and the second element is the object found during the
lookup
"""
def __init__(self, project_id, field_name, fixed_batch_size=None):
super(LookupEntityFieldFn, self).__init__(project_id=project_id, fixed_batch_size=fixed_batch_size)
self._field_name = field_name
#staticmethod
def _pb2_key_value_to_tuple(kv):
"""Converts a key_value object into a tuple, so that it can be a dictionary key"""
path = []
for p in kv.path:
path.append(p.name)
path.append(p.id)
return tuple(path)
def _flush_batch(self):
_elements = self._elements
keys_to_fetch = []
for element in self._elements:
kv = element.properties.get(self._field_name, None)
if kv and kv.key_value and kv.key_value.path:
keys_to_fetch.append(kv.key_value)
self._elements = keys_to_fetch
read_keys = super(LookupEntityFieldFn, self)._flush_batch()
_by_key = {self._pb2_key_value_to_tuple(entity.key): entity for entity in read_keys}
output_pairs = []
for input_obj in _elements:
kv = input_obj.properties.get(self._field_name, None)
output_obj = None
if kv and kv.key_value and kv.key_value.path:
output_obj = _by_key.get(self._pb2_key_value_to_tuple(kv.key_value), None)
output_pairs.append((input_obj, output_obj))
return output_pairs
The Key to this is the line response = datastore.lookup(request), where:
datastore = get_datastore(project_id) (from apache_beam.io.gcp.datastore.v1.helper.get_datastore)
request is a LookupRequest from google.cloud.proto.datastore.v1.datastore_pb2
response is LookupResponse from google.cloud.proto.datastore.v1.datastore_pb2
The rest of the above code does things like:
using a single connection to the datastore for a dofn bundle
batches keys together before performing a lookup request
throttles interactions with the datastore if requests start to fail
(honestly I don't know how critical these bits are, I just came across them when browsing the apache_beam source code)
The resulting util function LookupEntityFieldFn(project_id, field_name) is a DoFn that takes in an entity_pb2 object as input, extracts and fetches/gets the key_property that resides on the field field_name, and outputs the result as a tuple (the fetch-result is paired with the input object)
My Pipeline code then became
def format(element): # element is a tuple `entity_pb2` objects
kind_b_element, kind_a_element = element
return ",".join([
kind_b_element.properties.get('field_b1', None).string_value,
kind_b_element.properties.get('field_b2', None).string_value,
kind_a_element.properties.get('field_a1', None).string_value if kind_a_element else '',
kind_a_element.properties.get('field_a2', None).string_value if kind_a_element else '',
]
def build_pipeline(project, start_date, end_date, export_path):
query = query_pb2.Query()
query.kind.add().name = 'KindB'
filter_1 = datastore_helper.set_property_filter(query_pb2.Filter(), 'field_b1', PropertyFilter.GREATER_THAN, start_date)
filter_2 = datastore_helper.set_property_filter(query_pb2.Filter(), 'field_b1', PropertyFilter.LESS_THAN, end_date)
datastore_helper.set_composite_filter(query.filter, CompositeFilter.AND, filter_1, filter_2)
p = beam.Pipeline(options=pipeline_options)
_ = (p
| 'read from datastore' >> ReadFromDatastore(project, query, None)
| 'extract field' >> apache_beam.ParDo(LookupEntityFieldFn(project_id=project, field_name='key_to_kind_a'))
| 'format' >> beam.Map(format)
| 'write' >> apache_beam.io.WriteToText(
file_path_prefix=export_path,
file_name_suffix='.csv',
header='field_b1,field_b2,field_a1,field_a2',
num_shards=1)
)
return p
import wx
import sqlite3
class Frame(wx.Frame):
def __init__(self):
wx.Frame.__init__(self, None)
self.panel = wx.Panel(self)
self.text = wx.StaticText(self.panel)
self.conn = sqlite3.connect("test.db")
self.cursor = self.conn.cursor()
self.autoRefersh()
def autoRefersh(self):
self.LoadList()
wx.CallLater(1000, self.autoRefersh)
def LoadList(self):
self.cursor.execute("SELECT *FROM CLINIC1")
for date1 in self.cursor: pass
self.staticText2_1 = wx.StaticText(self.panel, label=date1[1], style=wx.ALIGN_CENTER, pos=(100,100))
if __name__ == '__main__':
app = wx.App()
frame = Frame()
frame.Show()
app.MainLoop()
combobox data sqlite3 save in why panel show Why it looks different bug??
I do not know why this is happening.
You missed one crucial step, getting the data itself.
You are using the cursor object not the data returned by the cursor.
def LoadList(self):
self.cursor.execute("SELECT *FROM CLINIC1")
data = self.cursor.fetchall()
for date1 in data: pass
self.staticText2_1 = wx.StaticText(self.panel, label=date1[1], style=wx.ALIGN_CENTER, pos=(100,100))
AS you are "passing" in your for loop perhaps what you actually want is only a single record, in which case
data = self.cursor.fetchone()
and drop the for loop
Even better, read the tutorial
https://www.blog.pythonlibrary.org/2012/07/18/python-a-simple-step-by-step-sqlite-tutorial/
In the heading of your question you mention combobox, so I assume that you want to replace the statictext with a combobox. The following should get you started, I'll leave the wx.EVT_COMBOBOX event binding for you to add, as you will need it to do something when you select an item.
import wx
import sqlite3
class Frame(wx.Frame):
def __init__(self):
wx.Frame.__init__(self, None)
self.selected_data=[]
self.panel = wx.Panel(self)
self.combo = wx.ComboBox(self.panel,-1,choices=self.selected_data, size=(130,30))
self.conn = sqlite3.connect("test.db")
self.cursor = self.conn.cursor()
self.combo.SetValue("Choose an Item")
self.autoRefresh()
def autoRefresh(self):
self.LoadList()
def LoadList(self):
self.combo.Clear()
self.cursor.execute("SELECT * FROM CLINIC1")
data = self.cursor.fetchall()
for date1 in data:
self.selected_data.append(date1[1])
for i in self.selected_data:
self.combo.Append(i)
if __name__ == '__main__':
app = wx.App()
frame = Frame()
frame.Show()
app.MainLoop()
Edit:
It should look like this.
I lost already so much time but I don't get it.
Is it possible to use a String as an argument in a function?
My String is definded as:
mergesetting <- "all = FALSE"
(Sometimes I use "all.y = TRUE" or "all.x = TRUE" instead)
I tried to set that String as an argument into the following Function:
merged = merge.data.frame(x = DataframeA ,y = DataframeB ,by = "date_new", mergesetting )
But i get an error message: Error in fix.by(by.x, x)
The function does work if I use the argument directly:
merged = merge.data.frame(x = DataframeA,y = DataframeB,by = "date_new", all = FALSE )
As well two other approaches found on Use character string as function argument
didn't work:
L<- list(x = DataframeA,y = DataframeB,by = "date_new", mergesetting)
merged <- do.call(merge.data.frame, L)
Any help is much appreciated.
Not sure the point but,
if you had a list with your data and arguments
Say dfA is this data frame
kable(head(dfA))
|dates | datum|
|:----------|-----:|
|2010-05-11 | 1130|
|2010-05-12 | 1558|
|2010-05-13 | 1126|
|2010-05-14 | 131|
|2010-05-15 | 2223|
|2010-05-16 | 4005|
and dfB is this...
kable(head(dfB))
|dates | datum|
|:----------|-----:|
|2010-05-11 | 3256|
|2010-05-12 | 50|
|2010-05-13 | 2280|
|2010-05-14 | 4981|
|2010-05-15 | 2117|
|2010-05-16 | 791|
Your pre set list:
arg.li <- list(dfA = dfA,dfB = dfB,all = T,by = 'dates')
The wrapper for the list function...
f <- function(x)do.call('merge.data.frame',list(x = x$dfA,y = x$dfB,all = x$all))
results in:
kable(summary(f(arg.li)))
| | dates | datum |
|:--|:------------------|:------------|
| |Min. :2010-05-11 |Min. : 24 |
| |1st Qu.:2010-09-03 |1st Qu.:1288 |
| |Median :2010-12-28 |Median :2520 |
| |Mean :2011-01-09 |Mean :2536 |
| |3rd Qu.:2011-04-22 |3rd Qu.:3785 |
| |Max. :2011-12-01 |Max. :5000 |