I've been trying to code an auto-clicker, but I first needed to learn how to use the Thread library, but I don't know why the code just stop when it runs the first time at the loop.
import time
from threading import Thread
def countdown(n):
while n > 0:
print('T-minus', n)
n -= 1
time.sleep(5)
t = Thread(target = countdown, args =(10, ))
t.start()
The output is only:
>>> T-minus 10
Anyone can help me?
I just understood what was going on! If you trying to run a thread on a Jupyter Cell, it won't show the output, I changed the code a bit to test it.
import time
from threading import Thread
TIMES = 0
def countdown(n):
global TIMES
while n > 0:
TIMES += 1
time.sleep(5)
t = Thread(target = countdown, args =(10, ))
t.start()
And in the next cell I just kept printing the TIMES value and it was changing! but just in the background!
for i in range(10):
time.sleep(3)
print(TIMES)
Related
This is a snippet of my code in PyTorch, my jupiter notebook stuck when I used num_workers > 0, I spent a lot on this problem without any answer. I do not have a GPU and I work only with a CPU.
class IndexedDataset(Dataset):
def __init__(self,data,targets, test=False):
self.dataset = data
if not test:
self.labels = targets.numpy()
self.mask = np.concatenate((np.zeros(NUM_LABELED), np.ones(NUM_UNLABELED)))
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
image = self.dataset[idx]
return image, self.labels[idx]
def display(self, idx):
plt.imshow(self.dataset[idx], cmap='gray')
plt.show()
train_set = IndexedDataset(train_data, train_target, test = False)
test_set = IndexedDataset(test_data, test_target, test = True)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, num_workers=2)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, num_workers=2)
Any help, appreciated.
When num_workers is greater than 0, PyTorch uses multiple processes for data loading.
Jupyter notebooks have known issues with multiprocessing.
One way to resolve this is not to use Jupyter notebooks - just write a normal .py file and run it via command-line.
Or try use what's suggested here: Jupyter notebook never finishes processing using multiprocessing (Python 3).
Since jupyter Notebook doesn't support python multiprocessing, there are two thin libraries, you should install one of them as mentioned here 1 and 2.
I prefer to solve my problem in two ways without using any external libraries:
By converting my file from .ipynb format to .py format and run it in the terminal and I write my code in the main() function as follows:
...
...
train_set = IndexedDataset(train_data, train_target, test = False)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, num_workers=4)
if `__name__ == '__main__'`:
for images,label in train_loader:
print(images.shape)
With multiprocessing library as follows:
In try.ipynb:
import multiprocessing as mp
import processing as ps
...
...
train_set = IndexedDataset(train_data, train_target, test = False)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE)
if __name__=="__main__":
p = mp.Pool(8)
r = p.map(ps.getShape,train_loader)
print(r)
p.close()
In processing.py file:
def getShape(data):
for i in data:
return i[0].shape
I'm using an Amazon SageMaker Notebook that has 72 cores and 144 GB RAM, and I carried out 2 tests with a sample of the whole data to check if the Dask cluster was working.
The sample has 4500 rows and 735 columns from 5 different "assets" (I mean 147 columns for each asset). The code is filtering the columns and creating a feature matrix for each filtered Dataframe.
First, I initialized the cluster as follows, I received 72 workers, and got 17 minutes of running. (I assume I created 72 workers with one core each.)
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(processes=True,n_workers=72,threads_per_worker=72)
def main():
import featuretools as ft
list_columns = list(df_concat_02.columns)
list_df_features=[]
from tqdm.notebook import tqdm
for asset in tqdm(list_columns,total=len(list_columns)):
dataframe = df_sma.filter(regex="^"+asset, axis=1).reset_index()
es = ft.EntitySet()
es = es.entity_from_dataframe(entity_id = 'MARKET', dataframe =dataframe,
index = 'index',
time_index = 'Date')
fm, features = ft.dfs(entityset=es,
target_entity='MARKET',
trans_primitives = ['divide_numeric'],
agg_primitives = [],
max_depth=1,
verbose=True,
dask_kwargs={'cluster': client.scheduler.address}
)
list_df_features.append(fm)
return list_df_features
if __name__ == "__main__":
list_df = main()
Second, I initialized the cluster as follows, I received 9 workers, and got 3,5 minutes of running. (I assume I created 9 workers with 8 cores each.)
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(processes=True)
def main():
import featuretools as ft
list_columns = list(df_concat_02.columns)
list_df_features=[]
from tqdm.notebook import tqdm
for asset in tqdm(list_columns,total=len(list_columns)):
dataframe = df_sma.filter(regex="^"+asset, axis=1).reset_index()
es = ft.EntitySet()
es = es.entity_from_dataframe(entity_id = 'MARKET', dataframe =dataframe,
index = 'index',
time_index = 'Date')
fm, features = ft.dfs(entityset=es,
target_entity='MARKET',
trans_primitives = ['divide_numeric'],
agg_primitives = [],
max_depth=1,
verbose=True,
dask_kwargs={'cluster': client.scheduler.address}
)
list_df_features.append(fm)
return list_df_features
if __name__ == "__main__":
list_df = main()
For me, it's mind-blowing because I thought that 72 workers could carry the work out faster! Once I'm not a specialist neither in Dask nor in FeatureTools I guess that I'm setting something wrong.
I would appreciate any kind of help and advice!
Thank you!
You are correctly setting dask_kwargs in DFS. I think the slow down happens as a result of additional overhead and less cores in each worker. The more workers there are, the more overhead exists from transmitting data. Additionally, 8 cores from 1 worker can be leveraged to make computations run faster than 1 core from 8 workers.
With reconfigurable model execution it is possible to resize inputs and outputs of components. How are the connections updated, when reconfigured outputs and inputs are connected?
In the example below the output c2.y and c3.y is resized at each model run. This input and output is supposed to be connected, as shown in the N2 chart. However, after the reconfiguration the connection size seems to be not updated automatically, it throws the following error:
ValueError: The source and target shapes do not match or are ambiguous for the connection 'c2.y' to 'c3.y'. Expected (1,) but got (2,).
I included below 3 tests, with promoted connection, absolute connection, and the last one with reconfiguration but without the connection (which works).
The last chance would be to declare the connection in the parent group of the comps, which I did not try yet.
The tests:
Promoted connection
Absolute connection
No connection
Reconfigurable component classes and tests:
from __future__ import division
import logging
import numpy as np
import unittest
from openmdao.api import Problem, Group, IndepVarComp, ExplicitComponent
from openmdao.utils.assert_utils import assert_rel_error
class ReconfComp(ExplicitComponent):
def initialize(self):
self.size = 1
self.counter = 0
def reconfigure(self):
logging.info('reconf started {}'.format(self.pathname))
self.counter += 1
logging.info('reconf ended {}'.format(self.pathname))
if self.counter % 2 == 0:
self.size += 1
return True
else:
return False
def setup(self):
logging.info('setup started {}'.format(self.pathname))
self.add_input('x', val=1.0)
self.add_output('y', val=np.zeros(self.size))
# All derivatives are defined.
self.declare_partials(of='*', wrt='*')
logging.info('setup ended {}'.format(self.pathname))
def compute(self, inputs, outputs):
logging.info('compute started {}'.format(self.pathname))
outputs['y'] = 2 * inputs['x']
logging.info('compute ended {}'.format(self.pathname))
def compute_partials(self, inputs, jacobian):
jacobian['y', 'x'] = 2 * np.ones((self.size, 1))
class ReconfComp2(ReconfComp):
"""The size of the y input changes the same as way as in ReconfComp"""
def setup(self):
logging.info('setup started {}'.format(self.pathname))
self.add_input('y', val=np.zeros(self.size))
self.add_output('f', val=np.zeros(self.size))
# All derivatives are defined.
self.declare_partials(of='*', wrt='*')
logging.info('setup ended {}'.format(self.pathname))
def compute(self, inputs, outputs):
logging.info('compute started {}'.format(self.pathname))
outputs['f'] = 2 * inputs['y']
logging.info('compute ended {}'.format(self.pathname))
def compute_partials(self, inputs, jacobian):
jacobian['f', 'y'] = 2 * np.ones((self.size, 1))
class TestReconfConnections(unittest.TestCase):
def test_reconf_comp_promoted_connections(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'], promotes_outputs=['y'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_inputs=['y'],
promotes_outputs=['f'])
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
totals = p.compute_totals(wrt=['x'], of=['y'])
assert_rel_error(self, p['x'], 3.0)
assert_rel_error(self, p['y'], 6.0)
assert_rel_error(self, totals['y', 'x'], [[2.0]])
print(p['x'], p['y'], totals['y', 'x'].flatten())
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
p.run_model() # FIXME Fails with ValueError
def test_reconf_comp_connections(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_outputs=['f'])
p.model.connect('c2.y', 'c3.y')
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
p.run_model() # FIXME Fails with ValueError
def test_reconf_comp_not_connected(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_outputs=['f'])
# c2.y not connected to c3.y
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
fail, _, _ = p.run_model()
self.assertFalse(fail)
if __name__ == '__main__':
unittest.main()
UPDATE:
It seems, that in Group._var_abs2meta only the source size is updated, but not the target. The setup of the connections starts, before the setup of the parent group or the setup of the other component would be called.
UPDATE 2:
This happens with the default NonlinearRunOnce solver, with a NewtonSolver of NonlinearBlockGS there is no error, but the variable sizes also don't change.
As of OpenMDAO V2.5 reconfigurable model variables is not an officially supported feature in the framework. The bare bones of the capability has been in the code since that research was done, but it wasn't something that was high priority enough for us to finalize. A recent major refactor in V2.4 re-worked how some underlying data-structures worked and must have broken this functionality.
It is on our development priority list to get this working again, but its not super high on that list. We focus development mainly on features that have a direct in-house applications, and we don't have one of those yet.
If you could provide a decently complete set of tests for it, we could take a look at getting the functionality working.
I want to print out the elements in the list in reverse order recursively.
def f3(alist):
if alist == []:
print()
else:
print(alist[-1])
f3(alist[:-1])
I know it is working well, but I don't know the difference between
return f3(alist[:-1])
and
f3(alist[:-1])
Actually both are working well.
My inputs are like these.
f3([1,2,3])
f3([])
f3([3,2,1])
There is a difference between the two although it isn't noticeable in this program. Look at the following example where all I am doing is passing a value as an argument and incrementing it thereby making it return the value once it hits 10 or greater:
from sys import exit
a = 0
def func(a):
a += 1
if a >= 10:
return a
exit(1)
else:
# Modifications are made to the following line
return func(a)
g = func(3)
print(g)
Here the output is 10
Now if I re-write the code the second way without the "return" keyword like this :
from sys import exit
a = 0
def func(a):
a += 1
if a >= 10:
return a
exit(1)
else:
# Modifications are made to the following line
func(a)
g = func(3)
print(g)
The output is "None".
This is because we are not returning any value to handle.
In short, return is like internal communication within the function that can be possibly handled versus just running the function again.
How can I load a csv file which is too big in iPython? It seems that it cannot be loaded at once in memory.
you can use this code to read a file in chunks and it will also distribute the file over multiple processors.
import pandas as pd
import multiprocessing as mp
LARGE_FILE = "yourfile.csv"
CHUNKSIZE = 100000 # processing 100,000 rows at a time
def process_frame(df):
# process data frame
return len(df)
if __name__ == '__main__':
reader = pd.read_csv(LARGE_FILE, chunksize=CHUNKSIZE)
pool = mp.Pool(4) # use 4 processes
funclist = []
for df in reader:
# process each data frame
f = pool.apply_async(process_frame,[df])
funclist.append(f)
result = 0
for f in funclist:
result += f.get(timeout=10) # timeout in 10 seconds