Jupyter notebook crashing server - jupyter-notebook

I just bought a GPU to put in a computer where I'm hosting a jupyter notebook. I tunnel the output of the jupyter notebook from the tower to my laptop via ssh. I'm running some code and the jupyter notebook freezes at the same line every time. Not only does the jupyter notebook freeze, but everything on the tower. If I have any other ssh connections, they freeze. If I use the GUI on the tower directly it's frozen. Nothing is responsive until I hit the power button to reset the computer.
The odd thing is, that while nothing is responding, nothing is timing out either. the ssh sessions keep their connections. the jupyter notebook homepage claims it's still connected. It's very odd, and I'm not sure this is a problem with the code or the tower somehow, so I'm not sure if I should post this here or somewhere else. But here's the code
def show_img(x):
x = x.clone().detach().permute(1,2,0).numpy()
print(x.shape)
x = rio.convert_tensor_to_rgb(x)
print(x.shape)
plt.figure(figsize=(8, 8))
plt.axis('off')
_ = plt.imshow(x)
# define generator discriminator, dataloader, other stuff....
G.cuda()
D.cuda()
g_optim = optim.RMSprop(G.parameters(), lr=lr)
d_optim = optim.RMSprop(D.parameters(), lr=lr)
g_losses = []
d_losses = []
i_losses = []
for epoch in range(n_epochs):
dataloader = DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, pin_memory=True,
num_workers=num_workers)
g_loss = 0.0 # g_loss is the generator's loss
d_loss = 0.0 # d_loss is the discriminator's loss
i_loss = 0.0 # i_loss is the generator's loss for not being invertable
i_weight = 10 # prioritize being invertible ten times more than minimizing g_loss
x,y,z = train_dataset[0]
print("image")
show_img(x)
print("target")
show_img(y)
print("generated")
x,y = x.cuda(), y.cuda()
g = G(x.unsqueeze(0),y.unsqueeze(0))
print(g.shape)
show_img(g.squeeze().cpu())
loop = tqdm(total=len(dataloader), position=0, file=sys.stdout)
print("just to be sure") #prints this
for minibatch, (image, batchImage, exp_batch) in enumerate(dataloader): #this is the line it freezes on?
print("image ", image.shape, " batchImage ", batchImage.shape, " experiment batch ", exp_batch) # doesn't print this. already frozen
EDIT: GUI and ssh are responsive, but unusually slow. I guess the main problem is that the code is still freezing on said line of code, and I don't know why.

I found The problem. The batch size was 32, which was much to large given the size of the images. I think it justed crashed after trying to load them all.

Related

Any way to use future.apply parallelization without PBS/TORQUE killing the job?

I frequently use the packages future.apply and future to parallelize tasks in R. This works perfectly well in my local machines. However, if I try to use them in a computer cluster, managed by PBS/TORQUE, the job gets killed for violating the resources policy. After reviewing the processes, I noticed that the resources_used.mem and resources_used.vmem as reported by qstat are ridiculously high. Is there any way to fix this?
Note: I already know and use the package batchtools and future.batchtools, but they produce jobs to launch to the queues, so this requires me to organize the scripts in a particular way, so I would like to avoid this for this specific example.
I have prepared the following MVE. As you can see, the code simply allocates a vector with 10^9 elements, and then performs, in parallel using future_lapply, some operations (here just a trivial check).
library(future.apply)
plan(multicore, workers = 12)
sample <- rnorm(n = 10^9, mean = 10, sd = 10)
print(object.size(sample)/(1024*1024)) # fills ~ 8 gb of RAM
options(future.globals.maxSize=+Inf)
options(future.gc = TRUE)
future_lapply(future.seed = TRUE,
X = 1:12, function(idx){
# just do some stuff
for(i in sample){
if (i > 0) dummy <- 1
}
return(dummy)
})
If run on my local computer (no PBS-TORQUE involved), this works well (meaning no problem with the RAM) assuming 32Gb of RAM are available. However, if run through TORQUE/PBS on a machine that has enough resources, like this:
qsub -I -l mem=60Gb -l nodes=1:ppn=12 -l walltime=72:00:00
the job gets automatically killed due to violating the resources policy. I am pretty sure that this has to do with PBS/TORQUE not measuring correctly the resources used since, since if I check
qstat -f JOBNAME | grep used
I get:
resources_used.cput = 00:05:29
resources_used.mem = 102597484kb
resources_used.vmem = 213467760kb
resources_used.walltime = 00:02:06
Telling me that the process is using ~102Gb of mem and ~213Gb of vmem. It does not, you can actually monitor the node with e.g. htop and it is using the correct amount of RAM, but TORQUE/PBS is measuring much more.

Baffling: println prints then the line erases in Atom for arguments>1e8

function foo(x)
n = 0
t = time()
while n < x
n += 1
end
sec = time() - t
println("done in $sec seconds $(x / sec) operations/sec")
end
foo(1e7)
I'm on Windows using Atom latest version of everything. I run the above code and it prints fine for 1e1,..., 1e7
But for foo(1e8) and above, it prints the line, and then the line DISAPPEARS. I'm completely baffled by that.
It only happens in Atom (VS Code works fine). I use control-enter on the foo(1e8) line to evaluate it and I can see it printing the line and then the line erases by itself. For foo(1e7) and below, it prints fine.
Here's the video of this with 1e8 then 1e7 and it happens on Linux too. As you can see from one of the attempts the video was able to capture the printing and erasing (see at 5 second mark in the video). When I changed to 1e7, it prints fine every single time.
everything is up-to-date: Julia 1.4.1, Atom 1.46, Juno 0.8.1 and I did a complete Julia package update as well.
github.com/JunoLab/Juno.jl/issues/560.
(credit to pfitzseb )

Python script run in background not writing to file?

I've created my first ever python script and it works fine in the foreground, but when I run it in the background it creates, but fails to write anything to the file. I run the script with command : python -u testit.py &
Please help me understand why this happens.
#!/usr/bin/env python
import time
import datetime
dt = datetime.datetime.now()
dtLog = dt.strftime("ThermoLogs/TempLOG%Y%m%d")
f = open(dtLog,"a")
while True:
dt = datetime.datetime.now()
print('{:%Y-%m-%d %H:%M:%S}'.format(dt))
f.write('{:%Y-%m-%d %H:%M:%S}'.format(dt))
f.write('\n')
time.sleep(5)
The output will arrive in bursts (it is being buffered). With the sleep set to 0.01 I am getting bursts about every 2 seconds. So for a delay of 5 you will get them much less frequent. You may also note that it outputs all pending outputs when you terminate it.
To make the output arrive now call f.flush().
I don't see any difference between foreground and background. However it should not buffer stdout when going to a terminal.

pyserial code working on Windows (COM1) but not on Linux (/dev/ttyS0)

I am using python 3.6.
The following code works just fine on Windows 10 Pro:
import serial
import binascii
ser = serial.Serial("COM1") # "COM1" will be "/dev/ttyS0" on Linux
if ser.is_open == True:
print("COM open")
ser.baudrate = 2400
print('Port configuration:')
print('baudrate:',ser.baudrate)
print('parity:',ser.parity)
print('stopbits:',ser.stopbits)
print('bytesize:',ser.bytesize)
print('xonxoff:',ser.xonxoff)
print('timeout:',ser.timeout)
print()
print('sending...')
frame = bytearray()
frame.append(0x7e)
frame.append(0x03)
frame.append(0x02)
frame.append(0x21)
frame.append(0x00)
frame.append(0xa4)
ser.write(frame)
print(binascii.hexlify(frame))
print()
print('receiving...')
recv = ser.readline()
recv_len = len(recv)
print(binascii.hexlify(recv))
print()
ser.close()
if ser.is_open == False:
print("COM closed")
But it gets stuck at 'ser.readline()' when I run it under CentOS 6.8, as there was no cable attached to the port.
It looks like a trivial issue, but I cannot figure out what's wrong or missing.
If you cannot either, I hope the sample code can result useful to someone at least.
False problem. The code worked using ttyS1 instead of ttyS0 (I knew it was something trivial).
Anyway, very useful to check
cat /proc/tty/driver/serial
which shows tx/rx statistics and parameters as DTS, RTS, RI, etc. next to each port.
For example, next to the ttyS1 I noticed an 'RI' which was the same parameter that Hercules terminal on Windows showed me (graphically) when I tried to open COM1. Very intuitive to identify a serial port this way!

How to Read Data from Serial Port in R

I'm wanting to plot live data from the serial port. I figured R would be a good tool for the job. I'm stumbling on trying to read data from the serial port (COM4). I've verified the data is coming in through terra term (and close the session before trying R), but I can't seem to get anything in R.
I've checked a few places, including these threads:
How to invoke script that uses scan() on Windows?
How to include interactive input in script to be run from the command line
I've also found this old thread on the R forum:
https://stat.ethz.ch/pipermail/r-help/2005-September/078929.html
These have gotten me this far, but I can't seem to actually get any data into R from the serial port.
At this point I can stream in the data in excel using VBA, but I'd like to do it in R for some nicer live plotting and filtering of the data.
Edit: Thanks for the help so far. I just got it working while writing up this edit, so here's the code:
#
# Reset environment
#
rm(list = ls()) # Remove environemnent variables
graphics.off() # Close any open graphics
#
# Libraries
#
library(serial)
#
# Script
#
con <- serialConnection(name = "test_con",
port = "COM11",
mode = "115200,n,8,1",
buffering = "none",
newline = 1,
translation = "cr")
open(con)
stopTime <- Sys.time() + 2
foo <- ""
textSize <- 0
while(Sys.time() < stopTime)
{
newText <- read.serialConnection(con)
if(0 < nchar(newText))
{
foo <- paste(foo, newText)
}
}
cat("\r\n", foo, "\r\n")
close(con)
foo ends up being a long string with new lines the way I want them:
3181, -53120, -15296, 2,
3211, -53088, -15328, 2,
3241, -53248, -15456, 1,
3271, -53216, -15424, 2,
3301, -53184, -15488, 2,
3331, -53344, -15360, 1,
3361, -53440, -15264, 1,
Thanks again for all the help!
i am working with the serial-package (here) available on CRAN. This was developed to do exactly that what you need. Reading and sending data form and to RS232 etc. connections.
I do really recommend this, because "mode.exe" seems not to work for virtual COM-ports. See NPort-Server etc.
Teraterm and Windows use a different mechanism to configure serial devices.
Are your system connection settings ok compared to what is configured in teraterm?
Re-check the configuration parameter in teraterm and then use them to set your COM4: configuration in R.
system("mode COM4: BAUD=115200 PARITY=N DATA=8 STOP=1")
see mode /? on your command prompt for further parameters
it might also be helpful to read data character by character using readChar()
It sometimes happens that teraterm doesn't close RS232 connections properly.
I realize that this is from five years ago but I found that in your code you do not have a handshake called.
I am working with something similar where I use PUTTY instead of teraterm, where I could see all of the following inputs for my COM device.
my command is as follow:
con <-serialConnection(name="Prolific USB-to-Serial Comm Port(Com3)",
port="COM3",
mode="9600,n,8,1",
newline=0,
translation="lf",
handshake = 'xonxoff'
)

Resources