Pytorch RuntimeError: CUDA error: out of memory at loss.backward() , No error when using CPU - out-of-memory

I'm training a Fully convolutional network (FCN32) for semantic segmentation on Tesla K80 with more than 11G memory.
The input image is pretty large: 352x1216. Network structure is shown below. I used batch_size=1, but still encounter the
out_of_memory error.
Criterion is nn.BCEWithLogitsLoss()
The network works fine when I run on CPU.
Layer (type) Output Shape # Param
Conv2d-1 [-1, 64, 352, 1216] 1,792
Conv2d-2 [-1, 64, 352, 1216] 36,928
MaxPool2d-3 [-1, 64, 176, 608] 0
Conv2d-4 [-1, 128, 176, 608] 73,856
Conv2d-5 [-1, 128, 176, 608] 147,584
MaxPool2d-6 [-1, 128, 88, 304] 0
Conv2d-7 [-1, 256, 88, 304] 295,168
Conv2d-8 [-1, 256, 88, 304] 590,080
Conv2d-9 [-1, 256, 88, 304] 590,080
MaxPool2d-10 [-1, 256, 44, 152] 0
Conv2d-11 [-1, 512, 44, 152] 1,180,160
Conv2d-12 [-1, 512, 44, 152] 2,359,808
Conv2d-13 [-1, 512, 44, 152] 2,359,808
MaxPool2d-14 [-1, 512, 22, 76] 0
Conv2d-15 [-1, 512, 22, 76] 2,359,808
Conv2d-16 [-1, 512, 22, 76] 2,359,808
Conv2d-17 [-1, 512, 22, 76] 2,359,808
MaxPool2d-18 [-1, 512, 11, 38] 0
Conv2d-19 [-1, 4096, 11, 38] 102,764,544
Conv2d-20 [-1, 4096, 11, 38] 16,781,312
Conv2d-21 [-1, 1, 11, 38] 4,097 ConvTranspose2d-22 [-1, 1, 352, 1216] 4,096
Error message:
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call
last) in ()
36 print (loss)
37 #torch.cuda.empty_cache()
---> 38 loss.backward()
39 optimizer.step()
40
/anaconda/envs/py35/lib/python3.5/site-packages/torch/tensor.py in
backward(self, gradient, retain_graph, create_graph)
91 products. Defaults to False.
92 """
---> 93 torch.autograd.backward(self, gradient, retain_graph, create_graph)
94
95 def register_hook(self, hook):
/anaconda/envs/py35/lib/python3.5/site-packages/torch/autograd/init.py
in backward(tensors, grad_tensors, retain_graph, create_graph,
grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: CUDA error: out of memory

Usually this happens because of memory on your GPU. If you have more powerful GPUs, your problem could be solved (as you mentioned in your answer).
But if you do not have, you can scale down your images into about 256*x sizes.
It is also good practice for performance's sake.

I found out the reason... It's hardware related. I changed to another machine and the error disappeared.

Related

Math problem trying to make a progressbar in project zomboid

I am making a mod for project Zomboid and I cant seem to figure out a math problem, so the value I am getting ranges from 0 to 1 and I want my progress bar to start at the max width and then descend as the value is increasing.
The first one was easy I got a value between a 100 and 0 so how do this with a value starting at 0?
I tried searching this on google but I am really bad at math and could not find an answer.
function panel:render()
self:drawRectBorder(30, 30, self:getWidth() - 1, 50, 1.0, 1.0, 1.0, 1.0);
--print((bt_core.player:getBodyDamage():getOverallBodyHealth() / 100) * self:getWidth());
self:drawRect(31, 31, (bt_core.player:getBodyDamage():getOverallBodyHealth() / 100) * self:getWidth() - 3, 48, 1, 0, 0, 1);
self:drawRectBorder(30, 110, self:getWidth() - 1, 50, 1.0, 1.0, 1.0, 1.0);
print(bt_core.player:getStats():getFatigue());
if bt_core.player:getStats():getFatigue() == 0 then
self:drawRect(31, 111, self:getWidth() - 3 , 48, 1, 0, 0, 1);
else
self:drawRect(32, 111,bt_core.player:getStats():getFatigue() / (self:getWidth() - 3) , 48, 1, 0, 0, 1);
end
end
To get variable in range 100..0 from variable in range 0..1, you can use y = 100 - x*100
So you have a value 0..1 and you want to map it to 100..0.
Multiplying your value with 100 gives you 0..100.
To invert this you subtract that from 100. 100-0 is 100, 100-100 is 0...
local newVal = 100 - val * 100
or
local newVal = 100 * (1-val)

how many n-length binary sequence problem

how to find out the solution to this problem in python/java or any other language:
Thanks in advance
Since a program isn't a proof and you would still need to prove it, here is some Python code:
def zig_zag(seq):
"""tests if binary sequence seq satsifies zig-zag pattern"""
for i in range(len(seq)-1):
if (i%2 == 0 and seq[i] > seq[i+1]) or (i%2 == 1 and seq[i] < seq[i+1]):
return False
return True
def count_zig_zags(n):
"""counts the number of binary zig-zag patterns of length n"""
count = 0
for i in range(2**n):
b = bin(i)[2:]
if zig_zag(b): count += 1
return count
For example:
>>> [count_zig_zags(n) for n in range(1,12)]
[2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233]
A proof would be via strong induction.

How can I shuffle a range of numbers and then split it into subarrays of a certain length?

Suppose I have a range 1:N, I want to shuffle the range into a random order and then split the resulting shuffled array into subarrays that are at most 128 elements long. How can I do this?
This question is based by one that appeared on the JuliaLang slack channel.
The function shuffle from the Random standard library can be used to shuffle a container into random order:
julia> using Random: shuffle
julia> shuffle(1:10)
10-element Array{Int64,1}:
6
9
3
2
10
1
8
7
5
4
The function Iterators.partition from Julia's Base can be used to iterate over an iterable in chunks of a fixed length:
julia> using Base.Iterators: partition
julia> partition(1:20, 7)
Base.Iterators.PartitionIterator{UnitRange{Int64}}(1:20, 7)
However, partition returns a lazy iterator by default so if we want to materialize the actual result, we'll need to collect it:
julia> collect(partition(1:20, 7))
3-element Array{UnitRange{Int64},1}:
1:7
8:14
15:20
Putting this all together, we have
julia> using Random: shuffle
julia> using Base.Iterators: partition
julia> shuffle_partition(N; chunk_size=128) = (collect ∘ partition)(shuffle(1:N), chunk_size)
shuffle_partition (generic function with 1 method)
julia> shuffle_partition(503)
4-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
[313, 51, 117, 373, 381, 340, 342, 415, 423, 453 … 201, 178, 167, 242, 2, 76, 146, 439, 363, 448]
[115, 121, 306, 440, 295, 181, 30, 280, 388, 227 … 362, 39, 317, 171, 55, 214, 261, 251, 96, 9]
[486, 248, 161, 319, 325, 176, 80, 369, 434, 209 … 442, 350, 273, 419, 130, 305, 192, 482, 265, 234]
[460, 31, 400, 466, 220, 447, 119, 446, 198, 141 … 226, 438, 74, 152, 203, 303, 378, 231, 458, 194]
julia> length.(ans)
4-element Array{Int64,1}:
128
128
128
119
This answer is based on the answer found on Slack.
using Iterators:
I think the most simple is using randperm (if the values are between 1 and N), so
using Base.Iterators: partition
using Random: randperm
N = 513
k = 128
collect(partition(randperm(N), k))
should work.
parts = view.(Ref(shuffle(1:N)),(i:min(i+k-1, N) for i in 1:k:N))
This assumes N is the number of elements and parition size is k. The obtained result is a list of views (hence shuffled 1:N is stored only once in the memory). Note how Ref is used to avoid vectorization over the shuffled list.
Sample test code:
julia> using Random
julia> N, k = 20, 4;
julia> parts = view.(Ref(shuffle(1:N)),(i:min(i+k-1, N) for i in 1:k:N))
5-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
[18, 15, 1, 6]
[10, 20, 4, 14]
[17, 9, 19, 16]
[5, 8, 12, 3]
[11, 13, 2, 7]

if statement argument length 0

Hopefully a simple question:
I am getting an "argument is of length zero" for the if statement line:
for (i in 1:(length(MixedDF))) {
if (MixedDF[i,1] - MixedDF[i-1,1] == 1) {
SwitchInd[i] = MixedDF$trial[i]
}
}
Where MixedDF is a large matrix and SwitchInd is a matrix of zeroes that is supposed to get filled in with the indices identified in the if statement. MixedDF$trial or MixedDF[i,1] is the first column in the matrix. This column contains integers starting at 51 and going to 74, where there are many rows with the same value. So for example MixedDF$trial <- c(51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53.....). I want to identify the indices where the trial changes, so 51 to 52, 52 to 53 and so on. More generally I want to understand why the if statement isn't working, it seems straightforward.
This gives the indices where the value changes
x <- c(51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53)
which(diff(x)!= 0) + 1
#[1] 6 15
Make sure you don't have NULL values. Also, you are not defining the cases where the if statement fails. Add an 'else' condition.
if (MixedDF[i,1] - MixedDF[i-1,1] == 1)
{
SwitchInd[i] = MixedDF$trial[i]
}
else
{
SwitchInd[i] = SOME VALUE
}
Another problem is you can't start from 1 (when i is 1, i-1 is 0).

Interpreting GDB registers (SSE registers)

I've been using GDB for 1 day and I've accumulated a decent understanding of it.
However when I set a breakpoint at the final semicolon using GDB and print registers I can't fully interpret the meaning of the data stored into the XMM register.
I don't know if the data is in (MSB > LSB) format or vice versa.
__m128i S = _mm_load_si128((__m128i*)Array16Bytes);
}
So this is the result that I'm getting.
(gdb) print $xmm0
$1 = {
v4_float = {1.2593182e-07, -4.1251766e-18, -5.43431603e-31, -2.73406277e-14},
v2_double = {4.6236050467459811e-58, -3.7422963639201271e-245},
v16_int8 = {52, 7, 55, -32, -94, -104, 49, 49, -115, 48, 90, -120, -88, -10, 67, 50},
v8_int16 = {13319, 14304, -23912, 12593, -29392, 23176, -22282, 17202},
v4_int32 = {872888288, -1567084239, -1926210936, -1460255950},
v2_int64 = {3749026652749312305, -8273012972482837710},
uint128 = 0x340737e0a29831318d305a88a8f64332
}
So would someone kindly guide me how to interpret the data.
SSE (XMM) registers can be interpreted in various different ways. The register itself has no knowledge of the implicit data representation, it just holds 128 bits of data. An XMM register can represent:
4 x 32 bit floats __m128
2 x 64 bit doubles __m128d
16 x 8 bit ints __m128i
8 x 16 bit ints __m128i
4 x 32 bit ints __m128i
2 x 64 bit ints __m128i
128 individual bits __m128i
So when gdb displays an XMM register it gives you all possible interpretations, as seen in your example above.
If you want to display a register using a specific interpretation (e.g. 16 x 8 bit ints) then you can do it like this:
(gdb) p $xmm0.v16_int8
$1 = {0, 0, 0, 0, 0, 0, 0, 0, -113, -32, 32, -50, 0, 0, 0, 2}
As for endianness, gdb displays the register contents in natural order, i.e. left-to-right, from MS to LS.
So if you have the following code:
#include <stdio.h>
#include <stdint.h>
#include <xmmintrin.h>
int main(int argc, char *argv[])
{
int8_t buff[16] __attribute__ ((aligned(16))) = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__m128i v = _mm_load_si128((__m128i *)buff);
printf("v = %vd\n", v);
return 0;
}
If you compile and run this you will see:
v = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
However if you step through the code in gdb and examine v you will see:
v16_int8 = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}

Resources