torch/rnn won't use CUDA - torch

I'm trying to use the torch/rnn toolkit to run RNNs on my nVidia graphics card. I've got an Ubuntu 16.04 VM with the nVidia driver, CUDA toolkit, Torch, and cuDNN working. I can run the mnistCUDNN example and nvidia-smi shows it using the graphics card. In Torch, I can require('cunn'); and it loads happily.
BUT when I dofile('./rnn/examples/recurrent-visual-attention.lua' ); inside Torch, I get
{
batchsize : 20
cuda : false
cutoff : -1
dataset : "Mnist"
device : 1
earlystop : 200
glimpseDepth : 1
glimpseHiddenSize : 128
glimpsePatchSize : 8
glimpseScale : 2
hiddenSize : 256
id : "ptb:brain:1508585440:1"
imageHiddenSize : 256
locatorHiddenSize : 128
locatorStd : 0.11
lstm : false
maxepoch : 2000
maxnormout : -1
minlr : 1e-05
momentum : 0.9
noTest : false
overwrite : false
progress : false
rewardScale : 1
saturate : 800
savepath : "/home/tom/save/rmva"
seqlen : 7
silent : false
startlr : 0.01
stochastic : false
trainsize : -1
transfer : "ReLU"
uniform : 0.1
unitPixels : 13
validsize : -1
version : 13
}
and since cuda:false, it runs using just the CPU.
Any ideas how to work out what I've missed? Thanks.

I'm an idiot. When I finally worked up the courage to read the source code, I discovered that it doesn't automatically try to use CUDA. There's a -cuda flag to ask it to.
In my defence, the examples are undocumented...

Related

Running into troubles with data path (os.path.isdir(path) returning false when it exists) using FB XLM

I'm trying to run an evaluation on FB's Transcoder (https://github.com/facebookresearch/TransCoder) which implements FB's XLM (cross language model): https://github.com/facebookresearch/XLM#iii-applications-supervised--unsupervised-mt
I have everything configured and follow the instructions as in the github page where I run the validation/test set unzip them and place them into a specific directory: /home/username/TransCoder/data/bpe.cpp-java-python.with_comments/XLM-cpp-java-python-with-comments-functions-sa-cl-split-test.
When I run the following command that is in the run an evaluation section:
python XLM/train.py
--n_heads 8
--bt_steps 'python_sa-cpp_sa-python_sa,cpp_sa-python_sa-cpp_sa,java_sa-cpp_sa-java_sa,cpp_sa-java_sa-cpp_sa,python_sa-java_sa-python_sa,java_sa-python_sa-java_sa' # The evaluator will use this parameter to infer the languages to test on
--max_vocab '-1'
--word_blank '0.1'
--n_layers 6
--generate_hypothesis true
--max_len 512
--bptt 256
--fp16 true
--share_inout_emb true
--tokens_per_batch 6000
--has_sentences_ids true
--eval_bleu true
--split_data false
--data_path '/home/salzubi/TransCoder/data/bpe.cpp-java-python.with_comments/XLM-cpp-java-python-with-comments-functions-sa-cl-split-test'
--eval_computation true
--batch_size 32
--reload_model 'model_1.pth,model_1.pth'
--amp 2
--max_batch_size 128
--ae_steps 'cpp_sa,python_sa,java_sa'
--emb_dim 1024
--eval_only True
--beam_size 10
--retry_mistmatching_types 1
--dump_path '/tmp/'
--exp_name='eval_final_model_wc_30'
--lgs 'cpp_sa-java_sa-python_sa'
--encoder_only=False
I get the following error:
Traceback (most recent call last):
File "XLM/train.py", line 337, in <module>
check_data_params(params)
File "/home/username/TransCoder/XLM/src/data/loader.py", line 246, in check_data_params
assert os.path.isdir(params.data_path), params.data_path
AssertionError
I'm not sure why this is the case?
$ python
import os
print(os.path.isdir("/home/username/TransCoder/data/bpe.cpp-java-python.with_comments/XLM-cpp-java-python-with-comments-functions-sa-cl-split-test"))
output:
True
Any ideas on how to fix this?

xl create: unable to retrieve domain configuration error

I'm working with Xen for the first time, using Alpine Linux as Dom0.
I'm following the Alpine Linux guide to set up a PV guest in Xen, and I'm receiving the following error:
alpine-xen:~# xl create -f /etc/xen/a1.cfg -c
libxl: error: libxl_mem.c:202:libxl_set_memory_target: unable to retrieve domain configuration: No such file or directory
failed to free memory for the domain
I could not find any details about this error online. Could someone help me out with this?
Below are my outputs for xl list and xl info:
alpine-xen:~# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 64622 24 r----- 10.8
alpine-xen:~# xl info
host : alpine-xen
release : 4.19.80-0-vanilla
version : #1-Alpine SMP Fri Oct 18 11:27:53 UTC 2019
machine : x86_64
nr_cpus : 24
max_cpu_id : 31
nr_nodes : 1
cores_per_socket : 12
threads_per_core : 2
cpu_mhz : 3835.933
hw_caps : 178bf3ff:f6d8320b:2e500800:244037ff:0000000f:219c91a9:00400004:00000500
virt_caps : pv hvm
total_memory : 65466
free_memory : 128
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 12
xen_extra : .1
xen_version : 4.12.1
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder no-real-mode edd=off
cc_compiler : gcc (Alpine 8.3.0) 8.3.0
cc_compile_by : buildozer
cc_compile_domain : [unknown]
cc_compile_date : Tue Aug 13 14:24:26 UTC 2019
build_id : f004ef86de8db93d5fbbb90e9b5fa21a70823d67
xend_config_format : 4
looks like you forgot to restrict the memory for dom0 on boot.
Depending on our hardware we need 256M up to 2048M for dom0.
you need to set it in these two files:
/boot/extlinux.conf
...
LABEL xen-vanilla
MENU LABEL Xen + Linux vanilla
COM32 mboot.c32
APPEND xen.gz dom0_mem=256M --- vmlinuz-vanilla root=...
...
and
/etc/update-extlinux.conf
...
xen_opts=dom0_mem=256M
...

http service not starting error 1009

I was trying to print a document for one of my games but the page viewer couldn't see the printer so I checked the print spooler service
C:\WINDOWS\system32>sc qc spooler
[SC] QueryServiceConfig SUCCESS
SERVICE_NAME: spooler
TYPE : 110 WIN32_OWN_PROCESS (interactive)
START_TYPE : 2 AUTO_START
ERROR_CONTROL : 1 NORMAL
BINARY_PATH_NAME : C:\WINDOWS\System32\spoolsv.exe
LOAD_ORDER_GROUP : SpoolerGroup
TAG : 0
DISPLAY_NAME : Print Spooler
DEPENDENCIES : RPCSS
: http
SERVICE_START_NAME : LocalSystem
C:\WINDOWS\system32>sc query spooler
SERVICE_NAME: spooler
TYPE : 110 WIN32_OWN_PROCESS (interactive)
STATE : 1 STOPPED
WIN32_EXIT_CODE : 1068 (0x42c)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\WINDOWS\system32>
And tried to start it, then this happened
C:\WINDOWS\system32>net start spooler
System error 1068 has occurred.
The dependency service or group failed to start.
C:\WINDOWS\system32>
Ok so I checked the dependencies
C:\WINDOWS\system32>sc qc rpcss
[SC] QueryServiceConfig SUCCESS
SERVICE_NAME: rpcss
TYPE : 20 WIN32_SHARE_PROCESS
START_TYPE : 2 AUTO_START
ERROR_CONTROL : 1 NORMAL
BINARY_PATH_NAME : C:\WINDOWS\system32\svchost.exe -k rpcss
LOAD_ORDER_GROUP : COM Infrastructure
TAG : 0
DISPLAY_NAME : Remote Procedure Call (RPC)
DEPENDENCIES : RpcEptMapper
: DcomLaunch
SERVICE_START_NAME : NT AUTHORITY\NetworkService
C:\WINDOWS\system32>sc query rpcss
SERVICE_NAME: rpcss
TYPE : 20 WIN32_SHARE_PROCESS
STATE : 4 RUNNING
(NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\WINDOWS\system32>
Ok RPCSS is good, next one
C:\WINDOWS\system32>sc qc http && sc query http
[SC] QueryServiceConfig SUCCESS
SERVICE_NAME: http
TYPE : 1 KERNEL_DRIVER
START_TYPE : 3 DEMAND_START
ERROR_CONTROL : 1 NORMAL
BINARY_PATH_NAME : system32\drivers\HTTP.sys
LOAD_ORDER_GROUP :
TAG : 0
DISPLAY_NAME : HTTP Service
DEPENDENCIES :
SERVICE_START_NAME :
SERVICE_NAME: http
TYPE : 1 KERNEL_DRIVER
STATE : 1 STOPPED
WIN32_EXIT_CODE : 1009 (0x3f1)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\WINDOWS\system32>
OK seeing it stopped I tried to start it again
C:\WINDOWS\system32>net start http
System error 1009 has occurred.
The configuration registry database is corrupt.
C:\WINDOWS\system32>
So I run SFC to try and fix this BUT...
C:\WINDOWS\system32>sfc /scannow
Beginning system scan. This process will take some time.
Beginning verification phase of system scan.
Verification 100% complete.
Windows Resource Protection did not find any integrity violations.
C:\WINDOWS\system32>
A fat lot of help this is, it can't even fix something so inherently wrong...
So this is where I ask the community for help, I don't know what to do past this point. Help is very much appreciated.
In my case, I had a sub-key under
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters\SslBindingInfo that was missing information. i.e. all the keys such as 0.0.0.0:40015 have values like "AppId","DefaultFlags", etc.
I had one that had no values under this key. I deleted that "empty" key and HTTP was able to start up.

Very slow artifactory query language

I've been trying to implement a artifact cleanup process based on https://www.jfrog.com/blog/advanced-cleanup-using-artifactory-query-language-aql/ ... but found that the AQL is really slow. i've seen queries still running after 24 hours with the server pinned at 100% cpu.
Using simple curl :
time curl -vv -u user:password -d "${query}" "http://server:8081/artifactory/api/search/aql"
I've measured :
items.find({"repo" : "ep-snapshots", "#milestone.keep": {"$ne" : "true"}, "#milestone.complete": {"$ne" : "true"}, "created": {"$before" : "3d"}}).limit(100)
33 minutes
items.find({"repo" : "ep-snapshots", "#milestone.keep": {"$ne" : "true"}, "#milestone.complete": {"$ne" : "true"}}).limit(100)
42 minutes
items.find({"repo" : "ep-snapshots", "#milestone.keep": {"$ne" : "true"}, "#milestone.complete": "false"}).limit(100)
574 minutes
Are there any techniques to investigate the performance of AQL ? Anything obviously broken with the above ?
Thanks,
Pete

Why won't MediaInfo CLI let me isolate "Inform='Other'" tags?

I'm trying to use MediaInfo CLI (v0.7.77 on Mac OS X 10.9.5) to grab the first-frame timecode of QuickTime files using this Inform= syntax:
mediainfo --Inform="Other;%TimeCode_FirstFrame%" FILENAME.MOV
But it seems that MediaInfo ignores the existence of the "Other" section of the MediaInfo dump. It just outputs all of the file's metadata, as if I didn't enter an Inform= option at all.
However if I do something like this...
mediainfo --Inform="Video;%ColorSpace%" FILENAME.MOV
It simply returns "YUV", as expected, with no extraneous information.
So why won't my example using the Other; option work?
Here's an example of what I get when I do a language=raw dump of a typical source file:
General
CompleteName : /Volumes/SCARY_RAID/projects.local/WaikikiStock.local/KBank-BrollSelects_20151026.local/resolve_renders/waikiki_selects/667_2942_01-0001_1920x1080-23.976.mov
Format : QuickTime
Format/Info : Original Apple specifications
FileSize/String : 380 MiB
Duration/String : 17s 59ms
OverallBitRate_Mode/String : VBR
OverallBitRate/String : 187 Mbps
Encoded_Date : UTC 2015-10-30 21:55:35
Tagged_Date : UTC 2015-10-30 21:55:41
Encoded_Library/String : Apple QuickTime
Video
ID/String : 1
Format : ProRes
Format_Version : Version 0
Format_Profile : 422 HQ
CodecID : apch
Duration/String : 17s 59ms
BitRate_Mode/String : VBR
BitRate/String : 187 Mbps
Width/String : 1920 pixel3
Height/String : 1080 pixel3
DisplayAspectRatio/String : 16:9
FrameRate_Mode/String : CFR
FrameRate/String : 23.976 fps2
ColorSpace : YUV
ChromaSubsampling : 4:2:2
ScanType/String : Progressive
Bits-(Pixel*Frame) : 3.761
StreamSize/String : 380 MiB (100%)
Encoded_Library/String : abm0
Language/String : en
Encoded_Date : UTC 2015-10-30 21:55:35
Tagged_Date : UTC 2015-10-30 21:55:41
colour_primaries : BT.709
transfer_characteristics : BT.709
matrix_coefficients : BT.709
matrix_coefficients_Original : BT.709
Other
ID/String : 2
Type : Time code
Format : QuickTime TC
Duration/String : 17s 59ms
TimeCode_FirstFrame : 15:02:53:00
TimeCode_Striped/String : Yes
Title : Untitled
Language/String : en
Encoded_Date : UTC 2015-10-30 21:55:41
Tagged_Date : UTC 2015-10-30 21:55:41
This is a MediaInfo bug, will be corrected soon.
Jérôme, developer of MediaInfo.

Resources