XTDB unable to start a node - xtdb

From their getting started
(defn start-xtdb! []
(letfn [(kv-store [dir]
{:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir (io/file dir)
:sync? true}})]
(xt/start-node
{:xtdb/tx-log (kv-store "data/dev/tx-log")
:xtdb/document-store (kv-store "data/dev/doc-store")
:xtdb/index-store (kv-store "data/dev/index-store")})))
(def xtdb-node (start-xtdb!))
(defn stop-xtdb! []
(.close xtdb-node))
Upon starting the node, it throws
Execution error (RocksDBException) at org.rocksdb.RocksDB/open (RocksDB.java:-2).
lock hold by current process, acquire time 1649604606 acquiring thread 123145548206080:
/Users/faiz.halde/Workspace/personal/data/proj/data/dev/index-store/LOCK: No locks available
Even tried deleting the data directory
CLJ - 1.10.3
openjdk version "1.8.0_292"

I ignore the cause of the problem. But I restarted the REPL and it worked.

This usually indicates that you still have an active node running, because a RocksDB instance can only be access by one node at a time, however if you lose the reference to the original node then you can't shut it down directly and I believe the only option then is to restart the JVM.

Related

How to restart the SHA256MigrationJob without restarting Artifactory?

I keep encountering the below error message in the sha256_migration.log. It doesn't restart after failure, however if I restart the artifactory service it begins the SHA256 migration from where it left off until it fails again.
2018-11-13 10:24:35,060 [art-exec-3] [ERROR] (o.a.s.j.m.s.Sha256MigrationJob:78) - Caught unexpected exception during SHA256 Migration job, operation will break.
org.springframework.core.task.TaskRejectedException: Executor [org.artifactory.schedule.ArtifactoryConcurrentExecutor#70a2137a] did not accept task: org.artifactory.schedule.aop.AsyncAdvice$$Lambda$654/1640835804#7dbf55d3
at org.springframework.core.task.support.TaskExecutorAdapter.submit(TaskExecutorAdapter.java:93)
at org.springframework.scheduling.concurrent.ConcurrentTaskExecutor.submit(ConcurrentTaskExecutor.java:143)
at org.artifactory.schedule.aop.AsyncAdvice.submitWorkQueueTask(AsyncAdvice.java:235)
at org.artifactory.schedule.aop.AsyncAdvice.submit(AsyncAdvice.java:217)
at org.artifactory.schedule.aop.AsyncAdvice.executeInvocation(AsyncAdvice.java:146)
at org.artifactory.schedule.aop.AsyncAdvice.invoke(AsyncAdvice.java:124)
at org.artifactory.schedule.aop.AsyncAdvice.invoke(AsyncAdvice.java:62)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy144.updateSha2(Unknown Source)
at org.artifactory.storage.jobs.migration.sha256.Sha256MigrationJob.migrationLogic(Sha256MigrationJob.java:134)
at org.artifactory.storage.jobs.migration.MigrationJobBase.migrationLoop(MigrationJobBase.java:106)
at org.artifactory.storage.jobs.migration.MigrationJobBase.runMigration(MigrationJobBase.java:83)
at org.artifactory.storage.jobs.migration.MigrationJobBase.onExecute(MigrationJobBase.java:73)
at org.artifactory.schedule.quartz.QuartzCommand.execute(QuartzCommand.java:48)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.artifactory.concurrent.ArtifactoryRunnable.run(ArtifactoryRunnable.java:30)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.RejectedExecutionException: Task org.artifactory.concurrent.ArtifactoryRunnable#4afb003 rejected from java.util.concurrent.ThreadPoolExecutor#33daf5aa[Running, pool size = 64, active threads = 64, queued tasks = 10000, completed tasks = 120723]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at org.artifactory.schedule.ArtifactoryConcurrentExecutor.execute(ArtifactoryConcurrentExecutor.java:69)
at org.springframework.core.task.support.TaskExecutorAdapter.submit(TaskExecutorAdapter.java:88)
... 19 common frames omitted
My artifactory.system.properties regarding sha256migration
##SHA2 Migration block
artifactory.sha2.migration.job.enabled=true
artifactory.sha2.migration.job.queue.workers=100
My setup:
Cloned production instances of Artifactory infrastructure into test instance (save for IP address and DNS records).
Ensure that DB and filestore configurations (db.properties and binarystore.xml) were updated accordingly on the cloned instances.
Things I've tried without luck:
I ran the Artifactory GC couple of times.
Increase the CPU count to 16
Increase the RAM to 16G
Ensure that I am running the latest Oracle Java 8u192
What I know:
It keep running fine for a while until it crashes.
When I restart artifactory service, the migration resumes and the total artifacts to migrate is lower.
I cannot keep restarting Artifactory in production to finish the sha256migrationjob, I have over 500k artifacts.
My question:
Any way method to restart the SHA256MigrationJob without restarting Artifactory?
Is there a way to find the artifact that it has trouble migrating to SHA256?
In the stack trace above, I feel the issue is at com.sun.proxy.$Proxy144.updateSha2(Unknown Source).
-- Workaround --
I ended up creating a new VM and installed a clean copy Artifactory 6.5.3 (with latest Oracle Java 8 Server-JRE). In the above issue, I was doing an in-place upgrade, just in a new folder.
I moved the necessary files in the etc to the new VM; such as master.key, binarystore.xml, db.properties and etc. I then executed the bin/installService.sh [user] [group], this creates the /etc/opt/jfrog/artifactory configuration symlink/folder move. My filestore and artifactory database are running on different VMs, thus only Artifactory and it's file configurations needed to be ported.
The new Artifactory 6.5.3 version started up without issues.
The sha256migrationjob is actually running without any problems. Last upgrade run I did, it worked fine without the job dying.
Notes: I did also sanely adjust the configuration values on the queue workers. https://www.jfrog.com/confluence/display/RTF/Checksum-Based+Storage#Checksum-BasedStorage-ConfiguringtheMigrationProcess
tl;dr decrease artifactory.sha2.migration.job.queue.workers to somewhere around 2 - 2 * number of cores.
What you are experiencing is an exhaustion of the ThreadPoolExecutor.
The size of the thread pool by default is 4 * [number of cores].
In your case 64 (16 cores * 4)
However the limit on the number of threads you have configured for the sha256 migration job is 100.
On an artifactory instance without any load, this will not cause any failures, because there is a queue backing up the thread pull (in your case the size is 10000).
In your case the thread pull and the queue are full.
If I understand correctly you have overcame the issue. But for other people bumping into this thread, I would recommend decreasing the value of artifactory.sha2.migration.job.queue.workers to no more than half of the thread pull number of threads.
So in your case 32: (16 * 4)/2

Pintos - UserProg all tests fail is_kernel_vaddr()

I am doing the Pintos project on the side to learn more about operating systems. I had tons of devops trouble at first with it not running well on an 18.04 Ubuntu droplet. I am now running it on the VirtualBox image that UCCS tells students to download for pintos.
I finished project 1 and started to map out my solution to project 2. Following the instructions to create a file I ran
pintos-mkdisk filesys.dsk --filesys-size=2
pintos -- -f -q
but am getting error
Kernel PANIC at ../../threads/vaddr.h:87 in vtop(): assertion
`is_kernel_vaddr (vaddr)' failed.
I then tried running make check (all the tests). They are all failing for the same reason.
Am I missing something? Is there something I need to implement to fix this? I reread the instructions and didnt see anything?
Would appreciate help!
Thanks
I had a similar problem. My code for Project 1 ran fine, but I could not format the filesystem for Project 2.
The failure for me came from the following call chain:
thread_init() -> ... -> thread_schedule_tail() -> process_activate() -> pagedir_activate() -> vtop()
The problem is that init_page_dir is still NULL when pagedir_activate() is called. init_page_dir should have been initialized in paging_init() but this is called after thread_init().
The root cause was that my scheduler was being called too early, i.e. before the call to thread_start(). The reason for my problem was that I had built in a call to thread_yield() upon completion of every call to lock_release() which makes sense from a priority donation standpoint. Unfortunately, locks are used prior to the scheduler being ready! To fix this, I installed a flag called threading_started that bails in the first line of my thread_block() and thread_yield() functions if thread_start() has not yet been called.
Good luck!

How to properly save Common Lisp image using SBCL?

If I want to create a Lisp-image of my program, how do I do it properly? Are there any prerequisites? And doesn't it play nicely with QUICKLISP?
Right now, if I start SBCL (with just QUICKLISP pre-loaded) and save the image:
(save-lisp-and-die "core")
And then try to start SBCL again with this image
sbcl --core core
And then try to do:
(ql:quickload :cl-yaclyaml)
I get the following:
To load "cl-yaclyaml":
Load 1 ASDF system:
cl-yaclyaml
; Loading "cl-yaclyaml"
.......
debugger invoked on a SB-INT:EXTENSION-FAILURE in thread
#<THREAD "main thread" RUNNING {100322C613}>:
Don't know how to REQUIRE sb-sprof.
See also:
The SBCL Manual, Variable *MODULE-PROVIDER-FUNCTIONS*
The SBCL Manual, Function REQUIRE
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [RETRY ] Retry completing load for #<REQUIRE-SYSTEM "sb-sprof">.
1: [ACCEPT ] Continue, treating completing load for #<REQUIRE-SYSTEM "sb-sprof"> as having been successful.
2: Retry ASDF operation.
3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
4: [ABORT ] Give up on "cl-yaclyaml"
5: Exit debugger, returning to top level.
(SB-IMPL::REQUIRE-ERROR "Don't know how to ~S ~A." REQUIRE "sb-sprof")
0]
Alternatively, if I try:
(require 'sb-sprof)
when sbcl is started with saved core, I get the same error. If sbcl is started just as sbcl there is no error reported.
In fact, pre-loading QUICKLISP is not a problem: the same problem happens if sbcl is called initially with sbcl --no-userinit --no-sysinit.
Am I doing it wrong?
PS. If I use roswell, ros -L sbcl-bin -m core run somehow doesn't pick up the image (tested by declaring variable *A* before saving and not seeing it once restarted).
PS2. So far what it looks like is that sbcl does not provide extension modules (SB-SPROF, SB-POSIX, etc.) unless they are explicitly required prior saving the image.
Thanks for the help from #jkiiski here is the full explanation and solution:
SBCL uses extra modules (SB-SPROF, SB-POSIX and others) that are not always loaded into the image. These module reside in contrib directory located either where SBCL_HOME environment variable pointing (if it is set) or where the image resides (for example, in /usr/local/lib/sbcl/).
When an image is saved in another location and if SBCL_HOME is not set, SBCL won't be able to find contrib, hence the errors that I saw.
Setting SBCL_HOME to point to contrib location (or copying contrib to image location or new image to contrib location) solves the problem.
Finally, about roswell: roswell parameter -m searches for images in a specific location. For SBCL (sbcl-bin) it would be something like ~/.roswell/impls/x86-64/linux/sbcl-bin/1.3.7/dump/. Secondly, the image name for SBCL must have the form <name>.core. And to start it, use: ros -m <name> -L sbcl-bin run. (Quick edit: better use ros dump for saving images using roswell as it was pointed out to me)
If you want to create executables, you could try the following:
(sb-ext:save-lisp-and-die
"core"
:compression t
;; this is the main function:
:toplevel (lambda ()
(print "hell world")
0)
:executable t)
With this you should be able to call QUICKLOAD as you wish. Maybe you want to checkout my extension to CL-PROJECT for creating executables: https://github.com/ritschmaster/cl-project

Riak: "Failed to read test value: {error,{insufficient_vnodes,0,need,1}}" after running "riak-admin test"

Getting this error soon after running riak start despite a config file that should be working correctly.
Turns out that this is a limit of Riak's error messaging: you will get the above message if you try to do a riak-admin test on your setup before the configuration has finished loading.
I encountered the same problem while starting new Riak clusters over and over again during automated testing. My solution was, in my test fixture setup, to execute code that keeps trying to put an object into a Riak bucket and then eventually succeeding.
Granted, my solution here is an Erlang snippet but it generally solves this problem in lieu of any Riak-supplied admin/wait functions. But since I've used a number of different Riak versions this technique here seems to work for all of them.
wait_for_riak() ->
{ok, C} = riak:local_client(),
io:format("Waiting for Raik..."),
wait_for_riak(C),
io:format("and had a successful put.~n").
wait_for_riak(C) ->
Strawman = riak_object:new(<<"test">>, <<"strawman">>, []),
case C:put(Strawman, 1) of
ok ->
ok;
_Error ->
receive after 1000 -> ok end,
wait_for_riak(C)
end.
adding sleep 4 like so:
brew install riak
riak start
sleep 4
riak-admin test
should help

escript: /riak-1.1.2/rel/riak/erts-5.8.5/bin/nodetool

I keep getting this error when trying to run riak commands.
The nodetool file does not exist in that directory. When I copy the nodetool file from 5.8.4 I start getting this error:
{"init terminating in do_boot",{'cannot get bootfile','start_clean.boot'}}
EDIT
I followed this great advice here: http://onerlang.blogspot.co.uk/2009/10/fighting-with-riak.html. Now when I run riak start I get:
Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
Error reading /abc/def/otp_src_R14B03/riak-1.1.2/dev/dev1/etc/app.config
{"init terminating in do_boot",{'cannot get bootfile','start_clean.boot'}}
EDIT 2
I seem to be getting this problem http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-November/006436.html.
Whenever I build from source (required for multiple nodes on the same machine) riak tries to user erts-5.8.5 whereas riak requires(?) erts-5.8.4.
Is it possible for me to tell to not use 5.8.5 and use 5.8.4 maybe?

Resources