Cosmos DB Emulator won't start on Windows 10 - azure-cosmosdb

I’m recieving this error when I try to start Cosmos Emulator 2.7.2.0:
Tried it on a fresh win10 machine, and it's working here. Any ideas on what might course this?
I tried all the suggestions in this article.
Looking at the etl file didn't give my anything either.
It's the first time I try to install it on my machine.
-- UPDATE --
I opened the dmp file, and found this:
Loading Dump File [C:\Users\xxx\AppData\Local\CrashDumps\Microsoft.Azure.Cosmos.StartupEntryPoint.exe(1).12596.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available
Symbol search path is: srv*
Executable search path is:
Windows 10 Version 18363 MP (12 procs) Free x64
Product: WinNt, suite: SingleUserTS
18362.1.amd64fre.19h1_release.190318-1202
Machine Name:
Debug session time: Mon Jan 20 09:35:26.000 2020 (UTC + 1:00)
System Uptime: not available
Process Uptime: 0 days 0:00:02.000
................................................................
..................................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(3134.41b0): Unknown exception - code c000000d (first/second chance not available)
For analysis of this file, run !analyze -v
ntdll!NtWaitForMultipleObjects+0x14:
00007ff9`562fcc14 c3 ret
0:000> .ecxr
rax=0000000000000000 rbx=0000000000000000 rcx=0000003629dfd270
rdx=00000000e6bc7d4e rsi=0000000000000000 rdi=0000000000000000
rip=000000006748b0ec rsp=0000003629dfd190 rbp=0000000000000000
r8=000001ba53134730 r9=0000000000000000 r10=0000000000000026
r11=000001ba6d7662e0 r12=0000000000000004 r13=000001ba6d7974d0
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
msvcr80!_invalid_parameter+0x6c:
00000000`6748b0ec 488d4c2440 lea rcx,[rsp+40h]
Seems like something is wrong with my version of msvcr80

This may be caused by corrupted performance counters on your machine.
To fix your performance counters try the following.
Open cmd as administrator
Run "lodctr /R" (must use capital R)
If this doesn't work, see this link here that shows other options to reset your counters. Older article but works the same on Windows 10.
https://answers.microsoft.com/en-us/insider/forum/insider_wintp-insider_perf/possible-performance-counter-problem-on-win10/3a5c22cb-1425-4d26-99e7-4ec46940b9a1
btw, one more option here, is to run the emulator in a Docker container. The docs on that are in that article you referenced in your question.
Hope this is helpful.

For me this confusing error message simply meant that I needed to use another port! Or fix problems with the default 8081 one (read update 2 way below).
Here's the text of the error so that it's searchable:
Error: multiple attempts to restart one of the Azure Cosmos Emulator
processes were detected. The emulator will shutdown. If the problem
persist please try uninstalling the Azure Cosmos Emulator, remove the
CosmosDBEmulator directory from your %%LOCALAPPDATA%% and reinstall.
You can also reach out to the Azure Cosmos team using the 'Feedback'
link in the Data Explorer browser window.
Details:
In my case this happened by itself after Windows 10 updated 2004. I swear something breaks in Cosmos DB emulator every feature update!
For me there were no crash dumps and rebuilding performance counters didn't help. In the Event Viewer for System I've noticed this error A LOT:
Unable to bind to the underlying transport for 127.0.0.1:8081. The IP Listen-Only list may contain a reference to an interface which may not exist on this machine. The data field contains the error number.
No solutions for this error helped, so I first used netstat -ao to see if any program is already listening at 8081. Nope.
Then I used the "Port listener" utility from the www, and it turned out it can't listen to 8081 either, not even under admin rights!
Since I couldn't fix this 8081 problem at all & turning off Windows Firewall didn't help, I simply started the Cosmos DB emulator at a different port and it worked!
C:\Program Files\Azure Cosmos DB Emulator>Microsoft.Azure.Cosmos.Emulator.exe /Port=18001
If you've configured cosmos db to be started automatically on system startup, make sure to update that script as well.
In my case this cmdline got me into the Data explorer, but it wasn't actually working. The "Explorer" part of it was "fetching offers" infinitely. I had to again uninstall Cosmos DB emulator, clean this folder: %LOCALAPPDATA%\CosmosDBEmulator, reboot, install again without starting at 8081, and only start at the new port.
Cosmos DB devs, you're awesome people, but this is too much hassle. And if you can't use the port just say so, you don't have to be cryptic!
UPDATE:
This worked, but I forgot to update the start up script to use the new port, and noticed that after one more reboot the cosmos db started on 8081 without issues. What??! But it wasn't just any old reboot - I already did several of them before and they didn't help. This reboot may have been special in that it came after Windows found one more update (kb4576478), probably realizing it's now needed since I'm at version 2004, installed it, and THAT reboot (or the kb4576478 update specifically) fixed the 8081 problem, whatever it was. Yikes!
UPDATE 2:
Had another port-related problem with another tool. Ugh. Found this solution (see "General workaround") https://superuser.com/a/1568476/251491 This page's accepted answer also mentions the magic of multiple reboots.

Finally got it working!
By using WinDbg and opening the dmp files %LOCALAPPDATA%\CrashDumps, I found out that a dll called perf-MSSQL$SQLEXPRESS-sqlctr10.1.2531.0.dll was the root cause.
By deleting this in the %WINDIR%\System32 the error was "fixed" and I can now run the emulator.

Related

Azure Data Factory - Self-Hosted Integration Runtime - ODBC driver mystery

We are using a Self-Hosted Integration Runtime for Azure Data Factory.
On that machine there was installed an Exasol ODBC driver of version 6. We wanted to upgrade the driver, deleted an old one and installed a new driver of version 7.
Weird thing is that now in Exasol logs we can see that Data Factory is sometimes connecting via driver version 7, and sometimes via driver version 6.
I made an experiment and deleted Exasol ODBC driver from the machine completely. After that Data Factory still was able to connect to Exasol using the driver I just deleted.
Looks like drivers' DLLs are cached somewhere. What can it be?
Update 1
I captured following actions in Process Monitor when Data Fatory connected to Exasol with ODBC driver of version 6:
Where these C:\Config.Msi\3739be5*.rbfASolution-6.1\ODBC\ DLLs may come from? There is no C:\Config.Msi\ directory on the machine.
Update 2
I noticed that when I test connection via Microsoft Integration Runtime Configuration Manager on the machine or in Data Factory Linked Service, then connection is always performed with ODBC driver of version 7.
But when I test connection via Data Factory Dataset, then in some cases connection is done with ODBC driver of version 6.
You could check the registry but clean at your own risk. An alternative might be the SysIternals tools, Process Monitor or Process Explorer which might help you get to the bottom of this. Install them on the SHIR VM if you are allowed to. Process Explorer in particular is a bit like SQL Profiler (if you've ever used that) so will be able to tell you which registry keys external processes are using. It will give you a lot a lot of information so you will have to make judicious use of timestamp and filtering. The proposed steps:
Start a trace using Process Monitor
Start a pipeline using the Exasol driver
Wait til it completes (or at least you know it has started)
Stop the Process Monitor trace Spend time going through the millions
of records it has captured, trying to filter down, or search for your
process
An alternative would be to build a clean SHIR and install only the new driver. Then swap it in for the old one. You may have to get the new SHIR added to the firewall if this is an issue for you.
Honestly I would propose both of these approached in parallel for a production problem. Procmon / Process Explorer can be quite labour and time expensive but should help you get to the bottom of the issue. Building a cleaner SHIR is probably a safer option in the long-term, but requires new infrastructure.
It may sound silly, but rebooting the server where SHIR is working solved the problem.
We noticed, that this server was running for more than 30 days, and decided to reboot it. Maybe restarting Integration Runtime service itself would also help, but we didn't do it.
Thanks to everyone for you help.

Derby won't let me re-create a data base

I'm developing a JSF/JPA application under Glassfish which uses Derby (JavaDB) as it's default data base. It turns out the "DROP AND CREATE" policy of the Persistence Unit doesn't work reliably, so i have taken to deleting the data base and then re-creating it it when I change the schema.
Or at least I am trying to. If I delete the data base, it won't let me create a "new" data base with the same time as the deleted one. Nor will it let me open the old one.
My work around for now is just create a data base with a new name and use that (have to edit the glassfish resources xml file each time) but I would like to know what is going on. Anybody else have this problem and/or know how to fix it?
I am not 100% sure I understand what the pathology of this problem is, but I have a work-around. It would be good someone could enter a bug-report to the Netbeans community or give me a link to do it.
The problem is created on a MacOS version of NetBeans 7.2, with the GlassFish 3.1 server under its management. When you start the server to run your app, it automatically starts Derby (the Java DB) with it. However when you stop the GlassFish server or even exit Netbeans, the Derby installation stays running.
When you start NetBeans & GlassFish again, you will note that when it attempts to start the Derby server it will complain that port 1527 is already in use. Normally this is not a problem, since the application will continue to run and communicate to the previously-started Derby process via the port it is holding open. However I suspect that the communications path used by the Netbeans menu system to delete and create a data base does NOT use this data path, and consequently attempts to do a delete/create operation on a data structure that is held open by a process with which it is not communicating. Hence the lock up and failure.
The work around is to kill the Derby process in the background, then do the Delete/Create operation, and it works fine. On MacOS or Linux, open a command window and do a
ps axe | grep -i derby
and you will find a java JVM running Derby. Just copy the process ID and do
kill <pid>
(the -9 seems not to be necessary) and do the ps command again and you should see the process is gone. Derby will be started by Netbeans the next time it is needed.

Glassfish admin console slow loading

Today I stopped/started my GlassfishV3 instance and now I cannot access the addmin console located at http://servername:4848/. The screen says: "The admin console is loading..." This is going on forever now.
I have tried as follows:
I have tried adding the following entry to my domain.xml located at /glassfishv3/glassfish/domains/domain1/config as suggested in another Stack Overflow Q&A but after restarting the server still no luck.
<java-options>-Dcom.sun.enterprise.tools.admingui.NO_NETWORK=true</java-options>
I have also installed glassfishv3 on my local machine and cannot recreate the problem.I can go to http://localhost:4848 without any problem.
I have also looked at the server.log and jvm.log files located under the /glassfishv3/glassfish/domains/domain1/logs and nothing there that shed some light.
Any help would be very much appreciated
I had similar symptoms, and I tried some of what Dario had suggested as well, but it didn't work. It could be that I had a unique configuration for my dev env: I'm running Glassfish 3.1 on a VirtualBox Ubuntu 11.04 64-bit guest on a Windows 7 64-bit host. Quite by accident, I discovered an additional symptom: if I turned off the network on the Ubuntu guest, the console would load successfully on a localhost browser instance. That is, on the Ubuntu guest with the network off, I could successfully navigate to http://localhost:4848 and show the Glassfish admin console as expected. However, if the Ubuntu guest's network was on, I had the exact behavior suggested by the original poster: http://localhost:4848 would just sit forever on the inial loading page.
To make a long story short, I found that adding the following argument to the JVM options for server-config fixed the problem:
-Djava.net.preferIPv4Stack=true
When I made that change and restarted the Glassfish server, everything worked.
(Note that I also had in place some of the other settings recommended above, i.e., NO_NETWORK=true, and I'd adjusted the JVM memory footprint and set it to -server instead of -client. It could be that these settings are required as well, though they weren't sufficient on their own in my case.)
I was having this exact same problem. I could deploy in run mode, but it would hang forever in Debug mode. IntelliJ was hanging on the breakpoints. I muted the breakpoints, and glassfish3 worked good as new. I didn't have to change any domain.xml settings. Check your breakpoints!
I found a solution to my problem. Setting the java-option to NO_NETWORK to true did not work so I upgraded from 3.0.1 to 3.1 and it got fixed. Not immediately though, I had to stop/start the Glassfish server a couple of times before I got into the admin console without any really long delays.
Solution
The solution was to upgrade from the command line using the pkg utility.
You can find the steps in this link:
http://download.oracle.com/docs/cd/E18930_01/html/821-2437/gkthu.html#gktjf
Or do as follows:
Go to as-install-parent/bin
./pkg image-update
as-install-parent/glassfish/bin/asadmin start-domain --upgrade domain-name
as-install-parent/glassfish/bin/asadmin start-domain domain-name
UPDATE
I had peformance issues again and I found this other solution in Joshi's tech blog:
http://joshitech.blogspot.com/2009/09/glassfish-application-server.html
Basically add the following jvm options in the domain.xml. It should increase Glassfish boot up and deployment performance:
<jvm-options>-server</jvm-options>
<jvm-options>-Xms3000m</jvm-options>
<jvm-options>-Xmx3000m</jvm-options>
<jvm-options>-XX:MaxPermSize=192m</jvm-options>
<jvm-options>-XX:NewRatio=2</jvm-options>
<jvm-options>-XX:+AggressiveHeap</jvm-options>
<jvm-options>-XX:+AggressiveOpts</jvm-options>
<jvm-options>-XX:+UseParallelGC</jvm-options>
<jvm-options>-XX:+UseParallelOldGC</jvm-options>
<jvm-options>-XX:ParallelGCThreads=5</jvm-options>
I don't know if you are referencing this answer, but there is a second step described (disabling update module).
Two more ideas:
Check if the NO_NETWORK=true option really works (there should be no ads in GF admin console)
Watch the server.log (glassfish-install-dir/glassfis/domains/domain1/logs) during startup and look for the last log entry before the delay occurs. This could be a hint for the source of the delay.
Beware of blindly following Dario's example unless you've lots more RAM than most do.
-Xms3000m gives 3gb to Glassfish. Do YOU have that much spare RAM?
I tried this on my 4gb Mac with 1gb for Glassfish. Made no discernable difference at all...performance still sux.

"Database is locked" error in SQLite over a Mac network

I have created a simple database using SQLite (actually PySQLite). It works fine when I'm querying or writing to the database from the local machine (ie program and database file on the windows machine drive). However when I copy the database file to my network drive (a time capsule), then Windows machines, although they can see the files and have full read/write access to the drive, give me a "SQL Error: database is locked" even when performing a simple select!
Queries work fine over the network from Macs.
There is no fancy multi-access going on - only one machine has the database open. Seems like some weird Mac networking issue. Happens in either the Python program, or in the SQLite3 command line. I am using SQLite 3.6.14.2.
Anybody seen this problem? Any way of fixing it? Don't really want to get heavy with MYSQL because this is a simple single-user program, but i'd like to use it from multiple machines.
Thanks.
I don't know if it can be done on MAC, on Debian I have to mount the samba directory with the nobrl option.
From mount.cifs(8):
nobrl
Do not send byte range lock requests to the server. This is
necessary for certain applications that break with cifs
style mandatory byte range locks (and most cifs servers do
not yet support requesting advisory byte range locks).
Read the sqlite FAQ: http://www.sqlite.org/faq.html#q5
"People who have a lot of experience
with Windows tell me that file locking
of network files is very buggy and is
not dependable. If what they say is
true, sharing an SQLite database
between two or more Windows machines
might cause unexpected problems."
So it doesn't work on Windows, it doesn't tell about MAC.
Possibly it fails to lock the file over the network, I think you use SMB protocol so the bugginess comes with the package. If you would like to use SQLite over the network see SQLite Network for alternatives.
I've had a similar problem and I solved it by installing a newer sqlite version. Since Python 2.6 the problem has disappeared too because it uses a newer sqlite dll.
Thank you Carlos. Cherrytree depends on SQLite, and for some reason it recently stopped working with my samba-mounted SQLite database file, complaining about a locked database. Adding "nobrl" to my ubuntu fstab entry solved the problem.
//192.168.3.122/Files /mnt/Files cifs username=public,password=asdf,rw,noperm,nobrl 0 0

How do I debug a memory dump of a spiking ASP.NET process?

Sorry, I couldn't figure out a good way to phrase my real question.
I run a high-traffic ASP.NET site on a 64-bit machine. I have IIS running in 32-bit mode, however, due to some legacy components of the app. I am running this particular web app inside an application pool that has the web garden option on (running 6 processes inside an 8 core machine).
Once or twice a week one of the processes will skyrocket into 100% CPU utilization, causing a giant slowdown for the site, so my plan was to wait until that happens, memory dump the offending process, then poke around WinDbg to zero in on the thread that's spiking to see where the code is spinning its wheels.
I've debugged using WinDbg before to figure out what was causing a deadlock on the site, but that was several months ago and I can't remember how I got it working. (As a side note, this is a lesson to document everything you do.)
I'm running WinDbg on the Windows 2003 server that's running the site, so as to prevent any DLL version problems. Here have been my steps so far, please let me know where I'm going wrong to get the error message that I'm getting.
I first memory dump the spiking process using UserDump, with the following command, where 3389 is the ID of the process:
userdump -k 3389
I load the dump into the x86 edition of WinDbg.
Since I'm running 32-bit on a 64-bit machine, I first load the memory dump and then:
.load wow64exts
.effmach x86
I make sure that my symbol path includes the directory that contains my apps PDB files:
.sympath+ c:\inetpub\myapp\bin
Running just `.load SOS' fails with an error of "The system cannot find the file specified", so I go the fully qualified route of the following, which works:
.load c:\windows\microsoft.net\framework\v2.0.50727\sos
From here, I'm lost. I try any of the SOS commands, like !threads, only to get this error:
Failed to load data access DLL, 0x80004005
That error is also accompanied by the numbered list of items that I should be verifying.
I have verified that I am running the most current version of the debugger, mscordacwks.dll is in fact in the same directory as the mscorwks.dll file, and I'm debugging on the same architecture as the dump file.
I've also run the magical ".cordll -ve -u -l" command, but that doesn't solve anything. I'm always greeted with "CLR DLL status: No load attempts" when I execute that. Then I try ".reload", which yields a handful of warnings like "WARNING: wldap32 overlaps dnsapi". I wish it said something like "CLRDLL: Loaded DLL C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\mscordacwks.dll". But it doesn't.
Try executing !sw before running the sos commands. See this blog post.
(MSDN) How To: Use CLR Profiler
(MSDN Magazine) Investigating Memory Issues
In my experience, spiking app pool can be due to it being recycled. Have you tried IIS Crash / Hang agent and IIS Dump ?
http://www.microsoft.com/downloads/details.aspx?FamilyID=01c4f89d-cc68-42ba-98d2-0c580437efcf&DisplayLang=en
Also included with them is a dumpfile analyzer which will tell you about memory leaks and even suggest areas of your code that need fixing (complete with links to the applicable MSKB articles!)
Dude - not sure if this helps, but maybe try this.
Copy c:\windows\microsoft.net\framework\v2.0.50727\sos.dll to the same directory where windbg is installed to (eg. c:\program files\Debugging Tools for Windows\ ). Why? make it easy to load the sos file
run windbg
load the memory dump file. for me, i use ctrl-D or File -> open crash dump
.load sos <-- take note of the fullstop BEFORE the load command
.symfix c:\temp\debug_symbols
.reload
Ok.. take note of the commandline. this tells me the current THREAD that the dump was in. That might be useless for a HIGH CPU scenario .. because we could be in any thread.
so from here i look at the threads that were running and check out the busiest thread
8 !threadpool <-- this is so i can see the cpu utilization to check we are in a crap (busy) state... eg 100% cpu or what not.
9 !runaway <-- list the threads that have ben around the longest...
eg.
0:027 !runaway
User Mode Time
Thread Time
18:704 0 days 0:00:17.843 <-- Thread #18
19:9f4 0 days 0:00:13.328 <-- Thread #19
16:1948 0 days 0:00:10.718
26:a7c 0 days 0:00:01.375
24:114 0 days 0:00:01.093
27:d54 0 days 0:00:00.390
28:1b70 0 days 0:00:00.328
0:b7c 0 days 0:00:00.171
25:3f8 0 days 0:00:00.000
23:1968 0 days 0:00:00.000
thread 18 and 19 have been hanging around awhile.. hmm.... are they stuck in a loop?
~18s <-- goto thread 18.
!clrstack <-- clr call stack .. which is just like debugging in windows.
.. and from here u can dump objects and stuff by giving the address references and stuff.
check out !help to list some commands to try and use .. i think !help.sos also works?
HTH .. if u still get stuck, ask away at what worked and what didn't.
I just had to deal with a similar problem. In my case, it turned out that WinDbg wasn't able to find the correct version of mscorwks.dll. In addition to the Framework version, there is also a revision of the DLL which can be different between the same framework version.
In theory, the Microsoft symbol servers should be able to supply the necessary DLL, but it wasn't happening for me. To solve it, I used !sym noisy to get additional information on symbol loading. When I did !dumpstack, I got the error message:
SYMSRV: http://msdl.microsoft.com/download/symbols/mscorwks.dll/492B82C1590000/mscorwks.dll not found
To fix this, I created the appropriate folders in my local symbol cache, and copied mscorwks.dll from the machine the dump came from. After a .reload, WinDbg found the necessary DLL in the local symbol cache, and continued on happily.
Alternatively, you can find the exact version of mscorwks being used with lm v m mscorwks. You can then find the update that contains the version you need from this list. You will need to extract the necessary DLLs from the particular update to the right location.

Resources