I cannot retrieve all entries from openLDAP database - openldap

I set up openLDAP on my Ubuntu server and filled the database via python-ldap with 10.000 persons.
Now, when trying to search for all of them, at first I only got 500 entries.
$ ldapsearch -x -h 192.168.1.222 -b dc=ldap-test,dc=xxx,dc=xx
I googled for a solution, and I read about a server side limit.
Then I changed following value from 500 to:
olcSizeLimit: unlimited
I also tried 15.000, but with the same effect.
Now, with the same search command I get:
# numResponses: 992
# numEntries: 991
I cannot find any 992 or 991 number restriction anywhere. I also grepped for sizelimit - only result is the above setting.
I also read about client side restrictions, but I tried the same search command against the old, deprecated test server, and there I get all 10.000 results.
I'd appreciate any help.

The problem was the generation of the data.
I used the Python package Faker which faked the last names, which I used as cn.
As Faker only provides a limited number of last names, the generation of names errored silently.
I fixed the problem by using the complete name for cn.

Related

Fuseki configuration

As outlined in http://wiki.bitplan.com/index.php/Apache_Jena#Script_to_start_Fuseki_server
I have been avoiding the complexity of Fuseki configuration files and started the server from a script for my usecases in which I only need one dataset/endpoint. For multiple datasets/endpoints i simply used multiple servers.
Descriptions like:
https://jena.apache.org/documentation/fuseki2/fuseki-config-endpoint.html
and questions like:
fuseki Multiple services found exception
have been intimidating me since there seem to be so many options and no straight forward way to simply say: please use these dataset from the following directories as the command line version can do for one dataset.
Just look at:
https://users.jena.apache.narkive.com/MNZHLT25/multiple-datasets-on-fuseki
where the user expectation:
java -jar fuseki-0.1.0-server.jar --update --loc=data /dataset
--loc=data2 /dataset2
can be seen that is unfortunately not fullfilled. Instead:
http://jena.apache.org/documentation/serving_data/index.html#fuseki-configuration-file
was the answer at the time which is now an outdated link.
So obviously there are people out there getting fuseki to work with multiple datasets. But how do they do it ?
I know how to load a TDB store from a triple file via command line. I know that i could use the WebGUI to setup datasets and load data but that won't work for my multi million (and partly multi-billion) triple files.
What is a (hopefully simple) example for loading multiple triple files and making the result available with the same fuseki server as different datasets and have the SPARQL endpoints running (partly read-only?)
https://jena.apache.org/documentation/fuseki2/fuseki-layout.html gives a hint on the layout of files.
Using the script to start fuseki i inspected the run directory which in my case was to be found at:
apache-jena-fuseki-3.16.0/run
There are two subdirectories which are initially empty and stay so if you run things from the commandline:
configuration
database
By adding a dataset via the webgui http://localhost:3030
a directory with the name of the dataset in this case:
databases/cr
and a configuration file
configuration/cr.ttl
is created.
For smaller datasets data can now be added via the webgui. For bigger datasets a copy or symlink of the original loaded tdb data is necessary in the databases directory.
example symlinks:
zeus:databases wf$ls -l
total 48
drwxr-xr-x 4 wf admin 136 Sep 14 07:43 cr
lrwxr-xr-x 1 wf admin 27 Sep 15 11:53 dblp -> /Volumes/Torterra/dblp/data
lrwxr-xr-x 1 wf admin 26 Sep 14 08:10 gnd -> /Volumes/Torterra/gnd/data
lrwxr-xr-x 1 wf admin 42 Sep 14 07:55 wikidata -> /Volumes/Torterra/wikidata2020-08-15/data/
By restarting the server without a --loc
nohup java -jar fuseki-server.jar&
the configurations are automatically picked up.
The good news is that you do not have to bother with the details of the config files this way as long as you do not have any special needs.

How to correctly setup keys with Hadley's secure package

I would like to use Hadley Wickam's secure package from GitHub.
The example usage isn't explicit about how to create keys and where to store them and I'm messing something up (possibly more than one thing).
I installed the package
# install.packages("devtools")
devtools::install_github("s-u/PKI") # needed for bug fixes not currently on CRAN
devtools::install_github("hadley/secure")
set up a vault folder:
dir.create("vault")
Then the next step is to add a user / key:
secure::add_user("hackr", local_key())
and of course if I literally run that last line as-is it says
Error: No key matches id_rsa
Because I don't have a key. So, I used PuttyGen to create a public/private RSA key pair.
I saved them to my desktop and tried putting the full path in the command above:
secure::add_user("hackr", local_key("C:/Users/hackr/Desktop/r_public_key"))
But that didn't work:
Error: No key matches
Then I tried saving the public key in the vault and doing:
secure::add_user("hackr", local_key("r_public_key"))
but I got the same error. Next I tried putting the public key in the working directory (one directory higher than the vault) but got the same error.
Finally, I tried copying the keys to C:\Users\hackr\.ssh but that also led to the same error.
I suspect I need to save the key somewhere special (in Windows I'm not sure where that would be?) and/or I am using the wrong type of key since PuttyGen is for SSH (?).
It looks like local_key is assuming your key is stored in ~/.ssh (which is a reasonable assumption). By default it assumes that the file is named id_rsa.pub so if you've renamed it then you'll need to pass the name into local_key.
I haven't used this package but always remember those wise words "Hack-R view the source"
The issue is that Hadley's local_key() function is assuming your key is stored in ~/.ssh which is where the commands below will place it by default, and name it id_rsa.pub. If you have a different setup, you can change the defaults, or you could simply follow the steps below.
Step 1
Go to https://help.github.com/articles/generating-an-ssh-key/
Read up. It's useful stuff to know.
It will tell you to do this in the console:
ssh-keygen -t rsa -b 4096 -C "your_email#example.com"
Set a passphrase. Remember it.
Then enter this:
ssh-add ~/.ssh/id_rsa
Enter your passphrase.
Step 2
Your secure::add_user("hackr", local_key()) should work now.

Lucene index gets broken segments after every restart of liferay-tomcat

I have a corrupted Lucene index. If I run "CheckIndex -fix" the problem is resolved, but as soon as I restart tomcat it becomes corrupted again.
The index directory is shared between two application servers running Liferay-Tomcat. I am fixing the index on 1 server and restarting that whilst the other is running. This is a production environment so I cannot bring them both down.
Any suggestions please?
Before fix, CheckIndex says:
Opening index # /usr/local/tomcat/liferay/lucene/0
Segments file=segments_5yk numSegments=1 version=FORMAT_SINGLE_NORM_FILE [Lucene 2.2]
1 of 1: name=_2vg docCount=31
compound=false
hasProx=true
numFiles=8
size (MB)=0.016
no deletions
test: open reader.........FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:335)
at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)
at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:119)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:491)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
WARNING: 1 broken segments (containing 31 documents) detected
WARNING: would write new segments file, and 31 documents would be lost, if -fix were specified
If you access your search index with more than one application server, I would suggest integrating a Solr Server. So you don't have the problem that 2 app servers are trying to write on the same file. This could be error prone as you already found out.
To get Solr up and running you have to follow those steps:
Install a Solr Server on any machine you like. A machine running only Solr would be quite preferable.
Install the Solr search portlet in Liferay
Adjust the config files according to the setup document of Sol Search portlet.
Here are some additional links:
http://www.liferay.com/de/marketplace/-/mp/application/15193648
http://www.liferay.com/de/community/wiki/-/wiki/Main/Pluggable+Enterprise+Search+with+Solr

Rmpi, OpenCPU, and Apparmor: DENIED request for "/"

I have an R package that sends out a job to the OpenMPI cluster I have running by means of the Rmpi package. All works as expected within an R session run from the console. However, when I try to execute the relevant function with from my OpenCPU server like this (details changed to protect the innocent):
curl -XPOST http://99.999.999.99/ocpu/library/MyPackage/R/my_cluster_function
I get this error:
R call failed: process died.
(Other, non-cluster calling functions within the package work as expected via OpenCPU). I noticed in /var/log/kern.log a variety of requests being DENIED by apparmor, and I have been able to resolve most of them by adding entries into /etc/apparmor.d/opencpu.d/custom to allow OpenMPI to access the files it needs. However, I cannot resolve these two issues (again, IP address changed) related to "open" requests for location "/":
Oct 26 03:49:58 99.999.999.99 kernel: [142952.551234] type=1400 audit(1414295398.849:957): apparmor="DENIED" operation="open" profile="opencpu-main" name="/" pid=22486 comm="orted" requested_mask="r" denied_mask="r" fsuid=33 ouid=0
Oct 26 03:49:58 99.999.999.99 kernel: [142952.556422] type=1400 audit(1414295398.857:958): apparmor="DENIED" operation="open" profile="opencpu-main" name="/" pid=22485 comm="apache2" requested_mask="r" denied_mask="r" fsuid=33 ouid=0
Adding this to my apparmor rules did not help:
/* r,
Two questions:
Why is opencpu trying to read from my root level directory (or does this mean something else)?
More urgently, how can I resolve this apparmor issue?
Thanks.
You might need to add both apparmor rules
/ r,
/* r,
The first rule allows directory listing of / and the second rule allows read access to any file under /.
I don't understand why Rmpi wants to read / or why were you getting process died error instead of access denied. Are you sure the problem is completely resolved?

How find which process generate max read/write disk operation

Cloud server begin generate big disk read/write operation. I want find some script who generate top file with process(process name | TOTAL | READ | WRITE )
You can use iotop to see the reads and writes of each process using a top like interface.
Another way is to look at the /proc/[PID]/io files.
Example:
$ cat /proc/1944/io
read_bytes: 17961091072
write_bytes: 8192000
cancelled_write_bytes: 32768
There's a monitor much like top available: Iotop.
Since you're using Debian Linux, you can simply retrieve it via APT:
apt-get install iotop
Done.

Resources