Plain text encryption that maintains sort order - encryption

Is there a generally available encryption scheme that maintains the sort order of plain text? Use case: Do not want to store the names in DB; the names are stored in encrypted form; however display time we need pagination based on names.
Some research papers are available, but that would ential translating those research papers into code.
Alternative approaches in consideration: generate a psuedo key, which maintains the sort order. Have not entirely thought through it. Before spending time on figuring out this scheme, wanted to know if I can perhaps use the experience of folks in here. Thanks in advance.

Related

Hard coded encryption without need for a key

sorry if there is an answer to this somewhere, but I could not find a clear explanation.
If I need to encrypt a sensitive field before storing it in my db, can the following approach possibly work?
I write a hard-coded encryption-cipher algorithm (in my php file) without any random behavior, like shuffling a deck of cards in a certain sequence, but only I know the exact sequence. I then also write the exact opposite algorithm to de-crypt the ciphered field, thus (hopefully) eliminating the need for a decryption key.
My understanding is that only the resulting string will be exposed to http(s) traffic and even if the result is posted back it is over https.
Can this approach work?

Using LMDB to implement a sqlite-alike relational database, relevant resources?

For educational reasons I wish to build a functional, full, relational database. I'm aware LMDB was used to be the storage backend of sqlite, but I don't know C. I'm on .NET and I'm not interested in just duplicate a "traditional" RDBMS (so, for example, I not worry about implement a sql parser but my own custom scripting language that I'm building), but expose the full relational model.
Consider this question similar to "How I implement a programming language on top of LLVM" before worry about why I'm not using sqlite or similar.
From the material I read, LMDB look great, specially because It provide transactions and reliability, plus the low-level plumbing. How that translate to changes that could touch several rows at several tables is another question..
Exist material that explain how is implemented a relational layer on top of something like LMDB? Is using LMDB (or their competitors) optimal enough or exist another better way to get results?
Is possible to use LMDB to store other structures like hashtables, arrays and (the one I'm more interested for a columnar database) bitmap arrays?, ie, similar to redis?
P.D: Exist a forum or another place to talk more about this subject?
I had this idea too. You should realize that this is tons of work and most likely no one will care. I haven't built full-blown relational db as this is crazy to do for one person. You could check it out here
Anyway I've used leveldb (and later rocksdb) and so you have keys-values sorted by key, ability to get value by key, iterate keys, have atomic writes of many values (WriteBatch) and consistent view of data at given time - snapshots. These features are enough to build correct thread-safe reading of table rows (using snapshots), correct writing of data and related indexes - all or nothing (using writebatch) and even transactions.
Each column has it's on disk index - keys sorted by values - so you could efficiently do various operations on it and keys with values themselves so you could efficiently read values with given id.
This setup is efficient for writing and reading using available operations on tables with little data (say less than a million rows). However, if table grows iterating over many keys can become not so fast. To solve this and to add a group-by statement I've decided to add memory indexes, but that's another story. So all-in-all it might be fun idea but in reality a lot of work and often frustrating results - why would you want to do that?

Scalable solution for extracting semantic data from websites?

Let's say I have a (rather large) number of websites on my disk, scraped or fetched from e.g. Common Crawl. I have no prior knowledge about the HTML structure, assume that each page is structured differently (which is usually the case). From each of them I want to extract some kinds of semantic information (known in advance) like articles/posts with metadata (date, author, tags, comments, etc.).
One straightforward way to go would be to write a simple parser for each of the websites, given good quality parsing libraries out there it should be easy enough. But this approach obviously does not scale. Is there a more clever solution to this problem? How would I proceed and what is actually the difficulty of this task?
You can include paid services, if anything like this exists. If you're aware of any better way of getting this kind of data (on a specific topic; instead of manual scraping / Common Crawl) please also include it.

Do hashes resemble a format language when decrypting

I am fairly new to cryptography, but I have come across this :
ea706916-4d0a-460d-9778-4d1a7195b229
which looks like a familiar format. It's original value is tjotol.
Would anyone know what format the above code is in? I know that if it has hashes it can be a giveaway. Base64? HTML? Something else?
It does not look like Base64, it may be MD5 with dashes in-between. However, remember that a hash is a one-way function (ie. it's not reversible), while a cryptographic function is two-way (you can encrypt and decrypt it). Hence, it's not correct to speak about "hash decrypting". I don't know what you mean by "format language", would you care to elaborate on that?
A quick google search took me to this article that seems to be well written an covering many issues regarding your concern related to hashes being a "giveaway".
Note: Base64 is hardly an encryption algorithm, it is indeed just an encoding/representation format.
This have the format of a Globally unique identifier (GUID). Take a look here: Globally unique identifier

Is there a way to use arbitrary type of value as key in environment or named list in R?

I've been looking for a proper implementation of hash map in R, with functionalities similar to the map type in Python.
After some googling and searching the R documentations, I found that environment and named list are the ONLY options I can use (is that really so?).
But the problem with the two is that they can only take charaters as key for the hashing, not even a number, let alone other type of things.
So is there a way to use arbitrary things as key? or at least more than just characters.
Or is there a better implemtation of hash map that I didn't find with better functionalities ?
Thanks in advance.
Edit:
My current problem: I need a map to store the distance relationship between data points. That is, the key of the map is a tuple (p1, p2) and the value is a number.
The reason I asked a generic question instead of a concrete one is that I'm learning R recently and I want to know how to manipulate some of the most fundamental data structures, not only what my problem refers to. So I may need to use other things as key in the future, and I want to avoid asking similar questions with only minor difference every time I run into them.
Edit 2:
I got a lot of very good advices on this topic. It seems I'm still thinking quite in the Pythonic way, rather than the should-be R way. I should really get more R-ly ! I think my purpose can easily be satisfied by a matrix in R. Thanks All !
The reason people keep asking you for a specific example is that most problems for which hash tables are the appropriate technique in Python have a good solution in R that does not involve hash tables.
That said, there are certainly times when a real hash table is useful in R, and I recommend you check out the hash package for R. It uses environments as its base but lets you do a lot of R-like vector work with them. It's efficient and I've never run into a problem with it.
Just keep in mind that if you're using hash tables a lot while working with R and your code is running slowly or is buggy, you may be able to get some mileage from figuring out a more R-like way of doing it :)

Resources