Creating RocksDB SST file in Java for bulk loading - rocksdb

I am new to RocksDB abd trying to create a SST file in Java for bulk loading. Eventual usecase is to create this in Apache Spark.
I am using rocksdbjni 6.3.6 in Ubuntu 18.04.03
I am keep getting this error,
org.rocksdb.RocksDBException: Keys must be added in order
at org.rocksdb.SstFileWriter.put(Native Method)
at org.rocksdb.SstFileWriter.put(SstFileWriter.java:104)
at CreateSSTFile.main(CreateSSTFile.java:34)
The sample code is
public static void main(String[] args) throws RocksDBException {
RocksDB.loadLibrary();
final Random random = new Random();
final EnvOptions envOptions = new EnvOptions();
final StringAppendOperator stringAppendOperator = new StringAppendOperator();
Options options1 = new Options();
SstFileWriter fw = null;
ComparatorOptions comparatorOptions = new ComparatorOptions();
try {
options1 = options1
.setCreateIfMissing(true)
.setEnv(Env.getDefault())
.setComparator(new BytewiseComparator(comparatorOptions));
fw = new SstFileWriter(envOptions, options1);
fw.open("/tmp/db/sst_upload_01");
for (int index = 0; index < 1000; index++) {
Slice keySlice = new Slice(("Key" + "_" + index).getBytes());
Slice valueSlice = new Slice(("Value_" + index + "_" + random.nextLong()).getBytes());
fw.put(keySlice, valueSlice);
}
fw.finish();
} catch (RocksDBException ex) {
ex.printStackTrace();
} finally {
stringAppendOperator.close();
envOptions.close();
options1.close();
if (fw != null) {
fw.close();
}
}
}
If the loop index is less than 10 the file is created successfully and I was able to ingest that into rocks db.
Thanks in advance.

I think I found the problem with the code.
The keys must be in order for the SST. The way I do the looping and using String lexicographical comparison for ordering, produces incorrect ordering. Like comparing "10" and "9" would break the order. Instead of that, if I sort all the keys before inserting into SST file it works.
Map<String, String> data = new HashMap<>();
for (int index = 0; index < 1000; index++) {
data.put("Key-" + random.nextLong(), "Value-" + random.nextDouble());
}
List<String> keys = new ArrayList<String>(data.keySet());
Collections.sort(keys);
for (String key : keys) {
Slice keySlice = new Slice(key);
Slice valueSlice = new Slice(data.get(key));
fw.put(keySlice, valueSlice);
}
When I tried with integer keys I found the issue.

The keys in sstfile should be in increasing order.
so you can start from index=10 it will work.

In fact ,you can create Comparator extends DirectComparator to avoid sorted.
class MyComparator extends DirectComparator {
public MyComparator(ComparatorOptions copt) {
super(copt);
}
#Override
public String name() {
return "MyComparator";
}
#Override
public int compare(DirectSlice a, DirectSlice b) {
// always true
return 1;
}
}
}
then set option
options1.setComparator(new MyComparator(comparatorOptions));

Related

Are Guids unique when using a U-SQL Extractor?

As these questions point out, Guid.NewGuid will return the same value for all rows due to the enforced deterministic nature of U-SQL i.e if it's scaled out if an element (vertex) needs retrying then it should return the same value....
Guid.NewGuid() always return same Guid for all rows
auto_increment in U-SQL
However.... the code example in the officials documentation for a User Defined Extractor purposefully uses Guid.NewGuid().
I'm not querying the validity of the answers for the questions above, as they are from an authoritative source (the programme manager for u-sql, so very authoritative!). However, what I'm wondering if the action of using an Extractor means NewGuid can be used as normal? Is it simply within c# expressions in u-sql and User Defined Functions in which NewGuid is unsafe?
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class FullDescriptionExtractor : IExtractor
{
private Encoding _encoding;
private byte[] _row_delim;
private char _col_delim;
public FullDescriptionExtractor(Encoding encoding, string row_delim = "\r\n", char col_delim = '\t')
{
this._encoding = ((encoding == null) ? Encoding.UTF8 : encoding);
this._row_delim = this._encoding.GetBytes(row_delim);
this._col_delim = col_delim;
}
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
string line;
//Read the input line by line
foreach (Stream current in input.Split(_encoding.GetBytes("\r\n")))
{
using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
{
line = streamReader.ReadToEnd().Trim();
//Split the input by the column delimiter
string[] parts = line.Split(this._col_delim);
int count = 0; // start with first column
foreach (string part in parts)
{
if (count == 0)
{ // for column “guid”, re-generated guid
Guid new_guid = Guid.NewGuid();
output.Set<Guid>(count, new_guid);
}
else if (count == 2)
{
// for column “user”, convert to UPPER case
output.Set<string>(count, part.ToUpper());
}
else
{
// keep the rest of the columns as-is
output.Set<string>(count, part);
}
count += 1;
}
}
yield return output.AsReadOnly();
}
yield break;
}
}
https://learn.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-programmability-guide#use-user-defined-extractors

ConcurrentModificationException when reinserting a node JavaFX [duplicate]

We all know you can't do the following because of ConcurrentModificationException:
for (Object i : l) {
if (condition(i)) {
l.remove(i);
}
}
But this apparently works sometimes, but not always. Here's some specific code:
public static void main(String[] args) {
Collection<Integer> l = new ArrayList<>();
for (int i = 0; i < 10; ++i) {
l.add(4);
l.add(5);
l.add(6);
}
for (int i : l) {
if (i == 5) {
l.remove(i);
}
}
System.out.println(l);
}
This, of course, results in:
Exception in thread "main" java.util.ConcurrentModificationException
Even though multiple threads aren't doing it. Anyway.
What's the best solution to this problem? How can I remove an item from the collection in a loop without throwing this exception?
I'm also using an arbitrary Collection here, not necessarily an ArrayList, so you can't rely on get.
Iterator.remove() is safe, you can use it like this:
List<String> list = new ArrayList<>();
// This is a clever way to create the iterator and call iterator.hasNext() like
// you would do in a while-loop. It would be the same as doing:
// Iterator<String> iterator = list.iterator();
// while (iterator.hasNext()) {
for (Iterator<String> iterator = list.iterator(); iterator.hasNext();) {
String string = iterator.next();
if (string.isEmpty()) {
// Remove the current element from the iterator and the list.
iterator.remove();
}
}
Note that Iterator.remove() is the only safe way to modify a collection during iteration; the behavior is unspecified if the underlying collection is modified in any other way while the iteration is in progress.
Source: docs.oracle > The Collection Interface
And similarly, if you have a ListIterator and want to add items, you can use ListIterator#add, for the same reason you can use Iterator#remove — it's designed to allow it.
In your case you tried to remove from a list, but the same restriction applies if trying to put into a Map while iterating its content.
This works:
Iterator<Integer> iter = l.iterator();
while (iter.hasNext()) {
if (iter.next() == 5) {
iter.remove();
}
}
I assumed that since a foreach loop is syntactic sugar for iterating, using an iterator wouldn't help... but it gives you this .remove() functionality.
With Java 8 you can use the new removeIf method. Applied to your example:
Collection<Integer> coll = new ArrayList<>();
//populate
coll.removeIf(i -> i == 5);
Since the question has been already answered i.e. the best way is to use the remove method of the iterator object, I would go into the specifics of the place where the error "java.util.ConcurrentModificationException" is thrown.
Every collection class has a private class which implements the Iterator interface and provides methods like next(), remove() and hasNext().
The code for next looks something like this...
public E next() {
checkForComodification();
try {
E next = get(cursor);
lastRet = cursor++;
return next;
} catch(IndexOutOfBoundsException e) {
checkForComodification();
throw new NoSuchElementException();
}
}
Here the method checkForComodification is implemented as
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
So, as you can see, if you explicitly try to remove an element from the collection. It results in modCount getting different from expectedModCount, resulting in the exception ConcurrentModificationException.
You can either use the iterator directly like you mentioned, or else keep a second collection and add each item you want to remove to the new collection, then removeAll at the end. This allows you to keep using the type-safety of the for-each loop at the cost of increased memory use and cpu time (shouldn't be a huge problem unless you have really, really big lists or a really old computer)
public static void main(String[] args)
{
Collection<Integer> l = new ArrayList<Integer>();
Collection<Integer> itemsToRemove = new ArrayList<>();
for (int i=0; i < 10; i++) {
l.add(Integer.of(4));
l.add(Integer.of(5));
l.add(Integer.of(6));
}
for (Integer i : l)
{
if (i.intValue() == 5) {
itemsToRemove.add(i);
}
}
l.removeAll(itemsToRemove);
System.out.println(l);
}
In such cases a common trick is (was?) to go backwards:
for(int i = l.size() - 1; i >= 0; i --) {
if (l.get(i) == 5) {
l.remove(i);
}
}
That said, I'm more than happy that you have better ways in Java 8, e.g. removeIf or filter on streams.
Same answer as Claudius with a for loop:
for (Iterator<Object> it = objects.iterator(); it.hasNext();) {
Object object = it.next();
if (test) {
it.remove();
}
}
With Eclipse Collections, the method removeIf defined on MutableCollection will work:
MutableList<Integer> list = Lists.mutable.of(1, 2, 3, 4, 5);
list.removeIf(Predicates.lessThan(3));
Assert.assertEquals(Lists.mutable.of(3, 4, 5), list);
With Java 8 Lambda syntax this can be written as follows:
MutableList<Integer> list = Lists.mutable.of(1, 2, 3, 4, 5);
list.removeIf(Predicates.cast(integer -> integer < 3));
Assert.assertEquals(Lists.mutable.of(3, 4, 5), list);
The call to Predicates.cast() is necessary here because a default removeIf method was added on the java.util.Collection interface in Java 8.
Note: I am a committer for Eclipse Collections.
Make a copy of existing list and iterate over new copy.
for (String str : new ArrayList<String>(listOfStr))
{
listOfStr.remove(/* object reference or index */);
}
People are asserting one can't remove from a Collection being iterated by a foreach loop. I just wanted to point out that is technically incorrect and describe exactly (I know the OP's question is so advanced as to obviate knowing this) the code behind that assumption:
for (TouchableObj obj : untouchedSet) { // <--- This is where ConcurrentModificationException strikes
if (obj.isTouched()) {
untouchedSet.remove(obj);
touchedSt.add(obj);
break; // this is key to avoiding returning to the foreach
}
}
It isn't that you can't remove from the iterated Colletion rather that you can't then continue iteration once you do. Hence the break in the code above.
Apologies if this answer is a somewhat specialist use-case and more suited to the original thread I arrived here from, that one is marked as a duplicate (despite this thread appearing more nuanced) of this and locked.
With a traditional for loop
ArrayList<String> myArray = new ArrayList<>();
for (int i = 0; i < myArray.size(); ) {
String text = myArray.get(i);
if (someCondition(text))
myArray.remove(i);
else
i++;
}
ConcurrentHashMap or ConcurrentLinkedQueue or ConcurrentSkipListMap may be another option, because they will never throw any ConcurrentModificationException, even if you remove or add item.
Another way is to use a copy of your arrayList just for iteration:
List<Object> l = ...
List<Object> iterationList = ImmutableList.copyOf(l);
for (Object curr : iterationList) {
if (condition(curr)) {
l.remove(curr);
}
}
A ListIterator allows you to add or remove items in the list. Suppose you have a list of Car objects:
List<Car> cars = ArrayList<>();
// add cars here...
for (ListIterator<Car> carIterator = cars.listIterator(); carIterator.hasNext(); )
{
if (<some-condition>)
{
carIterator().remove()
}
else if (<some-other-condition>)
{
carIterator().add(aNewCar);
}
}
Now, You can remove with the following code
l.removeIf(current -> current == 5);
I know this question is too old to be about Java 8, but for those using Java 8 you can easily use removeIf():
Collection<Integer> l = new ArrayList<Integer>();
for (int i=0; i < 10; ++i) {
l.add(new Integer(4));
l.add(new Integer(5));
l.add(new Integer(6));
}
l.removeIf(i -> i.intValue() == 5);
Java Concurrent Modification Exception
Single thread
Iterator<String> iterator = list.iterator();
while (iterator.hasNext()) {
String value = iter.next()
if (value == "A") {
list.remove(it.next()); //throws ConcurrentModificationException
}
}
Solution: iterator remove() method
Iterator<String> iterator = list.iterator();
while (iterator.hasNext()) {
String value = iter.next()
if (value == "A") {
it.remove()
}
}
Multi thread
copy/convert and iterate over another one collection. For small collections
synchronize[About]
thread safe collection[About]
I have a suggestion for the problem above. No need of secondary list or any extra time. Please find an example which would do the same stuff but in a different way.
//"list" is ArrayList<Object>
//"state" is some boolean variable, which when set to true, Object will be removed from the list
int index = 0;
while(index < list.size()) {
Object r = list.get(index);
if( state ) {
list.remove(index);
index = 0;
continue;
}
index += 1;
}
This would avoid the Concurrency Exception.
for (Integer i : l)
{
if (i.intValue() == 5){
itemsToRemove.add(i);
break;
}
}
The catch is the after removing the element from the list if you skip the internal iterator.next() call. it still works! Though I dont propose to write code like this it helps to understand the concept behind it :-)
Cheers!
Example of thread safe collection modification:
public class Example {
private final List<String> queue = Collections.synchronizedList(new ArrayList<String>());
public void removeFromQueue() {
synchronized (queue) {
Iterator<String> iterator = queue.iterator();
String string = iterator.next();
if (string.isEmpty()) {
iterator.remove();
}
}
}
}
I know this question assumes just a Collection, and not more specifically any List. But for those reading this question who are indeed working with a List reference, you can avoid ConcurrentModificationException with a while-loop (while modifying within it) instead if you want to avoid Iterator (either if you want to avoid it in general, or avoid it specifically to achieve a looping order different from start-to-end stopping at each element [which I believe is the only order Iterator itself can do]):
*Update: See comments below that clarify the analogous is also achievable with the traditional-for-loop.
final List<Integer> list = new ArrayList<>();
for(int i = 0; i < 10; ++i){
list.add(i);
}
int i = 1;
while(i < list.size()){
if(list.get(i) % 2 == 0){
list.remove(i++);
} else {
i += 2;
}
}
No ConcurrentModificationException from that code.
There we see looping not start at the beginning, and not stop at every element (which I believe Iterator itself can't do).
FWIW we also see get being called on list, which could not be done if its reference was just Collection (instead of the more specific List-type of Collection) - List interface includes get, but Collection interface does not. If not for that difference, then the list reference could instead be a Collection [and therefore technically this Answer would then be a direct Answer, instead of a tangential Answer].
FWIWW same code still works after modified to start at beginning at stop at every element (just like Iterator order):
final List<Integer> list = new ArrayList<>();
for(int i = 0; i < 10; ++i){
list.add(i);
}
int i = 0;
while(i < list.size()){
if(list.get(i) % 2 == 0){
list.remove(i);
} else {
++i;
}
}
One solution could be to rotate the list and remove the first element to avoid the ConcurrentModificationException or IndexOutOfBoundsException
int n = list.size();
for(int j=0;j<n;j++){
//you can also put a condition before remove
list.remove(0);
Collections.rotate(list, 1);
}
Collections.rotate(list, -1);
Try this one (removes all elements in the list that equal i):
for (Object i : l) {
if (condition(i)) {
l = (l.stream().filter((a) -> a != i)).collect(Collectors.toList());
}
}
You can use a while loop.
Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator();
while(iterator.hasNext()){
Map.Entry<String, String> entry = iterator.next();
if(entry.getKey().equals("test")) {
iterator.remove();
}
}
I ended up with this ConcurrentModificationException, while iterating the list using stream().map() method. However the for(:) did not throw the exception while iterating and modifying the the list.
Here is code snippet , if its of help to anyone:
here I'm iterating on a ArrayList<BuildEntity> , and modifying it using the list.remove(obj)
for(BuildEntity build : uniqueBuildEntities){
if(build!=null){
if(isBuildCrashedWithErrors(build)){
log.info("The following build crashed with errors , will not be persisted -> \n{}"
,build.getBuildUrl());
uniqueBuildEntities.remove(build);
if (uniqueBuildEntities.isEmpty()) return EMPTY_LIST;
}
}
}
if(uniqueBuildEntities.size()>0) {
dbEntries.addAll(uniqueBuildEntities);
}
If using HashMap, in newer versions of Java (8+) you can select each of 3 options:
public class UserProfileEntity {
private String Code;
private String mobileNumber;
private LocalDateTime inputDT;
// getters and setters here
}
HashMap<String, UserProfileEntity> upMap = new HashMap<>();
// remove by value
upMap.values().removeIf(value -> !value.getCode().contains("0005"));
// remove by key
upMap.keySet().removeIf(key -> key.contentEquals("testUser"));
// remove by entry / key + value
upMap.entrySet().removeIf(entry -> (entry.getKey().endsWith("admin") || entry.getValue().getInputDT().isBefore(LocalDateTime.now().minusMinutes(3)));
The best way (recommended) is use of java.util.concurrent package. By
using this package you can easily avoid this exception. Refer
Modified Code:
public static void main(String[] args) {
Collection<Integer> l = new CopyOnWriteArrayList<Integer>();
for (int i=0; i < 10; ++i) {
l.add(new Integer(4));
l.add(new Integer(5));
l.add(new Integer(6));
}
for (Integer i : l) {
if (i.intValue() == 5) {
l.remove(i);
}
}
System.out.println(l);
}
Iterators are not always helpful when another thread also modifies the collection. I had tried many ways but then realized traversing the collection manually is much safer (backward for removal):
for (i in myList.size-1 downTo 0) {
myList.getOrNull(i)?.also {
if (it == 5)
myList.remove(it)
}
}
In case ArrayList:remove(int index)- if(index is last element's position) it avoids without System.arraycopy() and takes not time for this.
arraycopy time increases if(index decreases), by the way elements of list also decreases!
the best effective remove way is- removing its elements in descending order:
while(list.size()>0)list.remove(list.size()-1);//takes O(1)
while(list.size()>0)list.remove(0);//takes O(factorial(n))
//region prepare data
ArrayList<Integer> ints = new ArrayList<Integer>();
ArrayList<Integer> toRemove = new ArrayList<Integer>();
Random rdm = new Random();
long millis;
for (int i = 0; i < 100000; i++) {
Integer integer = rdm.nextInt();
ints.add(integer);
}
ArrayList<Integer> intsForIndex = new ArrayList<Integer>(ints);
ArrayList<Integer> intsDescIndex = new ArrayList<Integer>(ints);
ArrayList<Integer> intsIterator = new ArrayList<Integer>(ints);
//endregion
// region for index
millis = System.currentTimeMillis();
for (int i = 0; i < intsForIndex.size(); i++)
if (intsForIndex.get(i) % 2 == 0) intsForIndex.remove(i--);
System.out.println(System.currentTimeMillis() - millis);
// endregion
// region for index desc
millis = System.currentTimeMillis();
for (int i = intsDescIndex.size() - 1; i >= 0; i--)
if (intsDescIndex.get(i) % 2 == 0) intsDescIndex.remove(i);
System.out.println(System.currentTimeMillis() - millis);
//endregion
// region iterator
millis = System.currentTimeMillis();
for (Iterator<Integer> iterator = intsIterator.iterator(); iterator.hasNext(); )
if (iterator.next() % 2 == 0) iterator.remove();
System.out.println(System.currentTimeMillis() - millis);
//endregion
for index loop: 1090 msec
for desc index: 519 msec---the best
for iterator: 1043 msec
you can also use Recursion
Recursion in java is a process in which a method calls itself continuously. A method in java that calls itself is called recursive method.

Lucene number of occurrences

I am using Lucene.net in my Web App.
Everithing works fine, but now i have to show the number of occurrences of my 'searchstring' in every single document of the hits array.
How can i do this? I use usual BooleanQuery.
That is my search:
BooleanQuery bq = new BooleanQuery();
bq.Add(QueryParser.Parse(Lquery, "", CurIndexDescritor.GetLangAnalizer()), false,false);
BooleanQuery.SetMaxClauseCount(int.MaxValue);
IndexSearcher searcher = new IndexSearcher(indexPath);
Hits hits = (filter != null) ? searcher.Search(bq, filter) : searcher.Search(bq);
for (int i = 0; i < hits.Length(); i++)
{
Document doc = hits.Doc(i);
SearchResultItem MyDb = new SearchResultItem();
MyDb.key = doc.Get(KeyField);
MyDb.score = hits.Score(i);
result.Add(MyDb);
}
Where can i get the number of occurrences?
Thanks!
If you dont want the score back and dont want to order the results using score you could probably build a custom Similarity implementation.
I quickly tested the following code, and it appears to work fine with TermQueries and PhraseQueries, i didnt test more query types tho. A PhraseQuery hit counts as a single occurence.
public class OccurenceSimilarity : DefaultSimilarity
{
public override float Tf(float freq)
{
return freq;
}
public override float Idf(int docFreq, int numDocs)
{
return 1;
}
public override float Coord(int overlap, int maxOverlap)
{
return 1;
}
public override float QueryNorm(float sumOfSquaredWeights)
{
return 1;
}
public override Explanation.IDFExplanation idfExplain(System.Collections.ICollection terms, Searcher searcher)
{
return CACHED_IDF_EXPLAIN;
}
public override Explanation.IDFExplanation IdfExplain(Term term, Searcher searcher)
{
return CACHED_IDF_EXPLAIN;
}
public override float SloppyFreq(int distance)
{
return 1;
}
private static Explanation.IDFExplanation CACHED_IDF_EXPLAIN = new ExplainIt();
private class ExplainIt : Explanation.IDFExplanation
{
public override string Explain()
{
return "1";
}
public override float GetIdf()
{
return 1.0f;
}
}
}
To use it:
Similarity.SetDefault(new OccurenceSimilarity());

Raven DB DocumentStore - throws out of memory exception

I have code like this:
public bool Set(IEnumerable<WhiteForest.Common.Entities.Projections.RequestProjection> requests)
{
var documentSession = _documentStore.OpenSession();
//{
try
{
foreach (var request in requests)
{
documentSession.Store(request);
}
//requests.AsParallel().ForAll(x => documentSession.Store(x));
documentSession.SaveChanges();
documentSession.Dispose();
return true;
}
catch (Exception e)
{
_log.LogDebug("Exception in RavenRequstRepository - Set. Exception is [{0}]", e.ToString());
return false;
}
//}
}
This code gets called many times. After i get to around 50,000 documents that have passed through it i get an OutOfMemoryException.
Any idea why ? perhaps after a while i need to declare a new DocumentStore ?
thank you
**
UPDATE:
**
I ended up using the Batch/Patch API to perform the update I needed.
You can see the discussion here: https://groups.google.com/d/topic/ravendb/3wRT9c8Y-YE/discussion
Basically since i only needed to update 1 property on my objects, and after considering ayendes comments about re-serializing all the objects back to JSON, i did something like this:
internal void Patch()
{
List<string> docIds = new List<string>() { "596548a7-61ef-4465-95bc-b651079f4888", "cbbca8d5-be45-4e0d-91cf-f4129e13e65e" };
using (var session = _documentStore.OpenSession())
{
session.Advanced.DatabaseCommands.Batch(GenerateCommands(docIds));
}
}
private List<ICommandData> GenerateCommands(List<string> docIds )
{
List<ICommandData> retList = new List<ICommandData>();
foreach (var item in docIds)
{
retList.Add(new PatchCommandData()
{
Key = item,
Patches = new[] { new Raven.Abstractions.Data.PatchRequest () {
Name = "Processed",
Type = Raven.Abstractions.Data.PatchCommandType.Set,
Value = new RavenJValue(true)
}}});
}
return retList;
}
Hope this helps ...
Thanks alot.
I just did this for my current project. I chunked the data into pieces and saved each chunk in a new session. This may work for you, too.
Note, this example shows chunking by 1024 documents at a time, but needing at least 2000 before we decide it's worth chunking. So far, my inserts got the best performance with a chunk size of 4096. I think that's because my documents are relatively small.
internal static void WriteObjectList<T>(List<T> objectList)
{
int numberOfObjectsThatWarrantChunking = 2000; // Don't bother chunking unless we have at least this many objects.
if (objectList.Count < numberOfObjectsThatWarrantChunking)
{
// Just write them all at once.
using (IDocumentSession ravenSession = GetRavenSession())
{
objectList.ForEach(x => ravenSession.Store(x));
ravenSession.SaveChanges();
}
return;
}
int numberOfDocumentsPerSession = 1024; // Chunk size
List<List<T>> objectListInChunks = new List<List<T>>();
for (int i = 0; i < objectList.Count; i += numberOfDocumentsPerSession)
{
objectListInChunks.Add(objectList.Skip(i).Take(numberOfDocumentsPerSession).ToList());
}
Parallel.ForEach(objectListInChunks, listOfObjects =>
{
using (IDocumentSession ravenSession = GetRavenSession())
{
listOfObjects.ForEach(x => ravenSession.Store(x));
ravenSession.SaveChanges();
}
});
}
private static IDocumentSession GetRavenSession()
{
return _ravenDatabase.OpenSession();
}
Are you trying to save it all in one call?
The DocumentSession need to turn all of the objects that you pass it into a single request to the server. That means that it may allocate a lot of memory for the write to the server.
Usually we recommend on batches of about 1,024 items in you are doing bulks saves.
DocumentStore is a disposable class, so I worked around this problem by disposing the instance after each chunk. I highly doubt this is the most efficient way to run operations, but it will prevent significant memory overhead from happening.
I was running a sort of "delete all" operation like so. You can see the using blocks disposing both the DocumentStore and the IDocumentSession objects after each chunk.
static DocumentStore GetDataStore()
{
DocumentStore ds = new DocumentStore
{
DefaultDatabase = "test",
Url = "http://localhost:8080"
};
ds.Initialize();
return ds;
}
static IDocumentSession GetDbInstance(DocumentStore ds)
{
return ds.OpenSession();
}
static void Main(string[] args)
{
do
{
using (var ds = GetDataStore())
using (var db = GetDbInstance(ds))
{
//The `Take` operation will cap out at 1,024 by default, per Raven documentation
var list = db.Query<MyClass>().Skip(deleteSum).Take(5000).ToList();
deleteCount = list.Count;
deleteSum += deleteCount;
foreach (var item in list)
{
db.Delete(item);
}
db.SaveChanges();
list.Clear();
}
} while (deleteCount > 0);
}

How do I add ROW_NUMBER to a LINQ query or Entity?

I'm stumped by this easy data problem.
I'm using the Entity framework and have a database of products. My results page returns a paginated list of these products. Right now my results are ordered by the number of sales of each product, so my code looks like this:
return Products.OrderByDescending(u => u.Sales.Count());
This returns an IQueryable dataset of my entities, sorted by the number of sales.
I want my results page to show the rank of each product (in the dataset). My results should look like this:
Page #1
1. Bananas
2. Apples
3. Coffee
Page #2
4. Cookies
5. Ice Cream
6. Lettuce
I'm expecting that I just want to add a column in my results using the SQL ROW_NUMBER variable...but I don't know how to add this column to my results datatable.
My resulting page does contain a foreach loop, but since I'm using a paginated set I'm guessing using that number to fake a ranking number would NOT be the best approach.
So my question is, how do I add a ROW_NUMBER column to my query results in this case?
Use the indexed overload of Select:
var start = page * rowsPerPage;
Products.OrderByDescending(u => u.Sales.Count())
.Skip(start)
.Take(rowsPerPage)
.AsEnumerable()
.Select((u, index) => new { Product = u, Index = index + start });
Actually using OrderBy and then Skip + Take generates ROW_NUMBER in EF 4.5 (you can check with SQL Profiler).
I was searching for a way to do the same thing you are asking for and I was able to get what I need through a simplification of Craig's answer:
var start = page * rowsPerPage;
Products.OrderByDescending(u => u.Sales.Count())
.Skip(start)
.Take(rowsPerPage)
.ToList();
By the way, the generated SQL uses ROW_NUMBER > start and TOP rowsPerPage.
Try this
var x = Products.OrderByDecending(u => u.Sales.Count());
var y = x.ToList();
for(int i = 0; i < y.Count; i++) {
int myNumber = i; // this is your order number
}
As long as the list stays in the same order, which should happen unless the sales number changes. You could be able to get an accurate count;
There is also this way of doing it.
var page = 2;
var count = 10;
var startIndex = page * count;
var x = Products.OrderByDecending(u => u.Sales.Count());
var y = x.Skip(startIndex).Take(count);
This gives the start index for the page, plus it gives you a small set of sales to display on the page. You just start the counting on your website at startIndex.
Here is a long winded answer. First create a class to house the number/item pair like so:
public class NumberedItem<T>
{
public readonly int Number;
public readonly T Item;
public NumberedItem(int number, T item)
{
Item = item;
Number = number;
}
}
Next comes an abstraction around a page of items (numbered or not):
class PageOf<T> : IEnumerable<T>
{
private readonly int startsAt;
private IEnumerable<T> items;
public PageOf(int startsAt, IEnumerable<T> items)
{
this.startsAt = startsAt;
this.items = items;
}
public IEnumerable<NumberedItem<T>> NumberedItems
{
get
{
int index = 0;
foreach (var item in items)
yield return new NumberedItem<T>(startsAt + index++, item);
yield break;
}
}
public IEnumerator<T> GetEnumerator()
{
foreach (var item in items)
yield return item;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
Once you have that you can "Paginate" a particular queryable collection using this:
class PaginatedQueryable<T>
{
private readonly int PageSize;
private readonly IQueryable<T> Source;
public PaginatedQueryable(int PageSize, IQueryable<T> Source)
{
this.PageSize = PageSize;
this.Source = Source;
}
public PageOf<T> Page(int pageNum)
{
var start = (pageNum - 1) * PageSize;
return new PageOf<T>(start + 1, Source.Skip(start).Take(PageSize));
}
}
And finally a nice extension method to cover the ugly:
static class PaginationExtension
{
public static PaginatedQueryable<T> InPagesOf<T>(this IQueryable<T> target, int PageSize)
{
return new PaginatedQueryable<T>(PageSize, target);
}
}
Which means you can now do this:
var products = Products.OrderByDescending(u => u.Sales.Count()).InPagesOf(20).Page(1);
foreach (var product in products.NumberedItems)
{
Console.WriteLine("{0} {1}", product.Number, product.Item);
}

Resources