I'm new to JGit, so maybe I get something wrong, but I can't manage to get the result from the pull-command.
By that I mean a list of files that were affected by the pull (modified/added/deleted files,....)
I tried several things, but none of them gave me the information I need:
PullResult pull_res = git.pull().setCredentialsProvider(cred).call();
MergeResult merge_res = pull_res.getMergeResult();
FetchResult fetch_res = pull_res.getFetchResult();
java.lang.System.out.println("MergeStat1: "+merge_res.getMergeStatus());
java.lang.System.out.println("####################################");
java.lang.System.out.println("MergeStat2: "+merge_res.getMergedCommits());
java.lang.System.out.println("####################################");
java.lang.System.out.println("FetchRes: "+fetch_res.getMessages());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes1: "+pull_res.getFetchResult());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes2: "+pull_res.getFetchResult().getMessages());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes3: "+pull_res.getFetchResult().toString());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes4: "+pull_res.getFetchResult().getMessages().toString());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes5: "+pull_res.getMergeResult());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes6: "+pull_res.getRebaseResult());
java.lang.System.out.println("####################################");
java.lang.System.out.println("PullRes7: "+pull_res.getFetchedFrom());
This results in the following output:
MergeStat1: Fast-forward
####################################
MergeStat2: [Lorg.eclipse.jgit.lib.ObjectId;#1....
####################################
FetchRes:
####################################
PullRes1: org.eclipse.jgit.transport.FetchResult#3....
####################################
PullRes2:
####################################
PullRes3: org.eclipse.jgit.transport.FetchResult#3....
####################################
PullRes4:
####################################
PullRes5: Merge of revisions a...., 1a.. with base 1a... using strategy recursive resulted in: Fast-forward.
####################################
PullRes6: null
####################################
PullRes7: origin
What can I do to get a list of all the affected files from that pull? (also how they were affected: modified, added, deleted..)
UPDATE
With RĂ¼digers answer I came to the following solution of the problem:
CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
ObjectId tree = git.getRepository().resolve("HEAD^{tree}");
newTreeIter.reset(reader, tree);
UsernamePasswordCredentialsProvider cred = new UsernamePasswordCredentialsProvider(USER, PASSWORD);
PullResult pull_res = git.pull().setCredentialsProvider(cred).call();
CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
tree = git.getRepository().resolve("HEAD^{tree}");
oldTreeIter.reset(reader, tree);
DiffFormatter df = new DiffFormatter(new ByteArrayOutputStream());
df.setRepository( git.getRepository() );
List<DiffEntry> entries = df.scan(oldTreeIter, newTreeIter);
for(DiffEntry entry : entries) {
java.lang.System.out.println(entry);
}
Buffer the commit ID of the current branch before calling pull, for example using git.getRepository().resolve("HEAD").
When pull is done, read the HEAD commit ID again. Now you should be able to create a diff between the two commits.
How to show the changed files between two commits has been answered here: How to show changes between commits with JGit
See here for a detailed discussion of JGit's diff APIs: https://www.codeaffine.com/2016/06/16/jgit-diff/
Related
// but the code is throwing unexpected terminal operator new
function MovePokemon(argument0, argument1) {
old = argument0;
new = argument1;
TPartyID = global.PartyID[old]
global.PartyID[old] = global.PartyID[new]
global.PartyID[new] = TPartyID;
new is a keyword in the current versions of GameMaker, so you'll need to rename that variable (say, to _new).
The project in question may leave some to be desired given the complete absence of local variable declarations (var).
Try use this code in your script to avoid use "new"
function MovePokemon(argument0, argument1) {
TPartyID = global.PartyID[argument0]
global.PartyID[argument0] = global.PartyID[argument1]
global.PartyID[argument1] = TPartyID;
In my current job with spreadsheet, all inserted data passes through a test, checking if the same value is found on the same index in other sheets. Failing, a caution message is put in the current cell.
//mimimalist algorithm
function safeInsertion(data, row_, col_)
{
let rrow = row_ - 1; //range row
let rcol = col_ - 1; // range col
const active_sheet_name = getActiveSheetName(); // do as the its name suggest
const all_sheets = SpreadsheetApp.getActiveSpreadsheet().getSheets();
//test to evaluate the value to be inserted in the sheet
for (let sh of all_sheets)
{
if (sh.getName() === active_sheet_name)
continue;
//getSheetValues do as its name suggest.
if( getSheetValues(sh)[rrow][rcol] === data )
return "prohibited insertion"
}
return data;
}
// usage (in cell): =safeInsertion("A scarce data", ROW(), COLUMN())
The problems are:
cached values confuse me sometimes. The script or data is changed but not perceived by the sheet itself until renewing manually the cell's content or refreshing all table. Is there any relevant configuration available to this issue?
Sometimes, at loading, a messing result appears. Almost all data are prohibited, for example (originally, all was fine!).
What can I do to obtain a stable sheet using this approach?
PS: The original function does more testing on each data insertion. Those tests consist on counting the frequency in the actual sheet and in all sheets.
EDIT:
In fact, I can't create a stable sheet. For test, a let you a copy of my code with minimal adaptations.
function safelyPut(data, max_onesheet, max_allsheet, row, col)
{
// general initialization
const data_regex = "\^\s*"+data+"\s*$"
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const activesheet = spreadsheet.getActiveSheet();
const active_text_finder = activesheet.createTextFinder(data_regex)
.useRegularExpression(true)
.matchEntireCell(true);
const all_text_finder = spreadsheet.createTextFinder(data_regex)
.useRegularExpression(true)
.matchEntireCell(true);
const all_occurrences = all_text_finder.findAll();
//test general data's environment
const active_freq = active_text_finder.findAll().length;
if (max_onesheet <= active_freq)
return "Too much in a sheet";
const all_freq = all_occurrences.length;
if (max_allsheet <= all_freq)
return "Too much in the work";
//test unicity in a position
const active_sname = activesheet.getName();
for (occurrence of all_occurrences)
{
const sname = occurrence.getSheet().getName();
//if (SYSTEM_SHEETS.includes(sname))
//continue;
if (sname != active_sname)
if (occurrence.getRow() == row && occurrence.getColumn() == col)
if (occurrence.getValue() == data)
{
return `${sname} contains same data with the same indexes.`;
};
}
return data;
}
Create two or three cells and put randomly in a short range short range a value following the usage
=safeInsertion("Scarce Data", 3; 5; ROW(), COLUMN())
Do it, probably you will get a unstable sheet.
About cached values confuse me sometimes. The script is changed but not perceived by the sheet until renewing manually the cell's content or refreshing all table. No relevant configuration available to this issue?, when you want to refresh your custom function of safeInsertion, I thought that this thread might be useful.
About Sometimes, at loading, a messing result appears. Almost all data are prohibited, for example (originally, all was fine!). and What can I do to obtain a stable sheet using this approach?, in this case, for example, how about reducing the process cost of your script? I thought that by reducing the process cost of the script, your situation might be a bit stable.
When your script is modified by reducing the process cost, how about the following modification?
Modified script:
function safeInsertion(data, row_, col_) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const range = ss.createTextFinder(data).matchEntireCell(true).findNext();
return range && range.getRow() == row_ && range.getColumn() == col_ && range.getSheet().getSheetName() != ss.getActiveSheet().getSheetName() ? "prohibited insertion" : data;
}
The usage of this is the same with your current script like =safeInsertion("A scarce data", ROW(), COLUMN()).
In this modification, TextFinder is used. Because I thought that when the value is searched from all sheets in a Google Spreadsheet, TextFinder is suitable for reducing the process cost.
References:
createTextFinder(findText) of Class Spreadsheet
findNext()
Realm allows you to receive the results of a query in sorted order.
let realm = try! Realm()
let dogs = realm.objects(Dog.self)
let dogsSorted = dogs.sorted(byKeyPath: "name", ascending: false)
I ran this test to see how quickly realm returns sorted data
import Foundation
import RealmSwift
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
}
class RealmSortTest {
let documentCount = 1000000
var smallestValue: TestModel = TestModel()
func writeData() {
let realm = try! Realm()
var documents: [TestModel] = []
for _ in 0 ... documentCount {
let newDoc = TestModel()
newDoc.value = Int.random(in: 0 ... Int.max)
documents.append(newDoc)
}
try! realm.write {
realm.deleteAll()
realm.add(documents)
}
}
func readData() {
let realm = try! Realm()
let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
let start = Date()
self.smallestValue = sortedResults[0]
let end = Date()
let delta = end.timeIntervalSinceReferenceDate - start.timeIntervalSinceReferenceDate
print("Time Taken: \(delta)")
}
func updateSmallestValue() {
let realm = try! Realm()
let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
smallestValue = sortedResults[0]
print("Originally loaded smallest value: \(smallestValue.value)")
let newSmallestValue = TestModel()
newSmallestValue.value = smallestValue.value - 1
try! realm.write {
realm.add(newSmallestValue)
}
print("Originally loaded smallest value after write: \(smallestValue.value)")
let readStart = Date()
smallestValue = sortedResults[0]
let readEnd = Date()
let readDelta = readEnd.timeIntervalSinceReferenceDate - readStart.timeIntervalSinceReferenceDate
print("Reloaded smallest value \(smallestValue.value)")
print("Time Taken to reload the smallest value: \(readDelta)")
}
}
With documentCount = 100000, readData() output:
Time taken to load smallest value: 0.48901796340942383
and updateData() output:
Originally loaded smallest value: 2075613243102
Originally loaded smallest value after write: 2075613243102
Reloaded smallest value 2075613243101
Time taken to reload the smallest value: 0.4624580144882202
With documentCount = 1000000, readData() output:
Time taken to load smallest value: 4.807577967643738
and updateData() output:
Originally loaded smallest value: 4004790407680
Originally loaded smallest value after write: 4004790407680
Reloaded smallest value 4004790407679
Time taken to reload the smallest value: 5.2308430671691895
The time taken to retrieve the first document from a sorted result set is scaling with the number of documents stored in realm rather than the number of documents being retrieved. This indicates to me that realm is sorting all of the documents at query time rather than when the documents are being written. Is there a way to index your data so that you can quickly retrieve a small number of sorted documents?
Edit:
Following discussion in the comments, I updated the code to load only the smallest value from the sorted collection.
Edit 2
I updated the code to observe the results as suggested in the comments.
import Foundation
import RealmSwift
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
}
class RealmSortTest {
let documentCount = 1000000
var smallestValue: TestModel = TestModel()
var storedResults: Results<TestModel> = (try! Realm()).objects(TestModel.self).sorted(byKeyPath: "value")
var resultsToken: NotificationToken? = nil
func writeData() {
let realm = try! Realm()
var documents: [TestModel] = []
for _ in 0 ... documentCount {
let newDoc = TestModel()
newDoc.value = Int.random(in: 0 ... Int.max)
documents.append(newDoc)
}
try! realm.write {
realm.deleteAll()
realm.add(documents)
}
}
func observeData() {
let realm = try! Realm()
print("Loading Data")
let startTime = Date()
self.storedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
self.resultsToken = self.storedResults.observe { changes in
let observationTime = Date().timeIntervalSince(startTime)
print("Time to first observation: \(observationTime)")
let firstTenElementsSlice = self.storedResults[0..<10]
let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
elementsArray.forEach { print($0.value) }
let moreElapsed = Date().timeIntervalSince(startTime)
print("Time to printed elements: \(moreElapsed)")
}
}
}
and I got the following output
Loading Data
Time to first observation: 5.252112984657288
3792614823099
56006949537408
Time to printed elements: 5.253015995025635
Reading the data with an observer did not reduce the time taken to read the data.
At this time it appears that Realm sorts data when it is accessed rather than when it is written, and there is not a way to have Realm sort data at write time. This means that accessing sorted data scales with the number of documents in the database rather than the number of documents being accessed.
The actual time taken to access the data varies by use case and platform.
dogs and dogsSorted are Realm Results Collection object that essentially contains pointers to the underlying data, not the data itself.
Defining a sort order does NOT load all of the objects and they remain lazy - only loading as needed, which is one of the huge benefits to Realm; giant datasets can be used without worrying about overloading memory.
It's also one of the reasons that Realm Results objects always reflect the current state of the data of the underlying data; that data can change many times and what you see in your app Results vars (and Realm Collections in general) will always show the updated data.
As a side node, at this time working with Realm Collection objects with Swift High Level functions causes that data to load into memory - so don't do that. Sort, Filter etc with Realm functions and everything stays lazy and memory friendly.
Indexing is a trade off; on one hand it can improve the performance of certain queries like an equality ( "name == 'Spot'" ) but on the other hand it can slow down write performance. Additionally, adding indexes takes up a bit more space.
Generally speaking, indexing is best for specific use cases; maybe in a situation were you doing some kind of type ahead autofill where performance is critical. We have several apps with very large datasets (Gb's) and nothing is indexed because the performance advantage received is offset by slower writes, which are done frequently. I suggest starting without indexing.
EDIT:
Going to update the answer based on additional discussion.
First and foremost, copying data from one object to another is not a measure of database loading performance. The real objective here is the user experience and/or being able to access that data - from the time the user expects to see the data to when it's shown. So let's provide some code to demonstrate general performance:
We'll first start with a similar model to what the OP used
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
convenience init(withIndex: Int) {
self.init()
self.value = withIndex
}
}
Then define a couple of vars to hold the Results from disk and a notification token which allows us to know when that data is available to be displayed to the user. And then lastly a var to hold the time of when the loading starts
var modelResults: Results<TestModel>!
var modelsToken: NotificationToken?
var startTime = Date()
Here's the function that writes lots of data. The objectCount var will be changed from 10,000 objects on the first run to 1,000,000 objects on the second. Note this is bad coding as I am creating a million objects in memory so don't do this; for demonstration purposes only.
func writeLotsOfData() {
let realm = try! Realm()
let objectCount = 1000000
autoreleasepool {
var testModelArray = [TestModel]()
for _ in 0..<objectCount {
let m = TestModel(withIndex: Int.random(in: 0 ... Int.max))
testModelArray.append(m)
}
try! realm.write {
realm.add(testModelArray)
}
print("data written: \(testModelArray.count) objects")
}
}
and then finally the function that loads those objects from realm and outputs when the data is available to be shown to the user. Note they are sorted per the original question - and in fact will maintain their sort as data is added and changed! Pretty cool stuff.
func loadBigData() {
let realm = try! Realm()
print("Loading Data")
self.startTime = Date()
self.modelResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
self.modelsToken = self.modelResults?.observe { changes in
let elapsed = Date().timeIntervalSince(self.startTime)
print("Load completed of \(self.modelResults.count) objects - elapsed time of \(elapsed)")
}
}
and the results. Two runs, one with 10,000 objects and one with 1,000,000 objects
data written: 10000 objects
Loading Data
Load completed of 10000 objects - elapsed time of 0.0059670209884643555
data written: 1000000 objects
Loading Data
Load completed of 1000000 objects - elapsed time of 0.6800119876861572
There are three things to note
A Realm Notification object fires an event when the data has
completed loading, and also when there are additional changes. We are
leveraging that to notify the app when the data has completed loading
and is available to be used - shown to the user for example.
We are lazily loading all of the objects! At no point are we going
to run into a memory overloading issue. Once the objects have loaded
into the results, they are then freely available to be shown to the
user or processed in whatever way is needed. Super important to work
with Realm objects in a Realm way when working with large datasets.
Generally speaking, if it's 10 objects well, no problem tossing
them into an array, but when there are 1 Million objects - let Realm
do it's lazy job.
The app is protected using the above code and techniques. There
could be 10 objects or 1,000,000 objects and the memory impact is
minimal.
EDIT 2
(see comment to the OP's question for more info about this edit)
Per a request fromt the OP, they wanted to see the same exercise with printed values and times. Here's the updated code
self.modelsToken = self.modelResults?.observe { changes in
let elapsed = Date().timeIntervalSince(self.startTime)
print("Load completed of \(self.modelResults.count) objects - elapsed time of \(elapsed)")
print("print first 10 object values")
let firstTenElementsSlice = self.modelResults[0..<10]
let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
elementsArray.forEach { print($0.value)}
let moreElapsed = Date().timeIntervalSince(self.startTime)
print("Printing of 10 elements completed: \(moreElapsed)")
}
and then the output
Loading Data
Load completed of 1000000 objects - elapsed time of 0.6730009317398071
print first 10 object values
12264243738520
17242140785413
29611477414437
31558144830373
32913160803785
45399774467128
61700529799916
63929929449365
73833938586206
81739195218861
Printing of 10 elements completed: 0.6745189428329468
I want to get the changed files between two commits, not the diff information, how can I use JGit to make it?
With two refs pointing at the two commits it should suffice to do the following to iterate all changes between the commits:
ObjectId oldHead = repository.resolve("HEAD^^^^{tree}");
ObjectId head = repository.resolve("HEAD^{tree}");
// prepare the two iterators to compute the diff between
try (ObjectReader reader = repository.newObjectReader()) {
CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
oldTreeIter.reset(reader, oldHead);
CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
newTreeIter.reset(reader, head);
// finally get the list of changed files
try (Git git = new Git(repository)) {
List<DiffEntry> diffs= git.diff()
.setNewTree(newTreeIter)
.setOldTree(oldTreeIter)
.call();
for (DiffEntry entry : diffs) {
System.out.println("Entry: " + entry);
}
}
}
}
There is a ready-to-run example snippet contained in the jgit-cookbook
After reading the documentation, I'm having a hard time conceptualizing the change feed. Let's take the code from the documentation below. The second change feed is picking up the changes from the last time it was run via the checkpoints. Let's say it is being used to create summary data and there was an issue and it needed to be re-run from a prior time. I don't understand the following:
How to specify a particular time the checkpoint should start. I understand I can save the checkpoint dictionary and use that for each run, but how do you get the changes from X time to maybe rerun some summary data
Secondly, let's say we are rerunning some summary data and we save the last checkpoint used for each summarized data so we know where that one left off. How does one know that a record is in or before that checkpoint?
Code that runs from collection beginning and then from last checkpoint:
Dictionary < string, string > checkpoints = await GetChanges(client, collection, new Dictionary < string, string > ());
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-201", MetricType = "Temperature", Unit = "Celsius", MetricValue = 1000
});
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-212", MetricType = "Pressure", Unit = "psi", MetricValue = 1000
});
// Returns only the two documents created above.
checkpoints = await GetChanges(client, collection, checkpoints);
//
private async Task < Dictionary < string, string >> GetChanges(
DocumentClient client,
string collection,
Dictionary < string, string > checkpoints) {
List < PartitionKeyRange > partitionKeyRanges = new List < PartitionKeyRange > ();
FeedResponse < PartitionKeyRange > pkRangesResponse;
do {
pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(collection);
partitionKeyRanges.AddRange(pkRangesResponse);
}
while (pkRangesResponse.ResponseContinuation != null);
foreach(PartitionKeyRange pkRange in partitionKeyRanges) {
string continuation = null;
checkpoints.TryGetValue(pkRange.Id, out continuation);
IDocumentQuery < Document > query = client.CreateDocumentChangeFeedQuery(
collection,
new ChangeFeedOptions {
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = 1
});
while (query.HasMoreResults) {
FeedResponse < DeviceReading > readChangesResponse = query.ExecuteNextAsync < DeviceReading > ().Result;
foreach(DeviceReading changedDocument in readChangesResponse) {
Console.WriteLine(changedDocument.Id);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
}
return checkpoints;
}
DocumentDB supports check-pointing only by the logical timestamp returned by the server. If you would like to retrieve all changes from X minutes ago, you would have to "remember" the logical timestamp corresponding to the clock time (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), then use that to retrieve changes.
Change feed uses logical time in place of clock time because it can be different across various servers/partitions. If you would like to see change feed support based on clock time (with some caveats on skew), please propose/upvote at https://feedback.azure.com/forums/263030-documentdb/.
To save the last checkpoint per partition key/document, you can just save the corresponding version of the batch in which it was last seen (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), like Fred suggested in his answer.
How to specify a particular time the checkpoint should start.
You could try to provide a logical version/ETag (such as 95488) instead of providing a null value as RequestContinuation property of ChangeFeedOptions.