checksum is same for different images - hashtable

I am sending thousands of images from one system to another over FTP. Initially, I'll dump all the images, but later on, I want to send only those images which are changed.
I haven't found any concrete solution to figure out changed images based on the updated timestamp in windows. Therefore, I decided the following approach:
1.) Generate checksums for all the files and store them somewhere. Maybe database or filesystem.
2.) Every time I send files to another system, compare the checksums and send only the files which have different checksums.
In order to test the above, I tried to generate a checksum (SHA and MD5) for two different images, and the checksum was the same.
Following is the sample code:
package com.test;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.commons.codec.digest.DigestUtils;
public class TestHash {
public static void main(String[] args) throws IOException {
String checksumSHA256 = DigestUtils.sha256Hex(new FileInputStream("monkey_11.jpg"));
System.out.println("checksumSHA256 : " + checksumSHA256);
String checksumMD5 = DigestUtils.md5Hex(new FileInputStream("monkey_11.jpg"));
System.out.println("checksumMD5 : " + checksumMD5);
String checksumSHA256_1 = DigestUtils.sha256Hex(new FileInputStream("monkey.jpg"));
System.out.println("checksumSHA256 : " + checksumSHA256_1);
String checksumMD5_1 = DigestUtils.md5Hex(new FileInputStream("monkey.jpg"));
System.out.println("checksumMD5 : " + checksumMD5_1);
}
}
I'm wondering why the checksums are the same? Is there another way to identify updated images?

Related

DynamoDb streams, just get new updates since

I'm trying to work with DynamoDb streams, I am using the example code shown in this article. I've modified it to work in a basic Spring Boot app (initializr), utilizing an existing DynamoDb table which has streams enabled. Everything appears to work, however; I'm not seeing any new updates.
This particular database has a bulk update once per day at a specific time, it may get some minor changes now and then during the day. I'm trying to monitor these minor updates. When I run the application I can see the records from the bulk update, however if my application is running and I use the AWS Console to modify, create or delete a record I don't seem to get any output.
I'm using:
Spring Boot:2.3.9.RELEASE
amazon-kinesis-client:1.14.2
Java 11
Running on Mac Catalina (though that shouldn't matter)
In my test application I did the following:
package com.test.dynamodb_streams_test_kcl.service;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreams;
import com.amazonaws.services.dynamodbv2.model.*;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import javax.annotation.PostConstruct;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.util.List;
#Slf4j
#Service
#RequiredArgsConstructor
public class LowLevelKclProcessor {
private static final String dynamoDbTableName = "global-items";
private final AmazonDynamoDB dynamoDB;
private final AmazonDynamoDBStreams dynamoDBStreams;
private final ZonedDateTime startTime = ZonedDateTime.now();
#PostConstruct
public void initialize() {
log.info("Describing table={}", dynamoDbTableName);
DescribeTableResult itemTableDescription = dynamoDB.describeTable(dynamoDbTableName);
log.info("Got description");
String itemTableStreamArn = itemTableDescription.getTable().getLatestStreamArn();
log.info("Got stream arn ({}) for table={} tableArn={}", itemTableStreamArn,
itemTableDescription.getTable().getTableName(), itemTableDescription.getTable().getTableArn());
// Get all the shard IDs from the stream. Note that DescribeStream returns
// the shard IDs one page at a time.
String lastEvaluatedShardId = null;
do {
DescribeStreamResult describeStreamResult = dynamoDBStreams.describeStream(
new DescribeStreamRequest()
.withStreamArn(itemTableStreamArn)
.withExclusiveStartShardId(lastEvaluatedShardId));
List<Shard> shards = describeStreamResult.getStreamDescription().getShards();
// Process each shard on this page
for (Shard shard : shards) {
String shardId = shard.getShardId();
System.out.println("Shard: " + shard);
// Get an iterator for the current shard
GetShardIteratorRequest getShardIteratorRequest = new GetShardIteratorRequest()
.withStreamArn(itemTableStreamArn)
.withShardId(shardId)
.withShardIteratorType(ShardIteratorType.LATEST);
GetShardIteratorResult getShardIteratorResult =
dynamoDBStreams.getShardIterator(getShardIteratorRequest);
String currentShardIter = getShardIteratorResult.getShardIterator();
// Shard iterator is not null until the Shard is sealed (marked as READ_ONLY).
// To prevent running the loop until the Shard is sealed, which will be on average
// 4 hours, we process only the items that were written into DynamoDB and then exit.
int processedRecordCount = 0;
while (currentShardIter != null && processedRecordCount < 100) {
System.out.println(" Shard iterator: " + currentShardIter.substring(380));
// Use the shard iterator to read the stream records
GetRecordsResult getRecordsResult = dynamoDBStreams.getRecords(new GetRecordsRequest()
.withShardIterator(currentShardIter));
List<Record> records = getRecordsResult.getRecords();
for (Record record : records) {
// I set a breakpoint on the line below, but it was never hit after the bulk update info
if (startTime.isBefore(ZonedDateTime.ofInstant(record.getDynamodb()
.getApproximateCreationDateTime().toInstant(), ZoneId.systemDefault()))) {
System.out.println(" " + record.getDynamodb());
}
}
processedRecordCount += records.size();
currentShardIter = getRecordsResult.getNextShardIterator();
}
}
// If LastEvaluatedShardId is set, then there is
// at least one more page of shard IDs to retrieve
lastEvaluatedShardId = describeStreamResult.getStreamDescription().getLastEvaluatedShardId();
} while (lastEvaluatedShardId != null);
}
}
Note that your test is based on the low-level API, not on the Kenisis client library. So it's normal to have some tricky technical details to deal with.
Your test application has some similarities with the example given in the doc, but it has issues:
When I run the application I can see the records from the bulk update
ShardIteratorType.LATEST will not look for old records that happened before running the test (It starts reading just after the most recent stream records in the shard)
So, I will assume that the iterator type was different (ex: TRIM_HORIZON) and changed later to LATEST during your tests.
The main issue comes from the fact that your application will sequentially poll shards, and it will bloque in the first shard until it finds 100 new records in this shard (due to LATEST iterator type).
So, you may not see the new minor changes while the test is running if they belong to a different shard.
Solutions:
1- Poll shards in parallel using threads.
2- Filter returned shards using the sequence number of the last logged record, and try to guess the shard that may contain minor changes.
3- Dangerous & I'm not sure if it works :)
In a test table, and if your data model allows this: close the current stream, and enable a new one, then make sure that all your writes belong to one partition. In the majority of cases, table partitions have a one-to-one relationship with active shards. Theoretically, you have only one active shard to deal with.

Traditional pdf indexing solution compared to graph-based version

My intention is to index an arbitrary directory containing pdf files (among other file types) with keywords stored in a list. I have a traditional solution and I heard that graph based solutions using e.g. SimpleGraph could be more elegant/efficient and independent of directory structures.
What would a graph-based solution (e.g. SimpleGraph) look like?
Traditional solution
// https://stackoverflow.com/a/14051951/1497139
List<File> pdfFiles = this.explorePath(TestPDFFiles.RFC_DIRECTORY, "pdf");
List<PDFFile> pdfs = this.getPdfsFromFileList(pdfFiles);
…
for (PDFFile pdf:pdfs) {
// https://stackoverflow.com/a/9560307/1497139
if (org.apache.commons.lang3.StringUtils.containsIgnoreCase(pdf.getText(), keyWord)) {
foundList.add(pdf.file.getName()); // here we access by structure (early binding)
// - in the graph solution by name (late binding)
}
}
Basically with SimpleGraph you'd use a combination of the modules
FileSystem
PDFSystem
With the FileSystem module you collect your graph of files in the directory and filter it to include only files with the extension pdf - then you analyze the PDFs using the PDFSystem to get the page/text structure - there is already a test case for this in the simplegraph-bundle module showing how it works with some RFC pdfs as input.
TestPDFFiles.java
I have now added the indexing test see below.
The core functionality has been taken from the old test with searching for a single keyword and allowing this as a parameter:
List<Object> founds = pdfSystem.g().V().hasLabel("page")
.has("text", RegexPredicate.regex(".*" + keyWord + ".*")).in("pages")
.dedup().values("name").toList();
This is a gremlin query that will do most of the work by searching in a whole tree of PDF files with just one call. I consider this more elegant since you do not have to care about the structure of the input (tree/graph/filesystem/database, etc ...)
JUnit Testcase
#Test
/**
* test for https://github.com/BITPlan/com.bitplan.simplegraph/issues/12
*/
public void testPDFIndexing() throws Exception {
FileSystem fs = getFileSystem(RFC_DIRECTORY);
int limit = Integer.MAX_VALUE;
PdfSystem pdfSystem = getPdfSystemForFileSystem(fs, limit);
Map<String, List<String>> index = this.getIndex(pdfSystem, "ARPA",
"proposal", "plan");
// debug=true;
if (debug) {
for (Entry<String, List<String>> indexEntry : index.entrySet()) {
List<String> fileNameList = indexEntry.getValue();
System.out.println(String.format("%15s=%3d %s", indexEntry.getKey(),
fileNameList.size(), fileNameList));
}
}
assertEquals(14,index.get("ARPA").size());
assertEquals(9,index.get("plan").size());
assertEquals(8,index.get("proposal").size());
}

Blackberry not creating a valid sqlite database

I have a very unusual problem.
I'm trying to create a simple database (6 tables, 4 of which only have 2 columns).
I'm using an in-house database library which I've used in a previous project, and it does work.
However with my current project there are occasional bugs. Basically the database isn't created correctly. It is added to the sdcard but when I access it I get a DatabaseException.
When I access the device from the desktop manager and try to open the database (with SQLite Database Browser v2.0b1) I get "File is not a SQLite 3 database".
UPDATE
I found that this happens when I delete the database manually off the sdcard.
Since there's no way to stop a user doing that, is there anything I can do to handle it?
CODE
public static boolean initialize()
{
boolean memory_card_available = ApplicationInterface.isSDCardIn();
String application_name = ApplicationInterface.getApplicationName();
if (memory_card_available == true)
{
file_path = "file:///SDCard/" + application_name + ".db";
}
else
{
file_path = "file:///store/" + application_name + ".db";
}
try
{
uri = URI.create(file_path);
FileClass.hideFile(file_path);
} catch (MalformedURIException mue)
{
}
return create(uri);
}
private static boolean create(URI db_file)
{
boolean response = false;
try
{
db = DatabaseFactory.create(db_file);
db.close();
response = true;
} catch (Exception e)
{
}
return response;
}
My only suggestion is keep a default database in your assets - if there is a problem with the one on the SD Card, attempt to recreate it by copying the default one.
Not a very good answer I expect.
Since it looks like your problem is that the user is deleting your database, just make sure to catch exceptions when you open it (or access it ... wherever you're getting the exception):
try {
URI uri = URI.create("file:///SDCard/Databases/database1.db");
sqliteDB = DatabaseFactory.open(myURI);
Statement st = sqliteDB.createStatement( "CREATE TABLE 'Employee' ( " +
"'Name' TEXT, " +
"'Age' INTEGER )" );
st.prepare();
st.execute();
} catch ( DatabaseException e ) {
System.out.println( e.getMessage() );
// TODO: decide if you want to create a new database here, or
// alert the user if the SDCard is not available
}
Note that even though it's probably unusual for a user to delete a private file that your app creates, it's perfectly normal for the SDCard to be unavailable because the device is connected to a PC via USB. So, you really should always be testing for this condition (file open error).
See this answer regarding checking for SDCard availability.
Also, read this about SQLite db storage locations, and make sure to review this answer by Michael Donohue about eMMC storage.
Update: SQLite Corruption
See this link describing the many ways SQLite databases can be corrupted. It definitely sounded to me like maybe the .db file was deleted, but not the journal / wal file. If that was it, you could try deleting database1* programmatically before you create database1.db. But, your comments seem to suggest that it was something else. Perhaps you could look into the file locking failure modes, too.
If you are desperate, you might try changing your code to use a different name (e.g. database2, database3) each time you create a new db, to make sure you're not getting artifacts from the previous db.

Need a cq5 example

I am new to Adobe cq5. Went through many online blogs and tutorials but could not get much. Can any one provide a Adobe cq5 application example with detailed explanation that can store and retrieve data in JCR.
Thanks in advance.
Here's a snippet for CQ 5.4 to get you started. It inserts a content page and text (as a parsys) at an arbitrary position in the content hierarchy. The position is supplied by a workflow payload, but you could write something that runs from the command line and use any valid CRX path instead. The advantage of making it a process step is that you get a session established for you, and the navigation to the insert point has been taken care of.
import java.text.SimpleDateFormat;
import java.util.Date;
import javax.jcr.Node;
import javax.jcr.RepositoryException;
import org.apache.sling.jcr.resource.JcrResourceConstants;
import org.apache.felix.scr.annotations.Component;
import org.apache.felix.scr.annotations.Properties;
import org.apache.felix.scr.annotations.Property;
import org.apache.felix.scr.annotations.Service;
import org.osgi.framework.Constants;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.day.cq.workflow.WorkflowException;
import com.day.cq.workflow.WorkflowSession;
import com.day.cq.workflow.exec.WorkItem;
import com.day.cq.workflow.exec.WorkflowData;
import com.day.cq.workflow.exec.WorkflowProcess;
import com.day.cq.workflow.metadata.MetaDataMap;
import com.day.cq.wcm.api.NameConstants;
#Component
#Service
#Properties({
#Property(name = Constants.SERVICE_DESCRIPTION,
value = "Makes a new tree of nodes, subordinate to the payload node, from the content of a file."),
#Property(name = Constants.SERVICE_VENDOR, value = "Acme Coders, LLC"),
#Property(name = "process.label", value = "Make new nodes from file")})
public class PageNodesFromFile implements WorkflowProcess {
private static final Logger log = LoggerFactory.getLogger(PageNodesFromFile.class);
private static final String TYPE_JCR_PATH = "JCR_PATH";
* * *
public void execute(WorkItem workItem, WorkflowSession workflowSession, MetaDataMap args)
throws WorkflowException {
//get the payload
WorkflowData workflowData = workItem.getWorkflowData();
if (!workflowData.getPayloadType().equals(TYPE_JCR_PATH)) {
log.warn("unusable workflow payload type: " + workflowData.getPayloadType());
workflowSession.terminateWorkflow(workItem.getWorkflow());
return;
}
String payloadString = workflowData.getPayload().toString();
//the text to be inserted
String lipsum = "Lorem ipsum...";
//set up some node info
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("d-MMM-yyyy-HH-mm-ss");
String newRootNodeName = "demo-page-" + simpleDateFormat.format(new Date());
SimpleDateFormat simpleDateFormatSpaces = new SimpleDateFormat("d MMM yyyy HH:mm:ss");
String newRootNodeTitle = "Demo page: " + simpleDateFormatSpaces.format(new Date());
//insert the nodes
try {
Node parentNode = (Node) workflowSession.getSession().getItem(payloadString);
Node pageNode = parentNode.addNode(newRootNodeName);
pageNode.setPrimaryType(NameConstants.NT_PAGE); //cq:Page
Node contentNode = pageNode.addNode(Node.JCR_CONTENT); //jcr:content
contentNode.setPrimaryType("cq:PageContent"); //or use MigrationConstants.TYPE_CQ_PAGE_CONTENT
//from com.day.cq.compat.migration
contentNode.setProperty(javax.jcr.Property.JCR_TITLE, newRootNodeTitle); //jcr:title
contentNode.setProperty(NameConstants.PN_TEMPLATE,
"/apps/geometrixx/templates/contentpage"); //cq:template
contentNode.setProperty(JcrResourceConstants.SLING_RESOURCE_TYPE_PROPERTY,
"geometrixx/components/contentpage"); //sling:resourceType
Node parsysNode = contentNode.addNode("par");
parsysNode.setProperty(JcrResourceConstants.SLING_RESOURCE_TYPE_PROPERTY,
"foundation/components/parsys");
Node textNode = parsysNode.addNode("text");
textNode.setProperty(JcrResourceConstants.SLING_RESOURCE_TYPE_PROPERTY,
"foundation/components/text");
textNode.setProperty("text", lipsum);
textNode.setProperty("textIsRich", true);
workflowSession.getSession().save();
}
catch (RepositoryException e) {
log.error(e.toString(), e);
workflowSession.terminateWorkflow(workItem.getWorkflow());
return;
}
}
}
I have posted further details and discussion.
A few other points:
I incorporated a timestamp into the name and title of the content
page to be inserted. That way, you can run many code and test cycles
without cleaning up your repository, and you know which test was the
most recently run. Added bonus: no duplicate file names, no
ambiguity.
Adobe and Day have been inconsistent about providing constants for
property values, node types, and suchlike. I used the constants that
I could find, and used literal strings elsewhere.
I did not fill in properties like the last-modified date. In code for
production I would do so.
I found myself confused by Node.setPrimaryType() and
Node.getPrimaryNodeType(). The two methods are only rough
complements; the setter takes a string but the getter returns a
NodeType with various info inside it.
In my original version of this code, I read the text to be inserted from a file, rather than just using the static string "Lorem ipsum..."
Once you've worked through this example, you should be able to use the Abobe docs to write code that reads data back from the CRX.
If you want to learn how to write a CQ application that can store and query data from the CQ JRC, see this article:
http://scottsdigitalcommunity.blogspot.ca/2013/02/querying-adobe-experience-manager-data.html
This provides a step by step guide and walks you right through the entire processes - including building the OSGi bundle using Maven.
FRom the comments above - I see reference to BND file. You should stay away from CRXDE to create OSGi and use Maven.

Blackberry - Cannot create SQLite database

I am making an app that runs in the background, and starts on device boot.
I have read the docs, and have the SQLiteDemo files from RIM, and I am using them to try create a database on my SD Card in the simulator.
Unfortunately, I am getting this error:
DatabasePathException:Invalid path name. Path does not contains a proper root list. See FileSystemRegistry class for details.
Here's my code:
public static Database storeDB;
public static final String DATABASE_NAME = "testDB";
private String DATABASE_LOCATION = "file:///SDCard/Databases/MyDBFolder/";
public static URI dbURI;
dbURI = URI.create(DATABASE_LOCATION+DATABASE_NAME);
storeDB = DatabaseFactory.openOrCreate(dbURI);
I took out a try/catch for URI.create and DatabaseFactory.openOrCreate for the purposes of this post.
So, can anyone tell me why I can't create a database on my simulator?
If I load it up and go into media, I can create a folder manually. The SD card is pointing to a folder on my hard drive, and if I create a folder in there, it is shown on the simulator too, so I can create folders, just not programatically.
Also, I have tried this from the developer docs:
// Determine if an SDCard is present
boolean sdCardPresent = false;
String root = null;
Enumeration enum = FileSystemRegistry.listRoots();
while (enum.hasMoreElements())
{
root = (String)enum.nextElement();
System.err.println("root="+root);
if(root.equalsIgnoreCase("sdcard/"))
{
sdCardPresent = true;
}
}
But it only picks up store/ and never sdcard/.
Can anyone help?
Thanks.
FYI,
I think I resolved this.
The problem was I was trying to write to storage during boot-up, but the storage wasn't ready. Once the device/simulator was loaded, and a few of my listeners were triggered, the DB was created.
See here:
http://www.blackberry.com/knowledgecenterpublic/livelink.exe/fetch/2000/348583/800332/832062/How_To_-_Write_safe_initialization_code.html?nodeid=1487426&vernum=0

Resources