How to determine a text block of a file in one version come from which file in the previous version? - similarity

The problem is described below:
Suppose I have a list of files in one version(say A,B,C,D). In the next version I have the following files(A,E,F,G). There are some similarities in their contents. The files in the later version comes from the previous version by file name renaming, content addition, deletion or partial modification or without any change( for example A is not changed).
I take a block of text from a file(E, 2nd version) and check which files(in the 1st version) contain this text block. I found that B,C and D contain the text fragment. I want to determine from which file(B or c or d) this text block actually comes from.(I assume that E is a file whose name change in the second version).
Since the contents may be changed, added or deleted in the later version, so in order to determine similarity I use LCS algorithm. But I cannot map the file with its previous version.
I think one possible approach might be to use the location information of the match text blocks. But this heuristics not always work. Is there any research or algorithm exist to find so. Any direction will be helpful. Thanks in advance.

I think it may be helpful to take a look at Subversion, and its capability to track file renaming between versions. http://svnbook.red-bean.com/
It's tried and tested, because it's used by so many developers. Renaming has to occur by using subversion tools though, but there are many (command line, file explorer integration for different OS, GUIs, IDEs, you name it). It also covers moving files between directories, and merging several lines of changes (branches).

Related

Ada `Gprbuild` Shorter File Names, Organized into Directories

Over the past few weeks I have been getting into Ada, for various different reasons. But there is no doubt that information regarding my personal reasons as to why I'm using Ada is out of scope for this question.
As of the other day I started using the gprbuild command that comes with the Windows version of GNAT, in order to get the benefits of a system for managing my applications in a project-related manner. That is, being able to define certain attributes on a per-project basis, rather than manually setting up the compile-phase myself.
Currently when naming my files, their names are based off of what seems to be a standard for the grpbuild, although I could very much be wrong. For periods (in the package structure), a - is put in the name of the file, for underscores, an _ is put accordingly. As such, a package by the name App.Test.File_Utils would have a file name of app-test-file_utils: .ads and .adb accordingly.
In the .gpr project file I have specified:
for Source_Dirs use ("app/src/**");
so that I am allowed to use multiple directories for storing my files, rather than needing to have them all in the same directory.
The Problem
The problem that arises, however, is that file names tend to get very long. As I am already putting the files in a directory based on the package name contained by the file, I was wondering if there is a way to somehow make the compiler understand that the package name can be retrieved from the file's directory name.
That is, rather than having to name the App.Test.File_Utils' file name app-test-file_utils, I would like it to reside under the app/test directory by the name file_utils.
Is this doable, or will I be stuck with the horrors of eventually having to name my files along the lines of: app-test-some-then-one-has-more_files-another_package-knew-test-more-important_package.ads? Granted, I have not missed something about how an Ada application should actually be structured.
What I have tried
I tried looking for answers in the package Naming configuration of the gpr files in the documentation, but to no avail. Furthermore I have been browsing the web for information, but decided it might be better to get help through Stackoverflow, so that other people who might struggle with this problem in the future (granted it is a problem in the first place) might also get help.
Any pointers in the right direction would be very helpful!
In the top-secret GNAT documentation there is a description of how to use non-default file names. It's a great deal of effort. You will probably give up, use the default names, and put them all in a single directory.
You can also simplify much of the effort by using GPS and letting it build your project file as you add files to your source directories.

How to find and replace across multiple documents using Adobe Brackets

ive decided to give the text editor [brackets] a trial. Im just wondering is there a way to find and replace in all open documents using brackets ?
I can see a find and a replace option, but ive got to hunt down alot of content across multiple pages.
I cant seem to find a short cut or an option from any of the drop down menus to replace all matches across multiple files.
any help greatly appreciated
Update: To use this feature now that it's available, just choose Find > Replace in Files. You can also right-click any folder in the left-hand pane and choose Replace In... to only replace within that subtree of your project.
Original answer:
This feature (replacing across multiple files at once) is not available yet, but it's currently under development and will probably ship in the next release of Brackets, Sprint 40. You can track its progress by following the Replace Across Multiple Files card on Trello. (Note: once it ships that link will break, because the card will be moved onto the Brackets History board).

Change Word 2013 autocorrect behaviour

This question involves bending Microsoft Word 2013 to one's will.
I have been asked to help fix a problem with Word 2013's autocorrect.
We are working on a spell checker for my native language (Afrikaans), and many Afrikaans words contain a diacritical/umlaut (ë, ö, Ü, etc).
The spell checker consists of a .dic file which is basically just a text file that contains about 508 000 words, and an autocorrect list (.acl) file that is used to automatically replace text as you type.
The spell checker works very well for the most part. It replaces the text as you type, which is the desired effect. The problem is that autocorrect doesn't work with all words.
For example, if I want to type the Afrikaans word 'pêrels' (which means 'pearls'), I should only have to type 'perels' (without the ^ character on the 'e'), and autocorrect should automatically change it to the correct form.
Same with 'reën' (rain). If I type 'reen' (without the umlaut), it is supposed to automatically correct it.
However, in both of the above cases, the words remain unchanged. A red line appears under the words, and when you right-click, you can select the correct word from the pop-up autocorrect menu as shown in the image below.
As you can see, the correct form of the word is the first one in the context menu. I need autocorrect to automatically change the wrong word into the first word that appears in said menu. It should completely ignore the other menu items, and just go with the first word.
My initial instinct was to manually add the words to the *.acl file using a text editor, but the file is encrypted and not readable (I used Notepad++).
I then tried adding them inside Word's autocorrect options menu. However, Word 2013 has a maximum autocorrect memory of 64KB, and the size of the file is already at that maximum. Whenever I add more words, it bombs out and basically wipes the file contents. This doesn't seem like the most efficient strategy anyway, since I would need to manually enter hundreds, if not thousands of autocorrect cases. Ain't nobody got time for that!
What makes this even more complicated (ironically), is that there is no real "program". In other words, this isn't a C# program with source code that I can manipulate. I have the two files mentioned above, and Word's built-in options (which I have already explored). That's it. Nothing else.
I'm stuck. Does anyone have any ideas?
Is it perhaps possible for me to hack Word to increase the autocorrect memory to, let's say, 128 KB? Google hasn't turned up anything of use.
Or, is there a way to set Word to not give the autocorrect context menu, and instead default to the first matching word in the dictionary, as mentioned above?
I can probably write a batch script, C# program, or edit the registry if need be. I just need to know where to start.
Thanks for any help!
In case you are still looking for a solution, you might consider using AutoHotkey (http://www.autohotkey.com). It is a very powerful free open-source utility, and can handle substitutions similar to AutoCorrect. Whenever the built-in program features of Word and others fail to handle my needs, I use AutoHotkey. It has the added benefit of not being tied to any specific program (e.g., Word), so the substitutions can occur anywhere needed. I hope it helps you. I have used and depended on AutoHotkey for years of new Windows versions, new Office versions, and highly recommend having a look. You might even get new ideas about time-saving automation with AutoHotkey. Good luck!

How to query the containing partition of a file with KDE/Qt4?

I'm using KDE, and I'm toying with the idea of hacking the code for Dolphin File Manager (and potentially Konqueror if necessary) to get context-sensitive drag and drop behaviour (i.e. files are moved within the same partition, or copied if they're moved across partitions or the source is read only).
To do this, I think I'd need to find out the containing partition of the source and destination (easy enough on Windows using the drive letter, but on Linux, as mount points can be almost anywhere, it can't be reliably derived from the file path), and compare them.
Does anyone know how I can find out the partition that contains a given file?
It must be possible - I know Nautilus provides this sort of behaviour, but I'm not familiar enough with GTK to track down the appropriate section in the source code to see how its done...
Qt doesn't provide API for this. For POSIX, have a look at stat.
For KDE, you can use KIO::stat() to get mostly the same info as POSIX' stat function but asynchronously.
The device id should be in the field UDS_DEVICE_ID of the result.

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

Resources