Automatically log changes to system files and allow revert - unix

I'm trying to learn about the guts of Unix right now, mostly through experimentation. When I was first starting, I found myself looking through forum posts, copying and pasting bash code. When I broke something, I often had to do a fresh install because I couldn't remember what exactly I had changed where. Now, the simple solution is to record a log of all the system files I've changed and keep original copies of all the default files so I can revert if necessary. It would be great if there was a cl tool which did this for me automatically. It would be even greater if I could step back through changes. Basically, I'm looking to version control my entire OS.
Does anything like this exist? I would also accept alternative strategies for spelunking through Unix without causing permanent damage if you think I'm going about this wrong.
Using debian if it matters.

Related

R and Rstudio Docker vs Binder

My problem is that I can't use R-studio at my work place as the IT does not support it . I want to use R and R-studio that installed on my personnel laptop on my company laptop ( using a modern browser which is behind firewall ) . Some of the options I am thinking of two two things
should I need to build a docker for R and R-studio (I see base images are already available) , I am mostly interested in basic R , Dplyr (haven ,xporter, and Reticulate ) packages .
Should I have to use a binder . I am not technical person and my programming skills are very limited can any one suggest me way .
What exactly are the difference between using Docker option vs Binder ?
I know I can use R-Studio online and get my work done but with the new paid account I am running out of project hours and very slow sometimes . Thanks in advance
Here are some examples beyond the modern RStudio MyBinder example:
https://github.com/fomightez/pythonista_skewedf
https://github.com/fomightez/r_phylogenetics_worshop
https://github.com/fomightez/chapter7/tree/master/binder
The modern RStudio MyBinder example has been set as a template on GitHub so you can use
The first one is for a special use of a package not on conda. And I started that one from square one.
The other two were converted from content by others to aid in making them Binder-ready.
You essentially list everything you need from conda in the environment.yml along with the appropriate channels. If you need special stuff not on conda, you need the other configuration files included there.
Getting everything working can take some iterations on adding things, letting the image get built, and testing your libraries are available. Although you seem to think your situation is not overly complex.
The binder launch badges you see are just images where you modify the URL to point the MyBinder federation site at your repository. Look at the URL and you should see the pattern where you put studio at the end of the URL pointing at your repo. The form at MyBinder.org site can help with this; however, most often it is easier to just adapt a working launch badge's code copied from elsewhere. The form isn't set up at this time for making the URLs for launching to RStudio.
Download anything useful your create in a running session. The sessions timeout after 10 minutes, although RStudio usually keeps them active.
Lack of Persistence and limited memory, storage, & power can be drawbacks. The inherent reproducibility and portability are advantages.
MyBinder.org doesn't work with private repos. If you have code you don't want to share, you can upload it to the temporary session, using the repo for specifying the environment. You could host a private binderhub that does allow the use of private git repositories; however, that is probably overkill for your use case and exceed your ability level at this time.
GitHub isn't the only place to host repositories that can be pointed at the MyBinder system. If you go to the MyBinder.org page and click where it says 'GitHub' on the left side of the top line of the form, you can see a list of the sources at which you can host a repository and point the system to build an image and launch a container with that specified image.
Building the image from a source repository takes some minutes the first time. Once the image is built though on the service, launch is typically less than 30 seconds. Each time you make a change on the source repo, a build is necessary. Some changes don't cause the new build to be as long as the initial one as some optimizing is done to only build what is necessary after a change. Keep in mind there are several members of the federation around the workd and if traffic on the internet gets sent to where the built image isn't yet available, it will be built from scratch again first.
The Holepunch project is out there to offer some help for users working in the R ecosystem; however, with the R-Conda system that is now integrated into MyBinder it is pretty much as easy to do it the way I described. Last I knew, the Holepunch route makes a Dockerfile that isn't as easy to troubleshoot as using the current the R-Conda system route. Dockerfiles are essentially a last ditch configuration file that MyBinder can handle. The reason being the other configuration files are much easier and don't require knowing Dockerfile syntax. MyBinder aims to offer the ability to take advantage of Docker offering containers with a specified environment without users needing to know anything about Docker.
There is a Binder Help category for posting to get help at the Jupyter Discourse Forum. Some other examples of posts already there may help you troubleshoot.
Notice of a common pitfall
Most of the the configuration files for making a repository Binder-ready are simply text and can be edited right in the GitHub browser interface, without need to git or even cloning the repo locally.
Last I knew, there are two exceptions to this. The postBuild and start configuration files have settings that allow them to be run as scripts and these get altered in a way they no longer work if you edit them via the GitHub browser interface. (This was my experience when last I tried. Your mileage may vary or things may have changed now.) To edit those, you have to have git available on a system you have and pull one from some other source. Then edit that on your machine that has git working & add it your repo and push it back up from your local computer.
(If this is a problem, you can post in the Jupyter Discourse Forum Binder help category and you and I could coordinate where I fork and edit those files in your repo to your specifications and then make a pull request to update your source of the fork with those changes.)
If you are using Jupyter notebooks extensively then it may make sense to use Binder
But if you simply want to use R and Rstudio, then all you need is docker. A good resource is
https://github.com/rocker-org/rocker

RStudio R File Corruption

I had a R script open in RStudio. The file was saved many times over the course of several weeks and worked perfectly fine when RStudio was opened and closed. However, today, I restarted my computer and when I opened RStudio and more specifically the script that I mentioned, all of the R code vanished, leaving a single long row of "....." with red highlighting.
When I tried to open the R file in other text processors such as Sublime Text and Notepad++, only a line of zeroes was visible. None of my other R files were affected. I'm currently running Windows 8.1 and have the latest version of R and RStudio. What can I do to recover the code in the file and prevent something like this from happening again?
It might be an old thread and it might have been covered in 'user4458796' answer in suggestion #1 ("Use the history..."), but:
My friend had the same problem and we managed to recover the code from a 'history_database' files located on Windows at:
'C:\Users\%user%\AppData\Local\RStudio-Desktop\'
I assume there is an equivalent location in Linux in general.
Hopefully I won't get downvoted, just sharing my 2cents.
Ben.
It's not clear what happened to corrupt your file (and thus how to fix it if possible) and it is kind of ominous that you're just seeing 0's in other text editors, but I'll give you my best suggestion and some tips.
Suggestions for Attempting Recovery
Since your other R files were unaffected, you should have a messy record of your code in the history. Use the history to reconstruct your code.
Access a copy of your file from any version control, cloud, or offline backup you may have used -- git, SVN, iCloud, SugarSync, Dropbox, etc (I realize you probably wouldn't have posted this question if that were an option, but I had to throw it out there).
Use a Hex or sector editor to try to recover the data.
Use a data recover program to find an old version of your file.
Inspect your trash or recycling bin to see if it has an old version. Depending on your OS and the settings of how you (insecurely or securely) delete files, then you may be able to undelete a deleted version, even if it's not immediately available.
Try different methods of recovering text data from corrupted text files like OpenOffice's and Microsoft's suggestions.
Tips for the Future
I know that hindsight is 20/20, but a few quick tips for good measure:
Use version control. Git is supported in RStudio's GUI interface.
Have more than one version of your file. Many professors and professionals recommend writing/storing code in a text editor and using your IDE only for the working copy.
Make backups. Distinct from #2, you should backup your files to a hard drive, flash drive, or cloud service like Dropbox or Spideroak.

Is there a tool to know which lines from a CSS file has been changed?

For example, I'm working on a .CSS file and I handed it over to another person and that person made some changes.
When the file is handed back to me, is there a tool that I can run to help me know which lines in the CSS has been changed?
What you need is a diff tool. For Windows:
WinMerge
Linux and Mac OS X have the tool diff, which can be used via command line.
You could do a command line diff most platforms have this.
There are lots of free, and paid for, comparison tools available, personally I like kdiff3, but the better option would be to use a version control tool such as Mercurial, (hg), as you work and before handing anything over to anybody - then you can carry on working and still merge their changes back in even if you have also made changes.

How can I uninstall Win32 assemblies and cleanup WinSxS?

After a lot of trial and error (mostly due to lack of documentation and examples) I have managed to create MSI installers that install custom DLLs to WinSxS as side-by-side assembly. There is only one problem: Uninstalling leaves all files (DLLs, manifests and catalogs) in the WinSxS directory. How can or should I best clean that up? I know for sure that nothing else references it.
I have read somewhere that WinSxS has a self-scavenging process that cleans up over time but I could not find more information about that. Can you manually invoke this to clean up stuff?
The only other way I see is manually deleting those bits. First you have to change the owner of all files (assembly, catalog, manifest and their respective directory) from SYSTEM to an administrator account, adjust the permissions and delete them. There are also pieces left in the registry (I think HKLM\COMPONENTS\DerivedData\Components may be one place), but since WinSxS should be treated as opaque it is hard to find any information.
Scavenging isn't exposed anywhere that I know of. I'm not even sure when it is kicked off automatically. Maybe on uninstall of a service pack? Maybe some tool admins can run? I really forget.
Anyway, my suggestion is don't fight it. There are so many twisty turns down there that it just isn't worth trying to get the disk space back. Once uninstalled the bits still in the SxS cache will not be activated so they are just wasting space.
It's a dumb design but blame Microsoft and don't try to overcompensate.
Here is an article, it's kinda complete guide to WinSxS.
So, shortly, you can only uninstall some components (all their versions are in this folder), and you can run Service Pack bridge burning utility (in Vista it is named VSP1CLN.EXE and shipped with SP1). Note, that after execution, you shouldn't be able to uninstall SP or any components to state, prior to SP release date.
No-one is convinced you can - short of a complete reinstall, your bloaty WinSxS directory is there to stay.
There's been a long "discussion" of the problem on technet.
There is no documentation of the format, or any instructions how to remove files that are no longer needed - MS seems to think that disc space is cheap. There is a self-scavenging feature, but no-one's convinced it works, or if it does, it is very conservative (as you'd hope as you don't want it to break your OS)
You can tell is the scavenger is working by checking the "C:\Windows\winsxs\Temp\PendingDeletes." folder, as this is where files are moved by windows update or an installer moves them to - the scavenger just deletes the files in here.
You'll notice that after you uninstall your assembly, while the files are still there, they can no longer be bound to - so they are just "staged", or cached, but not really installed.
Rob & gbjbaanb are correct - you cannot manually invoke a scavenge yourself. Don't try to delete the files yourself - there are multiple places in the registry where they are registered, DerivedData\Components being only one of the many references.
I think the rule for Vista is scavenging is kicked off by the TrustedInstaller service after 10 minutes of machine inactivity, after the last servicing operation (service pack, hotfix, etc). But it's very fickle, so it doesn't run as often as it should. So just be patient, and the files will disappear on their own.
Well i was having some issues as i have an 80GB SSD for my windows and the WinSxs folder was about 12gb's
I was searching the net and i found this command:
DISM.exe /online /Cleanup-Image /spsuperseded
And now my WinSxs is 7gb which was wonderful news.
There are a few updates regarding the cleanup method that apply to newer OS. Check http://www.karafilis.net/winsxs-cleanup

What artifacts to save for a nightly build?

Assume that I set up an automatic nightly build. What artifacts of the build should I save?
For example:
Input source code
output binaries
Also, how long should I save them, and where?
Do your answers change if I do Continuous Integration?
You shouldn't save anything for the sake of saving it. you should save it because you need it (i.e., QA uses nightly builds to test). At which point, "how long to save it" becomes however long QA wants them.
i wouldn't "save" source code so much as tag/label it. I don't know what source control you're using, but tagging is trivial (performance & disk space) for any quality source control system. Once your build is tagged, unless you need binaries, there really isn't any benefit to just having them around because you can simply re-compile when necessary from source.
Most CI tools let you tag on each successful build. This can become problematic for some systems as you can easily have 100+ tags a day. For such cases I recommend still running a nightly build and only tagging that.
Here are some artifacts/information that I'm used to keep at each build:
The tag name of the snapshot you are building (tag and do a clean checkout before you build)
The build scripts themselfs or their version number (if you treat them as a separate project with its own version control)
The output of the build script: logs and final product
A snapshot of your environment:
compiler version
build tool version
libraries and dll/libs versions
database version (client & server)
ide version
script interpreter version
OS version
source control version (client and server)
versions of other tools used in the process and everything else that might influence the content of your build products. I usually do this with a script that queries all this information and logs it to a text file that should be stored with the other build artifacts.
Ask yourself this question: "if something destroys entirely my build/development environment what information would I need to create a new one so I can redo my build #6547 and end up with the exact same result I got the first time?"
Your answer is what you should keep at each build and it will be a subset or superset of the things I already mentioned.
You can store everything in your SCM (I'd recommend a separate repository), but in this case your question on how long you should keep the items looses sense. Or you should store it to zipped folders or burn a cd/dvd with the build result and artifacts. Whatever you choose, have a backup copy.
You should store them as long as you might need them. How long, will depend on your development team pace and your release cycle.
And no, I don't think it changes if you do continous integration.
This isn't a direct answer to your question, but don't forget to version control the nightly build setup itself. When the project structure changes, you may have to change the build process, which will break older builds from that point on.
In addition to the binaries as everyone else has mentioned I would recomend setting up a symbol server and a source server and making sure you get the correct information out and into those. It will aid in debugging tremendously.
We save the binaries, stripped and unstripped (so we have the exactly same binary, once with and once without debug symbols). Further we build everything twice, once with debug output enabled and once without (again, stripped and unstripped, so every build result in 4 binaries). The build is stored to a directory according to SVN revision number. That way we can always retain the source from the SVN repository by simply checking out this very revision (that way the source is archived as well).
A surprising one I learned about recently: If you're in an environment that might be audited you'll want to save all the output of your build, the script output, the compiler output, etc.
That's the only way you can verify your compiler settings, build steps, etc.
Also, how long to save them for, and where to save them?
Save them until you know that build won't be going to production, iow as long as you have the compiled bits around.
One logical place to save them is your SCM system. Another option is to use a tool that will automatically save them for you, like AnthillPro and its ilk.
We're doing something close to "embedded" development here, and I can tell you what we save:
the SVN revision number and timestamp, as well as the machine it was built on and by whom (also burned into the build binaries)
a full build log, showing whether it was a full/incremental build, any interesting (STDERR) output the data baking tools produced, a list of files compiled and any compiler warnings (this compresses very well, being text)
the actual binaries (for anywhere from 1-8 build configurations)
files produced as a side effect of linking: a linker command file, address map, and a sort of "manifest" file indicating what was burned into the final binaries (CRC and size for each), as well as the debugging database (.pdb equivalent)
We also mail out the result of running some tools over the "side-effect" files to interested users. We don't actually archive these since we can reproduce them later, but these reports include:
total and delta of filesystem size, broken down by file type and/or directory
total and delta of code section sizes (.text, .data, .rodata, .bss, .sinit, etc)
When we have unit tests or functional tests (e.g. smoke tests) running, those results show up in the build log.
We've not thrown out anything yet -- given, our target builds usually end up at ~16 or 32 MiB per configuration, and they're fairly compressible.
We do keep uncompressed copies of the binaries around for 1 week for ease of access; after that we keep only the lightly compressed version. About once a month we have a script that extracts each .zip that the build process produces and 7-zips a whole month of build outputs together (which takes advantage of only having small differences per build).
An average day might have a dozen or two builds per project... The buildserver wakes up about every 5 minutes to check for relevant differences and builds. A full .7z on a large very active project for one month might be 7-10GiB, but it's certainly affordable.
For the most part, we've been able to diagnose everything this way. Occasionally there's a hiccup on the buildsystem and a file isn't actually a the revision it's supposed to be when a build happens, but there's usually enough evidence of this in the logs. Sometimes we have to dig out a tool that understands the debugging database format and feed it a few addresses to diagnose a crash (we have automatic stackdumps built into the product). But usually all the information needed is there.
We haven't had to crack the .7z archives yet, to mention. But we have the info there, and I have some interesting ideas on how to mine bits of useful data from it.
Save what can't be reproduced easily. I work on FPGAs where only the FPGA team have the tools and some cores (libraries) of the design are licensed to compile on only one machine. So we save the output bitstreams. But try to check them over one another rather than with a date/time/version stamp.
Save as in check in to source code control or just on disk? Save nothing to source code control. All derived files should be visible in the file system and available to developers. Don't checkin binaries, code generated from XML files, message digests etc. A separate packaging step will make these end products available. As you have the change number you can always reproduce the build if necessary assuming of course everything you need to do a build is completely in the tree and is available to all builds by syncing.
I would save your built binaries for exactly as long as they have a chance to go into production or be used by some other team (like a QA group). Once something has left production, what you do with it can vary a lot. For a lot of teams, they'll keep just their most recent prior build around (for rollback) and otherwise discard their builds.
Others have regulatory requirements to keep anything that went into production around for as long as seven years (banks). If you are a product company, I'd keep around any binary a customer might have installed in case a tech support guy wants to install the same version.

Resources