unzip in google colab corrupts after first read

unzip in google colab corrupts after first read - unzip

I have uploaded a 11GB images file into my google drive, and trying to unzip in colab for processing. First time it processes properly. After colab session is closed and restarted the next day, the same unzip command fails saying the zip file is corrupted. So I had to remove corrupted zip file, load original zip file again and use it colab. Again first time works perfectly fine, but second time it fails again. Every time uploading 11GB file to google drive takes lot of time, and uses so much bandwidth.
I am using !unzip '/content/drive/My Drive/CheXpert-v1.0-small.zip to unzip.
Second time also it starts unzipping, but after few records, it throws an error saying read error on a specific image, which is different every time. If I restart unzip again, it gives offset errors without unzipping any image.
Is there any way to fix this problem, so that I can unzip successfully any number of times?
Thanks in advance for a quick help.

Related

Resume download if _.gstmp files after downloading Sentinel-2 SAFE products using sen2r R package

I have downloaded a large number of Sentinel-2 SAFE files using the R package 'sen2r', which has implemented a Google Cloud download method to retrieve products stored in Long Term Archive. This has worked for me, but after checking the files I have found a decent number of empty files appended with _.gstmp, which according to this represent partially downloaded temporary files that are supposed to be resumed by gsutil. I have re-run the sen2r() command (with server = "gcloud" setting) but it does not resume and correct the downloads as the folders are already there. I would like to resume downloading just the _.gstmp files as it took over a week to download all of the SAFE products and I don't want to start all over again. I'm guessing I can fix this by using 'gsutil' directly but I'm a bit out of my element as this is my first experience using Google Cloud and the sen2r author as they no longer have time to respond to issues on github. If you have any tips for resuming these downloads manually using gsutil command line it would be much appreciated.
I have searched stack exchange and also the sen2r manual and github issues and have found any other reports of the problem.

jupyter notebook uploading a 10mB source data file

I have a question about uploading a 10mB source data file.
I tried multiple ways to upload this: upload its original version, zipped version, and txt version.
However, every time I click the uploaded data source file, I see this following error message:
out of memory.
I need your advice how to resolve this.

How can you get a second copy of a running log file without deleting it?

I am trying to trouble shoot an issue on an application running on flavor of UNIX.
The default logging level puts out a reasonable amount of messages and does not affect performance.
But where there is an issue I need to change the logging level to verbose. Thousands of line in a second. Which effects performance.
Doing a delete of the trace log file would crash the application.
Being able to change back the logging level as quick as possible helps avoid a production performance hit.
The code is running in production so a performance hit is not good.
How can one create a second instance of the log for just the second or two that the problem is reproduced?
This would save having to copy the whole large file and then doing an edit to remove log entries not of concern for the problem at hand?
I have answered my own question because I have found this tip to be very useful at times and hope it helps others.

The steps below show how to quickly get a small section of the log to a separate file.
1) Navigate to the directory with the log file that you want to clear.
cd /logs
2) At the command prompt enter the following line. e.g. (include the ">")
> trace.log
This will clear the file trace.log without a file pointer changing.
3) Now quickly reproduce the error.
4) Quickly go back to the command line and copy the trace.log file to a new file.
cp trace.log snippet_of_trace.log
5) Now you have a much small log to analyze.

How to load the actual .RData file, that is just called .RData (the compressed file that gets saved from a session)

Similar questions, but not the question I have, were around loading a file that someone saved as somefilename.RData. I am trying to do something different.
What I am trying to do is load the actual .RData file that gets saved from an R session. The context is that I am using 2 different computers and am trying to download the .RData file from one computer and then load this same .RData file on a different computer in RStudio.
When I download the .RData file it shows up without the “.” (e.g., it shows up as RData). When I try to rename it “.RData”, Windows will not allow me to do so.
Is there a way to do what I am trying to do?
Thanks!

After playing around with this, I was able to load the file (even though it was called “RData“ and not called “.RData”, by using RStudio by going to Session > Load Workspace... and then navigating to that file. I had used File > Open File... which did not work

Dropbox permissions on ggplot2 saved charts

I have an R script that I run on a regular basis with launchd (OS X 10.8.3 Mountain Lion), calling it with Rscript myscript.R
The script creates generates some ggplot2 plots and saves them into my Dropbox folder with the ggsave() function.
The problem I am having is that the saved plots don't sync to Dropbox properly - they get the little blue "synching" icon and never upload. I can fix it by going into the Dropbox preferences and using "fix permissions" but I'd like to have it so that when I output the files they will synch without any problems.
What could be the problem? If I run through the same script manually in RStudio, the plots save properly and synch to Dropbox without this happening.

It turns out that this was indeed a file ownership issue. I had launched set up to run my script as root, and because the files had the root owner, the .png charts saved from ggplot2 would not sync to Dropbox, which is under my user account.
The odd thing is that my script also output .html files, which do sync even with the root owner.
When I changed it to run under my user name, the output of the script synced to Dropbox as it should. Now, my only problem is that launchd will not run the script if I'm not logged in :/

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

unzip in google colab corrupts after first read - unzip

Related

Resume download if _.gstmp files after downloading Sentinel-2 SAFE products using sen2r R package

jupyter notebook uploading a 10mB source data file

How can you get a second copy of a running log file without deleting it?

How to load the actual .RData file, that is just called .RData (the compressed file that gets saved from a session)

Dropbox permissions on ggplot2 saved charts

Categories

Resources