I'm producing an .exe file via PyInstaller and I wonder whether there is a specific option or similar to do so while iterating through some changes. For example, I add a simple print statement in the entire code, do I always have to go through the whole building stage of the .exe file?
I'm asking because it takes a while just to try something out. In my case I think it takes between 5-10 min to create the .exe. In general it's ok but when you do that a dozens or more times a day it adds up.
Related
When closing R Studio at the end of a R session, I am asked via a dialog box: "Save workspace image to [working directory] ?"
What does that mean? If I choose to save the workspace image, where is it saved? I always choose not to save the workspace image, are there any disadvantages to save it?
I looked at stackoverflow but did not find posts explaining what does the question mean? I only find a question about how to disable the prompt (with no simple answers...): How to disable "Save workspace image?" prompt in R?
What does that mean?
It means that R saves a list of objects in your global environment (i.e. where your normal work happens) into a file. When R next loads, this list is by default restored (at least partially — there are cases where it won’t work).
A consequence is that restarting R does not give you a clean slate. Instead, your workspace is cluttered with existing stuff, which is generally not what you want. People then resort to all kinds of hacks to try to clean their workspace. But none of these hacks are reliable, and none are necessary if you simply don’t save/restore your workspace.
If I choose to save the workspace image, where is it saved?
R creates a (hidden) file called .RData in your current working directory.
I always choose not to save the workspace image, are there any disadvantages to save it?
The advantage is that, under some circumstances, you avoid recomputing results when you continue your work later. However, there are other, better ways of achieving this. On the flip side, starting R without a clean slate has many disadvantages: Any new analysis you now start won’t be in a clean room, and it won’t be reproducible when executed again.
So you are doing the right thing by not saving the workspace! It’s one of the rules of creating reproducible R code. For more information, I recommend Jenny Bryan’s article on using R with a Project-oriented workflow
But having to manually reject saving the workspace every time is annoying and error-prone. You can disable the dialog box in the RStudio options.
The workspace will include any of your saved objects e.g. dataframes, matrices, functions etc.
Saving it into your working directory will allow you to load this back in next time you open up RStudio so you can continue exactly where you left off. No real disadvantage if you can recreate everything from your script next time and if your script doesn't take a long time to run.
The only thing I have to add here is that you should consider seriously that some people may be working on ongoing projects, i.e. things that aren't accomplished in one day and thus must save their workspace image so as to not start from the beginning again.
I think, best practice is: its ok to save your workspace, but your code only really works if you can clear your entire workspace and then rerun it completely with no errors!
I have a list [~90 files] of zipped files . I have written a loop to unzip them (become 1Gb approx. per file), do some computations, save the output for each of the files and delete the unzipped file. One iteration of this process takes like 30-60min per file [not all files are the same size exactly].
I am not too worried about the time as I can leave it working over the weekend. However, R doesn’t manage to get all the way through. I left it on Friday night and it was only running 12 hours so it only processed 30 of the 90 files.
I don’t deal with this type of heavy processes often but the same has happened in the past with analogous processes. Is there any command I need to insert in my loops to avoid the computer from freezing with this intensive processes? I tried gc() at the end of the loop to no avail.
Is there a list of “good practice” recommendations for this type of procedures?
If your session is freezing you are likely running into a problem you need to isolate as it may be a single file, or maybe you are becoming restricted by memory or extensively using swap.
Regardless, here are some tips or ideas you could implement:
Writing your code to process a file as a singular case, e.g. a function like
process_gz_folder(). Then loop over the file paths and invoke the function you created each time, this keeps the global environment clean.
As you already tried, sometimes gc() can help, but it depends on the situation and if memory is being cleared (after running rm() for example). Could be used after invoking function in first point.
Are you keeping the results of each folder in memory? Does this set of results get larger with each iteration? If so this may be taking up required memory - storing the results to disk as a suitable format will let you accumulate the results after each has been processed.
To add to the prior point, if files produce outputs making sure their names are appropriate and even adding a timestamp (e.g. inputfile_results_YYYYMMDD).
Code could check if file is already processed and skip to next, this can help restarting from scratch, especially if your method for checking if a file is processed is using the existence of an output (with timestamp!).
Using try() to make sure failures do not stop future iterations - however this should produce warnings/output to notify of a failure so that you can come back at a later point.
An abstract approach could be to create a single script that processes a single file, it could just include the function from the first point, proceeded with setTimeLimit() and provide a time for which if the file is not processed the code will stop running. Iterate over this script with a bash script invoking said R script with Rscript which can be passed arguments (filepaths for example). This approach may help avoid freezes but is dependent on you knowing and setting an acceptable time.
Determine if the files are too large for memory when processing the code may need be adjusted to be more memory efficient or change code to process the data incrementally as to not run out of memory.
Reduce other tasks on the computer that can take resources that may cause a freeze.
These are just some ideas that spring to mind that could be things to consider in your example (given the info provided). It would help to see some code and understand what kind of processing you are doing on each file.
Given as little information as you have provided, it is hard to tell what the problem really is.
If possible, I would first unzip and concatenate the files. Then preprocess the data and strip off all fields that are not required for analysis. The resulting file would then be used as input for R.
Also be aware that parsing the input strings as e.g. timestamps may be quite time consuming.
I'm trying to find a way to automate a series of processes that uses several different programmes. (Indifferently on Ubuntu and Windows).
For each programme, I've boiled the process down to either a macro or a script in each programme, so I feel confident that the entire process can be almost entirely automated. I just can't figure out what I can do to create a unifying tool.
The process is the following;
I have a simple text file with data, I use a jedit macro to tidy the data. This then goes to a OpenCalc template to create a graph, that data is then imported to a programme called TXM which (after many clicks) generates a column of data, this is exported to a csv file, that csv file is imported to an R session where upon a script is executed.
I have to repeat this process( and a few more similarones) dozens of times a day, and it's driving me nuts.
My research into how to automate the import treatment export process has shown a few glimmers of hope but I haven't been able to make any real progress.
I tried Autoexpect, but couldn't make it work on Ubuntu. TCL, I think only works for internet applications, Fabric I also haven't been able to make work.
I'm willing to spend a lot of time learning and develloping a tool to achieve this, but at the moment I'm not even sure what terms to search for.
I've figured it out for windows; I created a .bat file in a text editor which, when click prompst the user for names, etc and rewrites another text file. It then executes that .txt file as a script with r with the
command R.exe CMD BATCH c:\user\desktop\mymacro.txt
I am very new to R and I am using RStudio. I am building a new "user-defined" function, which entails a huge amount of trial and error. Every time I make the slightest change to the function I need select the entire function and do crtl+Enter in order to "commit" the function to the workspace.
I am hoping there is a better way of doing it, perhaps in a separate window that automatically "commits" when I save.
I am coming from Matlab and am used to just saving the function after which it is already "committed".
Ctrl+Shift+P re-runs previously executed region, so you won't have to highlight your function again. so this will work unless you have executed something else in the interim.
If you wan to run some part of your code in RStudio you simply have to use Ctrl+Enter. If the code were run every time you saved it, it could have very bad effects. Imagine that you have a huge script that runs for a long time and uses much computer resources - this would lead you to killing R to stop the script every time you saved it!
What you could do is to save script in external file and than call it from your main script using source("some_directory/myscript.R").
I have a Qt project which will take around 15 mins to build the program. Each time when I make a small modification I have to wait for long time.
Is there a method to reduce this build time? Or is there a way to "make only the file which I had modified" and then execute the program?
Well check whether your are rebuilding in all your program runs. And also whether you are cleaning your project before running it. Because if you do so, it have to regenerate all those moc files and output[.o] files needed for execution.
But normally upon a small change, the build doesn't take too much time. I have been developing a project by qt creator, it doesn't take too much time while building upon a small modification. But if there is clean/rebuild step while every execution, it will definitely take whole build time. And again even if you manually build your project via terminal without the help of IDE, it won't take that much time provided you don't remove the moc files and .o files manually before building after small modifcation.
If you are using qrc in your app and there are many files in it, it takes too much time to generate qrc.cpp and qrc.obj files. So try to minimize the number of files in qrc system and also try to minimize the sizes of each file.
As far as I know, usage of precompiled headers may help, but effect of that may varay according to you program structure.
The general C++ ways to reduce compile time apply. Don't #include what you don't need. Use class declarations instead of #include'ing the definitions if possible. etc.
It depends where you do that small modification: if you modify something in a header that is included pretty much in every file in your project...
Additionally to above responses, if you work with linux, check out ccache (https://ccache.samba.org/)