How can I load pdb file varying in number of atoms? - vmd

I'm trying to visualize droplet simulation.
I deleted water molecules that is far from the system of interest during the simulation.
Therefore, the number of atoms decreases as model proceeds.
However, VMD could not recognize such pdb file, because of the way it working.
I found a way to use multimolanim plugin with separated files, but I have more than 1000 frames...
Is there an other way to visualize such PDB file??

Related

How to save my trained Random Forest model and apply it to test data files one by one?

This is a long shot and more of a code designing sort of ask for a rookie like me but I think it has real value for real world applications
The core questions are:
Can I save a trained ML model, such as Random Forest (RF), in R and call/use it later without the need to reload all the data used for training it?
When, in real life, I have a massive folder of hundreds and thousands files of data to be tested, can I load that model I saved somewhere in R and ask it to go read the unknown files one by one (so I am not limited by RAM size) and perform regression/classification etc analysis for each of the file read in, and store ALL the output together into a file.
For example,
If I have 100,000 csv files of data in a folder, and I want to use 30% of them as training set, and the rest as test for a Random Forest (RF) classification.
I can select the files of interest, call them "control files". Then use fread() then randomly sample 50% of the data in those files, call the CARET library or RandomForest library, train my "model"
model <- train(,x,y,data,method="rf")
Now can I save the model somewhere? So I don't have to load all the control files each time I want to use the model?
Then I want to apply this model to all the remaining csv files in the folder, and I want it to read those csv files one by one when applying the model, instead of reading them all in, due to RAM issue.

How to store where a passenger gets on and off a train whilst minimising size of file for plotting?

I have 500GB of .csv data which include these three (and other) variables: 1. where a passenger gets on a train, 2. where they get off and 3. The time it takes.
I need to make box plots of the time it takes based on where they got on and where they got off in an interactive R-shiny app - this is straight forward. But first I need to minimise the size of the file as reading 500GB in an R shiny app is prohibitive. Is there a way to store these variables in such a way that it makes this possible?
Even with vroom it takes too long and I don't think {Disk.frame} would work either. Any thoughts?

Fastest way to read a data.frame into a shiny app on load?

For a shiny app in a repository containing a single static data file, what is the optimal format for that flat file (and corresponding function to read that file) which minimises the read time for that flat file to a data.frame?
For example, suppose when a shiny app starts it reads an .RDS, but suppose that takes ~30 seconds and we wish to decrease that. Are there any methods of saving the file and using a function which can save time?
Here's what I know already:
I have been reading some speed comparison articles, but none seem to comprehensive benchmark all methods in the context of a shiny app (and possible cores/threading implications). Some offer sound advice like trying to load in less data
I notice languages like julia can sometimes be faster, but I'm not sure if reading a file using another language would help since it would have to be converted to an object R recognises, and presumably that process would take longer than simply reading as an R object initially
I have noticed identical files seem to be smaller when saved as .RDS compared to .csv, however, I'm not sure if file size necessarily has an effect on read time.

Read only rows meeting a condition from a compressed RData file in a Shiny App?

I am trying to make a shiny app that can be hosted for free on shinyapps.io. Free hosting requires that all data/code to be uploaded is <1GB, and that when running the app the memory used is <1GB at any time.
The data
The underlying data (that I'm uploading) is 1000 iterations of a network with ~3050 nodes. Each interaction between nodes (~415,000 interactions per network) has 9 characteristics--of the origin, destination, and the interaction itself--that I need to keep track of. The app needs to read in data from all 1000 networks for user-selected node(s) meeting user-input(ted?) criteria (those 9 characteristics) and summarize it (in a map & table). I can use 1000 one-per-network RData files (more on format below) and the app works, but it takes ~10 minutes to load, and I'd like to speed that up.
A couple notes about what I've done/tried, but I'm not tied to any of this if you have better ideas.
The data is too large to store as CSVs (and fall under the 1GB upload limit), so I've been saving it as RData files of a data.frame with "xz" compression.
To further reduce size, I've turned the data into frequency tables of the 9 variables of interest
In a desktop version, I created 10 summary files that each contained the data for 100 networks (~5 minutes to load), but these are too large to be read into memory in a free shiny app.
I tried making RData files for each node (instead of by splitting by network), but they're too large for the 1GB upload limit.
I'm not sure there are better ways to package the data (but again, happy to hear ideas!), so I'm looking to optimize processing it.
Finally, a question
Is there a way to read only certain rows from a compressed RData file, based on some value (i.e. nodeID)? This post (quickly load a subset of rows from data.frame saved with `saveRDS()`) makes me think that might not be possible because it's compressed. In looking at other options, awk keeps coming up, but I'm not sure if that would work with an RData file (I only seem to see data.frame/data.table/CSV implementations).

Locating tempory files from raster processes in R: 140 Gb missing

I recently ran a script that was meant to stack multiple large rasters and run a randomforest classification on the stack. I've done this numerous times with success, though it always takes up tremendous amounts of storage.
I'm aware of ways to check and clear the temporary folder in raster package: rasterTmpFile(prefix='r_tmp_'), showTmpFiles(), removeTmpFiles(h=24), TmpDir().
Typically when the process is complete and I no longer need the temp files, I go to the folder and delete them. Last night, the process ran, and 140 Gb of storage space were consumed, but there is no temp data (in raster - tmp folder, or others). Also these files were not written to .tif.
I do not understand what is happening. Where is the data? How can I remove it?

Resources