strange noise after trying to process large amounts of data - r

I apologize in advance for the somewhat vague question, but I dont have anyone else to ask. I was running the following code in R:
library(SAScii)
parse.SAScii("16crime.sas", beginline = 14)
x <- read.SAScii("opafy16nid.dat", "16crime.sas", beginline = 14)
the .dat file is 2.3 gigs which is large, at least for my computer specs, but seemingly do-able. Originally I tried running it on my windows onedrive, which should have about 9 Gigs of free space. I have a RAM of 8 Gigs. After it ran for about 8 hours and R showed about 16,000 records processed my computer started making a very bizarre noise. It almost sounded like it does when calling a fax machine ( this is the vague part i apologize for) for about 3 minutes, after that the noise ceased. Everything froze and i was unable to do a thing. Against my better judgement after about an hour of waiting for something to happened, i forced shutdown on my computer and rebooted. Seemingly, everything is working fine now and running scans on harddrive came back with no errors. I want to try it again because the data never completed loading into R, but i dont want my computer to blow up either. Can anyone comment to that noise and if i should try this again?

Related

RStudio: My code now runs many times slower than it did before on the same computer

I'm looking for an advice please. After cca 6 months I got back to a code I wrote that by then took around 30 minutes to finish. Now, when I run it's way slower. It looks like it could take days. Since back then, hardware didn't change, I'm using Windows 10 and since then I updated my RStudio to current version (2022.07.2 Build 576), and I didn't update R version, which is "4.1.2 (2021-11-01)".
I noticed that in contrast to before, now RStudio is not using more than around 400MB RAM. Before it was much more. I don't run any other SW and there is plenty RAM available.
I had an idea that antivirus might cause this, even though I didn't change any settings. I put RStudio and R to exceptions and didn't change anything.
I also updated RStudio from the previous version, which didn't help.
Please, does anyone have an idea what can be causing this? Sorry if the description is not optimal, it's my first post here and I'm not a programmer, I just use R for data analysis for my biology related diploma thesis.
Thanks a lot!
Daniel

Why do I get RAM error for big data using gganimate (ggplot2)?

I am trying to run my code to create a nice transition_reveal for my line graphs.
The data I've got is very large as it is daily data over 20 years for about 130 different variables.
When I run my code I sometimes get the following error:
Sometimes this error happens, sometimes it successfully runs but only if I cut the data into smaller parts. But it has to be very small parts. If I do that, since it is an animation, I'd have to create overlap and it gets complicated. I'd much prefer to run the whole thing. I don't mind if it takes hours. I can do other things.
But it doesn't make sense... it's not like my RAM is storing all the data at the same time, it's just storing what it needs before replacing. Therefore, it should never fill up. Here is an image of my Task Manager while running the code:
The RAM usually gets quite filled up at about 95% sometimes going lower and sometimes higher. Then it seems, by random chance, it hits my max at 100% and then the code just fails.
This is why splitting my data into 20 parts is difficult because I can't loop it as there is always a chance even a small part can hit the 100% RAM and cause an error.
I don't know if I'm doing anything wrong. I think buying more RAM would not solve the problem. Maybe there is a way I can allow it to use my SSD as RAM as well but I don't know how to do this.
Any help would be much appreciated. Thanks.

vector memory exhausted (R) workaround?

I tried a, as I came to see, quite memory intensive operation with R (write an xslx file with r of a dataset with 500k observations and 2000 variables).
I tried the method explained here. (First comment)
I set the max VSIZE to 10 GB, as I did not want to try more, because I was afraid to damage my computer (I saved money for a long time:)) and it still did not work.
I then looked up Cloud Computing with R, which I found to be quite difficult as well.
So finally, I wanted to ask here, if anyone could give me an answer on how much I can set the VSIZE without damaging my computer or if there is another way to solve my problem. (The goal is to transform an SAS file to an xslx or xsl file. The files are between 1.4 GB and 1.6 GB. My RAM is about 8GB big.) I am open to download programs if that's not too complicated.
Cheers.

Speed up performance of R Script, Performance changes between runs

I have a script that I want to run a couple of times (5000 - 10 000). The speed seems to be around 0.10 sec usually, but sometimes it goes up to 1-2 seconds and other times even up to 7 sec. It's not that common that it go up to this time, but I would like to know why this could happen.
I have a script-file calling other script-files. My only warnings are these:
"closing unused connection #NUMBER", which I'm trying to fix.
I try to use rm() in the end of each script-file.
My script writes and reads to some files, xml and txt.
Does anyone have any idea of what could be the problem? I know it's hard, but maybe someone have experience from this (that the time that a script takes changes).
I would also appreciate any tip of how I can search the "problem". I'm a bit of a beginner in this, maybe there's a good guide of debugging in R?
Thanks!

Why is R slowing down as time goes on, when the computations are the same?

So I think I don't quite understand how memory is working in R. I've been running into problems where the same piece of code gets slower later in the week (using the same R session - sometimes even when I clear the workspace). I've tried to develop a toy problem that I think reproduces the "slowing down affect" I have been observing, when working with large objects. Note the code below is somewhat memory intensive (don't blindly run this code without adjusting n and N to match what your set up can handle). Note that it will likely take you about 5-10 minutes before you start to see this slowing down pattern (possibly even longer).
N=4e7 #number of simulation runs
n=2e5 #number of simulation runs between calculating time elapsed
meanStorer=rep(0,N);
toc=rep(0,N/n);
x=rep(0,50);
for (i in 1:N){
if(i%%n == 1){tic=proc.time()[3]}
x[]=runif(50);
meanStorer[i] = mean(x);
if(i%%n == 0){toc[i/n]=proc.time()[3]-tic; print(toc[i/n])}
}
plot(toc)
meanStorer is certainly large, but it is pre-allocated, so I am not sure why the loop slows down as time goes on. If I clear my workspace and run this code again it will start just as slow as the last few calculations! I am using Rstudio (in case that matters). Also here is some of my system information
OS: Windows 7
System Type: 64-bit
RAM: 8gb
R version: 2.15.1 ($platform yields "x86_64-pc-mingw32")
Here is a plot of toc, prior to using pre-allocation for x (i.e. using x=runif(50) in the loop)
Here is a plot of toc, after using pre-allocation for x (i.e. using x[]=runif(50) in the loop)
Is ?rm not doing what I think it's doing? Whats going on under the hood when I clear the workspace?
Update: with the newest version of R (3.1.0), the problem no longer persists even when increasing N to N=3e8 (note R doesn't allow vectors too much larger than this)
Although it is quite unsatisfying that the fix is just updating R to the newest version, because I can't seem to figure out why there was problems in version 2.15. It would still be nice to know what caused them, so I am going to continue to leave this question open.
As you state in your updated question, the high-level answer is because you are using an old version of R with a bug, since with the newest version of R (3.1.0), the problem no longer persists.

Resources