association rules in R not enough RAM [closed]

association rules in R not enough RAM [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have read all transactions from my datasat and then made apriori.
But I "ate" whole RAM.
It is possibble to omit this?
It is possible to prepare apriori without loading everything to RAM or
somehow merge the results?

Generally, one can increase the memory available to R processes using command line parameters. See Increasing (or decreasing) the memory available to R processes
However, apriori has some optimization options itself. Add a list of control parameters to your call to apriori using control = list(memopt = TRUE) to minimize memory usage and control = list(load = FALSE) to disable loading transactions into memory.

Related

Load an .rda file (raw) obtained via API [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I'm using a Rest API to GET a model (.rda) file. I'm trying to load the file/model in-memory since I have the binary object (obtained successfully), but I'm finding this rather difficult. I'd include some sample code, but to be honest I'm struggling to make an attempt. The load() documentation notes that it can take "a (readable binary-mode) connection", so my attempts have involved trying these out with no success. I always struggle with these I/O tasks so any help would be very appreciated.

Here's how I was able to load an .rda model in one step:
endpoint <- 'v2/asset_files/data_asset/sample_model.rda'
r = httr::GET(url = paste(DATA_API_URL,endpoint,sep='/'),
query = list(project_id = project_id))
# class(r$content) is raw
load(rawConnection(r$content))

OpenMP, random variables, and reproducibility [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an R code, which calls C++, and C++ functions use a lot of parallel computing based on openMP. This is my first code using openMP and what I saw is that even setting the same C++ random seed, the code never gives the same results.
I read a lot of posts here, where it seems that this is an issue with openMP, but they are all old (between12 to 5 years ago)
I want to know if there are solutions now and if there are published article which explain this problem or/and possible solutions.
Thanks

You need to read up on parallel random number generation. This is not an OpenMP problem, but one that will afflict any use of random numbers in a parallel code.
Start with
Parallel Random Numbers: As Easy as 1, 2, 3 - The Salmons

R sommer package - version out of date message [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Using the sommer package quite extensively I often get messages telling my version is out of date even though I'm using the latest version of the package. This can be especially annoying when using sommer in some sort of loop.
Is there a way to avoid these messages?

Hi BartJan welcome to SO!
Have you tried calling the library with quietly = TRUE?
library(sommer,quietly = TRUE)
If that doesn't work, you can make R suppress all messages and warnings:
suppressMessages(suppressWarnings(require(xyz)))

When to kill a running script R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am having a larger data set with more than 1 million entries. If I am running scripts it sometimes takes up a while till I get an output. Sometimes it seems that there is no output what so ever, even if I let it run for hours. Is there a way to track the progress of the computation (or maybe just see if it is not stuck)?

1. Start small
Write your analysis script and then test it using trivially small amounts of data. Gradually scale up and see how the runtime increases. The microbenchmark package is great at this. In the example below, I compare the amount of time it takes to run the same function with three different sized chunks of data.
library(microbenchmark)
long_running_function <- function(x) {
for(i in 1:nrow(x)) {
Sys.sleep(0.01)
}
}
microbenchmark(long_running_function(mtcars[1:5,]),
long_running_function(mtcars[1:10,]),
long_running_function(mtcars[1:15,]))
2. Look for functions that provide progress bars
I'm not sure what kind of analysis you're performing, but some packages already have this functionality. For example, ranger gives you more updates than the equivalent RandomForest functions.
3. Write your own progress updates
I regularly add print() or cat() statements to large code blocks to tell me when R has finished running a particular part of my analysis. Functions like txtProgressBar() let you add your own progress bars to functions as well.

No stable solution using metaMDS() in Vegan [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a species abundance dataset with quite a few zeros in it and even when I set trymax = 1000 for metaMDS() the program is unable to find a stable solution for the stress. I have already tried combining data (collapsing multiple years together to reduce the number of zeros) and I can't do any more. I was just wondering if anyone knows - is it scientifically valid to pick what R gives me at the end (the lowest of the 1000 solutions) or should I not be using NMDS because it cannot find a stable spot? There seems to be very little information about this on the internet.

One explanation for this is that you are trying to use too few dimensions for the mapping. I presume you are using the default k = 2? If so, try k = 3 and compare the stress from the best solution you got from the 1000 tries for the k = 2 solution.
I would be a little concerned to take one solution out of 1000 just because it had the best/lowest stress.
You could also try 1000 more random starts and see if it converges if you run more iterations. When you saved the output from metaMDS(), you can supply that object to another call to metaMDS() via the previous.best argument. It will then do trymax further random starts but compare any lower-stress solutions with the previous best and converge if it finds one similar to it, rather than have to find two similar low-stress solutions in the 1000 starts.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

association rules in R not enough RAM [closed] - r

Related

Load an .rda file (raw) obtained via API [closed]

OpenMP, random variables, and reproducibility [closed]

R sommer package - version out of date message [closed]

When to kill a running script R [closed]

No stable solution using metaMDS() in Vegan [closed]

Categories

Resources