It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I've heard from many people that R is built for processing petabytes of data; however, on the other hand, I'm hearing very often as well that if you want to process for example 8 GB of data, you'd better to have at least 8 GB of memory, otherwise you'll face some problems.
My question is if I need to process like 20 GB of data (which I think is fairly common in many projects), how much Memory and also Processor do I need? If you had any previous experience I'd be happy to know how it should be for 2 petabytes of data as well.
I think you can't process 2 petabytes of data with any language at once (well maybe with some specific software and/or hardware you could). Paraller solutions or processing in smaller pieces is always needed. In R, objects are stored in the virtual memory, so there's a clear limit how much data you can have in R at the same time. Check Memory Limits in R.
Related
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have two xmls, say Sample1.xml and Sample2.xml. Now I need to compare both the xml values (parent nodes, child nodes, attributes and its values) and return the differences between them in xquery. I knew I can use deep-equal function to say if the xmls are identical. But I do not know how to compare and return the xml difference.
Please help.
Thanks,
-N
Depending on the degree of generality we're talking about here, this is a non-trivial problem (PDF). If your question is, "how do I write this algorithm," then it's way too open-ended for StackOverflow (see the FAQ). If, on the other hand, you are asking, "Is there any XQuery library code out there that will do this," then it appears that simply Googling "XML difference XQuery" will lead you straight to the answer. Faster, even, then having someone else do the Googling for you on Stack Overflow.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
What happens internally when I mount a file system on UNIX using the following command:
mount -t ext3 /dev/sda1 /home/users
Please give references (articles, books etc.)
Consider your position: do you want to read this? Can you read this?
http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/ufs/ffs/ffs_softdep.c.html
It is McKusick's base code for the ffs file system, which is generally considered the parent of modern
UNIX file systems. There is no finer detail than reading source.
The reason I posted: when I taught this stuff long ago, there was a text, and then I presented example code. Students seemed to get a lot out of it... those who actually worked on the material, to be more correct.
In this case the ffs.c code was kind of a defacto model. So it provides a how-we-got-here-from-there.
Now all you need to do is get this:
http://www.amazon.com/Linux-Device-Drivers-Jonathan-Corbet/dp/0596005903/ref=sr_1_1?s=books&ie=UTF8&qid=1354930353&sr=1-1&keywords=linux+drivers
Then ultimately download code for ext3. And read it.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
all. I downloaded the Qt source, and then proceeded to start building it on a Mac. At the moment, it's been four hours. Is it supposed to take this long? If not, what am I doing wrong? It's just building, building, building, building.... And using a lot of resources. It's confusing.
Yes, it is meant to take this long.
You should consider the use of parallel make.
make -j4 uses 4 CPUs.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have stock market application which will be frequently used by millions of members at a time.
What could be the best option to retrieve data from my database based on the retrieval time and Database load - DataReader or DataSet?
If you will have a large amount of people reading the data, then you should use the DataReader paradigm. This way you can quickly get in and out with the trouble of schema inference. I would also recommend that once you pull the data from the server that you cache it. Even if it is only cached for 1 second, that will improve the number of connections from the database that will retrieve the same data. Otherwise, you could quickly saturate your connection pool if you are not careful as well as some possible locking.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I would like to know what are the best practices for building Predictive Modeling solutions organically ?
Some of the questions I have are :-
If I have multiple R model files, what are efficient ways of storing them ?
Save as .Rdata files on file system
Serialize to a DB as binary objects
Since data is processed to create an interim model specific format, is it helpful to use such paradigms as PMML ?
Also, should one consider such practices as MVC (I'm not a trained software developer, so any insights into such development practices would be very helpful)
I apologize for the open-ended nature of this question. I wish to understand even simple things as recommended folder structure for data staging, model store, scripts collection and such other elements of a data mining solution.
I would be very grateful to members of the community for sharing their experiences and recommendations.
Thank you for your time.