As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am looking for some suggestions on using R for analysing Big Data - i.e., data that runs into TBs.
Typically I think that it is better to preprocess the data and load just the information that the user needs in order to perform analysis. However, if say information from a large dataset (say, 200 GB) needs to be aggregated, I think first, having the data stored in column databases rather than row-oriented DBMS would be more efficient. Second, for CPU intensive data analysis, it is probably worthwhile having some distributed computing capabilities using RHadoop / RHIPE. Also, if there are multiple enterprise users, what would be the best way to implement these ... (say 10 researchers who are concurrently working on large datasets)
I have found some resources on the web such as the R indexing, mmap packages to do efficient computations in R but wanted to get some feedback from those who have actually worked and implemented these at an enterprise level.
Thanks in advance for your suggestions,
Regards.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am looking for a package to download historical data of Futures (NO STOCK DATA).
could someone advice me some good R package?
Thanks!
P.S. I Know there are many packages, but they only seem to retrive stock prices, and not Futures. I Only need futures.
There is no free source of futures data. Atleast not comprehensive.
You can look into 'FinancialInstruments` package's source on R-Forge especially this file
https://r-forge.r-project.org/scm/viewvc.php/pkg/FinancialInstrument/inst/parser/download.tblox.R?view=markup&root=blotter
It will download historical data for select futures which TradingBlox publishes daily. Mind you this data is back adjusted continuous contract data and is created using TradingBlox's own methodology of back adjusting.
Among paid sources, CSI data is reasonably priced for smaller traders and can give data in multiple formats and allows you to customize back adjustment logic.
There is interesting project Quandl but I really haven't seen it mentioned around much or tested accuracy of the data.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm developing a web platform that will show data tables and charts using highchart.
Besides there is an exporting function for user to download csv/excel file.
But i'm thinking why I should make this. It's likely that exporting csv or excel is the standard. But is there any advantage?
I think I can export html file instead and the page will be interactive and much better than excel file. Is there anyone have idea about this problem?
A CSV allows the user greater control over how they display the data - you may see some great visualisations from your users if you give them access to these.
Likely, the people who will actually use the CSVs (which will be a small percentage) would scrape the data anyway - so offering them up saves bandwidth and increases the happiness of your userbase because they don't have to go through the effort of writing a scraper.
It all depends on the type of data, though.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know commonly CPU has many compute units or CUDA cores. This make it suitable for compute-intensive algorithms.
But why it has so much more cores than CPU? When rending image, which kinds of algorithms are parallelizable?
This huge number of compute units is necessary for fast processing of frames when applying shaders.
This type of computing is highly parallelizable as each shader will be applied n times (maybe one time by pixel) and often in an independent way on the same frame.
Note that each compute-unit is made of many shader-cores.
This is why shaders support is a prerequisite for OpenCL as it implies some dedicated cores to do the rendering job, cores that can be "hijacked" to do other things => this is called GPGPU.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Now a days, there is only one buzz that goes on...Big Data..Curious to know what it is ..Though I have gleaned some information from Big Data but want to know more.
Thanks
The difference between a database for a coffee shop, and for facebook. It's easy to get something to work with 200 users. But when you have 200,000 users... that's a different story.
Table scans become impossible. Indexes become very important.
Single servers cannot handle all the load. Solutions such as clustering are employed to make it so more than one server can host an application. This makes it so you can keep adding more servers to the cluster each time the load gets too big and performance starts to die.
You'll hear a lot about NoSQL databases too such as MongoDB. This is where the database just stores key/value documents. Such databases are more suited for massive scaling (by sharding) than are relational database systems.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We are going to develope one web application using Asp.Net which can have millions of data to handle
so i am confuse between database selection
which should i prefer sql server or oracle with respect to performance and all criteria
please guide me on this
thanks
Your question is looks subjective, how ever I like to answer and say that:
If some one gives you to drive a formula one, in how many seconds you gong to crash it? Probably you do not even manage to start it running.
The same think is on programming. Both programs are like formula one, maybe one have some feature and the other have some other, but they can run so fast if "you can drive them" like that.
Now it's up to you to make a good design to the database and make it real fast, or very slow and huge. It's not the machine, it you that you can make it run fast. It's not the formula one on the races, it’s the pilot (and the rest team) that they drive them so fast.