how to create cubes with druid using hdfs files - olap

I am using druid for OLAP on Big Data.ai load data from hdfs files with contain many measures in dimensions, i want to know how Druid create cubes with this files and how to do query them.

Related

Importing and querying a mongodb bson file from external harddrive

I am new to mongodb. I have a bson file(collect.bson) and it is on my external hard drive, very large about 200 GB and I want to run a query and i want to do that from my terminal. Do I need to create a database first in order to do that? Considering the fact that it is such a large file I don't how much space it will consume. I installed mongobd from my terminal and I was curious how I can proceed to extract the attributes and columns into a csv/R? Please suggest.
Thanks

Analyzing distributed data using Spark in Qt

I have some terabytes data and I want to analyze them in Qt. In a local system, analyzing these data is time-consuming and the best choice is using Spark and hdfs.
I know that I can put my data with put command in master node to store them in hdfs format using Qt.
As I saw some codes to execute queries in Scala or Python, I want to know how can I analyze my distributed data like executing some queries or drawing graph and plot or ... in my Qt program?

Microsoft R .xdf file

I got some question about the .xdf file:
What is this exacly?
How does this type of file work?
How Microsoft R works with this typ of file?
What are the advantages agains data.frames?
I'm really looking forward to your answers.
Greetings R123456789
An XDF file is a compressed binary file format with user selectable levels of compression, some quick facts can be found here: https://support.microsoft.com/en-us/help/3104260/qa-what-is-the-.xdf-file-format XDF files come in two forms, Standalone and Composite. For Standalone XDF files, you will see a single file stored on disk with the .xdf extension. For Composite, the XDF file is represented by a directory, which contains metadata and data subdirectories. Also, for Composite, Metadata and Data files in there directories are split and individually compress as XDF part files.
It is a proprietary implementation inside of Microsoft R Server, I can expand on this answer, but i would need to refine the question, "How does this type of file work?"
An XDF file is stored on the disk and does not sit in memory. Microsoft R Server, with a call to RxXdfData() or rxImport(), will read the XDF file and decompress it, then insert it into memory as a Data Frame. Many Microsoft R "rx" functions can take a path to an XDF directly as a data source or sink, and will manage reading segments into memory as required.
The advantages of using XDF as a Data Source/Sink is that you do not need to buffer the entire file into memory for Microsoft R Server to work with it. It allows for partial reads and writes, as well as other optimizations around disk space via compression. It will operate faster than reading/writing from flat files as Metadata is used to index the XDF. The disadvantages are primarily performance, Data in-memory (data.frames) will be faster to operate on than data on disk in all cases.
Note: As with all files, the underlying operation system controls when a file is written from memory to disk. For the purpose of your question, the assumption can be made that the XDF file resides on disk as a standard file.

SQLite Multiple Attached rather than single large database

I'm going to end-up with a rather large database CubeSQLite in the cloud and cloned on the local machine. In my current databases I already have 185 tables and growing. I store them in 6 SQLite databases and begin by attaching them together Using the ATTACH DATABASE command. There are views that point to information in other databases and, as a result, Navicat won't open the SQLite tables individually. It finds them to be corrupted, although they are not and are working fine.
My actual question is this:
Considering the potential size of the files, is it better/faster/slower to do it this way or to put them all into one really large SQLite DB?

A file storage format for file sharing site

I am implementing a file sharing system in ASP.NET MVC3. I suppose most file sharing sites store files in a standard binary format on a server's file system, right?
I have two options storage wise - a file system, or binary data field in a database.
Is there any advantages in storing files (including large one's) in a database, rather then on file system?
MORE INFO:
Expected average file size is 800 MB. 3 files per minute are to be usually requested to be fed back to the user, who is downloading.
If the files are as big as that, then using the filesystem is almost certainly a better option. Databases are designed to contain relational data grouped into small rows and are optimized for consulting and comparing the values in these relations. Filesystems are optimized for storing fairly large blobs and recalling them by name as a bytestream.
Putting files that big into a database will also make it difficult to manage the space occupied by the database. The tools to query space used in a filesystem, and remove and replace data are better.
The only caveat to using the filesystem is that your application has to run under an account that has the necessary permission to write the (portion of the) filesystem you use to store these files.
Use FileStream when:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
Here is MSDN link https://msdn.microsoft.com/en-us/library/gg471497.aspx
How to use it: https://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/

Resources