shlogg · Early preview
Judy @esproc_spl

Big Data Performance: Beyond Slogans And Benchmarks

Big data products' performance claims often exaggerated. "Queries over trillion-row tables in seconds" means using indexes for target-search tasks, not full traversal. Efficient algorithms & resources key to performance, not just big data handling.

We often hear about the advertisements for the performance of a big data product, saying that it is capable of running “queries over trillion-row tables in seconds”, which means they can get and return data meeting the specified condition from one trillion rows in seconds.
Is this true?
Probably you do not think it is true if you have read the article "How Much Is One Terabyte of Data? " . To process one trillion rows of data, which is dozens of, even one hundred, terabytes in size, we need tens of thousands of, even hundreds of thousands of, hard disks. This is almost impracticable.
However,...