The most advanced
open-table optimization
for data lakes

Built for Seamless Integration

Store your data where you want, analyze it how you like — our platform works out of the box with leading object storage solutions and is fully compatible with your favorite BI tools and data processing engines.
Works in any Data Lake
Compatible with all BI and Transformation Tools

How we organize our data

With Delta Lake format, Qbeast adds the necessary information to query efficiently
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
qbeast:/tmp/qbeast-table$ tail -1 _delta_log/00000000000000000000.json |
jq
{
 "add": {
   "path": "e24973a3-b8ba-4bez-8b2f-4ac60a55458c.parquet",
   "partitionValues": {},
   "size": 2565,
   "modificationTime": 1634732079000,
   "dataChange": true,
   "stats": "",
   "tags": {
     "state": "FL00DED",
     "rowCount": "177",
     "cube": "Qw",
     "space": "{\"timestamp\":1634732047219,\"transformations\":
[{\"min\":-960393.5,\"max\":2881188.5,\"scale\":6.945409084595083E-5}]}"
,
     "minWeight": "-2147483648",
     "maxWeight": "2147483647",
     "indexedColumns": "ss_cdemo_sk,ss_hdemo_sk"
   }
 }
}
qbeast:/tmp/qbeast-table$
Open table formats, Qbeast adds the necessary information to query efficiently
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
qbeast: $ cd /tmp/qbeast-table/
qbeast: /tmp/qbeast-table$ tree ./
./
├── 03ac7b97-99a3-4d48-a258-23b91ee53e15.parquet
├── 3c642315-e0f9-4f94-a2ce-41b28fc0964f.parquet
├── 55876ada-cdde-4214-b92b-001b83c0f2ae.parquet
├── ae0142b8-c720-453a-aab4-de05a262ff7f.parquet
├── bbc47b42-86f9-4c72-9e2c-b7ccfd978e12.parquet
├── _delta_log
│   └── 00000000000000000000.json
├── e24973a3-b8ba-4be2-8b2f-4ac60a55458c.parquet
├── e25dfb75-96d2-4887-86b3-d3c463fc2c29.parquet
├── e921d672-dc3a-4f49-b592-e6005b2ea89.parquet
├── eb140b57-1e8f-4e1a-85b3-9df71069bc80.parquet
├── f7ef973-a796-43ee-a0fd-889f8819034.parquet
├── fea16f6f-c33a-4bde-aa11-863845e83b90.parquet
└── _qbeast

2 directories, 13 files
qbeast:/tmp/qbeast-table$

Data Skipping

The usage of an index helps avoid reading the entire dataset, reducing the amount of data transfer involved and speeding up the query. Qbeast allows you to index your data on as many columns as you need and filter directly the files to answer the search.

Approximate Queries

Qbeast enables approximate queries, the ability to provide approximate answers to queries at a fraction of the cost of executing the query. With the Qbeast-Spark, you can access a statistical representative sample of the dataset and return the result of the query within a margin of error.

File Optimization

When writing new data, the file layout could be harmed, producing lots of small files or heavily large ones, making it uneasy to retrieve the results with the least noise possible. Optimization fixes the overflowed areas and improves the query's useful payload by reading more fine-grained files.

Easy to Deploy

It works with any Data Lake storage (S3, Azure and GCS) and is compatible with any BI/ML tool of your choice. Only takes 10 minutes to deploy and enjoy the benefits of querying Qbeast Tables.

Getting Started

Pick your base

1
2
3
4
5
val qbeast_df =
 spark
   .read
   .format("qbeast")
   .load("s3://my-bucket/my-qbeast-table")
1
2
3
4
5
qbeast_df =
 spark
   .read
   .format("qbeast")
   .load("s3://my-bucket/my-qbeast-table")
1
2
3
4
5
6
7
8
val df =
spark.read.format("csv").load(srcPath)

df.write
 .mode("overwrite")
 .format("qbeast")
 .option("columnsToIndex", "user_id,product_id")
 .save(destPath)
1
2
3
4
5
6
7
df = spark.read.format("csv").load(srcPath)

df.write
 .mode("overwrite")
 .format("qbeast")
 .option("columnsToIndex", "user_id,product_id")
 .save(destPath)
1
2
3
4
5
6
7
8
9
10
11
CREATE TABLE purchases (
 id INT,
 user_id INT,
 product_id STRING
)
USING qbeast
OPTIONS
('columnsToIndex'='user_id,product_id');

INSERT INTO TABLE purchases
SELECT id, user_id, product_id FROM raw_purchases;

Examples

Seamlessly integrate Qbeast with Databricks, Snowflake and more. Automate your data workflows and unlock faster, sharper insights so your team can focus on what matters.

Multicolumn Filtering

SELECT * FROM customers WHERE age > 20 and city = ‘Barcelona’
The usage of an index helps avoid reading the entire dataset, reducing the amount of data transfer involved and speeding up the query. Qbeast allows you to index your data on as many columns as you need and filter directly the files to answer the search.

Approximate Queries

SELECT avg(age) FROM customers WHERE city = ‘Barcelona’ TABLESAMPLE
(1 PERCENT)
Qbeast enables approximate queries, the ability to provide approximate answers to queries at a fraction of the cost of executing the query. With the Qbeast, you can access a statistically representative sample of the dataset and return the result of the query within a margin of error.

Optimization

QbeastTable.forPath(spark, tmpDir).optimize()
As your table grows, Qbeast optimization will dynamically adjust to the shape and density of your data. Our unique index delivers balanced file sizes, with records grouped based on the dimensions interesting for your business and use-cases.