Novel Format
TileDB introduces a novel on-disk format for storing multi-dimensional arrays. Contrary to other popular systems (e.g., HDF5) that are optimized mostly for dense arrays, TileDB is optimized for both dense and sparse arrays, exposing a unified array API. In addition, TileDB's concept of immutable, append-only fragments allows for efficient updates.
dense and sparse arrays, coupled with rapid updates
Filters
Specify a variety of data transformations (such as compression, encryption, byteshuffle, and more) to be applied to data tiles before they are written to disk. TileDB decomposes each tile into chunks that fit in L1 cache and applies the filters with full-fledged parallelism.
parallel chunk-based filters
Parallelism
Build powerful parallel analytics on top of the TileDB array storage manager, leveraging TileDB's extreme internal parallelism or its thread-/process-safety and asynchronous writes and reads.
internal parallelism or parallel programming
Portability
TileDB works on Linux, macOS and Windows, offering easy installation packages, binaries and Docker containerization. Integrate TileDB with the tools of your favorite platform to manage massive multi-dimensional array data.
cross-platform use
Language Bindings
Enable your data science applications to work with immense amounts of data, beyond what can be stored in main memory. TileDB is built in C and C++ for performance, providing Python, R, Go and Java APIs for interoperability and ease of use. Run SQL queries on TileDB data via PrestoDB.
built in C and C++ for performance, integrated with high-level languages and SQL engines
Multiple Backends
Transparently store your arrays across multiple backends such as HDFS or S3-compliant object stores (like AWS S3, minio, or Ceph). TileDB's API is the same regardless of where the array is stored.
HDFS and AWS S3 support
Key-Value Store
Store any persistent metadata with TileDB's key-value storage functionality. A TileDB key-value store is implemented as a TileDB sparse array and inherits all its benefits (such as compression, parallelism, and multiple backend support).
persistent maps/dictionaries via sparse arrays
Virtual Filesystem
Add general file management and IO to your applications for any supported storage backend using TileDB's unified "virtual filesystem" (VFS) API.
generic file IO on multiple backends
© 2018 TileDB, Inc. All rights reserved.
[email protected]