DataEngr.com

Data Engineering Blogroll

Discover the latest articles, tutorials, and insights curated from DataLakehouseHub.com.

April 29, 2026

Migrating to Apache Iceberg: Strategies for Every Source System

<!-- Meta Description: Migrate to Iceberg from Hive, data warehouses, or raw files using in-place migration, full rewrite, or the zero-downtime view s...

Read Article
April 29, 2026

Hands-On with Apache Iceberg Using Dremio Cloud

<!-- Meta Description: A practical walkthrough of creating, querying, and optimizing Iceberg tables on Dremio Cloud, from account setup to AI-powered ...

Read Article
April 29, 2026

Approaches to Streaming Data into Apache Iceberg Tables

<!-- Meta Description: Stream data into Iceberg with Spark Structured Streaming, Flink, or Kafka Connect. Here is how each works and the trade-offs be...

Read Article
April 29, 2026

Using Apache Iceberg with Python and MPP Query Engines

<!-- Meta Description: Access Iceberg tables from Python with PyIceberg, DuckDB, and Polars, or through MPP engines like Dremio, Spark, and Trino. Her...

Read Article
April 29, 2026

Apache Iceberg Metadata Tables: Querying the Internals

<!-- Meta Description: Iceberg metadata tables let you query snapshots, files, manifests, and partitions using SQL. Here is every metadata table and h...

Read Article
April 29, 2026

Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup

<!-- Meta Description: Keep Iceberg tables fast with compaction, snapshot expiry, orphan cleanup, and manifest rewriting. Here is when and how to run ...

Read Article
April 29, 2026

Concurrency, Isolation, and MVCC: How Engines Handle Contention

<!-- Meta Description: Databases handle concurrent access using locks, MVCC, or optimistic concurrency control. Here is how each approach works and wh...

Read Article
April 29, 2026

How Data Lake Table Storage Degrades Over Time

<!-- Meta Description: Iceberg tables degrade through small files, orphan files, metadata bloat, sort order decay, and partition skew. Here is how to ...

Read Article
April 29, 2026

Hash, Sort-Merge, Broadcast: How Distributed Joins Work

<!-- Meta Description: Distributed joins move data across the network using shuffle, broadcast, or co-location strategies. Here is how each works and ...

Read Article
April 29, 2026

When Catalogs Are Embedded in Storage

<!-- Meta Description: S3 Tables and MinIO AI Stor embed the Iceberg catalog directly in the storage layer. Here is when embedded catalogs make sense ...

Read Article
April 29, 2026

Partitioning, Sharding, and Data Distribution Strategies

<!-- Meta Description: Hash partitioning distributes data evenly. Range partitioning enables fast range scans. Both create tradeoffs. Here is how data...

Read Article
April 29, 2026

What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg

<!-- Meta Description: Lakehouse catalogs store metadata pointers, manage namespaces, and enforce access control. Here is the complete catalog landsca...

Read Article
April 29, 2026

Buffer Pools, Caches, and the Memory Hierarchy

<!-- Meta Description: Databases use buffer pools, column caches, and result caches to keep hot data in RAM. Here is how each caching strategy works a...

Read Article
April 29, 2026

Writing to an Apache Iceberg Table: How Commits and ACID Actually Work

<!-- Meta Description: Here is exactly how an engine writes to an Iceberg table, step by step, from data files through the atomic commit that makes AC...

Read Article
April 29, 2026

Volcano, Vectorized, Compiled: How Engines Execute Your Query

<!-- Meta Description: The Volcano model processes one row at a time. Vectorized execution processes batches with SIMD. Code generation fuses operator...

Read Article
April 29, 2026

Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans

<!-- Meta Description: Iceberg's hidden partitioning separates physical layout from user queries using transform functions. Here is how it works and w...

Read Article
April 29, 2026

Inside the Query Optimizer: How Engines Pick a Plan

<!-- Meta Description: Query optimizers transform SQL into execution plans using rule-based rewrites, cost-based search, and adaptive runtime adjustme...

Read Article
April 29, 2026

Partition Evolution: Change Your Partitioning Without Rewriting Data

<!-- Meta Description: Iceberg lets you change partition schemes without rewriting data. Here is how partition evolution works internally and why Hive...

Read Article
April 29, 2026

B-Trees, LSM Trees, and the Indexing Tradeoff Spectrum

<!-- Meta Description: B-trees balance reads and writes for OLTP. LSM trees maximize write throughput. Bitmap indexes accelerate OLAP filtering. Here ...

Read Article
April 29, 2026

Performance and Apache Iceberg's Metadata

<!-- Meta Description: Iceberg's three-layer metadata tree eliminates directory listing and enables multi-level data skipping. Here is how scan planni...

Read Article
April 29, 2026

How Databases Organize Data on Disk: Pages, Blocks, and File Formats

<!-- Meta Description: Databases structure data on disk as heap files, sorted files, or LSM trees, then wrap it in formats like Parquet with metadata ...

Read Article
April 29, 2026

The Metadata Structure of Modern Table Formats

<!-- Meta Description: Iceberg uses a metadata tree, Delta Lake uses a transaction log, Hudi uses a timeline. Here is exactly how each format organize...

Read Article
April 29, 2026

Row vs. Column: How Storage Layout Shapes Everything

<!-- Meta Description: Row stores keep records together for fast transactions. Column stores keep field values together for fast analytics. Here is ho...

Read Article
April 29, 2026

What Are Table Formats and Why Were They Needed?

<!-- Meta Description: Table formats like Apache Iceberg solved the ACID, schema, and performance problems that turned data lakes into data swamps. He...

Read Article
April 29, 2026

How Query Engines Think: The Tradeoffs Behind Every Data System

<!-- Meta Description: Every database is a collection of engineering tradeoffs. Learn the 9 design decisions that shape how query engines store, index...

Read Article
April 13, 2026

Agentic Analytics on the Apache Lakehouse

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Read Article
April 13, 2026

What is Apache Iceberg? The Table Format Revolution

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation](/blog/2026-04-apache-software-foundation) * [Part 2: ...

Read Article
April 13, 2026

What is Apache Arrow? Erasing the Serialization Tax

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Read Article
April 13, 2026

What is Apache Parquet? Columns, Encoding, and Performance

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Read Article
April 13, 2026

What is Apache Polaris? Unifying the Iceberg Ecosystem

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Read Article
April 13, 2026

Assembling the Apache Lakehouse: The Modular Architecture

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Read Article
April 13, 2026

Apache Software Foundation: History, Purpose, and Process

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation](/blog/2026-04-apache-software-foundation) * [Part 2: ...

Read Article
March 7, 2026

The Model Context Protocol (MCP) Explained: A Complete Guide to How Every Major AI Tool Connects to External Data

The Model Context Protocol (MCP) has become the universal standard for connecting AI models to external tools, data sources, and services. Originally ...

Read Article
March 7, 2026

Context Management Strategies for VS Code with LLM Plugins: A Complete Guide to Building Your Own AI-Powered IDE

Visual Studio Code is the most widely used code editor in the world, and its extensibility means you can integrate AI capabilities through a growing e...

Read Article
March 7, 2026

Context Management Strategies for T3 Chat: A Complete Guide to the Unified Multi-Model AI Interface

T3 Chat is a modern web-based AI chat interface that gives you access to multiple AI models through a single unified platform. Its primary value propo...

Read Article
March 7, 2026

Context Management Strategies for Zed: A Complete Guide to the High-Performance AI Code Editor

Zed is a high-performance code editor built in Rust that prioritizes speed, simplicity, and real-time collaboration. Its AI integration is designed to...

Read Article
March 7, 2026

Context Management Strategies for Windsurf: A Complete Guide to the AI Flow IDE

Windsurf is an AI-powered IDE built on the VS Code foundation that introduces the concept of "Flows," a paradigm where the AI maintains deep awareness...

Read Article
March 7, 2026

Context Management Strategies for Perplexity AI: A Complete Guide to Research-First AI Conversations

Perplexity AI occupies a unique position in the AI landscape: it is a research-first tool that combines conversational AI with real-time web search to...

Read Article
March 7, 2026

Context Management Strategies for Cursor: A Complete Guide to the AI-Native Code Editor

Cursor is an AI-native code editor built on the VS Code foundation that integrates AI deeply into every aspect of the development workflow. Its contex...

Read Article
March 7, 2026

Context Management Strategies for OpenWork: A Complete Guide to the Desktop AI Agent Framework

OpenWork is a desktop-native AI agent framework designed for local, multi-step task execution on your computer. Unlike browser-based AI tools or termi...

Read Article
March 7, 2026

Context Management Strategies for OpenCode: A Complete Guide to the Open-Source Terminal AI Agent

OpenCode is an open-source terminal-based AI coding agent that prioritizes privacy, local-first operation, and broad model provider support. Built as ...

Read Article
March 7, 2026

Context Management Strategies for Google Antigravity: A Complete Guide to the Agent-First IDE

Google Antigravity is an agent-first IDE built by Google DeepMind's Advanced Agentic Coding team. It approaches context management differently from ot...

Read Article
March 7, 2026

Context Management Strategies for Gemini CLI: A Complete Guide to Terminal-Native AI Development

Gemini CLI is an open-source terminal agent powered by Gemini models that operates directly in your command line. It brings Google's AI capabilities i...

Read Article
March 7, 2026

Context Management Strategies for Gemini Web and NotebookLM: A Complete Guide to Google's AI Knowledge Ecosystem

Google's AI ecosystem for knowledge work consists of two deeply integrated tools: Gemini (the conversational AI at gemini.google.com) and NotebookLM (...

Read Article
March 7, 2026

Context Management Strategies for Claude Code: A Complete Guide for Developers

Claude Code is a terminal-native agentic coding assistant that lives in your command line and operates directly on your codebase. Unlike chat-based in...

Read Article
March 7, 2026

Context Management Strategies for Claude CoWork: A Complete Guide for Knowledge Workers

Claude CoWork represents a fundamentally different approach to AI context management. Unlike chat interfaces where you send messages and receive respo...

Read Article
March 7, 2026

Context Management Strategies for Claude Desktop: A Complete Guide to MCP, Computer Use, and Local File Access

Claude Desktop takes everything available in Claude Web and adds three capabilities that fundamentally change how you manage context: MCP server conne...

Read Article
March 7, 2026

Context Management Strategies for Claude Web: A Complete Guide to Projects, Artifacts, and Intelligent Context

Claude's web interface at claude.ai combines one of the largest context windows in the industry with a structured Project system that makes it genuine...

Read Article
March 7, 2026

Context Management Strategies for OpenAI Codex: A Complete Guide Across Browser, CLI, and App

OpenAI Codex is not a chatbot. It is an autonomous software engineering agent that runs tasks in isolated cloud sandboxes, operates across a browser i...

Read Article
March 7, 2026

Context Management Strategies for ChatGPT: A Complete Guide to Getting Better Results

Getting consistently useful results from ChatGPT requires more than writing good prompts. The real differentiator is how you manage context: the backg...

Read Article
March 5, 2026

How to Use Dremio with OpenWork: Connect, Query, and Build Data Apps

OpenWork is an open-source desktop AI agent built on the OpenCode engine. It runs entirely on your machine with your own API keys, giving you full con...

Read Article
March 5, 2026

How to Use Dremio with OpenCode: Connect, Query, and Build Data Apps

OpenCode is an open-source, terminal-based AI coding agent released under the MIT license. It provides a TUI with split panes, uses the Language Serve...

Read Article
March 5, 2026

How to Use Dremio with Zed: Connect, Query, and Build Data Apps

Zed is an open-source, GPU-accelerated code editor written in Rust. It is designed for speed and collaboration, with a built-in AI assistant that supp...

Read Article
March 5, 2026

How to Use Dremio with OpenAI Codex CLI: Connect, Query, and Build Data Apps

OpenAI Codex CLI is a terminal-based coding agent built in Rust. It reads your codebase, writes files, executes commands, and supports MCP for connect...

Read Article
March 5, 2026

How to Use Dremio with Amazon Kiro: Connect, Query, and Build Data Apps

Amazon Kiro is an agentic AI IDE from AWS that introduces spec-driven development to the coding workflow. Instead of jumping straight to code, Kiro he...

Read Article
March 5, 2026

How to Use Dremio with JetBrains AI Assistant: Connect, Query, and Build Data Apps

JetBrains AI Assistant is built into IntelliJ IDEA, PyCharm, DataGrip, and every JetBrains IDE. It provides AI chat, inline code generation, multi-fil...

Read Article
March 5, 2026

How to Use Dremio with Gemini CLI: Connect, Query, and Build Data Apps

Gemini CLI is Google's open-source terminal-based AI agent. It runs directly in your terminal, powered by Gemini models with a 1-million token context...

Read Article
March 5, 2026

How to Use Dremio with Google Antigravity: Connect, Query, and Build Data Apps

Google Antigravity is an agent-first IDE built by Google DeepMind. Its autonomous agents plan multi-step tasks, write code, browse documentation, and ...

Read Article
March 5, 2026

How to Use Dremio with Windsurf: Connect, Query, and Build Data Apps

Windsurf is an AI-native code editor built as a fork of VS Code. Its standout feature is Cascade, an agentic AI system that plans and executes multi-s...

Read Article
March 5, 2026

How to Use Dremio with GitHub Copilot: Connect, Query, and Build Data Apps

GitHub Copilot is the most widely adopted AI coding assistant, integrated into VS Code, JetBrains IDEs, and the GitHub platform. Its agent mode allows...

Read Article
March 5, 2026

How to Use Dremio with Claude CoWork: Connect, Query, and Build Data Apps

Claude CoWork is Anthropic's desktop agentic assistant. Unlike Claude Code (a terminal coding agent), CoWork operates as a general-purpose autonomous ...

Read Article
March 5, 2026

How to Use Dremio with Claude Code: Connect, Query, and Build Data Apps

Claude Code is Anthropic's terminal-based coding agent. It reads your files, writes code, runs commands, and maintains context across a session. Dremi...

Read Article
March 5, 2026

How to Use Dremio with Cursor: Connect, Query, and Build Data Apps

Cursor is an AI-native code editor built as a fork of VS Code. It integrates AI directly into the editing experience with features like Chat, Composer...

Read Article
March 1, 2026

The 2025 State of the Apache Iceberg Ecosystem Results

![2025 Survey](https://imgur.com/eSwOYfd.png) **Raw Results at Bottom of Post** **Apache Iceberg Literature from Alex Merced and/or Andrew Madsen:**...

Read Article
March 1, 2026

Connect Dremio Software to Dremio Cloud: Hybrid Federation Across Deployments

Dremio Cloud can connect to Dremio Software (self-managed) instances as a federated data source. This creates a hybrid deployment where Dremio Cloud s...

Read Article
March 1, 2026

Dremio's Built-in Open Catalog: Your Zero-Configuration Apache Iceberg Lakehouse

Every Dremio Cloud account starts with a built-in Open Catalog — a fully managed Apache Iceberg catalog with integrated storage. When you create a Dre...

Read Article
March 1, 2026

Connect Any Iceberg REST Catalog to Dremio Cloud: Universal Lakehouse Access

The Apache Iceberg REST Catalog specification defines a standard HTTP API for managing Iceberg table metadata. Any catalog implementation that conform...

Read Article
March 1, 2026

Connect Databricks Unity Catalog to Dremio Cloud: Query Delta Lake Tables with Federation and AI

Databricks Unity Catalog is Databricks' governance layer for data and AI assets. It manages Delta Lake tables, machine learning models, feature stores...

Read Article
March 1, 2026

Connect Snowflake Open Catalog to Dremio Cloud: Multi-Engine Iceberg Analytics

Snowflake Open Catalog is Snowflake's managed implementation of the Apache Iceberg REST catalog specification, based on the open-source Apache Polaris...

Read Article
March 1, 2026

Connect AWS Glue Data Catalog to Dremio Cloud: Query and Manage Your AWS Iceberg Tables

AWS Glue Data Catalog is AWS's managed metadata service for data lakes. It stores table definitions, schemas, partition information, and statistics fo...

Read Article
March 1, 2026

Connect Apache Druid to Dremio Cloud: Add SQL Joins, AI, and Governance to Your Real-Time Analytics

Apache Druid is a real-time analytics database designed for sub-second queries on high-ingestion-rate event data. Clickstream analytics, application m...

Read Article
March 1, 2026

Connect MongoDB to Dremio Cloud: SQL Analytics on Document Data

MongoDB is the most popular NoSQL document database. It stores data in flexible JSON-like documents, making it ideal for applications with evolving sc...

Read Article
March 1, 2026

Connect Vertica to Dremio Cloud: Federation for Analytics-Optimized Data

Vertica is a columnar analytics database engineered for fast aggregate queries on large datasets. It was built from the ground up for analytical workl...

Read Article
March 1, 2026

Connect Azure Synapse Analytics to Dremio Cloud: Multi-Cloud Data Warehouse Federation

Microsoft Azure Synapse Analytics combines big data analytics and enterprise data warehousing into a single Azure-integrated platform. If your organiz...

Read Article
March 1, 2026

Connect Snowflake to Dremio Cloud: Federate, Govern, and Accelerate Beyond Snowflake

Snowflake is a popular cloud data warehouse known for its separation of storage and compute, near-zero maintenance, and broad ecosystem. Many organiza...

Read Article
March 1, 2026

Connect Google BigQuery to Dremio Cloud: Cross-Cloud Analytics Without Data Movement

Google BigQuery is Google Cloud's serverless data warehouse. If your organization uses Google Cloud Platform, BigQuery is where your analytics data, m...

Read Article
March 1, 2026

Connect Amazon Redshift to Dremio Cloud: Extend Your Warehouse with Federation and AI Analytics

Amazon Redshift is AWS's managed data warehouse, designed for petabyte-scale analytics. If your organization chose Redshift for analytical workloads, ...

Read Article
March 1, 2026

Connect Azure Storage to Dremio Cloud: Query Your Microsoft Data Lake with SQL and AI

Azure Storage is Microsoft's cloud storage platform, spanning Blob Storage, Azure Data Lake Storage Gen2 (ADLS Gen2), and Azure Files. If your organiz...

Read Article
March 1, 2026

Connect Amazon S3 to Dremio Cloud: Query Your Data Lake with SQL, Federation, and AI

Amazon S3 is the default landing zone for data in the cloud. Log files, Parquet datasets, CSV exports, JSON events, IoT telemetry, and raw data dumps ...

Read Article
March 1, 2026

Connect SAP HANA to Dremio Cloud: Unlock Analytics Beyond the SAP Ecosystem

SAP HANA is the in-memory database platform that powers SAP S/4HANA, SAP BW/4HANA, and custom enterprise applications across finance, manufacturing, l...

Read Article
March 1, 2026

Connect IBM Db2 to Dremio Cloud: Modernize Mainframe Analytics with Federation and AI

IBM Db2 is the relational database that powers critical applications across banking, insurance, government, healthcare, and manufacturing. For organiz...

Read Article
March 1, 2026

Connect Microsoft SQL Server to Dremio Cloud: Federate Enterprise Data Without ETL

Microsoft SQL Server is one of the most widely deployed enterprise databases in the world. ERP systems, CRM platforms, financial applications, and cus...

Read Article
March 1, 2026

Connect Oracle Database to Dremio Cloud: Enterprise Analytics Without Data Movement

Oracle Database runs the most critical enterprise applications in the world — ERP systems, financial ledgers, supply chain management, and HR platform...

Read Article
March 1, 2026

Connect MySQL to Dremio Cloud: Federated Analytics Without ETL

MySQL runs more web applications, SaaS platforms, and e-commerce backends than any other database. It's fast for transactional reads and writes, but i...

Read Article
March 1, 2026

Connect PostgreSQL to Dremio Cloud: Query, Federate, and Accelerate Your Data

PostgreSQL powers more production applications than almost any other open-source database. It's where your customer records, transaction logs, product...

Read Article
March 1, 2026

Extract Structured Data from Text with Dremio's AI_GENERATE Function

Unstructured text is the most underused data in most organizations. Customer emails sit in inboxes. Contract notes live in text fields. Meeting summar...

Read Article
March 1, 2026

Generate Summaries and Insights with Dremio's AI_COMPLETE Function

Every data team has a version of this problem: a table full of raw data that needs human-readable summaries, translations, or narrative descriptions. ...

Read Article
March 1, 2026

Classify Your Data with SQL: A Hands-On Guide to Dremio's AI_CLASSIFY Function

Most classification workflows require exporting data to Python, running a model, and importing results back into your warehouse. Dremio's `AI_CLASSIFY...

Read Article
February 18, 2026

Semantic Layer Best Practices: 7 Mistakes to Avoid

![Semantic layer best practices checklist — checks and mistakes](/images/blog/semantic-layer/best-practices.png) Semantic layers don't fail because t...

Read Article
February 18, 2026

How a Self-Documenting Semantic Layer Reduces Data Team Toil

![Self-documenting semantic layer — AI generating descriptions and labels automatically](/images/blog/semantic-layer/self-documenting.png) Every data...

Read Article
February 18, 2026

Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models

![Headless BI — one semantic layer serving all consumers](/images/blog/semantic-layer/headless-bi.png) Your organization uses Tableau for executive d...

Read Article
February 18, 2026

Data Virtualization and the Semantic Layer: Query Without Copying

![Data virtualization — connecting sources to a unified semantic layer without copying](/images/blog/semantic-layer/data-virtualization.png) Every da...

Read Article
February 18, 2026

The Role of the Semantic Layer in Data Governance

![Data governance through a semantic layer — centralized policies and documentation](/images/blog/semantic-layer/governance-semantic.png) Most organi...

Read Article
February 18, 2026

Why Your AI Initiatives Fail Without a Semantic Layer

![AI with vs without a semantic layer — failure modes and fixes](/images/blog/semantic-layer/ai-semantic-layer.png) Your team builds an AI agent. It ...

Read Article
February 18, 2026

Semantic Layer vs. Data Catalog: Complementary, Not Competing

![Data catalog and semantic layer — complementary systems bridged together](/images/blog/semantic-layer/catalog-vs-semantic.png) "We already have a d...

Read Article
February 18, 2026

Semantic Layer vs. Metrics Layer: What's the Difference?

![Semantic layer vs metrics layer — the metrics layer is a subset](/images/blog/semantic-layer/semantic-vs-metrics.png) Both terms appear in every mo...

Read Article
February 18, 2026

How to Build a Semantic Layer: A Step-by-Step Guide

![Building a semantic layer — Bronze, Silver, and Gold tiers](/images/blog/semantic-layer/build-semantic-layer.png) Most teams start building a seman...

Read Article
February 18, 2026

What Is a Semantic Layer? A Complete Guide

![Semantic layer concept — translating raw data into business terms](/images/blog/semantic-layer/semantic-layer-concept.png) Ask three teams in your ...

Read Article
February 18, 2026

Data Engineering Best Practices: The Complete Checklist

![Comprehensive data engineering checklist organized by categories with status indicators](/images/blog/debp/de-checklist.png) Best practices documen...

Read Article
February 18, 2026

Pipeline Observability: Know When Things Break

![Pipeline observability dashboard showing metrics, logs, and data lineage](/images/blog/debp/observability-dashboard.png) An analyst messages you on...

Read Article
February 18, 2026

Testing Data Pipelines: What to Validate and When

![Data pipeline testing pyramid with schema tests at the base, contract tests in the middle, and regression tests at the top](/images/blog/debp/testin...

Read Article
February 18, 2026

Partition and Organize Data for Performance

![Table data split into partitions by date with query scanning only the relevant partition](/images/blog/debp/partition-overview.png) A table with 50...

Read Article
February 18, 2026

Batch vs. Streaming: Choose the Right Processing Model

![Batch processing in scheduled groups vs streaming in continuous flow](/images/blog/debp/batch-vs-streaming.png) "We need real-time data." This is o...

Read Article
February 18, 2026

Schema Evolution Without Breaking Consumers

![Schema as a contract between producers and consumers with version tracking](/images/blog/debp/schema-contract.png) A source team renames a column f...

Read Article
February 18, 2026

Idempotent Pipelines: Build Once, Run Safely Forever

![Pipeline running multiple times and converging to the same result](/images/blog/debp/idempotent-pipeline.png) A pipeline runs, processes 100,000 re...

Read Article
February 18, 2026

Data Quality Is a Pipeline Problem, Not a Dashboard Problem

![Data quality checks enforced at the pipeline validation stage before data reaches consumers](/images/blog/debp/data-quality-pipeline.png) When an a...

Read Article
February 18, 2026

How to Design Reliable Data Pipelines

![Data pipeline architecture with four layers flowing from ingestion through staging, transformation, and serving](/images/blog/debp/pipeline-architec...

Read Article
February 18, 2026

How to Think Like a Data Engineer

![Data flowing through a system of interconnected pipeline stages from sources to consumers](/images/blog/debp/data-engineer-mindset.png) The median ...

Read Article
February 18, 2026

Data Modeling Best Practices: 7 Mistakes to Avoid

![Checklist of data modeling quality markers with warning symbols on common mistakes](/images/blog/data-modeling/best-practices-checklist.png) A bad ...

Read Article
February 18, 2026

Data Vault Modeling: Hubs, Links, and Satellites

![Data Vault model showing Hubs, Links, and Satellites as interconnected components](/images/blog/data-modeling/data-vault-overview.png) Dimensional ...

Read Article
February 18, 2026

Denormalization: When and Why to Flatten Your Data

![Normalized model with many interconnected tables vs. denormalized wide flat table](/images/blog/data-modeling/denormalization-overview.png) Normali...

Read Article
February 18, 2026

Data Modeling for Analytics: Optimize for Queries, Not Transactions

![OLTP normalized model vs. OLAP denormalized model side by side](/images/blog/data-modeling/analytics-data-modeling.png) The data model that runs yo...

Read Article
February 18, 2026

Slowly Changing Dimensions: Types 1-3 with Examples

![Dimension timeline showing attribute values changing across time periods](/images/blog/data-modeling/slowly-changing-dimensions.png) Dimensions cha...

Read Article
February 18, 2026

Dimensional Modeling: Facts, Dimensions, and Grains

![Dimensional model showing a central fact table connected to surrounding dimension tables](/images/blog/data-modeling/dimensional-modeling.png) Dime...

Read Article
February 18, 2026

Data Modeling for the Lakehouse: What Changes

![Traditional data warehouse model vs. open lakehouse model with flexible schema and views](/images/blog/data-modeling/lakehouse-data-modeling.png) T...

Read Article
February 18, 2026

Star Schema vs. Snowflake Schema: When to Use Each

![Star schema with central fact table surrounded by denormalized dimension tables](/images/blog/data-modeling/star-vs-snowflake.png) Both star schema...

Read Article
February 18, 2026

Conceptual, Logical, and Physical Data Models Explained

![Three layers of data modeling from business concepts to database implementation](/images/blog/data-modeling/types-of-data-models.png) Most data tea...

Read Article
February 18, 2026

What Is Data Modeling? A Complete Guide

![Data entities connected by relationship lines forming a structured data model](/images/blog/data-modeling/data-modeling-overview.png) Every databas...

Read Article
February 13, 2026

A 2026 Introduction to Apache Iceberg

Apache Iceberg is an open-source table format for large analytic datasets. It defines how data files stored on object storage (S3, ADLS, GCS) are orga...

Read Article
January 15, 2026

A Practical Guide to AI-Assisted Coding Tools

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Definitive Guide](ht...

Read Article
January 10, 2026

What Are Recursive Language Models?

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
January 6, 2026

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
January 2, 2026

Building Pangolin - My Holiday Break, an AI IDE, and a Lakehouse Catalog for the Curious

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
December 29, 2025

2025 Year in Review Apache Iceberg, Polaris, Parquet, and Arrow

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
December 5, 2025

dremioframe & iceberg - Pythonic interfaces for Dremio and Apache Iceberg

Modern data teams want simple tools to work with Iceberg tables and Dremio. Two new Python libraries now make that work easier. The first is DremioFra...

Read Article
November 29, 2025

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

If you're a data analyst or Python developer who prefers chaining expressive `.select()` and `.mutate()` calls over writing raw SQL, you're going to l...

Read Article
November 12, 2025

Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)

[Video Playlist of this Walkthough](https://www.youtube.com/playlist?list=PL-gIUf9e9CCvY0bcRBGu2SzFFR-yJGIB6) On November 13, at the [Subsurface Lake...

Read Article
October 23, 2025

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

The data world is evolving fast. Just a few years ago, building a modern analytics stack meant stitching together tools, ETL pipelines, and compromise...

Read Article
October 21, 2025

An Exploration of the Commercial Iceberg Catalog Ecosystem

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
October 17, 2025

Building a Universal Lakehouse Catalog - Beyond Iceberg Tables

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
October 16, 2025

Intro to Apache Iceberg with Apache Polaris and Apache Spark

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
October 14, 2025

The State of Apache Iceberg v4 - October 2025 Edition

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
September 24, 2025

The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Read Article
September 23, 2025

The 2025 & 2026 Ultimate Guide to the Data Lakehouse and the Data Lakehouse Ecosystem

- [Join the Data Lakehouse Community](https://www.datalakehousehub.com) - [Data Lakehouse Blog Listings](https://lakehouseblogs.com) *Year-end 2025 r...

Read Article
September 17, 2025

Composable Analytics with Agents - Leveraging Virtual Datasets and the Semantic Layer

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
September 16, 2025

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
September 9, 2025

Managing Large-Scale Optimizations — Parallelism, Checkpointing, and Fail Recovery

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
September 5, 2025

Unlocking the Power of Agentic AI with Apache Iceberg and Dremio

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
September 2, 2025

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
August 26, 2025

Using Iceberg Metadata Tables to Determine When Compaction Is Needed

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
August 19, 2025

Designing the Ideal Cadence for Compaction and Snapshot Expiration

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
August 12, 2025

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
August 5, 2025

Smarter Data Layout — Sorting and Clustering Iceberg Tables

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
July 29, 2025

Optimizing Compaction for Streaming Workloads in Apache Iceberg

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
July 22, 2025

The Basics of Compaction — Bin Packing Your Data for Efficiency

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
July 15, 2025

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Read Article
July 3, 2025

How to Discover or Organize Lakehouse & Apache Iceberg Meetups

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
June 23, 2025

What is an API? And Why Data Architecture Depends on Them

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
June 18, 2025

Decoding AWS EC2 Instance Type Names

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | What is Data Engineering?

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Understanding Data Sources and Ingestion

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Batch Processing Fundamentals

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Streaming Data Fundamentals

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Data Modeling Basics

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Data Warehousing Fundamentals

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Data Lakes Explained

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Storage Formats and Compression

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Data Quality and Validation

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Metadata, Lineage, and Governance

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Building Scalable Pipelines

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Cloud Data Platforms and the Modern Stack

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | DevOps for Data Engineering

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Data Lakehouse Architecture Explained

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | Apache Iceberg, Arrow, and Polaris

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
May 2, 2025

Introduction to Data Engineering Concepts | The Power of Dremio in the Modern Lakehouse

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 14, 2025

A Journey from AI to LLMs and MCP - 10 - Sampling and Prompts in MCP — Making Agent Workflows Smarter and Safer

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 13, 2025

A Journey from AI to LLMs and MCP - 9 - Tools in MCP — Giving LLMs the Power to Act

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 12, 2025

A Journey from AI to LLMs and MCP - 8 - Resources in MCP — Serving Relevant Data Securely to LLMs

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 11, 2025

A Journey from AI to LLMs and MCP - 7 - Under the Hood — The Architecture of MCP and Its Core Components

# A Journey from AI to LLMs and MCP - 7 - Under the Hood — The Architecture of MCP and Its Core Components ## Free Resources - **[Free Apache Icebe...

Read Article
April 10, 2025

Journey from AI to LLMs and MCP - 6 - Enter the Model Context Protocol (MCP) — The Interoperability Layer for AI Agents

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 9, 2025

A Journey from AI to LLMs and MCP - 5 - AI Agent Frameworks — Benefits and Limitations

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 8, 2025

A Journey from AI to LLMs and MCP - 4 - What Are AI Agents — And Why They're the Future of LLM Applications

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 7, 2025

A Journey from AI to LLMs and MCP - 3 - Boosting LLM Performance — Fine-Tuning, Prompt Engineering, and RAG

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 6, 2025

A Journey from AI to LLMs and MCP - 2 - How LLMs Work — Embeddings, Vectors, and Context Windows

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 5, 2025

A Journey from AI to LLMs and MCP - 1 - What Is AI and How It Evolved Into LLMs

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
April 4, 2025

Building a Basic MCP Server with Python

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
February 19, 2025

Using Helm with Kubernetes - A Guide to Helm Charts and Their Implementation

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
February 1, 2025

Crash Course on Developing AI Applications with LangChain

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
January 31, 2025

The Data Lakehouse - The Benefits and Enhancing Implementation

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Read Article
January 20, 2025

2025 Comprehensive Guide to Apache Iceberg

- [Free Apache Iceberg Crash Course](https://university.dremio.com/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=2025-iceberg-comp-g...

Read Article
January 7, 2025

When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
December 9, 2024

2025 Guide to Architecting an Iceberg Lakehouse

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
November 25, 2024

10 Future Apache Iceberg Developments to Look forward to in 2025

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
November 15, 2024

Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
November 8, 2024

Intro to SQL using Apache Iceberg and Dremio

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
November 5, 2024

Dremio, Apache Iceberg and their role in AI-Ready Data

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
November 5, 2024

Introduction to Cargo and cargo.toml

When working with Rust, Cargo is your go-to tool for managing dependencies, building, and running your projects. Acting as Rust's package manager and ...

Read Article
November 1, 2024

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Read Article
October 31, 2024

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_b...

Read Article
October 30, 2024

Data Modeling - Entities and Events

Structuring data thoughtfully is critical for both operational efficiency and analytical value. Data modeling helps us define the relationships, const...

Read Article
October 21, 2024

All About Parquet Part 01 - An Introduction

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 02 - Parquet's Columnar Storage Model

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 04 - Schema Evolution in Parquet

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 05 - Compression Techniques in Parquet

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 09 - Parquet in Data Lake Architectures

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 21, 2024

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 19, 2024

Orchestrating Airflow DAGs with GitHub Actions - A Lightweight Approach to Data Curation Across Spark, Dremio, and Snowflake

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Read Article
October 19, 2024

A Deep Dive Into GitHub Actions From Software Development to Data Engineering

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_content=alexmerced&u...

Read Article
October 18, 2024

A Guide to dbt Macros - Purpose, Benefits, and Usage

- [Apache Iceberg 101](https://www.dremio.com/lakehouse-deep-dives/apache-iceberg-101/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=...

Read Article
October 16, 2024

Data Lakehouse Roundup 1 - News and Insights on the Lakehouse

I’m excited to kick off a new series called "Data Lakehouse Roundup," where I’ll cover the latest developments in the data lakehouse space, approximat...

Read Article
October 15, 2024

Getting Started with Data Analytics Using PyArrow in Python

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

Read Article
October 9, 2024

What is Three-Tier Data (Bronze, Silver, Gold) and How Dremio Simplifies It

- [Apache Iceberg 101](https://www.dremio.com/lakehouse-deep-dives/apache-iceberg-101/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=...

Read Article
October 7, 2024

A Brief Guide to the Governance of Apache Iceberg Tables

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

Read Article
October 7, 2024

Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

Read Article
October 5, 2024

Ultimate Directory of Apache Iceberg Resources

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you...

Read Article
October 4, 2024

Change Data Capture (CDC) when there is no CDC

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

Read Article
September 25, 2024

Virtualization + Lakehouse + Mesh = Data At Scale

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
September 22, 2024

Deep Dive into Data Apps with Streamlit

# Introduction The ability to quickly develop and deploy interactive applications is invaluable. **Streamlit** is a powerful tool that enables data s...

Read Article
September 21, 2024

A Deep Dive into Docker Compose

## Understanding the Docker Compose File Structure Docker Compose uses a YAML file (`docker-compose.yml`) to define services, networks, and volumes t...

Read Article
September 12, 2024

Hands-on with Apache Iceberg on Your Laptop - Deep Dive with Apache Spark, Nessie, Minio, Dremio, Polars and Seaborn

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
September 10, 2024

Why Data Analysts, Engineers, Architects and Scientists Should Care about Dremio and Apache Iceberg

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
September 1, 2024

5 Trends in the Data Lakehouse Space

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
August 30, 2024

Using the alexmerced/datanotebook Docker Image

- [Watch My Intro to Data Playlist](https://www.youtube.com/watch?v=nq8ETrTgT7o&list=PLsLAVBjQJO0p_4Nqz99tIjeoDYE97L0xY&pp=iAQB) - [Download Free Copy...

Read Article
August 29, 2024

Understanding Apache Iceberg Delete Files

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
August 27, 2024

Understanding the Apache Iceberg Manifest

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
August 25, 2024

Understanding the Apache Iceberg Manifest List (Snapshot)

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Read Article
August 21, 2024

Understanding Apache Iceberg's Metadata.json

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

Read Article
August 18, 2024

What Apache Iceberg REST Catalog is and isn't

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

Read Article
August 15, 2024

ACID Guarantees and Apache Iceberg - Turning Any Storage into a Data Warehouse

Apache Iceberg has become a prominent name in the data world, with numerous platforms integrating support for Iceberg tables as part of the growing op...

Read Article
August 5, 2024

Data Lakehouse 101 - The Who, What and Why of Data Lakehouses

- [Sign-up for this free Apache Iceberg Crash Course](https://bit.ly/am-2024-iceberg-live-crash-course-1) - [Get a free copy of Apache Iceberg the Def...

Read Article
July 31, 2024

Understanding the Polaris Iceberg Catalog and Its Architecture

NOTE: I am working on a hands-on tutorial for Polaris, so please watch for the [Dremio Blog](https://www.dremio.com/blog) in the coming days. Also, ch...

Read Article
July 26, 2024

Apache Iceberg Reliability

- [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://bit.ly/am-iceberg-book) - [Sign Up for the Free Apache Iceberg Crash Course](htt...

Read Article
July 20, 2024

Upcoming Data Talks from Alex Merced (And how to follow)

In this article, I will provide you with a list of events I'm currently scheduled to speak at. New events are regularly being added, so here are a cou...

Read Article
July 12, 2024

Databases Deconstructed - The Value of Data Lakehouses and Table Formats

- [Checkout out my Apache Iceberg Crash Course](https://bit.ly/am-2024-iceberg-live-crash-course-1) - [Get a free copy of Apache Iceberg the Definitiv...

Read Article
June 26, 2024

Video Course - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

[Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://bit.ly/am-iceberg-book) ## #1 - Intro - Basics of Lakehouse Engineering - Apache ...

Read Article
May 29, 2024

Partitioning with Apache Iceberg - A Deep Dive

- [Apache Iceberg 101](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) - [Get Hands-on W...

Read Article
May 15, 2024

3 Reasons Data Engineers Should Embrace Apache Iceberg

Data engineers are constantly seeking ways to streamline workflows and enhance data management efficiency. [Apache Iceberg, a high-performance table f...

Read Article
May 3, 2024

Running SQL on your Excel Files From Your Laptop with Dremio

Being able to quickly analyze and gain insights from your data is crucial. Excel is widely used for data storage, but when it comes to complex queries...

Read Article
April 4, 2024

Understanding the Future of Apache Iceberg Catalogs

[Apache Iceberg](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) is revolutionizing the ...

Read Article
April 4, 2024

A Deep Intro to Apache Iceberg and Resources for Learning More

For a long time, siloed data systems such as databases and data warehouses were sufficient. These systems provided convenient abstractions for various...

Read Article
April 1, 2024

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

Data engineering aims to make data accessible and usable for data analytics and data science purposes. This involves several key aspects: - Transferr...

Read Article
March 19, 2024

5 Open Source Data Projects You Should Be Following

[Follow Me On Social](https://bio.alexmerced.com/data) [Subscribe to my SubStack](https://amdatalakehouse.substack.com) Open source technology signif...

Read Article
March 9, 2024

5 Reasons Dremio is the Ideal Apache Iceberg Lakehouse Platform

[The Apache Iceberg table format](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) has se...

Read Article
March 6, 2024

The Apache Iceberg Lakehouse - The Great Data Equalizer

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

Read Article
March 1, 2024

10 Reasons to Make Apache Iceberg and Dremio Part of Your Data Lakehouse Strategy

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

Read Article
March 1, 2024

A deep dive into the concept and world of Apache Iceberg Catalogs

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

Read Article
February 24, 2024

Introduction to ANSI SQL - Understanding the Syntax and Concepts

[Subscribe to my Data Youtube Channel and Podcasts, Links Here](https://bio.alexmerced.com/data) [Subscribe to my web development youtube channel and...

Read Article
February 24, 2024

The Role of Ontologies in Data Management

The concept of ontologies plays a pivotal role in organizing and making sense of the vast information available. In data management, ontologies are cr...

Read Article
February 21, 2024

What is the Data Lakehouse and the Role of Apache Iceberg, Nessie and Dremio?

Organizations are constantly seeking more efficient, scalable, and flexible solutions to manage their ever-growing data assets. This quest has led to ...

Read Article
February 12, 2024

Partitioning Practices in Apache Hive and Apache Iceberg

# Partitioning Practices in Apache Hive and Apache Iceberg ## Introduction The efficiency of query execution is paramount. One of the key strategies ...

Read Article
February 3, 2024

Columnar vs. Row-based Data Structures in OLTP and OLAP Systems

[Follow my Data Youtube Channel](https://www.youtube.com/@alexmerceddata) The decision between using columnar and row-based data structures can signi...

Read Article
February 2, 2024

Introduction to Data Vault Modeling

[Subscribe to my Data Youtube Channel and Podcasts, Links Here](https://bio.alexmerced.com/data) Data Vault modeling is an approach to data warehouse...

Read Article
February 2, 2024

Table Format FUD - Thinking Through the Table Format Conversion (Apache Iceberg, Apache Hudi, Delta Lake)

## Context This article is meant to be a sober reflection on the data lakehouse table format conversation I have had as a participant over the last t...

Read Article
January 25, 2024

Embracing the Future of Data Management - Why Choose Lakehouse, Iceberg, and Dremio?

Data is not just an asset but the cornerstone of business strategy. The way we manage, store, and process this invaluable resource has evolved dramati...

Read Article
January 19, 2024

Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources

The concept of the **Open Lakehouse** has emerged as a beacon of flexibility and innovation. An Open Lakehouse represents a specialized form data lake...

Read Article
January 8, 2024

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

Unlike traditional table formats, Apache Iceberg provides a comprehensive solution for handling big data's complexity, volume, and diversity. It's des...

Read Article
January 3, 2024

Apache Iceberg, Git-Like Catalog Versioning and Data Lakehouse Management - Pillars of a Robust Data Lakehouse Platform

Managing vast amounts of data efficiently and effectively is crucial for any organization aiming to leverage its data for strategic decisions. The key...

Read Article