#DDQ2025-05 Multi-Source Excel data to Tableau Hyper with SQL Queries (Intermediate)

Welcome to the intermediate DataDev Quest Challenge for May 2025! This challenge is designed to help you learn how to use Tableau Hyper API to create datasources for your dashboards.

Difficulty Level: Intermediate
Time Estimate: 45 minutes
Technology: Python, Tableau, hyperapi, pantab

Challenge Overview

Objective

Hey there, data wizard! Ready to level up your Tableau skills? Today we’re tackling something super practical, converting multiple Excel files into Tableau’s lightning-fast Hyper format using Python, and then running SQL queries against those Hyper files!

You’ll implement two different approaches to accomplish these tasks and compare their performance and functionality.

Why this challenge matters

In the real world, data rarely comes from just one source, right? Being able to combine multiple Excel files into a single Hyper database and then query that data is a game-changer for:

Getting those cross-departmental insights without the spreadsheet chaos
Building dashboards that don’t make your users fall asleep waiting for them to load
Setting up automated data pipelines that handle all your Excel sources
Filtering and transforming data using SQL before it even hits your Tableau dashboard

Master this workflow and watch your colleagues suddenly want to “pick your brain” about their data challenges!

Learning Goals

After crushing this challenge, you’ll be able to:

Work with multiple data sources and combine them into a single Tableau Hyper file
Run SQL queries directly against Hyper files (seriously cool stuff!)
Understand when to use Pantab vs. Hyper API for different scenarios
Create more efficient data workflows for your Tableau projects
Impress your colleagues with your Python-Tableau integration skills

These skills are pure gold for anyone working with data visualization!

Submission Guidelines

Source Code: Publish your project publically in the GitHub profile
Add README: Include setup instructions and describe how to run the program.
Video of Solution: Include a video of your solution in the README file, which you can publish on YouTube and embed the iframe. Or save the video file in the directory root inside the repository.
Comments: Ensure your code is well-commented.
Submission: Submit your challenge in the following **forms**

Additional Resources

The Challenge

Here’s the scenario: You’ve got two Excel files: “RWFD_Supply_Chain.xlsx” with the “OrderList” sheet, and “RWFD_Solar_Energy.xlsx” with the “Actuals” sheet. Your mission is to:

You can find both datasets here!

Convert BOTH Excel files into a SINGLE Tableau Hyper file using TWO different methods:
- Using the Pantab library – perfect for quick conversions
- Using the Tableau Hyper API – giving you maximum control
Run SQL queries against your Hyper files to filter data:
- Filter OrderList for items with Weight > 50
- Filter Actuals for locations with Latitude < 40
Save the query results back to Hyper files (in the same file, overwrite it!)

For each method, you’ll:

Pull both Excel datasets into pandas DataFrames
Create a Hyper file containing both tables
Run SQL queries against the data
Save the filtered results
Time each process (let’s see which approach wins the speed race!)

Getting Started

Prerequisites

Before diving in, make sure you’ve got:

Python fundamentals down (variables, loops, functions, etc.)
Familiarity with pandas for data manipulation
Basic SQL knowledge (SELECT, WHERE clauses)
Python 3.6 or newer installed
These Python packages are ready to go:
- pandas (for data wrangling)
- pantab (for the easy-mode conversion)

Don’t have these installed? No worries! Just run:

pip install pandas pantab tableauhyperapi

Python

Challenge Instructions

Part 1: Setup and Data Loading

First, let’s get everything organized:

Install the required packages:

pip install pandas pantab tableauhyperapi

Python

Create a new Python script or Jupyter notebook
Import the necessary libraries:

 import pandas as pd
 import time
 # Use this lib for Version 1
 import pantab 
 # Use this lib for Version 2
 from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability

Python

Define paths to your Excel files:

# Path to the first Excel file
excel_path1 = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name1 = "OrderList"
 
# Path to the second Excel file
excel_path2 = "path/to/your/RWFD_Solar_Energy.xlsx" 
sheet_name2 = "Actuals"

Python

Read both Excel files into pandas DataFrames:

# Read the Excel files into DataFrames
df1 = pd.read_excel(excel_path1, sheet_name=sheet_name1)
df2 = pd.read_excel(excel_path2, sheet_name=sheet_name2)

Python

Part 2: Method 1 – The Pantab Approach

Now let’s implement our first method using pantab:

Set up a path for your Hyper file
Time the process with `time.time()`
Use `pantab.frames_to_hyper()` to create a Hyper file with both tables
Write SQL queries to filter each table (Weight > 50 for OrderList, Latitude < 40 for Actuals)
Use `pantab.frame_from_hyper_query()` to execute each SQL query
Save the filtered results back to a Hyper file
Calculate and print the processing times

The pantab approach is like using a food processor: quick, efficient, and gets the job done with minimal fuss!

Part 3: Method 2 – The Hyper API Approach

Time for the more detailed approach:

Define a path for your second Hyper file
Start timing with `time.time()`
Use HyperProcess and Connection to create a Hyper database
Create appropriate table schemas based on your DataFrame columns
Insert data from both DataFrames into your tables
Run SQL queries against your tables using the Hyper API
Create a new Hyper file with the filtered results
Calculate and print the processing times

This approach is like custom-crafting your database, more steps but ultimate control!

Part 4: Compare Results

Now for the analysis:

Compare the processing times for both methods
Think about which approach was easier to implement
Consider which method gives you more flexibility
Reflect on when you would choose one method over the other

It’s like comparing automatic vs. manual transmission, each has its perfect use case!

Solution Expectations

Your awesome solution should:

Successfully create Hyper files containing both tables using both methods
Run SQL queries against both tables and save filtered results
Include timing information for performance comparison
Handle different data types appropriately in both approaches
Follow good practices for resource management (closing connections, etc.)

Testing Your Solution

Wanna make sure everything worked? Try this:

Open both resulting Hyper files in Tableau Desktop
Check that both tables are present and contain the expected data
Verify that your filtered results match the query criteria:
- OrderList items have Weight > 50
- Actual locations have Latitude < 40
Ensure all data types are preserved correctly

If everything looks good in Tableau, you’ve nailed it!

Common Mistakes to Avoid

Keep an eye out for these potential pitfalls:

Forgetting to close Hyper connections properly
Mismatching data types when using the Hyper API
Using incorrect schema or table names in SQL queries
Not handling potential null values in your data
Forgetting to execute the Inserter when adding rows

Extensions and Variations

Want to push your skills further? Try these bonus challenges:

Add a join or union query that combines data from both tables
Implement aggregate queries (COUNT, SUM, AVG) against your data
Create a simple command-line tool that allows users to specify their own SQL filters
Optimize the code for very large datasets
Add error handling for common issues like missing columns or data type mismatches

Solution

Coming soon…

Special Thanks

To Marcelo Has and Cristian Saavedra Desmoineaux, the dynamic duo who invited me to create this challenge.
To William Ayd, one of the biggest contributors to pantab, a true DataDev Legend!
To DataDevQuest Organizers and DataDev Ambassadors team who keep pushing the boundaries of what’s possible with data.

Who am I?

I’m Nik Dutra, data enthusiast, Tableau lover, and proud DataDev Ambassador. I’m passionate about helping everyone level up their data skills!
Connect on LinkedIn or find me in the DataDev Slack as @nikdutra.

Happy coding, and may the data be with you!

Tagged Intermediate

#DDQ2025-05 Multi-Source Excel data to Tableau Hyper with SQL Queries (Intermediate)

Challenge Overview

Objective

Why this challenge matters

Learning Goals

Submission Guidelines

Additional Resources

The Challenge

Getting Started

Prerequisites

Challenge Instructions

Part 1: Setup and Data Loading

Part 2: Method 1 – The Pantab Approach

Part 3: Method 2 – The Hyper API Approach

Part 4: Compare Results

Solution Expectations

Testing Your Solution

Common Mistakes to Avoid

Extensions and Variations

Solution

Special Thanks

Who am I?

Leave a Reply Cancel reply

About

Categories

Monthly Newsletter