Welcome to the intermediate DataDev Quest Challenge for May 2025! This challenge is designed to help you learn how to use Tableau Hyper API to create datasources for your dashboards.
Difficulty Level: Intermediate
Time Estimate: 45 minutes
Technology: Python, Tableau, hyperapi, pantab
Challenge Overview
Objective
Hey there, data wizard! Ready to level up your Tableau skills? Today we’re tackling something super practical, converting multiple Excel files into Tableau’s lightning-fast Hyper format using Python, and then running SQL queries against those Hyper files!
You’ll implement two different approaches to accomplish these tasks and compare their performance and functionality.
Why this challenge matters
In the real world, data rarely comes from just one source, right? Being able to combine multiple Excel files into a single Hyper database and then query that data is a game-changer for:
- Getting those cross-departmental insights without the spreadsheet chaos
- Building dashboards that don’t make your users fall asleep waiting for them to load
- Setting up automated data pipelines that handle all your Excel sources
- Filtering and transforming data using SQL before it even hits your Tableau dashboard
Master this workflow and watch your colleagues suddenly want to “pick your brain” about their data challenges!
Learning Goals
After crushing this challenge, you’ll be able to:
- Work with multiple data sources and combine them into a single Tableau Hyper file
- Run SQL queries directly against Hyper files (seriously cool stuff!)
- Understand when to use Pantab vs. Hyper API for different scenarios
- Create more efficient data workflows for your Tableau projects
- Impress your colleagues with your Python-Tableau integration skills
These skills are pure gold for anyone working with data visualization!
Submission Guidelines
- Source Code: Publish your project publically in the GitHub profile
- Add README: Include setup instructions and describe how to run the program.
- Video of Solution: Include a video of your solution in the README file, which you can publish on YouTube and embed the iframe. Or save the video file in the directory root inside the repository.
- Comments: Ensure your code is well-commented.
- Submission: Submit your challenge in the following **forms**
Additional Resources
- Tableau Hyper API Documentation
- Pantab GitHub Repository
- Pandas Documentation
- Tableau Developer Program
The Challenge
Here’s the scenario: You’ve got two Excel files: “RWFD_Supply_Chain.xlsx” with the “OrderList” sheet, and “RWFD_Solar_Energy.xlsx” with the “Actuals” sheet. Your mission is to:
You can find both datasets here!
- Convert BOTH Excel files into a SINGLE Tableau Hyper file using TWO different methods:
- Using the Pantab library – perfect for quick conversions
- Using the Tableau Hyper API – giving you maximum control
- Run SQL queries against your Hyper files to filter data:
- Filter OrderList for items with Weight > 50
- Filter Actuals for locations with Latitude < 40
- Save the query results back to Hyper files (in the same file, overwrite it!)
For each method, you’ll:
- Pull both Excel datasets into pandas DataFrames
- Create a Hyper file containing both tables
- Run SQL queries against the data
- Save the filtered results
- Time each process (let’s see which approach wins the speed race!)
Getting Started
Prerequisites
Before diving in, make sure you’ve got:
- Python fundamentals down (variables, loops, functions, etc.)
- Familiarity with pandas for data manipulation
- Basic SQL knowledge (SELECT, WHERE clauses)
- Python 3.6 or newer installed
- These Python packages are ready to go:
- pandas (for data wrangling)
- pantab (for the easy-mode conversion)
Don’t have these installed? No worries! Just run:
pip install pandas pantab tableauhyperapi
PythonChallenge Instructions
Part 1: Setup and Data Loading
First, let’s get everything organized:
- Install the required packages:
pip install pandas pantab tableauhyperapi
Python- Create a new Python script or Jupyter notebook
- Import the necessary libraries:
import pandas as pd
import time
# Use this lib for Version 1
import pantab
# Use this lib for Version 2
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability
Python- Define paths to your Excel files:
# Path to the first Excel file
excel_path1 = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name1 = "OrderList"
# Path to the second Excel file
excel_path2 = "path/to/your/RWFD_Solar_Energy.xlsx"
sheet_name2 = "Actuals"
Python- Read both Excel files into pandas DataFrames:
# Read the Excel files into DataFrames
df1 = pd.read_excel(excel_path1, sheet_name=sheet_name1)
df2 = pd.read_excel(excel_path2, sheet_name=sheet_name2)
PythonPart 2: Method 1 – The Pantab Approach
Now let’s implement our first method using pantab:
- Set up a path for your Hyper file
- Time the process with `time.time()`
- Use `pantab.frames_to_hyper()` to create a Hyper file with both tables
- Write SQL queries to filter each table (Weight > 50 for OrderList, Latitude < 40 for Actuals)
- Use `pantab.frame_from_hyper_query()` to execute each SQL query
- Save the filtered results back to a Hyper file
- Calculate and print the processing times
The pantab approach is like using a food processor: quick, efficient, and gets the job done with minimal fuss!
Part 3: Method 2 – The Hyper API Approach
Time for the more detailed approach:
- Define a path for your second Hyper file
- Start timing with `time.time()`
- Use HyperProcess and Connection to create a Hyper database
- Create appropriate table schemas based on your DataFrame columns
- Insert data from both DataFrames into your tables
- Run SQL queries against your tables using the Hyper API
- Create a new Hyper file with the filtered results
- Calculate and print the processing times
This approach is like custom-crafting your database, more steps but ultimate control!
Part 4: Compare Results
Now for the analysis:
- Compare the processing times for both methods
- Think about which approach was easier to implement
- Consider which method gives you more flexibility
- Reflect on when you would choose one method over the other
It’s like comparing automatic vs. manual transmission, each has its perfect use case!
Solution Expectations
Your awesome solution should:
- Successfully create Hyper files containing both tables using both methods
- Run SQL queries against both tables and save filtered results
- Include timing information for performance comparison
- Handle different data types appropriately in both approaches
- Follow good practices for resource management (closing connections, etc.)
Testing Your Solution
Wanna make sure everything worked? Try this:
- Open both resulting Hyper files in Tableau Desktop
- Check that both tables are present and contain the expected data
- Verify that your filtered results match the query criteria:
- OrderList items have Weight > 50
- Actual locations have Latitude < 40
- Ensure all data types are preserved correctly
If everything looks good in Tableau, you’ve nailed it!
Common Mistakes to Avoid
Keep an eye out for these potential pitfalls:
- Forgetting to close Hyper connections properly
- Mismatching data types when using the Hyper API
- Using incorrect schema or table names in SQL queries
- Not handling potential null values in your data
- Forgetting to execute the Inserter when adding rows
Extensions and Variations
Want to push your skills further? Try these bonus challenges:
- Add a join or union query that combines data from both tables
- Implement aggregate queries (COUNT, SUM, AVG) against your data
- Create a simple command-line tool that allows users to specify their own SQL filters
- Optimize the code for very large datasets
- Add error handling for common issues like missing columns or data type mismatches
Solution
Coming soon…
Special Thanks
To Marcelo Has and Cristian Saavedra Desmoineaux, the dynamic duo who invited me to create this challenge.
To William Ayd, one of the biggest contributors to pantab, a true DataDev Legend!
To DataDevQuest Organizers and DataDev Ambassadors team who keep pushing the boundaries of what’s possible with data.
Who am I?

I’m Nik Dutra, data enthusiast, Tableau lover, and proud DataDev Ambassador. I’m passionate about helping everyone level up their data skills!
Connect on LinkedIn or find me in the DataDev Slack as @nikdutra
.
Happy coding, and may the data be with you!