Welcome to the intermediate DataDev Quest Challenge for May 2025! This challenge is designed to help you learn how to use Tableau Hyper API to create datasources for your dashboards.

Difficulty Level: Intermediate
Time Estimate: 45 minutes
Technology: Python, Tableau, hyperapi, pantab


Challenge Overview

Objective

Hey there, data wizard! Ready to level up your Tableau skills? Today we’re tackling something super practical, converting multiple Excel files into Tableau’s lightning-fast Hyper format using Python, and then running SQL queries against those Hyper files!

You’ll implement two different approaches to accomplish these tasks and compare their performance and functionality.

Why this challenge matters

In the real world, data rarely comes from just one source, right? Being able to combine multiple Excel files into a single Hyper database and then query that data is a game-changer for:

Master this workflow and watch your colleagues suddenly want to “pick your brain” about their data challenges!

Learning Goals

After crushing this challenge, you’ll be able to:

These skills are pure gold for anyone working with data visualization!

Submission Guidelines

Additional Resources


The Challenge

Here’s the scenario: You’ve got two Excel files: “RWFD_Supply_Chain.xlsx” with the “OrderList” sheet, and “RWFD_Solar_Energy.xlsx” with the “Actuals” sheet. Your mission is to:

You can find both datasets here!

  1. Convert BOTH Excel files into a SINGLE Tableau Hyper file using TWO different methods:
    • Using the Pantab library – perfect for quick conversions
    • Using the Tableau Hyper API – giving you maximum control
  2. Run SQL queries against your Hyper files to filter data:
    • Filter OrderList for items with Weight > 50
    • Filter Actuals for locations with Latitude < 40
  3. Save the query results back to Hyper files (in the same file, overwrite it!)

For each method, you’ll:


Getting Started

Prerequisites

Before diving in, make sure you’ve got:

Don’t have these installed? No worries! Just run:

pip install pandas pantab tableauhyperapi
Python

Challenge Instructions

Part 1: Setup and Data Loading

First, let’s get everything organized:

pip install pandas pantab tableauhyperapi
Python
 import pandas as pd
 import time
 # Use this lib for Version 1
 import pantab 
 # Use this lib for Version 2
 from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability
Python
# Path to the first Excel file
excel_path1 = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name1 = "OrderList"
 
# Path to the second Excel file
excel_path2 = "path/to/your/RWFD_Solar_Energy.xlsx" 
sheet_name2 = "Actuals"
Python
# Read the Excel files into DataFrames
df1 = pd.read_excel(excel_path1, sheet_name=sheet_name1)
df2 = pd.read_excel(excel_path2, sheet_name=sheet_name2)
Python

Part 2: Method 1 – The Pantab Approach

Now let’s implement our first method using pantab:

  1. Set up a path for your Hyper file
  2. Time the process with `time.time()`
  3. Use `pantab.frames_to_hyper()` to create a Hyper file with both tables
  4. Write SQL queries to filter each table (Weight > 50 for OrderList, Latitude < 40 for Actuals)
  5. Use `pantab.frame_from_hyper_query()` to execute each SQL query
  6. Save the filtered results back to a Hyper file
  7. Calculate and print the processing times

The pantab approach is like using a food processor: quick, efficient, and gets the job done with minimal fuss!

Part 3: Method 2 – The Hyper API Approach

Time for the more detailed approach:

  1. Define a path for your second Hyper file
  2. Start timing with `time.time()`
  3. Use HyperProcess and Connection to create a Hyper database
  4. Create appropriate table schemas based on your DataFrame columns
  5. Insert data from both DataFrames into your tables
  6. Run SQL queries against your tables using the Hyper API
  7. Create a new Hyper file with the filtered results
  8. Calculate and print the processing times

This approach is like custom-crafting your database, more steps but ultimate control!

Part 4: Compare Results

Now for the analysis:

  1. Compare the processing times for both methods
  2. Think about which approach was easier to implement
  3. Consider which method gives you more flexibility
  4. Reflect on when you would choose one method over the other

It’s like comparing automatic vs. manual transmission, each has its perfect use case!

Solution Expectations

Your awesome solution should:

Testing Your Solution

Wanna make sure everything worked? Try this:

  1. Open both resulting Hyper files in Tableau Desktop
  2. Check that both tables are present and contain the expected data
  3. Verify that your filtered results match the query criteria:
    • OrderList items have Weight > 50
    • Actual locations have Latitude < 40
  4. Ensure all data types are preserved correctly

If everything looks good in Tableau, you’ve nailed it!

Common Mistakes to Avoid

Keep an eye out for these potential pitfalls:

Extensions and Variations

Want to push your skills further? Try these bonus challenges:


Solution

Coming soon…


Special Thanks

To Marcelo Has and Cristian Saavedra Desmoineaux, the dynamic duo who invited me to create this challenge.
To William Ayd, one of the biggest contributors to pantab, a true DataDev Legend!
To DataDevQuest Organizers and DataDev Ambassadors team who keep pushing the boundaries of what’s possible with data.


Who am I?

I’m Nik Dutra, data enthusiast, Tableau lover, and proud DataDev Ambassador. I’m passionate about helping everyone level up their data skills!
Connect on LinkedIn or find me in the DataDev Slack as @nikdutra.

Happy coding, and may the data be with you!

Leave a Reply

Your email address will not be published. Required fields are marked *