#DDQ2025-05 Convert Excel to Tableau Hyper Files (Beginner)

Welcome to the beginner DataDev Quest Challenge for May 2025! This challenge is designed to help you learn how to use Tableau Hyper API to create datasources for your dashboards.

Difficulty Level: Beginner
Time Estimate: 30 minutes
Technology: Python, Tableau, hyperapi, pantab

Challenge Overview

Objective

Hey there, data enthusiast! Today, we’re tackling a super useful skill for anyone working with Tableau – converting Excel data into Tableau’s speedy Hyper format using Python.

You’ll implement two different approaches to accomplish this task and compare their performance and ease of use.

Why this challenge matters

Let’s be real, nobody likes waiting for dashboards to load, right? That’s where Tableau’s Hyper format comes in clutch! Converting your Excel files to Hyper gives you:

Dashboards that load faster than your boss can say “is it ready yet?”
Queries that run so quickly you’ll have time for that extra coffee break
The ability to handle chunky datasets without your computer having a meltdown
Data refreshes that don’t take forever and a day

Trust me, your future self (and your teammates) will thank you for mastering this skill!

Learning Goals

Once you crush this challenge, you’ll be able to:

Read Excel data using pandas like a pro
Convert data to Tableau Hyper format in not one but TWO cool ways
Flex your Tableau Hyper API muscles
Make smart decisions about which approach to use when

These are seriously handy skills that’ll make you look like a data wizard!

Submission Guidelines

Source Code: Publish your project publically in the GitHub profile
Add README: Include setup instructions and describe how to run the program.
Video of Solution: Include a video of your solution in the README file, which you can publish on YouTube and embed the iframe. Or save the video file in the directory root inside the repository.
Comments: Ensure your code is well-commented.
Submission: Submit your challenge in the following **forms**

Additional Resources

The Challenge

So here’s the deal: You’ve got an Excel file called RWFD_Supply_Chain.xlsx with a bunch of order data in the “OrderList” sheet. Your mission (should you choose to accept it) is to convert this Excel data to Tableau Hyper format using TWO different methods:

You can find the dataset here!

Using the Pantab library – think of this as the “easy button” way
Using the Tableau Hyper API – the “I want more control” way

For each method, you’ll:

Pull the Excel data into a pandas DataFrame
Work your magic to create a shiny new Hyper file
Time how long each method takes (let’s see which one’s faster!)

Getting Started

Prerequisites

Before diving in, make sure you’ve got:

Python basics under your belt (you know, variables, functions, that sort of thing)
Some familiarity with pandas (our trusty data manipulation sidekick)
Python 3.6 or newer installed
These Python packages are ready to rock:
- pandas
- pantab
- tableauhyperapi

Don’t have these installed? No worries! Just run:

CODE

pip install pandas pantab tableauhyperapi

  pip install pandas pantab tableauhyperapi

Challenge Instructions

Part 1: Setup and Data Loading

Install the packages:

CODE

pip install pandas pantab tableauhyperapi

  pip install pandas pantab tableauhyperapi

Start a new Python script or Jupyter notebook
Import the libraries:

CODE

import pandas as pd import time
import pantab from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability

  import pandas as pd import time
import pantab from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability

Set the path to your Excel file:

CODE

excel_path = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"

  excel_path = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"

Read the Excel data into a DataFrame:

CODE

df = pd.read_excel(excel_path, sheet_name=sheet_name)

  df = pd.read_excel(excel_path, sheet_name=sheet_name)

Part 2: Method 1 – The Pantab Express Lane

This is the quick and easy route. Just:

Decide where you want your Hyper file to go
Use the awesome pantab.frame_to_hyper() function to work its magic
Time how long it takes (we’re keeping score, remember?)

This approach is like using a microwave — just push a button and done! 👨‍🍳

Part 3: Method 2 – The Hyper API Adventure

This is the scenic route with more control. Steps:

Pick a spot for your second Hyper file
Create a table definition that matches your DataFrame columns
Use the Hyper API to create your Hyper file with the perfect schema
Get all your data from the DataFrame into the Hyper file
Time this process too!

This approach is more like cooking from scratch – more steps, but hey, you control the seasonings! 🧂

Part 4: Compare Results

Now for the fun part:

Which method was faster? By how much?
Which code was easier to write?
Think about when you might prefer one method over the other

It’s like comparing fast food to home cooking – each has its time and place!

Solution Expectations

Your awesome solution should:

Successfully turn your Excel data into Hyper files both ways
Include some timing info so we can see which method wins the race
Work with any similar Excel file (because reusable code rocks!)
Handle different data types appropriately

Testing Your Solution

Wanna make sure everything worked? Try this:

Open up Tableau Desktop
Connect to each of your fresh Hyper files
Make sure all your data made it across safely
Check that dates still look like dates, numbers like numbers, etc. (VERY IMPORTANT THING)

If Tableau loads your data without complaining, you’re golden!

Common Mistakes to Avoid

Watch out for these gotchas:

Getting data types mixed up when using the Hyper API (numbers aren’t text, people!)
Forgetting to close your connections (nobody likes a memory leak)
Using the wrong schema name (psst… Tableau likes “Extract”)
Skipping error handling (because data is rarely perfectly behaved)

Extensions and Variations

Finished already? You overachiever! Try these bonus challenges:

Make your code handle multiple Excel sheets (the more the merrier!)
Create a command-line tool for easy use (impress your colleagues!)
Add proper error handling (because stuff happens)
Optimize for really big datasets (the bigger, the better!)
Add an option to update existing Hyper files instead of starting from scratch

Solution

CODE

# Install before running:
# pip install pantab tableauhyperapi pandas

import pandas as pd
import time
# Use this lib for Version 1
import pantab 
# Use this lib for Version 2
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, \
    TableName, Inserter, CreateMode, Nullability



# Path to the Excel file
excel_path = r"C:\Users\usuario\Downloads\RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"


# Read the Excel file into a DataFrame
df = pd.read_excel(excel_path, sheet_name=sheet_name)

### -----------------------------------
### Version 1: Using Pantab
### -----------------------------------

start_time_pantab = time.time()

# Define the Hyper file path
hyper_path_pantab = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_pantab.hyper"

# Create Hyper file from the DataFrame
pantab.frame_to_hyper(df, hyper_path_pantab, table=sheet_name)

end_time_pantab = time.time()

print(f"[Pantab] Hyper file created at: {hyper_path_pantab}") # Shows where it's located
print(f"[Pantab] Processing time: {end_time_pantab - start_time_pantab:.4f} seconds") # Shows the processing time

### -----------------------------------

### Version 2: Using Tableau Hyper API

### -----------------------------------

# Define the output Hyper file path

hyper_path = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_hyperapi.hyper"

# Define the schema and table name

# 'Extract' is the default schema used by Tableau

table_name = TableName("Extract", "OrderList")

# Build table columns based on DataFrame dtypes

columns = []

for col, dtype in zip(df.columns, df.dtypes):

    if pd.api.types.is_integer_dtype(dtype):

        sql_type = SqlType.big_int()

    elif pd.api.types.is_float_dtype(dtype):

        sql_type = SqlType.double()

    elif pd.api.types.is_datetime64_any_dtype(dtype):

        sql_type = SqlType.date()  # Using 'date' as preferred

    else:

        sql_type = SqlType.text()

    # Append as Column objects, not tuples

    columns.append(TableDefinition.Column(col, sql_type, Nullability.NULLABLE))

# Define the table

table_def = TableDefinition(

    table_name=table_name,

    columns=columns

)

start_time_hyperapi = time.time() #Record the processing start time

with HyperProcess(telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    with Connection(endpoint=hyper.endpoint, database=hyper_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
        # Create the schema if it doesn't exist
        connection.catalog.create_schema("Extract")
        # Create the table
        connection.catalog.create_table(table_def)

        # Insert the data into the table
        with Inserter(connection, table_def) as inserter:
            # Insert rows as simple tuples
            inserter.add_rows(df.itertuples(index=False, name=None))
            inserter.execute()

end_time_hyperapi = time.time() #Record the processing end time

print(f"[Hyper API] Hyper file created at: {hyper_path}") # Shows where it's located
print(f"[Hyper API] Processing time: {end_time_hyperapi - start_time_hyperapi:.4f} seconds") # Shows the processing time

# The reason for the processing time is to create a comparison between these two similar approaches

  # Install before running:
# pip install pantab tableauhyperapi pandas

import pandas as pd
import time
# Use this lib for Version 1
import pantab 
# Use this lib for Version 2
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, \
    TableName, Inserter, CreateMode, Nullability



# Path to the Excel file
excel_path = r"C:\Users\usuario\Downloads\RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"


# Read the Excel file into a DataFrame
df = pd.read_excel(excel_path, sheet_name=sheet_name)

### -----------------------------------
### Version 1: Using Pantab
### -----------------------------------

start_time_pantab = time.time()

# Define the Hyper file path
hyper_path_pantab = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_pantab.hyper"

# Create Hyper file from the DataFrame
pantab.frame_to_hyper(df, hyper_path_pantab, table=sheet_name)

end_time_pantab = time.time()

print(f"[Pantab] Hyper file created at: {hyper_path_pantab}") # Shows where it's located
print(f"[Pantab] Processing time: {end_time_pantab - start_time_pantab:.4f} seconds") # Shows the processing time

### -----------------------------------

### Version 2: Using Tableau Hyper API

### -----------------------------------

# Define the output Hyper file path

hyper_path = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_hyperapi.hyper"

# Define the schema and table name

# 'Extract' is the default schema used by Tableau

table_name = TableName("Extract", "OrderList")

# Build table columns based on DataFrame dtypes

columns = []

for col, dtype in zip(df.columns, df.dtypes):

    if pd.api.types.is_integer_dtype(dtype):

        sql_type = SqlType.big_int()

    elif pd.api.types.is_float_dtype(dtype):

        sql_type = SqlType.double()

    elif pd.api.types.is_datetime64_any_dtype(dtype):

        sql_type = SqlType.date()  # Using 'date' as preferred

    else:

        sql_type = SqlType.text()

    # Append as Column objects, not tuples

    columns.append(TableDefinition.Column(col, sql_type, Nullability.NULLABLE))

# Define the table

table_def = TableDefinition(

    table_name=table_name,

    columns=columns

)

start_time_hyperapi = time.time() #Record the processing start time

with HyperProcess(telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    with Connection(endpoint=hyper.endpoint, database=hyper_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
        # Create the schema if it doesn't exist
        connection.catalog.create_schema("Extract")
        # Create the table
        connection.catalog.create_table(table_def)

        # Insert the data into the table
        with Inserter(connection, table_def) as inserter:
            # Insert rows as simple tuples
            inserter.add_rows(df.itertuples(index=False, name=None))
            inserter.execute()

end_time_hyperapi = time.time() #Record the processing end time

print(f"[Hyper API] Hyper file created at: {hyper_path}") # Shows where it's located
print(f"[Hyper API] Processing time: {end_time_hyperapi - start_time_hyperapi:.4f} seconds") # Shows the processing time

# The reason for the processing time is to create a comparison between these two similar approaches

Special Thanks

To Marcelo Has and Cristian Saavedra Desmoineaux for inviting me to create this challenge.
To William Ayd, one of the biggest contributors to Pantab — a true DataDev Legend!
And to the DataDevQuest organizers and DataDev Ambassadors team — always great examples for me.

Who am I?

I’m Nik Dutra, data enthusiast, Tableau lover, and proud DataDev Ambassador. I’m passionate about helping everyone level up their data skills!
Connect on LinkedIn or find me in the DataDev Slack as @nikdutra.

Happy coding, and may the data be with you!

#DDQ2025-05 Convert Excel to Tableau Hyper Files (Beginner)

Challenge Overview

Objective

Why this challenge matters

Learning Goals

Submission Guidelines

Additional Resources

The Challenge

Getting Started

Prerequisites

Challenge Instructions

Part 1: Setup and Data Loading

Part 2: Method 1 – The Pantab Express Lane

Part 3: Method 2 – The Hyper API Adventure

Part 4: Compare Results

Solution Expectations

Testing Your Solution

Common Mistakes to Avoid

Extensions and Variations

Solution

Special Thanks

Who am I?

Related Challenges

#DDQ2026-02 Bulk Add Users (Beginner)

#DDQ2025-11 Tableau MCP with Claude Desktop (Beginner)

#DDQ2025-09 Embed Tableau Content w/ the Embedding API (Beginner)

#DDQ2025-07 Tableau Metadata API (Beginner)