Hyper API Beginner

#DDQ2025-05 Convert Excel to Tableau Hyper Files (Beginner)

Hey there, data enthusiast! Today, you’re going to learn how to convert Excel data into Tableau’s speedy Hyper format using Python.

N

Nik Dutra

Author

2025-05-29T00:00:42
7 min read
preview.png

Welcome to the beginner DataDev Quest Challenge for May 2025! This challenge is designed to help you learn how to use Tableau Hyper API to create datasources for your dashboards.

Difficulty Level: Beginner
Time Estimate: 30 minutes
Technology: Python, Tableau, hyperapi, pantab


Challenge Overview

Objective

Hey there, data enthusiast! Today, we’re tackling a super useful skill for anyone working with Tableau – converting Excel data into Tableau’s speedy Hyper format using Python.

You’ll implement two different approaches to accomplish this task and compare their performance and ease of use.

Why this challenge matters

Let’s be real, nobody likes waiting for dashboards to load, right? That’s where Tableau’s Hyper format comes in clutch! Converting your Excel files to Hyper gives you:

  • Dashboards that load faster than your boss can say “is it ready yet?”
  • Queries that run so quickly you’ll have time for that extra coffee break
  • The ability to handle chunky datasets without your computer having a meltdown
  • Data refreshes that don’t take forever and a day

Trust me, your future self (and your teammates) will thank you for mastering this skill!

Learning Goals

Once you crush this challenge, you’ll be able to:

  • Read Excel data using pandas like a pro
  • Convert data to Tableau Hyper format in not one but TWO cool ways
  • Flex your Tableau Hyper API muscles
  • Make smart decisions about which approach to use when

These are seriously handy skills that’ll make you look like a data wizard!

Submission Guidelines

  • Source Code: Publish your project publically in the GitHub profile
  • Add README: Include setup instructions and describe how to run the program.
  • Video of Solution: Include a video of your solution in the README file, which you can publish on YouTube and embed the iframe. Or save the video file in the directory root inside the repository.
  • Comments: Ensure your code is well-commented.
  • Submission: Submit your challenge in the following **forms**

Additional Resources


The Challenge

So here’s the deal: You’ve got an Excel file called RWFD_Supply_Chain.xlsx with a bunch of order data in the “OrderList” sheet. Your mission (should you choose to accept it) is to convert this Excel data to Tableau Hyper format using TWO different methods:

You can find the dataset here!

  1. Using the Pantab library – think of this as the “easy button” way
  2. Using the Tableau Hyper API – the “I want more control” way

For each method, you’ll:

  • Pull the Excel data into a pandas DataFrame
  • Work your magic to create a shiny new Hyper file
  • Time how long each method takes (let’s see which one’s faster!)

Getting Started

Prerequisites

Before diving in, make sure you’ve got:

  • Python basics under your belt (you know, variables, functions, that sort of thing)
  • Some familiarity with pandas (our trusty data manipulation sidekick)
  • Python 3.6 or newer installed
  • These Python packages are ready to rock:
    • pandas
    • pantab
    • tableauhyperapi

Don’t have these installed? No worries! Just run:

CODE
pip install pandas pantab tableauhyperapi

Challenge Instructions

Part 1: Setup and Data Loading

  • Install the packages:
CODE
pip install pandas pantab tableauhyperapi
  • Start a new Python script or Jupyter notebook
  • Import the libraries:
CODE
import pandas as pd import time
import pantab from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, TableName, Inserter, CreateMode, Nullability
  • Set the path to your Excel file:
CODE
excel_path = "path/to/your/RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"
  • Read the Excel data into a DataFrame:
CODE
df = pd.read_excel(excel_path, sheet_name=sheet_name)

Part 2: Method 1 – The Pantab Express Lane

This is the quick and easy route. Just:

  1. Decide where you want your Hyper file to go
  2. Use the awesome pantab.frame_to_hyper() function to work its magic
  3. Time how long it takes (we’re keeping score, remember?)

This approach is like using a microwave — just push a button and done! 👨‍🍳

Part 3: Method 2 – The Hyper API Adventure

This is the scenic route with more control. Steps:

  1. Pick a spot for your second Hyper file
  2. Create a table definition that matches your DataFrame columns
  3. Use the Hyper API to create your Hyper file with the perfect schema
  4. Get all your data from the DataFrame into the Hyper file
  5. Time this process too!

This approach is more like cooking from scratch – more steps, but hey, you control the seasonings! 🧂

Part 4: Compare Results

Now for the fun part:

  1. Which method was faster? By how much?
  2. Which code was easier to write?
  3. Think about when you might prefer one method over the other

It’s like comparing fast food to home cooking – each has its time and place!

Solution Expectations

Your awesome solution should:

  • Successfully turn your Excel data into Hyper files both ways
  • Include some timing info so we can see which method wins the race
  • Work with any similar Excel file (because reusable code rocks!)
  • Handle different data types appropriately

Testing Your Solution

Wanna make sure everything worked? Try this:

  1. Open up Tableau Desktop
  2. Connect to each of your fresh Hyper files
  3. Make sure all your data made it across safely
  4. Check that dates still look like dates, numbers like numbers, etc. (VERY IMPORTANT THING)

If Tableau loads your data without complaining, you’re golden!

Common Mistakes to Avoid

Watch out for these gotchas:

  • Getting data types mixed up when using the Hyper API (numbers aren’t text, people!)
  • Forgetting to close your connections (nobody likes a memory leak)
  • Using the wrong schema name (psst… Tableau likes “Extract”)
  • Skipping error handling (because data is rarely perfectly behaved)

Extensions and Variations

Finished already? You overachiever! Try these bonus challenges:

  • Make your code handle multiple Excel sheets (the more the merrier!)
  • Create a command-line tool for easy use (impress your colleagues!)
  • Add proper error handling (because stuff happens)
  • Optimize for really big datasets (the bigger, the better!)
  • Add an option to update existing Hyper files instead of starting from scratch

Solution

CODE
# Install before running:
# pip install pantab tableauhyperapi pandas

import pandas as pd
import time
# Use this lib for Version 1
import pantab 
# Use this lib for Version 2
from tableauhyperapi import HyperProcess, Connection, Telemetry, TableDefinition, SqlType, \
    TableName, Inserter, CreateMode, Nullability



# Path to the Excel file
excel_path = r"C:\Users\usuario\Downloads\RWFD_Supply_Chain.xlsx"
sheet_name = "OrderList"


# Read the Excel file into a DataFrame
df = pd.read_excel(excel_path, sheet_name=sheet_name)

### -----------------------------------
### Version 1: Using Pantab
### -----------------------------------

start_time_pantab = time.time()

# Define the Hyper file path
hyper_path_pantab = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_pantab.hyper"

# Create Hyper file from the DataFrame
pantab.frame_to_hyper(df, hyper_path_pantab, table=sheet_name)

end_time_pantab = time.time()

print(f"[Pantab] Hyper file created at: {hyper_path_pantab}") # Shows where it's located
print(f"[Pantab] Processing time: {end_time_pantab - start_time_pantab:.4f} seconds") # Shows the processing time

### -----------------------------------

### Version 2: Using Tableau Hyper API

### -----------------------------------

# Define the output Hyper file path

hyper_path = r"C:\Users\usuario\Downloads\RWFD_CHA1_Supply_Chain_hyperapi.hyper"

# Define the schema and table name

# 'Extract' is the default schema used by Tableau

table_name = TableName("Extract", "OrderList")

# Build table columns based on DataFrame dtypes

columns = []

for col, dtype in zip(df.columns, df.dtypes):

    if pd.api.types.is_integer_dtype(dtype):

        sql_type = SqlType.big_int()

    elif pd.api.types.is_float_dtype(dtype):

        sql_type = SqlType.double()

    elif pd.api.types.is_datetime64_any_dtype(dtype):

        sql_type = SqlType.date()  # Using 'date' as preferred

    else:

        sql_type = SqlType.text()

    # Append as Column objects, not tuples

    columns.append(TableDefinition.Column(col, sql_type, Nullability.NULLABLE))

# Define the table

table_def = TableDefinition(

    table_name=table_name,

    columns=columns

)

start_time_hyperapi = time.time() #Record the processing start time

with HyperProcess(telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    with Connection(endpoint=hyper.endpoint, database=hyper_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
        # Create the schema if it doesn't exist
        connection.catalog.create_schema("Extract")
        # Create the table
        connection.catalog.create_table(table_def)

        # Insert the data into the table
        with Inserter(connection, table_def) as inserter:
            # Insert rows as simple tuples
            inserter.add_rows(df.itertuples(index=False, name=None))
            inserter.execute()

end_time_hyperapi = time.time() #Record the processing end time

print(f"[Hyper API] Hyper file created at: {hyper_path}") # Shows where it's located
print(f"[Hyper API] Processing time: {end_time_hyperapi - start_time_hyperapi:.4f} seconds") # Shows the processing time

# The reason for the processing time is to create a comparison between these two similar approaches


Special Thanks


To Marcelo Has and Cristian Saavedra Desmoineaux for inviting me to create this challenge.
To William Ayd, one of the biggest contributors to Pantab — a true DataDev Legend!
And to the DataDevQuest organizers and DataDev Ambassadors team — always great examples for me.


Who am I?

I’m Nik Dutra, data enthusiast, Tableau lover, and proud DataDev Ambassador. I’m passionate about helping everyone level up their data skills!
Connect on LinkedIn or find me in the DataDev Slack as @nikdutra.

Happy coding, and may the data be with you!