Rest API TSC Intermediate

#DDQ2026-02 Fetch recent refreshes (Intermediate)

Learn how to find extract performance details.

J

Jordan Woods

Author

2026-02-26T17:00:00
6 min read
preview.png

Welcome to the DataDev Quest Challenge for February 2026! This challenge is designed to teach you
how to use the filtering mechanisms with TSC and Tableau’s REST API.

Challenge Overview

Objective:

Learn how to add filter, sort, and slice background jobs effectively.

Why this challenge?

When starting a refresh on Tableau server, you may choose to have your script wait until the job can be reasonably certain of being complete.

Learning Goals

  • Filter the responses from Tableau Server
  • Interacting with Iterable interfaces in Python
  • Tracking job history

Submission Guidelines

  • Source Code: Publish your project publicly in your GitHub profile
  • Add README: Include any setup instructions and describe how to run the program.
  • Video of Solution: Include a video of your solution in the README file. You can publish it on YouTube and embed the iframe, or save the video file in the repository’s root directory
  • Comments: Ensure your code is well-commented
  • Submission: Submit your challenge in the following forms

Additional Resources


Getting Started

The first step is to get and set up your Tableau Developer Sandbox. Cristian Saavedra Desmoineaux has a Medium Post that provides a step-by-step guide to configuring it.

1. In your Tableau Site, identify a workbook or datasource that has refreshes occurring. You can check the admin views for this.

2. Identify the workbook/datasource refreshing

You will need the name of the object for which you are retrieving refresh history. Have you checked if that name is unique? How might you guarantee uniqueness?

3. Install Python

I recommend installing python with uv. Install Python 3.13 or newer if you don’t have one already installed.

4. Install TSC

With uv, you can install TSC with the following

CODE
uv add tableauserverclient>=v0.40

Challenge

Now that you’ve identified the object, retrieve the last 10 successful extract refreshes for it. Calculate the average and standard deviation. Is there a lot of variability? If you were going to wait to poll the server for job status, how would you determine how long to wait?


Extra Challenge

Do you want to learn more?

  • Can you do it without keeping hundreds or thousands of jobs in memory?
  • How might you preserve execution history?
  • What indications might you have that the query is not giving you the results you expect?

Solutions


CODE
import argparse
from collections.abc import Sequence
from dataclasses import dataclass
from enum import StrEnum
from itertools import islice
import os
from statistics import mean, stdev
import sys

from dotenv import load_dotenv
import tableauserverclient as TSC

load_dotenv()


class RefreshableTypes(StrEnum):
    WORKBOOK = "workbook"
    DATASOURCE = "datasource"
    FLOW = "flow"


@dataclass
class RecentRefreshes:
    item_name: str
    item_type: RefreshableTypes
    n: int


def parse_args(args: Sequence[str] | None = None) -> RecentRefreshes:
    if args is None:
        args = sys.argv[1:]
    parser = argparse.ArgumentParser(description="...")
    parser.add_argument(
        "item_name",
        type=str,
        help="The name of the item to retrieve refresh details for.",
    )
    parser.add_argument(
        "item_type",
        type=RefreshableTypes,
        choices=RefreshableTypes,
        help="The type of the item to retrieve refresh details for.",
    )
    parser.add_argument(
        "n", default=10, type=int, help="The number of recent refreshes to retrieve."
    )
    parsed_args = parser.parse_args(args)
    return RecentRefreshes(**vars(parsed_args))


def get_flow_runs(server: TSC.Server, flow_name: str, n: int) -> list[TSC.FlowRunItem]:
    flows = server.flows.filter(name=flow_name)
    if len(flows) == 0:
        raise ValueError(f"No flow found with name '{flow_name}'")
    if len(flows) > 1:
        raise ValueError(f"Multiple flows found with name '{flow_name}'")
    flow = flows[0]

    flow_runs = server.flow_runs.filter(
        flow_id=flow.id, progress=100, page_size=min(n, 1_000)
    ).order_by("-started_at")
    flow_runs = list(islice((r for r in flow_runs if r.status == "Success"), n))

    return flow_runs


def get_jobs(
    server: TSC.Server, item_name: str, item_type: RefreshableTypes, n: int
) -> list[TSC.BackgroundJobItem]:
    if item_type == RefreshableTypes.WORKBOOK:
        items = server.workbooks.filter(name=item_name)
    elif item_type == RefreshableTypes.DATASOURCE:
        items = server.datasources.filter(name=item_name)
    else:
        raise ValueError(f"Unsupported item type '{item_type}' for job retrieval")

    if len(items) == 0:
        raise ValueError(f"No {item_type} found with name '{item_name}'")
    if len(items) > 1:
        raise ValueError(f"Multiple {item_type}s found with name '{item_name}'")

    recent_refreshes: list[TSC.BackgroundJobItem] = server.jobs.filter(
        job_type="refresh_extracts",
        # The notes section will contain the workbook and datasource name.
        notes__has=item_name,
        # Keep only jobs that are completed.
        status="Success",
        # Max possible page size is 1,000.
        page_size=min(n, 1_000),
        # Sort the jobs returned by server.
    ).order_by("-started_at")[:n]

    return recent_refreshes


def get_recent_refreshes(
    server: TSC.Server, item_type: RefreshableTypes, item_name: str, n: int
) -> list[TSC.FlowRunItem] | list[TSC.BackgroundJobItem]:
    if item_type == RefreshableTypes.FLOW:
        return get_flow_runs(server, item_name, n)
    else:
        return get_jobs(server, item_name, item_type, n)


def get_wait_time(runtimes: Sequence[float]) -> float:
    avg_runtime = mean(runtimes)
    std_dev = stdev(runtimes) if len(runtimes) > 1 else 0
    # Add a buffer of 2 standard deviations to the average runtime to account for variability.
    wait_time = avg_runtime + 2 * std_dev
    return wait_time


def main():
    args = parse_args()
    server = TSC.Server(os.environ["TABLEAU_SERVER"], use_server_version=True)
    auth = TSC.PersonalAccessTokenAuth(
        os.environ["TABLEAU_TOKEN_NAME"],
        os.environ["TABLEAU_TOKEN_SECRET"],
        site_id=os.getenv("TABLEAU_SITE", ""),
    )
    with server.auth.sign_in(auth):
        recent_refreshes = get_recent_refreshes(
            server, args.item_type, args.item_name, args.n
        )

    if args.item_type == RefreshableTypes.FLOW:
        runtimes = [
            (r.completed_at - r.started_at).total_seconds() for r in recent_refreshes
        ]
    if args.item_type in (RefreshableTypes.WORKBOOK, RefreshableTypes.DATASOURCE):
        runtimes = [
            (j.ended_at - j.started_at).total_seconds() for j in recent_refreshes
        ]

    print(
        f"Average runtime for last {args.n} refreshes of {args.item_type} '{args.item_name}': {mean(runtimes):.2f} seconds"
    )
    if len(runtimes) > 1:
        print(f"Standard deviation: {stdev(runtimes):.2f} seconds")
        print(
            f"Recommended wait time before checking for completion: {get_wait_time(runtimes):.2f} seconds"
        )


if __name__ == "__main__":
    main()

Who am I?

I am Jordan Woods, a Tableau DataDev Ambassador and co-founder of DataDevQuest. I am passionate about data, Python, and automation. You can contact me on LinkedIn.

Jordan Woods Headshot

Special Thanks to Marcelo Has for the beautiful redesign of DataDevQuest!