AWS QUICK SUITE • Concept 2025

Evaluating model outputs

Evaluating model outputs

My Role

UX Lead, Product requirements, User research

Timeline

3 weeks sprint

The Product

Amazon Quick Flows is an AI-powered automation tool that lets business users build workflows for repetitive tasks using natural language prompts, no technical skills required. It connects data across Amazon Quick Suite and third-party apps like Jira, enables real-time web data retrieval and browser automation, and supports conversational refinement of outputs. The tool can generate images and visualizations using AI models and QuickSight dashboards. Examples include sales teams automating meeting prep and reports, marketers generating social media content, and HR creating job descriptions and onboarding materials.

The Challenge

Vision for the future of Quick Flows

In Q4 2025, our product and engineering lacked a unified direction, leading to stakeholder misalignment and stalled velocity. I was tasked with defining a long-term UX vision to move the team from ambiguity to execution. The vision was executed in 2 phases: the first focused on redesigning the tool to address usability challenges from previous research studies, and the second focused on making the experience sticky for both the users and creators of the Flow

Key question: How might we enable Flow creators to define sense of success or quality?

Solution

Evaluate model generated outputs using test cases, success criteria with ease

Generate test cases

The goal is to give users an easy way to generate test cases while using generated input data. Hence, users start with a simple screen to generate test cases - this is the primary path.

Add your own data

In research, we heard some users are interested in adding their own data while testing especially for use cases like financial data.

Edit test cases

Edits success criteria and inputs for each test cases to test the output against.

Test summary and fixes interface showing AI-driven suggestions to resolve errors

Test Summary and fixes

Replaced verbose logs with a scannable summary view to highlight critical failures immediately. AI-driven fixes that provide contextual suggestions to resolve errors instantly.

Team Ideation

I was given key questions to investigate and explore within the theme which were ambiguous. I conducted ideation sessions with the design team and product partners to gather inputs that I then synthesized into a coherent UX flow

Team ideation session showing participant contributions with sticky notes, wireframes and sketches

Iterations

To meet aggressive timelines, I bypassed low-fidelity wireframing in favor of rapid, high-fidelity iteration. This allowed for immediate, high-signal reviews with Engineering and Product stakeholders. To ensure quality wasn't sacrificed for speed, I conducted targeted user research sessions to validate the desirability of the new feature and refine the core concepts early in the cycle

Iteration 1 (Tested with users)

Customer Metrics Analyzer - Empty state showing Test Flow panel with Generate test cases button

Empty state communicates a clear Generate intent

Customer Metrics Analyzer - Generated state showing Test Flow with 8 tests generated and test cases list

Show what has been generated

Customer Metrics Analyzer - Test Flow with KPIs, Test Summary, and recommendations

KPIs, Test Summary, and recommendations

Empty state iterations

Empty state iterations showing 4 different design approaches for the Test Flow panel

User Insights

What went well

1. AI generated performance summary provides details like accuracy and hallucination that builds trust

2. Test cases solve a problem for users today especially for use cases data handling, connections, and duplicate prevention.

Opportunities to improve

1. Optimize the AI evaluation loop

By mitigating cognitive overload through reduced signal noise in test cases, clearer performance benchmarks, and the translation of dense logs into actionable insights.

2. 'One-click' error resolution is the primary value proposition

Viewing AI-assisted fixes as a critical aha moment from friction to resolution.

3. AI-generated test data and test cases need flexibility

Allowing users to add their own data/test cases as required.