Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions content/patterns/lemonade-stand-quickstart/_index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: Lemonade Stand AI Quickstart
date: 2026-06-25
tier: sandbox
summary: This pattern deploys an AI guardrails demonstration with a multi-layered safety pipeline, interactive chatbot, and real-time monitoring on OpenShift.
rh_products:
- Red Hat OpenShift Container Platform
- Red Hat OpenShift AI
industries:
- General
focus_areas:
- AI
- Safety
- AI Quickstart
aliases: /lemonade-stand-quickstart/
links:
github: https://github.com/validatedpatterns-sandbox/ai-quickstart-lemonade-stand
install: getting-started
bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-lemonade-stand/issues
feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
---
:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

include::modules/lemonade-stand-quickstart-about.adoc[leveloffset=+1]

include::modules/lemonade-stand-quickstart-architecture.adoc[leveloffset=+1]

[id="next-steps-lemonade-stand-quickstart"]
== Next steps

* link:getting-started[Install this pattern]
* link:cluster-sizing[Cluster sizing]
* link:customizing-this-pattern[Customizing this pattern]
* link:troubleshooting[Troubleshooting]
29 changes: 29 additions & 0 deletions content/patterns/lemonade-stand-quickstart/cluster-sizing.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: Cluster sizing
weight: 30
aliases: /lemonade-stand-quickstart/cluster-sizing/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]
include::modules/ai-quickstart-lemonade-stand/metadata-ai-quickstart-lemonade-stand.adoc[]

include::modules/cluster-sizing-template.adoc[]

[id="lemonade-stand-quickstart-gpu-node-requirements"]
== GPU node requirements

In addition to the worker nodes listed above, this pattern requires at least 1 GPU-equipped node for LLM inference. On AWS, the pattern automatically provisions a `g5.2xlarge` instance with an NVIDIA A10G GPU. On other providers and bare metal, a GPU node must already be part of the cluster before deploying the pattern.

.GPU node minimum requirements
[cols="<,^,<,<"]
|===
| Cloud provider | Node type | Number of nodes | Instance type

| Amazon Web Services
| GPU Worker
| 1
| g5.2xlarge
|===
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: Customizing this pattern
weight: 20
aliases: /lemonade-stand-quickstart/customizing/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

[id="customizing-lemonade-stand-quickstart"]
== Customizing the Lemonade Stand AI Quickstart pattern

This pattern deploys an AI chatbot with a multi-layered guardrails pipeline, including model-based detectors, a rule-based language detector, and regex-based competitor filtering. You can customize the LLM model, detector configuration, and monitoring settings.

[id="changing-model-lemonade-stand"]
=== Changing the LLM model

The pattern serves Llama 3.2 3B Instruct (FP8-quantized) by default through vLLM on KServe. The model is defined in the lemonade-stand-assistant Helm chart's `values.yaml`.

To change the locally served model, update the model configuration in the Helm chart values. The model must be compatible with vLLM and fit within the available GPU VRAM on the provisioned node (NVIDIA A10G with 24 GB VRAM on `g5.2xlarge`).

[id="using-external-model-lemonade-stand"]
=== Using an external model endpoint (BYOM)

Instead of serving a model locally on GPU, you can configure the pattern to use an external Model-as-a-Service endpoint. This eliminates the GPU node requirement for inference.

. Make a local copy of the secrets template outside of your repository:
+
[WARNING]
====
Do not add, commit, or push this file to your repository. Doing so might expose personal credentials to GitHub.
====
+
[source,terminal]
----
$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-lemonade-stand.yaml
----

. Edit the secrets file and set the API key for your external model endpoint:
+
[source,terminal]
----
$ vim ~/values-secret-ai-quickstart-lemonade-stand.yaml
----
+
[source,yaml]
----
- name: lemonade-stand
vaultPrefixes:
- global
fields:
- name: vllm-api-key
value: <your-external-api-key>
----

. Set the `model` section in the Helm chart values to point to your external endpoint:
+
[source,yaml]
----
model:
name: my-model
endpoint: my-maas-instance
port: 443
----

When using an external model endpoint, the vLLM InferenceService is not deployed and the GPU node is not required for LLM inference. The guardrails pipeline continues to function normally with the external model.

[id="enabling-gpu-detectors-lemonade-stand"]
=== Enabling GPU for detector models

By default, the HAP and prompt injection detector models run on CPU. You can enable GPU acceleration for these models to reduce inference latency, but this requires additional GPU resources.

To enable GPU for the detector models, set the `useGpu` flag in the Helm chart values:

[source,yaml]
----
detectors:
hap:
useGpu: true
promptInjection:
useGpu: true
----

[NOTE]
====
Enabling GPU for both detectors requires 2 additional GPUs beyond the 1 GPU used for the LLM, for a total of 3 GPUs. You must provision additional GPU nodes before enabling this option.
====

[id="configuring-detector-thresholds-lemonade-stand"]
=== Configuring detector thresholds

The guardrails pipeline uses three detector models, each with a configurable detection threshold. Lower thresholds increase sensitivity (block more content) while higher thresholds reduce false positives.

The default thresholds are:

[cols="1,1,2",options="header"]
|===
| Detector | Default threshold | Description

| IBM Granite Guardian HAP
| 0.5
| Hate speech, abuse, and profanity detection

| DeBERTa v3 Prompt Injection
| 0.5
| Prompt injection and jailbreak detection

| Lingua Language
| 0.88
| English language confidence threshold
|===

To adjust detector thresholds, modify the Guardrails Orchestrator configuration in the `fms-orchestr8-config-nlp` ConfigMap within the lemonade-stand-assistant Helm chart.

[id="configuring-regex-detector-lemonade-stand"]
=== Configuring the regex detector

The FastAPI application includes a regex-based detector that blocks mentions of competitor fruit names (oranges, apples, bananas, and others) across 13+ languages. This detector runs locally in the application before the request reaches the Guardrails Orchestrator.

To modify the blocked terms or supported languages, edit the regex patterns in the `app_fastapi.py` file in the lemonade-stand-assistant repository.

[id="configuring-shiny-dashboard-lemonade-stand"]
=== Adjusting the monitoring dashboard

The R Shiny dashboard polls the FastAPI application's `/metrics` endpoint to display guardrail activation statistics in real time. The default polling interval is 1 second.

To adjust the refresh interval, modify the `shinyDashboard.metrics.refreshInterval` value in the Helm chart values:

[source,yaml]
----
shinyDashboard:
metrics:
refreshInterval: 5
----

Push your changes to your forked repository so the GitOps framework applies the updated configuration.
158 changes: 158 additions & 0 deletions content/patterns/lemonade-stand-quickstart/getting-started.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: Getting started
weight: 10
aliases: /lemonade-stand-quickstart/getting-started/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

[id="deploying-lemonade-stand-quickstart-pattern"]
== Deploying the Lemonade Stand AI Quickstart pattern

.Prerequisites

* An OpenShift cluster (version 4.18 or later). This pattern requires at least 1 NVIDIA GPU node for LLM inference.
** *AWS*: The pattern automatically provisions 1 `g5.2xlarge` GPU worker node (NVIDIA A10G) during installation. No GPU nodes need to be present before you deploy.
** *Other providers and bare metal*: A GPU node must already be part of the OpenShift cluster before you deploy this pattern. The pattern installs all required operators automatically.
** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm].
* The `oc` CLI tool. For instructions, see link:https://docs.openshift.com/container-platform/latest/cli_reference/openshift_cli/getting-started-cli.html[Getting started with the OpenShift CLI].
* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start].

[id="preparing-for-deployment-lemonade-stand"]
== Preparing for deployment
.Procedure

. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-lemonade-stand[ai-quickstart-lemonade-stand] repository on GitHub. You must fork the repository to customize this pattern.

. Clone the forked copy of this repository.
+
[source,terminal]
----
$ git clone git@github.com:your-username/ai-quickstart-lemonade-stand.git
----

. Go to the root directory of your Git repository:
+
[source,terminal]
----
$ cd ai-quickstart-lemonade-stand
----

. Run the following command to set the upstream repository:
+
[source,terminal]
----
$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-lemonade-stand.git
----

. Verify the setup of your remote repositories by running the following command:
+
[source,terminal]
----
$ git remote -v
----
+
.Example output
+
[source,terminal]
----
origin git@github.com:your-username/ai-quickstart-lemonade-stand.git (fetch)
origin git@github.com:your-username/ai-quickstart-lemonade-stand.git (push)
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-lemonade-stand.git (fetch)
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-lemonade-stand.git (push)
----

. Optional: To customize the deployment, create and switch to a new branch by running the following command:
+
[source,terminal]
----
$ git checkout -b my-branch
----
+
Make your changes, then stage and commit them:
+
[source,terminal]
----
$ git add <changed-files>
$ git commit -m "Customize deployment"
----
+
Push the changes to your forked repository:
+
[source,terminal]
----
$ git push origin my-branch
----

[id="deploying-cluster-using-patternsh-file-lemonade-stand"]
== Deploying the pattern by using the pattern.sh file

To deploy the pattern by using the `pattern.sh` file, complete the following steps:

. Log in to your cluster by following this procedure:

.. Obtain an API token by visiting link:https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request[https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request].

.. Log in to the cluster by running the following command:
+
[source,terminal]
----
$ oc login --token=<retrieved-token> --server=https://api.<your_cluster>.<domain>:6443
----
+
Or log in by running the following command:
+
[source,terminal]
----
$ export KUBECONFIG=~/<path_to_kubeconfig>
----

. Deploy the pattern to your cluster. Run the following command:
+
[source,terminal]
----
$ ./pattern.sh make install
----

.Verification

To verify a successful installation, check the health of the ArgoCD applications:

. Run the following command:
+
[source,terminal]
----
$ ./pattern.sh make argo-healthcheck
----
+
It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading detector models, initializing the GPU operator, and starting the vLLM inference service.

. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console. Confirm the following Operators are present:
+
* NVIDIA GPU Operator
* {rhoai}
* Node Feature Discovery Operator
* External Secrets Operator

. After all applications are healthy, verify the inference service is serving by running:
+
[source,terminal]
----
$ oc get inferenceservice -A
----

. Access the Lemonade Stand chatbot UI. Navigate to *Networking -> Routes* in the `lemonade-stand` namespace and open the route URL for the `lemonade-stand` service.

. Access the R Shiny monitoring dashboard. Navigate to *Networking -> Routes* in the `lemonade-stand` namespace and open the route URL for the `shiny-dashboard` service.

[id="next-steps-getting-started-lemonade-stand"]
== Next steps

* link:customizing-this-pattern[Customizing this pattern]
* link:cluster-sizing[Cluster sizing]
* link:troubleshooting[Troubleshooting]
Loading