You got the AI budget. Now everyone is watching to see what you do with it. The pressure to show something is real, and so is the risk of spending 30 days "exploring AI" and ending up with nothing to show for it except a Slack thread full of screenshots. Most small business operators running their first AI pilot have no data science team, no ML engineer, no internal AI expert. They have a browser, a credit card, and a workflow that needs to get faster or cheaper or both.
That is not a disadvantage. It is actually the right starting point. The AI workflows worth piloting first do not require technical teams. They require a specific task, a clear measurement standard, and the discipline to call the result at day 30 whatever it is.
This guide teaches you how to run that pilot. By the end, you will have a scoped workflow, written success criteria, written kill criteria, and a week-by-week structure that produces a real go or no-go decision at day 30. Before you start reading, skim two companion papers that will save you from the most common pilot failure modes: The Readiness Theater, which explains why most AI assessments produce motion instead of decisions, and Why Most Small-Business AI Pilots Fail, which catalogs the patterns I see repeatedly across clients.
Why this matters for small business operators specifically
Enterprise companies running AI pilots have IT departments, legal teams, and project managers whose full-time job is to structure these evaluations. A 12-person services firm or a $8M distribution company does not have that bench. The pilot design has to be simple enough for one person to run alongside their existing job, structured enough to produce a real answer, and scoped tightly enough that 30 days is actually enough time to see signal.
The other thing that is different about small business pilots: the decision authority is usually in the room. The owner or operator both runs the pilot and makes the call on whether to expand. That is actually a structural advantage. Enterprise pilots often stall because the person running the evaluation cannot make the decision without three levels of approval. You can move faster and with less political overhead. Use that advantage. Design a pilot you can run yourself, measure yourself, and decide on yourself.
What an AI workflow pilot actually is
A 30-day AI pilot is not an exploration. It is an experiment with one workflow, one measurement standard, and a defined decision date.
An exploration is what happens when someone sets up an AI tool and says "let's see what it can do." Explorations are fine for learning. They do not produce deployment decisions. A pilot is different: you pick one task the business does repeatedly, define what better looks like in measurable terms, run AI on that task for 30 days, and compare.
The three tools you need to run the pilot in this guide:
- A Claude Pro or ChatGPT Plus business-tier account ($20 to $30 per month per seat)
- The Scope Sketcher at /scope, which walks you through picking the right first workflow and sizing the time savings before you commit
- A shared doc or spreadsheet where you track daily measurements for the four weeks
Think of this as a job trial for AI. You would not hire a new employee without a 90-day review. Do not expand AI without a 30-day pilot with a clear standard.
Why this matters for small business specifically
Small businesses are the category that benefits most from AI on repeatable tasks and suffers most from failed pilots that generate skepticism and kill future adoption. A single poorly run pilot, one with no criteria and no real measurement, can set an organization back 12 to 18 months because the owner now has anecdotal evidence that "we tried AI and it did not work."
The compliance frame for small business AI is not industry-specific regulation (though if your niche has one, it applies). It is general hygiene: customer data, employee data, and vendor NDAs. We have a dedicated section on this below. The business that handles those three categories correctly in the pilot design avoids the majority of real-world problems. The ones that skip it run into a data incident or a contractual violation that overshadows any productivity gain.
Picking the right workflow
The most important decision in the whole pilot is the workflow you choose to test. The wrong workflow produces no useful signal. The right one produces a clear answer in 30 days.
A good pilot workflow has four properties:
It is repeated at least three times per week. One-off tasks are hard to measure. A task that happens 20 or 30 times per month gives you enough instances to distinguish signal from noise. Common candidates: drafting customer emails, summarizing meeting notes, writing first drafts of proposals or SOPs, extracting key information from inbound vendor or client documents.
The current time cost is measurable. Before you start, you should be able to say "this task takes us X minutes per instance." If you have never measured it, spend the first three days of week one measuring. You need a baseline. Without a baseline, "it feels faster" is the best you can say at day 30, which is not a deployment decision.
AI can help with the actual work, not just adjacent work. The task should produce text, a draft, a summary, or a structured output. AI is strong on generation and extraction. It is less useful on tasks that require physical action, real-time system integration, or judgment calls that depend on information that lives outside any document.
The cost of a bad output is recoverable. In the pilot phase, a wrong AI draft gets caught by a human review before it goes anywhere. Do not pilot AI on a workflow where a mistake has immediate external consequences (regulatory filings, public communications, client-facing numbers that go unchecked). Pilot it on internal workflows first. Expand to external-facing workflows after the quality bar is established.
Use the Scope Sketcher at /scope to pressure-test your candidate workflow against all four criteria before you commit. It takes about 10 minutes and will tell you whether you've picked a workflow worth piloting or whether you should look at a different one.
Writing your success and kill criteria
This is the section most pilots skip. It is the most important one.
Success criteria and kill criteria are two separate documents. They both get written before day one. They do not change during the pilot. If you write them after you see preliminary results, they will reflect what you want to be true rather than what you actually measured.
Success criteria define what "this worked, let's expand" looks like. They should be specific and measurable. Generic success criteria are useless. "We save time" is not a success criterion. "The average time to draft a vendor proposal drops from 45 minutes to under 20, with no increase in revision requests from clients, measured across at least 15 drafts" is a success criterion.
For most small business pilots, the success criteria cover three areas:
- Time: how many minutes per task before versus after
- Quality: a measurable proxy for output quality (revision rate, approval rate, error rate, customer response rate)
- Adoption: whether the person doing the task actually used AI on every eligible instance or skipped it
Kill criteria define what "this is not worth expanding, move on" looks like. They are equally important. Without kill criteria, a mediocre pilot never ends. Someone always argues for more time, a different workflow, a different prompt approach. The kill criteria close that loop.
Kill criteria examples:
- AI time savings are below 30% of current task time after the first two weeks, with no improving trend
- More than 20% of AI outputs require substantial rewrite (not light editing, but substantial revision)
- The person doing the task stops using AI for the workflow more than twice without being asked why
- Any compliance or data security issue surfaces during the pilot
Write both documents. Put them somewhere the team can see. Do not edit them during the pilot.
Week 1: Setup and baseline
Week one is not about AI. It is about measurement.
The failure pattern: operators who want to show progress on week one start using AI immediately and skip the baseline. By day 30, they have anecdote and feeling but no comparison. They cannot say what changed because they did not measure what it was before.
Days 1 and 2: Complete all account setup. Get the AI account live at the right tier. If you need a Data Processing Addendum, start that process now because it may take a few days. Build the first two or three prompt templates for the workflow you're testing. Prompt templates are not finished prompts, they are scaffolds with brackets where the specific information goes: [client name], [document type], [key dates], [desired output format]. Store the templates somewhere easy to find (a pinned doc, a desktop text file, a bookmark).
Days 3 through 5: Measure the baseline. Run the target workflow exactly as you normally would, with no AI. Time each instance. Record it. Three to five instances of baseline data is enough to establish a starting point. If the workflow does not occur naturally in three days, review past instances and reconstruct the time cost from memory or from calendar records.
Days 6 and 7: Run AI on one or two instances of the workflow as a test, not yet in production. Compare the draft to what you would have produced manually. Note what needed editing. Adjust the prompt templates based on what the test outputs reveal. The goal is to arrive at week two with prompt templates that are ready for real use, not prototypes.
What to track in week one (set up the measurement doc now, before day one):
- Task instance date
- Time spent before AI (manual baseline)
- Time spent with AI
- Number of edits needed after AI draft
- Any notes on output quality or problems
Weeks 2 and 3: Active pilot
Weeks two and three are the core measurement window. This is where the data accumulates that will drive the day-30 decision.
The failure pattern for the active weeks: inconsistent use. The operator uses AI on some instances and not others, sometimes because they're in a hurry and default to the old method, sometimes because a slightly different version of the task feels like it does not fit the prompt. Inconsistent use produces unreadable data. You need AI used on every eligible instance to get a clean comparison.
Days 8 through 21: Use AI on every eligible instance of the target workflow. No exceptions during this window. If an instance comes up and you're tempted to skip AI, note why in the measurement doc. That note is data.
Two things to watch for in weeks two and three that are early signals:
Prompt drift. The templates you built in week one will need small adjustments as you encounter real instances. That is expected. Log the adjustments. If you're making major structural edits to the prompt on every instance, the original scope was too broad and you may need to narrow it. A prompt that needs to be rebuilt from scratch every time is not a repeatable workflow, it is a custom task each time.
Review pattern. Watch how much time you spend reviewing and editing AI output versus drafting from scratch. In week two, review may take as much time as drafting did before. By week three, it should be noticeably faster. If it is not trending toward faster, that is a signal the workflow is not a good AI fit, and it should surface in your week-three check against the kill criteria.
Mid-pilot check (end of week three): Pull the measurement data. Calculate the average time per task with AI versus without. Calculate the revision rate. Compare both against the kill criteria you wrote in week one. If you have already crossed a kill criterion, the pilot is done. Do not keep running it hoping the last week changes the trajectory. The kill criteria exist precisely for this situation.
Week 4: Decision
Week four is not an extension of the pilot. It is the decision phase.
The failure pattern: treating week four as more data collection and deferring the decision to week five, six, or "once we get more examples." If the pilot is designed correctly, you have enough data at the end of week three to make the call. Week four is for finalizing the measurement, writing the decision memo, and preparing the implementation plan or the post-mortem.
Days 22 through 28: Continue running AI on the workflow as in weeks two and three. No changes to the process. At day 28, close the measurement doc.
Day 29: Analyze the results. Calculate the final averages. Check every success criterion. Check every kill criterion. The decision should follow directly from what you wrote before day one. If it does not, that means the criteria were wrong or something unexpected happened that genuinely changes the evaluation. Write it down either way.
Day 30: Write the decision memo. Two pages maximum. What you tested, what the baseline was, what the results were, what the decision is, and if the decision is to expand, what the next workflow is and what the timeline looks like. If the decision is no-go, write one sentence on what you would look for in a different workflow before trying again.
One practical note: the decision is not binary. "Expand" and "kill" are the clear outcomes, but a third outcome is common and valid: "the workflow saves time but output quality is not consistent enough to use without additional prompt work." That outcome gets a defined next step (refine the prompts, test two more weeks on a narrower version of the task) rather than a restart from zero.
The small-business prompts that actually work
After working through dozens of small business AI pilots, the difference between pilots that produce clear results and pilots that produce noise comes down to four prompt disciplines.
Name the output format explicitly. AI defaults to a format that may or may not match what you need. "Summarize this meeting" produces a paragraph. "Summarize this meeting in five bullet points, each under 20 words, that I can paste into our project tracker" produces something usable. Every prompt should state the format before the AI starts generating.
Give it the constraint that matters for your situation. Generic prompts produce generic output. The constraint that tightens the output is usually a voice, length, audience, or data restriction. "Under 150 words, written for our operations director who does not know this vendor's history, no technical jargon" is a constraint set. "Write a summary" is not.
Include a negative instruction when stakes are higher. For workflows where certain language or certain content could cause a problem, tell AI what to leave out. "Do not include any pricing or commitment language" for a draft vendor email. "Do not include any language that reads as a legal opinion" for a contract summary. AI follows explicit exclusions more reliably than it infers them.
Treat the first output as a draft, not a final. The fastest pilot teams I have seen develop a habit of reading the first AI output, noting the two or three things that need adjustment, and giving AI one follow-up instruction before copying the output. "Shorten the first paragraph by half" or "make the tone slightly warmer" as a second prompt takes 15 seconds and often produces a better final product than editing the draft manually.
The general compliance non-negotiables
This section is short because the rule is simple, but it is the most important section in this guide.
Do not put any of the following into the consumer tier of any AI tool:
- Customer names paired with financial data, purchase history, or account information
- Employee names paired with performance data, compensation, HR records, or health information
- Any information covered by a vendor or client NDA you have signed
- Trade secrets, unreleased product information, or proprietary formulas or processes
- Any data subject to a state consumer privacy law you are obligated to comply with (CCPA, Virginia CDPA, and similar)
- Personally identifiable information that you collected for a specific purpose and that a different use (feeding into an AI tool) may violate
The practical workflow that respects these rules: build the prompt templates and test the workflow structure using anonymized or invented examples first. Client X, Vendor Y, Employee Z. Once the templates work on anonymized data, move to the Business tier account with the appropriate Data Processing Addendum before any real names or real data go in.
For most general small businesses running a workflow pilot, the real exposure is simpler than it sounds: do not describe a specific customer by name alongside their spending history, and do not describe a specific employee alongside their performance or compensation. Keep those data points separated or anonymized in the prompt. The AI can still do the work. The names and the data categories just do not appear in the same prompt.
If your business has signed a Business agreement with an AI vendor that includes a Data Processing Addendum, the rules on data retention and training use change materially. Ask your IT contact or your general counsel what your specific agreement covers. Do not assume the Business tier is automatically safe for everything. Read what you signed.
When NOT to use AI in the pilot
Not every workflow that is slow and manual is a good AI candidate. Part of what a 30-day pilot teaches you is where the boundaries are.
- Anything where the cost of a wrong output is immediate and external. Regulatory filings, financial disclosures, client-facing contracts, and public communications all have stakes that a 30-day pilot should not carry. Run AI on internal drafts first. Expand to external-facing work after quality is established over time.
- Workflows where the judgment is the whole task. AI is useful for drafting, summarizing, and extracting from structured inputs. It is not useful as the decision-maker when the decision requires knowing the history, relationships, and context that live in someone's head. Pilot the drafting. Keep the decision.
- Tasks that happen fewer than three times per month. Not enough instances to measure. Pilot a higher-frequency workflow, then apply the learned prompt techniques to lower-frequency tasks later.
- Anything where your team cannot review the output before it acts. If the AI is generating output that immediately triggers an action (sends an email, updates a record, moves a file) without human review, a 30-day pilot is not the time to run it. Add the human review step first. Automate after the quality is proven.
A simple rule: AI is an unfair advantage on the workflows where the job is to produce a first draft, extract key information, or organize existing content into a usable format. Trust the human for the 20% where judgment, relationship context, or real-time information is what makes the output right.
The quick-start template
Here is the prompt scaffold that works across most small business pilot workflows. Fill in the brackets, paste into Claude or ChatGPT at the Business tier, adjust based on what the first output reveals.
I need a [document type: draft email, meeting summary, vendor comparison, SOP draft, proposal section].
Context: [one to three sentences describing the specific situation, the parties involved, and any key facts the output needs to reflect].
Output format: [bullet points, numbered list, single paragraph, two-paragraph structure, specific length target].
Audience: [who will read this and what they need to understand].
Constraints: [what to include, what to leave out, any tone or language requirements].
Do not include: [specific language, categories, or claims that would cause a problem for this workflow].
For recurring use, store the filled-in version for your most common workflow variant as a pinned note or a text file on your desktop. Every new instance gets a copy of the scaffold with the context brackets updated. The structural work is done once. The per-instance work takes two minutes.
Bigger wins beyond the first pilot
Once the first pilot wraps and you have a decision, the compound gains come from building on what you learned rather than starting fresh each time.
A prompt library that accumulates. Every workflow you test generates prompt templates. Store them. A small business that runs three 30-day pilots over a year ends up with a library of 10 to 15 prompt templates covering its most common repeatable tasks. New employees onboard onto the templates. The learning does not restart with every person.
A quality threshold that informs the next pilot. The revision rate data you collect in the first pilot tells you what "good enough" looks like for AI output in your organization. The second pilot starts with that baseline already established. The question becomes "does this workflow meet the same quality bar," not "how do we define quality."
A decision pattern that builds organizational confidence. The most important long-term output of a well-run first pilot is not the time savings. It is the fact that the organization now knows how to evaluate an AI workflow. The second pilot is faster to design. The third one faster still. Teams that run pilots well develop an institutional muscle for evaluating new tools that applies far beyond AI.
A no-go result that is still a win. A clean no-go at day 30 is genuinely valuable. It tells you this workflow is not an AI candidate today, frees the budget and attention for a different workflow, and gives you specific data on why it did not work (quality threshold, time savings below target, inconsistent use) that informs the next attempt.
The small business AI consulting connection
A 30-day pilot is one workflow, one team, one 30-day window. The bigger AI question for a small or mid-market business is structural: which workflows across the whole operation are AI candidates, in what order do you address them, what does the full implementation picture look like 18 months out, and where are the compliance or data risks that need to be addressed at the program level before individual pilots proceed.
That is the work I do with clients through AI Consulting for Small Business. It covers the full picture: workflow audit, priority sequencing, vendor evaluation, compliance framing, and change management with non-technical teams. If you're reading this guide, you're probably at the stage where one well-run pilot makes sense. If the pilot works and you want to scale it across the operation without starting from zero on every workflow, that page explains how an engagement works.
Closing
A 30-day pilot with written success criteria and written kill criteria is not a project. It is a forcing function. It makes you define what you believe before you see the data, and it holds you to the answer the data produces. That discipline is what separates operators who use AI well from operators who spend two years exploring without deciding.
Pick one workflow this week. Run it through the Scope Sketcher at /scope to confirm it is worth piloting. Write the success and kill criteria before you open the AI tool. Start the measurement doc on day one. Call the result on day 30.
The 30 days will teach you more about where AI fits in your operation than any amount of reading, attending webinars, or waiting for clarity. Start the clock.
If you want to talk about how AI fits into your business at the program level, the AI Consulting for Small Business page lays out the full picture and how an engagement works.
Let's talk about your AI + SEO stack
If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.
Let's Talk