Do I need a paid AI account to run a meaningful pilot?

Yes. Consumer free tiers have usage caps, context limits, and data policies that are not appropriate for any workflow touching real business data. For a 30-day pilot, budget for the business tier of whichever AI tool you choose. Claude Pro runs around $20 per month per seat. ChatGPT Plus is similar. If the workflow you're piloting touches client names, financial data, vendor contracts, or HR records, you need a Business or Enterprise agreement with a Data Processing Addendum in place before anything real goes into the tool. The per-seat cost is a rounding error against the time savings you're testing for. Trying to measure a pilot on the free tier is like testing a delivery driver's efficiency in a car that runs out of gas every 10 miles. Get the paid tier. The evaluation will mean something.

Is AI on the Business tier safe to use with proprietary business information?

It depends on what you've signed. The Business or Enterprise tiers of Anthropic (Claude) and OpenAI (ChatGPT) both offer Data Processing Addendums that give you contractual data use commitments and, with most plans, zero data retention for training. Read what your specific agreement covers. Do not assume. The categories that require extra caution regardless of tier: anything that identifies individual customers by name alongside financial data, anything protected by a nondisclosure agreement with a vendor or client, and personally identifiable employee information. Build the habit of describing those categories with placeholders in AI prompts ("Client X," "Vendor Y") and keeping the actual names inside your CRM or ERP. Ask your IT contact or general counsel what your signed agreement actually covers.

Will AI output during the pilot look generic and unusable?

If the prompts are generic, yes. If the prompts include your actual business context, no. The single biggest driver of output quality in a 30-day pilot is prompt specificity. A prompt that says "write a summary of our vendor meeting" produces something forgettable. A prompt that says "write a 150-word summary of our meeting with Vendor Y covering three points: what they agreed to deliver, the deadline, and the open question about pricing, written in plain language for our operations director to forward to finance" produces something usable. Spend hour one of Week 1 building two or three prompt templates specific to the workflow you're piloting. The quality difference between generic and specific prompts is dramatic enough that it can look like two different tools. Output quality is almost always a prompt problem, not an AI capability problem.

How do I share AI output with team members who don't have a seat?

You share the output, not the tool. The person with the AI seat drafts the document, summary, template, or draft. That output lands in whatever shared tool the team already uses: Google Docs, Notion, a shared drive, Slack, email. Team members review and edit inside the tool they already have access to. The AI is doing the generation step; the rest of the workflow is unchanged. In a small business running a pilot, a single AI seat on one person's account is enough to test whether the workflow saves meaningful time. If the pilot proves out, you can expand seats. If it does not, you have not over-invested. For the 30-day evaluation, start with one seat on the person closest to the target workflow.

What if my industry has regulations around AI use that I'm not sure about?

Stop before the pilot goes live and get an answer. The compliance question is not one to figure out after you've run three weeks of data through an AI tool. The relevant frames by industry: healthcare touches HIPAA and state licensure rules; financial services touches SEC and FINRA guidance; legal work touches privilege and work product doctrine; any business collecting EU customer data touches GDPR. For general small business without a sector-specific regulatory frame, the three live wires are customer personal data, employee personal data, and anything covered by an NDA. Build your pilot around a workflow that sidesteps the live wires until you have clear answers. If you're not sure whether a specific workflow category has regulatory exposure, ask a lawyer or a compliance officer who knows your industry before you start, not after.

Can employees use the AI tool on their own for tasks outside the pilot scope?

This needs a policy decision before week one. Employees who see a new AI tool will naturally try it on other tasks, which is not inherently bad but creates two problems for the pilot: you cannot measure the workflow you targeted because AI use is scattered across the team, and an employee using AI on an unsanctioned task may create a compliance or data security issue before anyone catches it. The simplest policy for a 30-day pilot: one designated person runs the AI workflow for the target task. Other employees do not use the tool for anything without checking with you first. After the pilot concludes and you decide to expand, build the broader usage policy then. The 30 days should produce a clean measurement of one thing, not a noisy measurement of whatever people tried.

I'm not technical at all. Can I actually set this up myself?

Yes, with one condition: pick a workflow that does not require any technical setup. The workflows in this guide (drafting, summarizing, extracting from documents, generating first drafts of communications) require exactly three technical steps: open a browser, go to the AI tool's website, type a prompt. No API, no code, no integrations. The intermediate difficulty rating on this guide reflects the planning work (defining success and kill criteria, scoping the workflow, building the week-by-week structure) not the technical difficulty of using the AI tool. If a vendor or consultant is pitching you a 30-day pilot that requires building integrations, custom tooling, or a data pipeline before you can see any results, push back. The workflows worth piloting first are the ones you can test with a browser and a prompt today.

What's the most common reason a 30-day AI pilot fails to produce a usable answer?

No kill criteria going in. Most pilots fail to produce a decision because the team never defined what failure looked like. They run four weeks, see mixed results, and punt the decision because there is no agreed standard against which to call it. The go decision gets easier to defend when you defined what success looked like in week one. The no-go decision is much harder to make without kill criteria, because there is always someone who thinks "we just need more time." Define both before you start: what does good look like at day 30, and what does "this workflow is not worth expanding" look like. The kill criteria are at least as important as the success criteria. Write them down before you start. Hold to them.

How Do I Run a 30-Day AI Pilot Without a Data Science Team

You got the AI budget. Now everyone is watching to see what you do with it. The pressure to show something is real, and so is the risk of spending 30 days "exploring AI" and ending up with nothing to show for it except a Slack thread full of screenshots. Most small business operators running their first AI pilot have no data science team, no ML engineer, no internal AI expert. They have a browser, a credit card, and a workflow that needs to get faster or cheaper or both.

That is not a disadvantage. It is actually the right starting point. The AI workflows worth piloting first do not require technical teams. They require a specific task, a clear measurement standard, and the discipline to call the result at day 30 whatever it is.

This guide teaches you how to run that pilot. By the end, you will have a scoped workflow, written success criteria, written kill criteria, and a week-by-week structure that produces a real go or no-go decision at day 30. Before you start reading, skim two companion papers that will save you from the most common pilot failure modes: The Readiness Theater, which explains why most AI assessments produce motion instead of decisions, and Why Most Small-Business AI Pilots Fail, which catalogs the patterns I see repeatedly across clients.

Why this matters for small business operators specifically

Enterprise companies running AI pilots have IT departments, legal teams, and project managers whose full-time job is to structure these evaluations. A 12-person services firm or a $8M distribution company does not have that bench. The pilot design has to be simple enough for one person to run alongside their existing job, structured enough to produce a real answer, and scoped tightly enough that 30 days is actually enough time to see signal.

The other thing that is different about small business pilots: the decision authority is usually in the room. The owner or operator both runs the pilot and makes the call on whether to expand. That is actually a structural advantage. Enterprise pilots often stall because the person running the evaluation cannot make the decision without three levels of approval. You can move faster and with less political overhead. Use that advantage. Design a pilot you can run yourself, measure yourself, and decide on yourself.

What an AI workflow pilot actually is

A 30-day AI pilot is not an exploration. It is an experiment with one workflow, one measurement standard, and a defined decision date.

An exploration is what happens when someone sets up an AI tool and says "let's see what it can do." Explorations are fine for learning. They do not produce deployment decisions. A pilot is different: you pick one task the business does repeatedly, define what better looks like in measurable terms, run AI on that task for 30 days, and compare.

The three tools you need to run the pilot in this guide:

A Claude Pro or ChatGPT Plus business-tier account ($20 to $30 per month per seat)
The Scope Sketcher at /scope, which walks you through picking the right first workflow and sizing the time savings before you commit
A shared doc or spreadsheet where you track daily measurements for the four weeks

Think of this as a job trial for AI. You would not hire a new employee without a 90-day review. Do not expand AI without a 30-day pilot with a clear standard.

Why this matters for small business specifically

Small businesses are the category that benefits most from AI on repeatable tasks and suffers most from failed pilots that generate skepticism and kill future adoption. A single poorly run pilot, one with no criteria and no real measurement, can set an organization back 12 to 18 months because the owner now has anecdotal evidence that "we tried AI and it did not work."

The compliance frame for small business AI is not industry-specific regulation (though if your niche has one, it applies). It is general hygiene: customer data, employee data, and vendor NDAs. We have a dedicated section on this below. The business that handles those three categories correctly in the pilot design avoids the majority of real-world problems. The ones that skip it run into a data incident or a contractual violation that overshadows any productivity gain.

Picking the right workflow

The most important decision in the whole pilot is the workflow you choose to test. The wrong workflow produces no useful signal. The right one produces a clear answer in 30 days.

A good pilot workflow has four properties:

It is repeated at least three times per week. One-off tasks are hard to measure. A task that happens 20 or 30 times per month gives you enough instances to distinguish signal from noise. Common candidates: drafting customer emails, summarizing meeting notes, writing first drafts of proposals or SOPs, extracting key information from inbound vendor or client documents.

The current time cost is measurable. Before you start, you should be able to say "this task takes us X minutes per instance." If you have never measured it, spend the first three days of week one measuring. You need a baseline. Without a baseline, "it feels faster" is the best you can say at day 30, which is not a deployment decision.

AI can help with the actual work, not just adjacent work. The task should produce text, a draft, a summary, or a structured output. AI is strong on generation and extraction. It is less useful on tasks that require physical action, real-time system integration, or judgment calls that depend on information that lives outside any document.

The cost of a bad output is recoverable. In the pilot phase, a wrong AI draft gets caught by a human review before it goes anywhere. Do not pilot AI on a workflow where a mistake has immediate external consequences (regulatory filings, public communications, client-facing numbers that go unchecked). Pilot it on internal workflows first. Expand to external-facing workflows after the quality bar is established.

Use the Scope Sketcher at /scope to pressure-test your candidate workflow against all four criteria before you commit. It takes about 10 minutes and will tell you whether you've picked a workflow worth piloting or whether you should look at a different one.

Writing your success and kill criteria

This is the section most pilots skip. It is the most important one.

Success criteria and kill criteria are two separate documents. They both get written before day one. They do not change during the pilot. If you write them after you see preliminary results, they will reflect what you want to be true rather than what you actually measured.

Success criteria define what "this worked, let's expand" looks like. They should be specific and measurable. Generic success criteria are useless. "We save time" is not a success criterion. "The average time to draft a vendor proposal drops from 45 minutes to under 20, with no increase in revision requests from clients, measured across at least 15 drafts" is a success criterion.

For most small business pilots, the success criteria cover three areas:

Time: how many minutes per task before versus after
Quality: a measurable proxy for output quality (revision rate, approval rate, error rate, customer response rate)
Adoption: whether the person doing the task actually used AI on every eligible instance or skipped it

Kill criteria define what "this is not worth expanding, move on" looks like. They are equally important. Without kill criteria, a mediocre pilot never ends. Someone always argues for more time, a different workflow, a different prompt approach. The kill criteria close that loop.

Kill criteria examples:

AI time savings are below 30% of current task time after the first two weeks, with no improving trend
More than 20% of AI outputs require substantial rewrite (not light editing, but substantial revision)
The person doing the task stops using AI for the workflow more than twice without being asked why
Any compliance or data security issue surfaces during the pilot

Write both documents. Put them somewhere the team can see. Do not edit them during the pilot.

Week 1: Setup and baseline

Week one is not about AI. It is about measurement.

The failure pattern: operators who want to show progress on week one start using AI immediately and skip the baseline. By day 30, they have anecdote and feeling but no comparison. They cannot say what changed because they did not measure what it was before.

Days 1 and 2: Complete all account setup. Get the AI account live at the right tier. If you need a Data Processing Addendum, start that process now because it may take a few days. Build the first two or three prompt templates for the workflow you're testing. Prompt templates are not finished prompts, they are scaffolds with brackets where the specific information goes: [client name], [document type], [key dates], [desired output format]. Store the templates somewhere easy to find (a pinned doc, a desktop text file, a bookmark).

Days 3 through 5: Measure the baseline. Run the target workflow exactly as you normally would, with no AI. Time each instance. Record it. Three to five instances of baseline data is enough to establish a starting point. If the workflow does not occur naturally in three days, review past instances and reconstruct the time cost from memory or from calendar records.

Days 6 and 7: Run AI on one or two instances of the workflow as a test, not yet in production. Compare the draft to what you would have produced manually. Note what needed editing. Adjust the prompt templates based on what the test outputs reveal. The goal is to arrive at week two with prompt templates that are ready for real use, not prototypes.

What to track in week one (set up the measurement doc now, before day one):

Task instance date
Time spent before AI (manual baseline)
Time spent with AI
Number of edits needed after AI draft
Any notes on output quality or problems

Weeks 2 and 3: Active pilot

Weeks two and three are the core measurement window. This is where the data accumulates that will drive the day-30 decision.

The failure pattern for the active weeks: inconsistent use. The operator uses AI on some instances and not others, sometimes because they're in a hurry and default to the old method, sometimes because a slightly different version of the task feels like it does not fit the prompt. Inconsistent use produces unreadable data. You need AI used on every eligible instance to get a clean comparison.

Days 8 through 21: Use AI on every eligible instance of the target workflow. No exceptions during this window. If an instance comes up and you're tempted to skip AI, note why in the measurement doc. That note is data.

Two things to watch for in weeks two and three that are early signals:

Prompt drift. The templates you built in week one will need small adjustments as you encounter real instances. That is expected. Log the adjustments. If you're making major structural edits to the prompt on every instance, the original scope was too broad and you may need to narrow it. A prompt that needs to be rebuilt from scratch every time is not a repeatable workflow, it is a custom task each time.

Review pattern. Watch how much time you spend reviewing and editing AI output versus drafting from scratch. In week two, review may take as much time as drafting did before. By week three, it should be noticeably faster. If it is not trending toward faster, that is a signal the workflow is not a good AI fit, and it should surface in your week-three check against the kill criteria.

Mid-pilot check (end of week three): Pull the measurement data. Calculate the average time per task with AI versus without. Calculate the revision rate. Compare both against the kill criteria you wrote in week one. If you have already crossed a kill criterion, the pilot is done. Do not keep running it hoping the last week changes the trajectory. The kill criteria exist precisely for this situation.

Week 4: Decision

Week four is not an extension of the pilot. It is the decision phase.

The failure pattern: treating week four as more data collection and deferring the decision to week five, six, or "once we get more examples." If the pilot is designed correctly, you have enough data at the end of week three to make the call. Week four is for finalizing the measurement, writing the decision memo, and preparing the implementation plan or the post-mortem.

Days 22 through 28: Continue running AI on the workflow as in weeks two and three. No changes to the process. At day 28, close the measurement doc.

Day 29: Analyze the results. Calculate the final averages. Check every success criterion. Check every kill criterion. The decision should follow directly from what you wrote before day one. If it does not, that means the criteria were wrong or something unexpected happened that genuinely changes the evaluation. Write it down either way.

Day 30: Write the decision memo. Two pages maximum. What you tested, what the baseline was, what the results were, what the decision is, and if the decision is to expand, what the next workflow is and what the timeline looks like. If the decision is no-go, write one sentence on what you would look for in a different workflow before trying again.

One practical note: the decision is not binary. "Expand" and "kill" are the clear outcomes, but a third outcome is common and valid: "the workflow saves time but output quality is not consistent enough to use without additional prompt work." That outcome gets a defined next step (refine the prompts, test two more weeks on a narrower version of the task) rather than a restart from zero.

The small-business prompts that actually work

After working through dozens of small business AI pilots, the difference between pilots that produce clear results and pilots that produce noise comes down to four prompt disciplines.

Name the output format explicitly. AI defaults to a format that may or may not match what you need. "Summarize this meeting" produces a paragraph. "Summarize this meeting in five bullet points, each under 20 words, that I can paste into our project tracker" produces something usable. Every prompt should state the format before the AI starts generating.

Give it the constraint that matters for your situation. Generic prompts produce generic output. The constraint that tightens the output is usually a voice, length, audience, or data restriction. "Under 150 words, written for our operations director who does not know this vendor's history, no technical jargon" is a constraint set. "Write a summary" is not.

Include a negative instruction when stakes are higher. For workflows where certain language or certain content could cause a problem, tell AI what to leave out. "Do not include any pricing or commitment language" for a draft vendor email. "Do not include any language that reads as a legal opinion" for a contract summary. AI follows explicit exclusions more reliably than it infers them.

Treat the first output as a draft, not a final. The fastest pilot teams I have seen develop a habit of reading the first AI output, noting the two or three things that need adjustment, and giving AI one follow-up instruction before copying the output. "Shorten the first paragraph by half" or "make the tone slightly warmer" as a second prompt takes 15 seconds and often produces a better final product than editing the draft manually.

The general compliance non-negotiables

This section is short because the rule is simple, but it is the most important section in this guide.

Do not put any of the following into the consumer tier of any AI tool:

Customer names paired with financial data, purchase history, or account information
Employee names paired with performance data, compensation, HR records, or health information
Any information covered by a vendor or client NDA you have signed
Trade secrets, unreleased product information, or proprietary formulas or processes
Any data subject to a state consumer privacy law you are obligated to comply with (CCPA, Virginia CDPA, and similar)
Personally identifiable information that you collected for a specific purpose and that a different use (feeding into an AI tool) may violate

The practical workflow that respects these rules: build the prompt templates and test the workflow structure using anonymized or invented examples first. Client X, Vendor Y, Employee Z. Once the templates work on anonymized data, move to the Business tier account with the appropriate Data Processing Addendum before any real names or real data go in.

For most general small businesses running a workflow pilot, the real exposure is simpler than it sounds: do not describe a specific customer by name alongside their spending history, and do not describe a specific employee alongside their performance or compensation. Keep those data points separated or anonymized in the prompt. The AI can still do the work. The names and the data categories just do not appear in the same prompt.

If your business has signed a Business agreement with an AI vendor that includes a Data Processing Addendum, the rules on data retention and training use change materially. Ask your IT contact or your general counsel what your specific agreement covers. Do not assume the Business tier is automatically safe for everything. Read what you signed.

When NOT to use AI in the pilot

Not every workflow that is slow and manual is a good AI candidate. Part of what a 30-day pilot teaches you is where the boundaries are.

Anything where the cost of a wrong output is immediate and external. Regulatory filings, financial disclosures, client-facing contracts, and public communications all have stakes that a 30-day pilot should not carry. Run AI on internal drafts first. Expand to external-facing work after quality is established over time.
Workflows where the judgment is the whole task. AI is useful for drafting, summarizing, and extracting from structured inputs. It is not useful as the decision-maker when the decision requires knowing the history, relationships, and context that live in someone's head. Pilot the drafting. Keep the decision.
Tasks that happen fewer than three times per month. Not enough instances to measure. Pilot a higher-frequency workflow, then apply the learned prompt techniques to lower-frequency tasks later.
Anything where your team cannot review the output before it acts. If the AI is generating output that immediately triggers an action (sends an email, updates a record, moves a file) without human review, a 30-day pilot is not the time to run it. Add the human review step first. Automate after the quality is proven.

A simple rule: AI is an unfair advantage on the workflows where the job is to produce a first draft, extract key information, or organize existing content into a usable format. Trust the human for the 20% where judgment, relationship context, or real-time information is what makes the output right.

The quick-start template

Here is the prompt scaffold that works across most small business pilot workflows. Fill in the brackets, paste into Claude or ChatGPT at the Business tier, adjust based on what the first output reveals.

I need a [document type: draft email, meeting summary, vendor comparison, SOP draft, proposal section].

Context: [one to three sentences describing the specific situation, the parties involved, and any key facts the output needs to reflect].

Output format: [bullet points, numbered list, single paragraph, two-paragraph structure, specific length target].

Audience: [who will read this and what they need to understand].

Constraints: [what to include, what to leave out, any tone or language requirements].

Do not include: [specific language, categories, or claims that would cause a problem for this workflow].

For recurring use, store the filled-in version for your most common workflow variant as a pinned note or a text file on your desktop. Every new instance gets a copy of the scaffold with the context brackets updated. The structural work is done once. The per-instance work takes two minutes.

Bigger wins beyond the first pilot

Once the first pilot wraps and you have a decision, the compound gains come from building on what you learned rather than starting fresh each time.

A prompt library that accumulates. Every workflow you test generates prompt templates. Store them. A small business that runs three 30-day pilots over a year ends up with a library of 10 to 15 prompt templates covering its most common repeatable tasks. New employees onboard onto the templates. The learning does not restart with every person.

A quality threshold that informs the next pilot. The revision rate data you collect in the first pilot tells you what "good enough" looks like for AI output in your organization. The second pilot starts with that baseline already established. The question becomes "does this workflow meet the same quality bar," not "how do we define quality."

A decision pattern that builds organizational confidence. The most important long-term output of a well-run first pilot is not the time savings. It is the fact that the organization now knows how to evaluate an AI workflow. The second pilot is faster to design. The third one faster still. Teams that run pilots well develop an institutional muscle for evaluating new tools that applies far beyond AI.

A no-go result that is still a win. A clean no-go at day 30 is genuinely valuable. It tells you this workflow is not an AI candidate today, frees the budget and attention for a different workflow, and gives you specific data on why it did not work (quality threshold, time savings below target, inconsistent use) that informs the next attempt.

The small business AI consulting connection

A 30-day pilot is one workflow, one team, one 30-day window. The bigger AI question for a small or mid-market business is structural: which workflows across the whole operation are AI candidates, in what order do you address them, what does the full implementation picture look like 18 months out, and where are the compliance or data risks that need to be addressed at the program level before individual pilots proceed.

That is the work I do with clients through AI Consulting for Small Business. It covers the full picture: workflow audit, priority sequencing, vendor evaluation, compliance framing, and change management with non-technical teams. If you're reading this guide, you're probably at the stage where one well-run pilot makes sense. If the pilot works and you want to scale it across the operation without starting from zero on every workflow, that page explains how an engagement works.

Closing

A 30-day pilot with written success criteria and written kill criteria is not a project. It is a forcing function. It makes you define what you believe before you see the data, and it holds you to the answer the data produces. That discipline is what separates operators who use AI well from operators who spend two years exploring without deciding.

Pick one workflow this week. Run it through the Scope Sketcher at /scope to confirm it is worth piloting. Write the success and kill criteria before you open the AI tool. Start the measurement doc on day one. Call the result on day 30.

The 30 days will teach you more about where AI fits in your operation than any amount of reading, attending webinars, or waiting for clarity. Start the clock.

If you want to talk about how AI fits into your business at the program level, the AI Consulting for Small Business page lays out the full picture and how an engagement works.

How Do I Run a 30-Day AI Pilot Without a Data Science Team?