Article

Baby Food: A Complete Guide from SillyTavern Setup to API Acquisition

AI Summary

This document serves as a detailed, beginner-friendly guide to setting up and using SillyTavern, a front-end for AI role-playing. It contrasts free, open-source options with paid services, providing resources for installation and troubleshooting. The guide explains how to obtain both free APIs (like Google Gemini and Deepseek) and paid ones, clarifying technical concepts such as API tokens and pricing. It also covers SillyTavern's interface, including configuration, character cards, and plugins. Advanced topics like memory optimization using Retrieval-Augmented Generation (RAG) to create "long-term memory" for the AI and the importance of community support channels are also introduced, aiming to provide users with a comprehensive manual for a personalized AI interaction setup.

TinkeringPublished at·Updated at·Language Chinese -> English·AI Translation
#Reprint#Notes#AI

Baby FoodH1#

Chit-chat *1H2#

dzmm character card
dzmm character card

The whole thing started because I saw this picture in a group. Someone dug up the dzmm I found a year ago; this damn thing often breaks out of its circle because of various weird character cards.

To be honest, it's hard for me to judge things like dzmm, Fengyue AI, or other miscellaneous AI role-playing and NSFW websites. Many of them directly take free open-source community projects to make money, which is quite disgusting. (Especially since dzmm charges in USD. It's fine to play with the free credits, but there is absolutely no need to top up.)

The commercialization of open-source projects is actually beyond reproach, and most open-source licenses usually allow commercial use. Most of these platforms sell their own services, targeting newbies with no technical skills. The most typical example is Mirror Chan. As long as it complies with the corresponding open-source license, it's fine.

So, is there a free platform for AI role-playing and NSFW content? Yes, brother, I have nine ways to do this (crossed out)!

SillyTavern: Your AI Interactive TavernH2#

"SillyTavern (ST for short) is a locally installed user interface that allows you to interact with text generation LLMs, image generation engines, and TTS voice models" SillyTavern.

SillyTavern/SillyTavern: LLM Frontend for Power Users.

Simply put, it's a frontend

It doesn't matter if you can't understand the GitHub page, let's make an analogy:

缩放:
Ctrl+滚轮

If you trick the beautiful girl into the tavern, you can chat with her, and after getting her drunk, you can do NSFW things.

Technically, this corresponds to "Jailbreaking". This usually involves using specific prompts, character settings, or model parameter adjustments to bypass the LLM's safeguards and content moderation mechanisms, causing it to generate sensitive or inappropriate content that is normally restricted.

There are ready-made video tutorials on Bilibili for setting up the operating environment, and the links are attached here:

Below are the patches; the errors do not originate from the network...

Patch 1: Proxy Setting TipsH3#

System proxy address error
System proxy address error

Who made this video? How can you edit yaml using Notepad? No highlighting, no obvious indentation, and non-monospace fonts will blind people's eyes...

Here I recommend Visual Studio Code: Download Visual Studio Code - Mac, Linux, Windows

Simply put, vscode is a cross-platform code editor developed by Microsoft (beginners don't need to worry about the terminology, just remember "editor = tool for writing code"). Its core features can be summarized in 3 keywords:

  • Free and open-source: It costs nothing, everyone can download and use it for free, and the code is open and transparent (guaranteed security);

  • Cross-platform: Whether you use a Windows, Mac, or Linux computer, you can install and use it, with exactly the same interface and functions;

  • Lightweight but can "become powerful": It starts fast without lagging; but it can extend its functions through "extensions", such as supporting all programming languages like Python, Java, and frontend (HTML/CSS/JS), and can also achieve code auto-completion, syntax checking, running and debugging, syntax highlighting, etc.

Compared to other programming tools, vscode is quite friendly to beginners, mainly for 4 reasons:

  1. Zero-barrier to start: Simple interface, no complicated settings, but more essential programming functions;
  2. Supports most programming languages: Whether you modify configuration files (JSON, YAML, TOML, etc.), learn programming, or even write Markdown notes, as long as you install the corresponding extension, vscode can handle it, without needing to install different tools for different languages;
  3. Features "added on demand", not bloated: Based on this article, beginners only need to install the yaml extension; if more complex functions are needed later (such as modifying code, doing project management), you can slowly add extensions. It won't be like some tools with dozens of incomprehensible buttons right after opening;
  4. Strong community support, easy to solve problems: Because there are so many users, if you encounter any problems (like "how to run Python code", "extension can't be installed"), just search on Bing / Xiaohongshu / Bilibili, and you can find tutorials that beginners can understand.
Clash port
Clash port

If you are looking for the system proxy address under the magic tool (proxy), and then fill it in, or the port 7897 (clash) or 10808 (v2ray), it will report an error (see common errors). Personally, I recommend using the tun mode.

Patch 2: Repeat Operation to Solve ErrorsH3#

Similar error 1
Similar error 2

If similar errors occur, it is recommended to repeat the second step once (referring to the document steps of https://gitee.com/LimeOnTop/sill-tavern).

API Acquisition: Your Exclusive WaiterH2#

What is an API? Let's continue with an analogy:

缩放:
Ctrl+滚轮

The guest passes a message to the kitchen through the waiter, and then cooks a dish for you. This is how the API works. Moreover, because every time you order a dish (number of calls), you have to collect a dish fee (call fee).

Generally speaking, there are also some API providers here that charge by the number of times.

Here we discuss free APIs and paid APIs separately. I will still put the meaning of the specific model names at the end. (Unless otherwise specified, all websites below require a proxy).

Free APIs: The Gospel for FreebiesH3#

Gemini: Produced by Google, Must Be a MasterpieceH4#

Gemini Logo
Gemini Logo

Method:

  1. Register a Google account (If you don't know how to register, please scroll to the end of the article).
  2. Open https://aistudio.google.com/ and follow the method in the picture above (By the way, if you are watching the PC version Bilibili tutorial, it contains the way to obtain the Gemini API).

Google AI Studio is very strict in judging IP. It is recommended to use a cleaner proxy, and do not use the proxy of Hong Kong, China nodes.

Deepseek: Saving the Nation by a Curve through Alibaba's ModelScope CommunityH4#

Deepseek Logo
Deepseek Logo

(Note, the official website of the deepseek API is charged, here we use the channel of Alibaba's ModelScope community)

Method:

  1. Open https://modelscope.cn from the PC (mobile phone cannot open), and register an account.

  2. On the PC (mobile phone cannot open), operate as shown above.

SiliconFlow can also be used

Claude: The Wonderful Use of Web CookiesH4#

(Note, Claude's official API is still charged, and it's in USD. Here we use a clever method utilizing web cookies)

  • What are cookies?
  • Fetching cookies with an anti-detect browser
  • Setting up a Claude polling pool

Community Public Welfare Stations: Free but Use with CautionH4#

Public welfare station fraud warning
Public welfare station fraud warning

This depends on the specific community. It is recommended to read the rules in detail. This is responsible for you and the webmaster of the public welfare station.

For example, most public welfare stations in the Linuxdo community actually do not support high-concurrency applications like SillyTavern or Immersive Translate, and there is a risk of account bans.

(Note, public welfare station means literally no money is charged. It may accept AfD (Afdian) or account sponsorship. Anyone charging money under the banner of a public welfare station or semi-public welfare station is an account reseller! Is an account reseller! Is an account reseller!)

(Moreover, such account resellers are even too lazy to buy accounts to make an account pool, but use the APIs of public welfare stations to fraudulently swipe, which is extremely hateful) (As shown in the picture above)

Chit-chat *2: Considerations for Model SelectionH4#

It's not that only these three models are free, but because only these three models are relatively popular in the community which means it's easier for newcomers to get started, and there is more community support and character cards.

For example, some names you may have never even heard of, such as Mistral AI's La Plateforme (French, considered the only one in Europe), they provide a free API for mistral-medium-2505 (which has poor support for Chinese).

(Here we only list the official website API recharge channels and the latest model prices, without specific tutorials)

Before using a paid API, let's use the restaurant example again to explain two important concepts when you order a dish to the waiter (API), how does he know how much it will cost?

Someone might have thought, there are prices on the menu. But this restaurant can cook anything. We cannot list all the dishes into a huge menu and charge them one by one.

At this time, a universal standard is needed, which is a token (similar to game coins).

When we give the description to the waiter (API), the waiter will first give it to the sous-chef (tokenizer) in the kitchen, asking him to help calculate the "game coin content" (input tokens) of this sentence, and then pass it on to the head chef.

After the head chef is done, you also need to pay the head chef a fee for cooking the dish (output tokens). And the price of the dish we finally eat depends on how much we ask for, and the head chef's cooking fee.

That is, (Input Tokens + Output Tokens) = Total Price.

(The more requests (input tokens), the more expensive the price, just like the difference between ordering a home-style side dish and braised fish eyes with steamed fish teeth)

Generally speaking, the unit price of output tokens is much higher than that of input tokens.

Once you know this principle, you can understand the charging standards of the models.

Okay, here is the supplemented information for Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 4.1 Opus, as well as Gemini 2.5 Pro and Flash:

ChatGPT: OpenAI's Flagship ProductH4#

Grok: X.ai's Unique AttemptH4#

Claude 4 Sonnet: Anthropic's Balanced Choice of Intelligence and SpeedH4#

  • Model: Claude 4 Sonnet
  • Official Website: https://www.anthropic.com/ (Accessible via Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI)
  • Price: Input: 15.00/1M tokens
    • Long context pricing (over 200K input tokens): Input: 22.50/1M tokens
    • 50% discount for batch processing

Claude 3.5 Haiku: Anthropic's Fastest, Most Cost-Effective ModelH4#

  • Model: Claude 3.5 Haiku
  • Official Website: https://www.anthropic.com/ (Accessible via Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI)
  • Price: Input: 4.00/1M tokens
    • 50% discount for batch processing

Claude 4.1 Opus: Anthropic's Most Intelligent ModelH4#

  • Model: Claude Opus 4.1
  • Official Website: https://www.anthropic.com/ (Accessible via Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI)
  • Price: Input: 75.00/1M tokens
    • 50% discount for batch processing

Gemini 2.5 Pro: Google's Advanced Reasoning ModelH4#

  • Model: Gemini 2.5 Pro
  • Official Website: https://ai.google.dev/ (Accessible via Google AI Studio and Vertex AI)
  • Price: Input: 2.50/1M tokens (for >200K tokens) | Output: 15.00/1M tokens (for >200K tokens)

Gemini 2.5 Flash: Google's Highly Cost-Effective Work ModelH4#

  • Model: Gemini 2.5 Flash
  • Official Website: https://ai.google.dev/ (Accessible via Google AI Studio and Vertex AI)
  • Price: Input: 0.40/1M tokens
    • Image output price is 0.039 per image.

SillyTavern Page Information Simple TutorialH2#

Basic InformationH3#

SillyTavern basic information interface
SillyTavern basic information interface

PresetsH3#

SillyTavern preset interface
SillyTavern preset interface

API link setup 1
API link setup 2
API link setup 3

Virtual Network Card (TUN Mode)H4#

TUN mode schematic diagram
TUN mode schematic diagram

Or called TUN mode (the tutorial for using proxies is not here, the demonstration page is Clash V).

Principle:

For example

缩放:
Ctrl+滚轮

Actually, the content in the first yellow box is the working principle of the TUN mode.

It will create a virtual network card (or use a TUN device) in the operating system. All traffic that should have been sent directly to the physical network card will be routed to this virtual network card by the operating system. The virtual network card acts like a "valve", controlling the flow of data.

That is, all the water flowing through the reservoir will pass through the valve of the TUN mode.

Combined with global configuration, it can prevent SillyTavern from not connecting to the proxy, leading to the following errors.

Common ErrorsH4#

Error 1
Error 2

World BookH3#

World book interface
World book interface

Character CardsH3#

Character card interface
Character card interface

Extension ImportH3#

Extension import 1
Extension import 2

Joining Communities: Huddling for Warmth, Doing More with LessH2#

I highly recommend joining a community. Not only are there a large number of character cards here, but there are also enough presets and experts to help you solve problems. (Most of the tutorials below are based on tutorials developed by experts. If you don't join, it will be much more troublesome).

But before that, you need to register a Discord account. If you don't have one, please see the account registration tutorial below.

Tavern Assistant: Essential Prerequisite ExtensionH2#

The prerequisite required for the vast majority of extensions and character cards.

Documentation and tutorial links: https://n0vi028.github.io/JS-Slash-Runner-Doc/guide

Memory Optimization: Giving AI "Long-Term Memory"H2#

(Please make sure you have installed Tavern Assistant before installing this)

Originally, I wanted to try writing this with a few specialized extensions (specialized RAG extensions, specialized table extensions), but after putting it into practice, Amily No. 2 is still the most suitable for newcomers.

https://docs.google.com/document/u/0/d/11E7HIFg59up0afv-lV0cAF5G3jzJXCkZK8cBCOMZ9zo/mobilebasic#heading=h.xq53yl5o80i7
(Tutorial written personally by the author of Amily)

Reverse Proxy: Baby Food Level (Beginner) TutorialH2#

https://discord.com/channels/1291925535324110879/1406084681564160030
With polling, a beginner-level tutorial (a total of three steps to install, two steps to use).

Magic (Proxy): The Bridge to FreedomH2#

The reprinted article bypassed this part due to content censorship. You can check the original content here: #Baby Food - Mubu

Google Account: The Passport to the AI WorldH2#

Registration: Get it for FreeH3#

How to skip the mobile phone verification code step when registering a Google account: https://b23.tv/79xTl6J
(If it's invalid, just follow the instructions to register...)

SMS receiving platform
Xianyu purchase example

https://sms-activate.io/
This is an SMS receiving platform (but because the numbers in it are somewhat too "dirty", do not buy a mobile number to receive codes for registration. Even if it's very cheap, registration will fail 100%) (Figure 1).

The second method is to search "Gemini 2.5 Pro student verification" on Xianyu. They will send you a Pro account along with a Google account (Figure 2).

Science Popularization You Might Need (Optional): AI Naming and Working PrinciplesH2#

Just like NaOH, the naming of AI also follows certain rules. Take the recently popular Gemini 2.5 Flash Image Preview as an example, let's break down its naming conventions one by one:

  • Gemini - Google's (the name of the large model series developed by Google DeepMind)
  • 2.5 - Version number (the larger the number, the faster/better)
  • flash - Fast (similarly, there is Pro as a reasoning model)
  • image - Picture (different from the dedicated Imagen image generation model)
  • preview - Preview version (similarly, there is EXP experimental version)

That is: Google's 2.5th generation fast multimodal model preview version.
Similarly, Gemini 2.5 Pro Preview 0605 means the preview version of Google's 2.5th generation reasoning model released on May 6th.

Chat Models: Deep UnderstandingH3#

What is a Token?H4#

We previously compared tokens to game coins. Although they are very similar, this is actually an imprecise description (but if you see the description of game coins, it still refers to tokens). As mentioned earlier, in every kitchen, apart from the head chef (reasoning model) who actually cooks, there is also a sous-chef, which is the tokenizer. The sous-chef's job is to chop up the text entered by the user, separating it just like chopping vegetables, and then hand it over to the head chef for processing. The separated individual characters, words, or phrases (in English) are called tokens.

We will occasionally see cache hits and cache misses on the pricing pages of large models. What does this mean?
Let's take another example. Suppose you are in a library and I want a book. There are two situations:

  1. My book is right beside me, or nearby, and I can easily grab it (cache hit).
  2. My book is on the bookshelf, or in a place that requires a long walk to get (cache miss).
    Applying this concept to AI, the process of looking for a book in the library becomes the process of looking for tokens in the cache.

Parameter Limits: AI's "Appetite"H4#

When we call an API, the model is mainly limited by the following parameters:

  • Input/Output method — whether it inputs text or pictures, and outputs text
  • Maximum number of input/output tokens

(Similarly, you can use the methods for picture input and output found in the beautification tutorial, which will not be detailed here).

Token limits have a quota. This is easy to understand. After all, model companies are not doing charity. Still using the previous restaurant example. Suppose you have a membership card for this restaurant, then:

  • Requests Per Minute (RPM) — The maximum number of dishes you can let the kitchen cook within a minute.
  • Requests Per Day (RPD) — The maximum number of dishes this membership card can let the kitchen cook every day.
  • Tokens Per Minute (Input) (TPM) — How many game coins are in the card per minute.
  • Tokens Per Day (Input) (RPM) — How many game coins are in the card in total.

Similarly, taking Google's free Gemini 2.5 Pro as an example, its input methods are: picture/text/audio/video and PDF, and the output method is text, which means it does not support image generation. The maximum input tokens are 1 million tokens, and output tokens are 65536 tokens. RPM is 2, RPD is 50, TPM is 125,000, and daily tokens (RPM) is 3 million.

(In other words, free accounts cannot enjoy the 1 million input tokens, and can only play 50 times).

Related error: 429 error (Too Many Requests). It will prompt you with the "RESOURCE_EXHAUSTED" error. The reason may be that you have violated one of the four limits mentioned above. At this time, you need to switch to another API or change to another account.

Here I have to say that some APIs provided by public welfare stations are realized by polling a super large number of API keys, because Google will give each AI Studio API key a certain daily quota, so this can bypass the limits. (But this method is very unstable, and it is easy to blow up the keys).

Working Principles: The Mystery of TransformerH4#

[10 Minutes to Let You Completely Understand Transformer - Bilibili] https://b23.tv/3b89PXv
(Since this part of the content is really irrelevant to the tutorial, please search for related content on Bilibili by yourself).

Memory Principles: AI's "Short-Term Memory" and "Forgetting"H4#

We talked about in the parameter limits above that for the maximum input tokens, models often have a certain parameter standard (for example, Gemini's maximum input tokens is 100). And this is usually a hard indicator, even large model companies cannot change it arbitrarily.

Of course, no one can input 1 million tokens at once, so a large part is "embezzled" by the large model itself. But please note that this embezzlement is necessary because large models have no memory. But you can see that it will continue to answer based on your previous message. Didn't we say that large models have no memory? Why is this?
Answer: Because it will send back all your previous questions, regardless of good or bad, all at once, until it reaches the upper limit of maximum input tokens (taking Gemini as an example, that is 1 million tokens).

For example, AI's memory is like us taking scrolling screenshots; the picture gets longer and longer. And sometimes, we want to send the scrolling screenshot we took to a friend, but QQ often limits long pictures, and it may even fail to send (429 or exceeding the maximum input tokens). At this time, we have to delete a part of the picture. Converted to AI, it is the AI directly "forgetting" this part of the content this forgetting follows the principle of time: what is asked first is deleted first, and what is asked later is deleted later.

If you understand the above content, congratulations on understanding the mechanism of AI's short-term memory which is to send all the context over, and "forget" the excess parts. This inevitably accompanies an extremely huge consumption of tokens, so it must be optimized.

RAG: Building AI's "Long-Term Memory"H3#

In order to not limit AI to that short-term memory of 1 million tokens, and at the same time to consume fewer tokens, we must let AI have its own long-term memory how to do it?

Still using the example of screenshots and sending messages on QQ, we can find that long screenshots are just blurrier than short screenshots and who would read such a long string. What's more pity is that there may be only a few places in a long screenshot that need the other party's attention. Analogous to AI, it means that AI cannot distinguish primary from secondary, so the answers are totally irrelevant. The solution is also very simple, which is to cut each key point of the long screenshot into short screenshots, mark their sequence, save them to the album, and then send them to a friend.

By comparing it back, you can understand the meaning of several professional terms:

  • Long cut to short - Chunking
  • Marking sequence - Indexing
  • Saving to album - Vectorization
  • Sending to friend - Retrieval

This is the entire principle of RAG, Retrieval-Augmented Generation.
Video version: https://b23.tv/JOf2Zoq (Sourced from the network).

Bonus: More ResourcesH2#

Bonus resource 1
Bonus resource 2

I originally actually wrote a lot here, but when I was browsing Leinao, I found that someone had already summarized the tutorials. To prevent reinventing the wheel, I will directly attach the Discord link here:
https://discord.com/channels/1134557553011998840/1392379963239301221/1392379963239301221

Article reprinted from #Baby Food - Mubu Authorized for reprint by the original author Luochen

Authorization information
Authorization information
Copyright & License
© 2025 天翔TNXG
Baby Food: A Complete Guide from SillyTavern Setup to API Acquisition
CCCreative Commons License
BYAttribution: You must give appropriate credit
NCNonCommercial: You may not use the material for commercial purposes
SAShareAlike: You must distribute your contributions under the same license
License:BY-NC-SA