samples | ||
src | ||
LICENSE | ||
README.md | ||
requirements.txt |
Generative Agents
This serves as yet-another cobbled together application of generative agents utilizing LangChain as the core dependency and subjugating a "proxy" for GPT4.
In short, by utilizing a language model to summarize, rank, and query against information using NLP queries/instructions, immersive agents can be attained.
Features
- gradio web UI
- saving and loading of agents
- works with non-OpenAI LLMs and embeddings (tested llamacpp)
- modified prompts for use with vicuna
Installation
pip install -r requirements.txt
Usage
Set your environment variables accordingly:
LLM_TYPE
: (oai
,llamacpp
): the LLM backend to use in LangChain. OpenAI requires some additional environment variables:OPENAI_API_BASE
: URL for your target OpenAIOPENAI_API_KEY
: authentication key for OpenAIOPENAI_API_MODEL
: target model
LLM_MODEL
: (./path/to/your/llama/model.bin
): path to your GGML-formatted LLaMA model, if usingllamacpp
as the LLM backendLLM_EMBEDDING_TYPE
: (oai
,llamacpp
,hf
): the embedding model to use for similarity computing.LLM_PROMPT_TUNE
: (oai
,vicuna
,supercot
,cocktail
): prompt formatting to use, for variants with specific finetunes for instructions, etc.LLM_CONTEXT
: sets maximum context size
To run:
python .\src\main.py
Plans
I do not plan on making this uber-user friendly like mrq/ai-voice-cloning, as this is just a stepping stone for a bigger project integrating generative agents.
Caveats
A local LM is quite slow. Things seem to be getting faster as llama.cpp is being developed.
Even using one that's more instruction-tuned like Vicuna (with a SYSTEM:\nUSER:\nASSISTANT:
structure of prompts), it's still inconsistent.
However, I seem to be getting consistent results with SuperCOT 33B, it's just, well, slow. SuperCOT 13B seems to be giving better answers over Vicuna-1.1 13B, so. Cocktail 13B seems to be the best of the 13Bs.
A lot of prompt wrangling is needed, and a lot of the routines could be polished up (for example, an observation queries the LM for a rating, and each response reaction requires quering for the observed entity, then the relationship between an agent and observed entity which ends up just summarizing relevant context/memories, and then queries for a response), and if one of these steps fails, then the fail rate is higher. If anything, I might as well just work from the ground up and only really salvage the use of FAISS to store embedded-vectors.
GPT4 seems to Just Work, unfortunately.