# Generative Agents

This serves as yet-another cobbled together application of [generative agents](https://arxiv.org/pdf/2304.03442.pdf) utilizing [LangChain](https://github.com/hwchase17/langchain/tree/master/langchain) as the core dependency and subjugating a "proxy" for GPT4.

In short, by utilizing a language model to summarize, rank, and query against information using NLP queries/instructions, immersive agents can be attained.


## Features

* gradio web UI
* saving and loading of agents
* works with non-OpenAI LLMs and embeddings (tested llamacpp)
* modified prompts for use with vicuna

## Installation

```
pip install -r requirements.txt
```

## Usage

Set your environment variables accordingly:
* `LLM_TYPE`: (`oai`, `llamacpp`): the LLM backend to use in LangChain. OpenAI requires some additional environment variables:
	- `OPENAI_API_BASE`: URL for your target OpenAI
	- `OPENAI_API_KEY`: authentication key for OpenAI
	- `OPENAI_API_MODEL`: target model
* `LLM_MODEL`: (`./path/to/your/llama/model.bin`): path to your GGML-formatted LLaMA model, if using `llamacpp` as the LLM backend
* `LLM_EMBEDDING_TYPE`: (`oai`, `llamacpp`, `hf`): the embedding model to use for similarity computing.
* `LLM_VECTORSTORE_TYPE`: (`chromadb`): the vector store to use for "infinite" context.
* `LLM_PROMPT_TUNE`: (`oai`, `vicuna`, `supercot`, `cocktail`): prompt formatting to use, for variants with specific finetunes for instructions, etc.
* `LLM_CONTEXT`: sets maximum context size

To run:

```
python .\src\main.py
```

## Plans

* I ***do not*** plan on making this uber-user friendly like [mrq/ai-voice-cloning](https://git.ecker.tech/mrq/ai-voice-cloning), as this is just a stepping stone for a bigger project integrating generative agents.
* I need to re-implement grabbing relevant context, as I moved away mostly entirely from the provided LangChain implementation (not to knock it, it's just not in my nature to piggyback off of it).
* the """grand""" endgoal of this is to either have it host as an addon server (more likely, given the... prohibitive VRAM requirements), or live in a C++ library to be used in other programs.

## Caveats

A local LLM isn't *quite* ready yet to truly rely on:
* It works, but it's a bit slow.
* You're really, really dependent on how well your variant is performing.
	- You ***have*** to condition your prompts properly for decent results.
	- Some flavors will handle the instruction-based method used here better than others.
* Model size is also a factor with how much memory it will consume.
	- for example, if this were to be used strictly in C++ in a game engine, you're having to compete with limited (V)RAM
		+ however, I don't think mine consumes all that much resources.
* ChromaDB seems to be strictly Python (node.js bindings still use Python, to my knowledge), which everyone seems to be preferring to use as a vector store over FAISS.

Utilizing GPT4 (or Anthropic Claude) Just Works a little too nicely even without conditioning the prompts for a "chat" model. But SaaS models are inherently limited by cost-per-generation, and not everyone will have that luxury for `[enter use case for this application of generative agents here]`