Core Capabilities
Aide provides access to various pretrained foundational models, allowing users to leverage advanced capabilities for diverse tasks.
Prompt Engineering and LLM Customization
Aide supports prompt engineering and customization of Large Language Models (LLMs) to tailor responses to specific needs.
Fine-tuning and Inference
Users can fine-tune pretrained models and perform inference locally for customized tasks and efficient text generation.
Local Instance Architecture
Aide operates using local instances for model hosting and fine-tuning, enabling efficient management of computational resources.
How Does Aide Work?
Text Input -> Aide Service -> Text Output
Aide is designed to process and generate human language efficiently at a large scale, suitable for a variety of use cases including:
- Text Generation
- Summarization
- Data Extraction
- Classification
- Conversation
Pretrained Foundational Models
Text Generation Models
Text generation models are designed to generate coherent and contextually relevant text based on given prompts or instructions. This document outlines various models available through Ollama, including both traditional language models and multimodal options.
Llama 2
Llama 2 is a family of open-source large language models developed by Meta AI. It's available in various sizes and is suitable for a wide range of natural language processing tasks.
- Variants: 7B, 13B, 70B parameters
- Use cases: General text generation, question-answering, summarization
ollama run llama2
Mistral
Mistral is a powerful and efficient language model that offers strong performance across various tasks.
- Variants: 7B, 8x7B (Mixtral)
- Key features: Efficient architecture, strong performance on diverse tasks
ollama run mistral
Phi-2
Phi-2 is a small language model developed by Microsoft Research, known for its impressive performance despite its compact size.
- Size: 2.7B parameters
- Key features: Compact size, efficient performance
ollama run phi
Stable Beluga
A fine-tuned version of Llama that excels in instruction-following and conversational tasks.
- Base model: Llama
- Specialization: Instruction-following, conversation
ollama run stable-beluga
Orca 2
Another Llama-based model series optimized for reasoning and task completion.
- Base model: Llama
- Specialization: Reasoning, task completion
ollama run orca2
Yi
A series of large language models developed by 01.AI, available in various sizes.
- Variants: 6B to 34B parameters
- Use cases: General text generation, analysis
ollama run yi
Neural Chat
An instruction-following model based on Intel's neural processing units, optimized for conversational AI applications.
- Specialization: Conversational AI
- Key features: Optimized for Intel NPUs
ollama run neural-chat
Multimodal Models
These models extend beyond text, incorporating visual understanding capabilities.
LLaVA (Large Language and Vision Assistant)
LLaVA combines Llama 2 with visual understanding capabilities.
- Base model: Llama 2
- Additional capability: Visual processing
ollama run llava
Bakllava
A multimodal model based on Llama that can process both text and images.
- Base model: Llama
- Additional capability: Image processing
ollama run bakllava
CLIP
While not a full LLM, CLIP is a multimodal model that can understand relationships between images and text.
- Key feature: Image-text relationship understanding
- Use cases: Image classification, visual search
ollama run clip
Usage Notes
- Ensure you have Ollama installed on your system.
- Use the
ollama run
command followed by the model name to start interaction. - For multimodal models, make sure you have the necessary setup to input both text and images.
- Consider the trade-offs between model size, performance, and resource requirements when choosing a model for your specific use case.
Remember to check the Ollama documentation for the most up-to-date information on available models and their usage.
Text Summarization
Summarize text according to specific formats, lengths, and tones.
Models:
- Command: Utilized for generating summaries with user-specified parameters.
Embedding Models
Convert text into numerical vector embeddings for tasks like semantic search and classification.
Models:
- embed-english-v3.0 / embed-multilingual-v3.0: Provides vector embeddings for English and multilingual text.
- embed-english-light-v3.0 / embed-multilingual-light-v3.0: A smaller, faster version for efficient embedding.
- embed-english-light-v2.0: Previous generation model for English text.
Fine-tuning and Inference in Aide
Fine-tuning Workflow
- Create a Local Instance: Set up a local environment for model fine-tuning.
- Gather Training Data: Prepare and organize your domain-specific dataset.
- Kickstart Fine-tuning: Initiate the fine-tuning process on your local instance.
- Generate Fine-tuned Model: The model is refined based on the provided data.
Inference Workflow
- Create a Local Instance: Set up an instance to host the fine-tuned model.
- Create Endpoint: Define a local endpoint for the model.
- Serve Model: Handle inference requests and generate responses based on the fine-tuned model.
T-Few Fine-tuning
T-Few Fine-tuning is an efficient method that updates a subset of the model's weights, resulting in reduced training time and cost compared to traditional fine-tuning. It involves:
- Utilizing initial weights and annotated data.
- Generating a supplementary set of model weights.
- Confining updates to specific transformer layers.
Fine-tuning Parameters
- Total Training Epochs: Number of training iterations (default: 3).
- Batch Size: Number of samples processed before updating parameters (default: 8 for Command).
- Learning Rate: Rate at which parameters are updated (default: 0.1 for T-Few).
- Early Stopping Threshold: Minimum improvement required to avoid premature termination (default: 0.01).
- Early Stopping Patience: Tolerance for stagnation before stopping training (default: 6).
- Log Model Metrics Interval: Frequency of logging model metrics (default: 10 steps).
Prompt Engineering
Prompt
The initial text provided to the model.
Prompt Engineering involves refining prompts to elicit desired responses, leveraging techniques such as in-context learning and few-shot prompting.
In-context Learning and Few-shot Prompting
- In-context Learning: Provides context and instructions within the prompt.
- Few-shot Prompting: Includes examples in the prompt to guide the model’s responses.
Advanced Prompting Strategies
- Chain-of-Thought: Incorporates reasoning steps in the prompt to improve response quality.
- Zero-Shot Chain-of-Thought: Uses reasoning without explicit examples.
Retrieval Augmented Generation (RAG)
RAG optimizes model output by querying external knowledge bases without altering the underlying model. It involves:
- Few-shot Prompting: Simple and quick to implement, but may increase latency.
- Fine-tuning: Enhances model performance for specific tasks, but requires labeled datasets.
- RAG: Effective for integrating up-to-date information and grounding responses in current data.
Choosing the Right Approach
- Start with a Simple Prompt: Test and refine basic prompts.
- Add Few-shot Prompting: Incorporate examples for improved performance.
- Utilize RAG: Integrate retrieval mechanisms for enhanced context and accuracy.
- Fine-tune the Model: Apply fine-tuning for domain-specific needs.
- Optimize Retrieval: Fine-tune retrieval processes for more accurate results.
Local Instance Setup
Instance Configuration
Set up local instances based on your needs for fine-tuning and inference. Instances can be scaled according to model requirements and expected throughput.
- Fine-tuning Cost: Cost is metaphoric with local running hardware and bandwith based on the number of instance hours and the fine-tuning duration with the parameterised model being used on the local machine.
- Hosting Cost: Reflects the cost of maintaining instances for inference in cloud based solution during scaling up.
Security
Aide ensures that customer data and models are isolated and secure within the local instance environment, with access restricted to the customer’s tenancy.
Feel free to adjust any sections or details as needed!