LLM

GitHub - Hannibal046/Awesome-LLM (opens in a new tab)

LLM - Applications

LLM - Applications - Chat UI

GitHub - oobabooga/text-generation-webui (opens in a new tab)

Doesn't support MCP server integration yet

Certification

NCA-GENL (opens in a new tab)

NVIDIA-Certified Associate - Generative AI LLMs
NCP-GENL (opens in a new tab)

NVIDIA-Certified Professional - Generative AI LLMs
NCP-AAI (opens in a new tab)

NVIDIA-Certified Professional - Agentic AI
NCP-ADS (opens in a new tab)

NVIDIA-Certified Professional - Accelerated Data Science
AWS Certified ML Engineer - Associate (opens in a new tab)
- Preparing for AWS Certified ML Engineer Associate Exam (opens in a new tab)

Study target

AI Agent & Agentic AI Expertise

Advanced proficiency in AI agent frameworks (LangChain, LangGraph, AutoGen, CrewAI, or similar).

Deep understanding of multi-agent systems, agent communication protocols, and coordination mechanisms.

Extensive experience with Large Language Models (LLMs) and their integration into agent workflows.

Expert-level prompt engineering and chain-of-thought reasoning implementation.

Proven experience with autonomous agent architectures (ReAct, Plan-and-Execute, Reflection patterns).

Knowledge of agent memory systems, tool integration, and environment interaction.

AI-assisted Development Workflow

Coding agent

Cheat sheet

Cheat sheet - Downloading models from Hugging Face

Ensure Git LFS is installed.
```
git lfs install
```

Clone the model repo

# e.g. MODEL_ID=Qwen/Qwen3-8B-GGUF
git clone https://huggingface.co/$MODEL_ID

Cheat sheet - Model reference

An open-source database of AI models (opens in a new tab)

Resources

Performance of llama.cpp on Apple Silicon M-series (opens in a new tab)

Nov, 2023

Framework for LLM use case evaluation

Type of customer need	Example	ML Implementation (Yes/No/Depends)	Type of ML Implementation
Repetitive tasks where a customer needs the same output for the same input	Add my email across various forms online	No	Creating a rules-based system is more than sufficient to help you with your outputs
Repetitive tasks where a customer needs different outputs for the same input	The customer is in “discovery mode” and expects a new experience when taking the same action: - Generate a new artwork per click - StumbleUpon style exploration	Yes	- Image generation LLMs - Recommendation algorithms (collaborative filtering)
Repetitive tasks where a customer needs the same/similar output for different inputs	- Grading essays - Generating themes from customer feedback	Depends	If simple: rules-based system works. If complex combinations: - Classifiers - Topic modelling Use LLMs for patternless or one-off cases
Repetitive tasks where a customer needs different outputs for different inputs	- Answering customer support questions - Search	Yes	Too many permutations for rules-based systems. Consider: - LLMs with retrieval-augmented generation (RAG) - Decision trees for products such as search
Non-repetitive tasks with different outputs	Review of a hotel/restaurant	Yes	Pre-LLMs needed specialized models: - Recurrent neural networks (RNNs) - LSTMs LLMs are a great fit for this type of scenario

Not everything needs an LLM: A framework for evaluating when AI makes sense | VentureBeat (opens in a new tab)

LLMOps

GitHub - tensorchord/Awesome-LLMOps (opens in a new tab)

An awesome & curated list of the best LLMOps tools for developers.

LLMOps - API

Gist
- Most LLM providers have an API compatible with OpenAI API.
Resources
- Common OpenAI API Endpoints (opens in a new tab)
- OpenAI Platform - API Reference (opens in a new tab)

API - Model Provider

Gist
- Local models
Providers
- vLLM
  - Red Hat - What is vLLM? (opens in a new tab)
- KoboldCpp
  - Forked from llama.cpp, optimized for serving LLMs via an API.
  - Kobold API on port 5001 at http://localhost:5001/api/
  - OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
Resources

API - API Gateway

Requirements
- Must be able to integrate custom OpenAI API compatible endpoints.
Providers
- LiteLLM (opens in a new tab)
- TensorZero (opens in a new tab)

Coding agent

Providers
- aider (opens in a new tab) - CLI based, working on all platforms
Workflow
- Create a feature branch
- Create a plan / roadmap in markdown - either manually or chat to LLM to create and flesh it out.
- Then work through the plan, making frequent commits in local feature branch. You can go step by step (my approach, lets me micro manage / write code where its quicker for me than the llm), or just YOLO and ask Sonnet 3.7 / Gemini 2.5 pro to try and one shot (probably not a good idea if you are actually a developer, but the option is there!)
- Commit messages are unimportant, "WIP" is fine, its just the local branch, and enables easy rollbacks.
- You can of course follow changes in the terminal, but using the git window > local changes is often helpful, if you are following the step by step approach.
- When done, squash merge the local feature branch, writing a meaningful, semantic commit message

Concepts

Context length

The maximum number of tokens that an LLM can process in a single input sequence.

In simpler terms, context length acts as the model’s attention span determining how much information it can consider at once when generating responses.

GitHub - NVIDIA/RULER (opens in a new tab)

RULER: What’s the Real Context Size of Your Long-Context Language Models?

Training

Training - Pretraining

Train the model on a large corpus of text (raw text).
Pretraining creates an initial pretrained LLM, called a base or foundation model.

Training - Fine-tuning

Train the model to predict the next word in the text on labeled data, aka fine-tuning.
2 most popular fine-tuning categories
- Instruction fine-tuning
  
  Labeled dataset consists of instruction and answer pairs, such as a query to translate a text accompanied by the correct translation.
- Classification fine-tuning
  
  Labeled dataset consists of text and associated labels, such as emails with spam or not spam labels.

Devcontainer Reading