YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co./docs/hub/model-cards#model-card-metadata)
Content Classification LoRA Adapter for Gemma-2B
A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.
Used in a pipeline.
Technical Specifications
Base Model
- Model: unsloth/gemma-2b
- LoRA Rank: 64
- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
- Task: CAUSAL_LM
- Dropout: 0
- Alpha: 32
Input/Output Format
Input XML structure:
<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
<suitable>
<content>{input_text}</content>
Output XML structure:
<thinking>{reasoning_process}</thinking>
<category>{content_type}</category>
<should_index>{true|false}</should_index>
</suitable>
The model then expects an indefinite list of <suitable> ... </suitable>
that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.
Your stop token should be </suitable>
.
Deployment
VLLM Server Setup
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
vllm serve unsloth/gemma-2-2b \
--gpu-memory-utilization=1 \
--port 6002 \
--served-model-name="gemma" \
--trust-remote-code \
--max-model-len 8192 \
--disable-log-requests \
--enable-lora \
--lora-modules lora=./dataset/output/unsloth/lora_model \
--max-lora-rank 64
Processing Pipeline
- Install Dependencies:
pip install requests tqdm concurrent.futures
- Run Content Processor:
python process.py --input corpus.jsonl --output results.jsonl --threads 24
Client Implementation
import requests
def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
xml_content = (
'<instruction>Determine true or false if the following content is '
'suitable and should be indexed.</instruction>\n'
'<suitable>\n'
f' <content>{text}</content>'
)
response = requests.post(
vllm_url,
json={
"prompt": xml_content,
"max_tokens": 6000,
"temperature": 1,
"model": "lora",
"stop": ["</suitable>"]
},
timeout=30000
)
completion = response.json()["choices"][0]["text"]
# Parse XML tags
import re
def extract_tag(tag: str) -> str:
match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
return match.group(1).strip() if match else ""
return {
"thinking": extract_tag("thinking"),
"category": extract_tag("category"),
"should_index": extract_tag("should_index")
}
Example Usage
text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management
TABLE OF CONTENTS..."""
result = classify_content(text)
print(result)
Example output:
{
"thinking": "This is a table of contents for a document, not the actual content.",
"category": "table of contents",
"should_index": "false"
}
Batch Processing
The included processor supports parallel processing of JSONL files:
from request_processor import RequestProcessor
processor = RequestProcessor(
input_file="corpus.jsonl",
output_file="results.jsonl",
num_threads=24
)
processor.process_file()
Input JSONL format:
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
}
}
Output JSONL format:
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
},
"thinking": "reasoning process",
"category": "content type",
"should_index": "true/false",
"processed_at": "2024-10-22 02:52:33"
}
Implementation and Performance Considerations
- Use thread pooling for parallel processing
- Implement atomic writes with file locking
- Progress tracking with tqdm
- Automatic error handling and logging
- Configurable thread count for optimization
Error Handling
Errors are captured in the output JSONL:
{
"error": "error message",
"processed_at": "timestamp"
}
Monitor errors in real-time:
tail -f results.jsonl | grep error