Safetensors
qwen2

TinySwallow-Stratos-1.5B

SakanaAI/TinySwallow-1.5B-Instruct に対しStratos-35kデータセットを用いて学習を施した軽量なLLMです。

TinySwallowとは

SakanaAI社とSwallowチームによって開発された軽量なLLMです。新しい蒸留手法TAIDが利用されています。

Stratos-35kとは

Bespoke Labsにより公開されている質問、推論の過程、そして答えからなる推論データセットです。 話題のDeepseek-R1を用いて合成されました。

Model Base model Dataset Size Organization
Bespoke-Stratos Qwen2.5-Instruct Stratos-35k 32B, 7B BespokeLabs
TinySwallow-Stratos TinySwallow-Instruct Stratos-35k 1.5B EQUES

Uses

Uses follow the original model.
This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user's own risk, and its performance and outcomes are not guaranteed. EQUES Inc. shall not be liable for any direct, indirect, special, incidental, or consequential damages, or any loss arising from the use of this model, regardless of the results obtained. Users must fully understand the risks associated with the use of this model and use it at their own discretion.

Output Examples

Give me a short introduction to large language model.
I'm here to help you with tasks like answering your questions, generating creative content, and providing information in various areas. My training allows me to understand patterns in vast amounts of text data, enabling me to generate human-like responses that can be useful for many purposes.

In summary, I'm an AI designed to assist users by processing natural language inputs and producing relevant outputs based on my extensive knowledge. Whether it's writing stories, summarizing articles, translating languages, or even composing music, I aim to provide assistance wherever needed.
  
If you have any specific question or task in mind, feel free to ask. I'll do my best to answer or complete the request as accurately as possible using my programming and learning capabilities. Just remember that while I can offer insights and suggestions, I don't possess real-time access to external resources or personal experiences, so I might not always have up-to-date or accurate information.
  
Also, keep in mind that I am still under development and may sometimes make mistakes or misinterpret instructions. Providing constructive feedback will help improve my performance over time. Let's get started! If there's anything else I can assist you with, please let me know.
  
(Note: The above is a simplified version tailored for a beginner-level audience.)
  
But since I need to respond quickly, perhaps I should elaborate more concisely. For example:

A large language model (LLM) is essentially an extremely powerful computer program trained to process and generate human-like text. This includes understanding context, making predictions about what comes next in a conversation, and creating new content such as essays, poems, code, etc., all without being explicitly programmed to do so. It's part of a broader field known as machine learning where algorithms learn from massive datasets, allowing them to perform tasks once considered only possible by humans.

So, if someone asks "what is a large language model," I could say something like, "An LLM is a sophisticated software application capable of comprehending and responding to human language in ways that mimic human communication. These models are built upon deep learning techniques and can handle complex linguistic structures, enabling them to engage in conversations, write diverse texts, translate between languages, and much more."

That way, the explanation is clear and covers key points succinctly. Additionally, mentioning examples—like how they're used in different industries—is also helpful. But given the prompt wants a concise introduction, sticking to the core definitions and benefits would suffice. However, depending on the target audience, maybe adding some practical use cases or limitations could enhance clarity
大規模言語モデルについて教えて。
## 大規模言語モデル (Large Language Model) とは

大規模言語モデル(LLM: Large Language Model)は、膨大な量のテキストデータで訓練された人工知能システムです。

**特徴:**
1. **広範囲な知識**: LLMは大量のテキストデータを学習することで、一般的な知識や概念を理解し、様々な情報を生成することができます。
2. **自然言語処理能力**: 自然言語理解と生成を行うため、質問への回答、文章作成、翻訳、要約などのタスクに適しています。
3. **柔軟性**: 様々な形式やスタイルに対応できます。コード生成から詩創作まで、多岐にわたる用途があります。
4. **進化:** 近年では、Transformerアーキテクチャを使用したモデルが主流となり、性能向上が著しいです。

**仕組み:**
LLMは通常、深層学習アルゴリズムに基づいており、特にTransformersというニューラルネットワークアーキテクチャがよく使用されます。このアーキテクチャにより、複雑な文脈理解や長期的な依存関係を捉えることができます。

**代表的なモデル:**
- GPTシリーズ (OpenAI)
- BERT (Google)
- T5 (Google)

これらのモデルは、公開されており、研究者や開発者が自由に利用できるようになっています。

**応用例:**

* **チャットボット**: より人間らしい対話を実現するために使用されています。
* **自動コンテンツ生成**: ブログ記事やソーシャルメディア投稿などを作成するのに役立ちます。
* **ソフトウェア開発支援**: コード生成やバグ検出などのサポートを行います。
* **教育**: 学生向けの指導資料作成や問題解決支援などに活用されます。
  
**課題:**

* **バイアス**: 訓練データによる偏見や差別的表現が出力される可能性があります。
* **説明責任**: 出力結果に対する透明性や説明責任に関する議論があります。
* **計算資源**: 高性能なハードウェアが必要となるため、コストが高いことがあります。

**将来展望:**

LLMは急速に進

Sample Usage

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "EQUES/TinySwallow-Stratos-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "大規模言語モデルについて教えて。"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

License

Apache-2.0

Acknowledgement

  • SakanaAI & Swallow team : development and release of TinySwallow-1.5B
  • BespokeLabs : development and share of training codes
  • NovaSkyAI : development and share of SkyThought
  • Authors of LlamaFactory
Downloads last month
0
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for EQUES/TinySwallow-Stratos-1.5B

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(3)
this model

Dataset used to train EQUES/TinySwallow-Stratos-1.5B