Text Generation
Transformers
Safetensors
English
Chinese
llama
conversational
text-generation-inference
Inference Endpoints

Full of refusals

#3
by rombodawg - opened

This model needs work, Either you didnt filter the dataset for refusals, or you did it poorly. I didnt test the base model, but the instruct model cant do basic coding because it acts like "An AI language model isnt capable of doing such a task" Which is the typical AI refusal slop found in most chatgpt generated datasets. Definitely needs retrained before this is at all useful.

infly-ai org

Hi~ Would you mind providing some examples about this issue with reproducible settings?

@Simingh Sure, I used LM studio, running at temp 0, GGUF model Q8_0 version. I used a prompt that is normally used to test modern coding LLM's regularly for their effectiveness.

Prompt:
"Code the classic game "tetris" in python using pygame. Include block falling, stacking, rotating, multiple blocks, and the game ending when the blocks reach the top"

Response:
"Im sorry but as an AI language model I cannot produce and entire tetris game in python..."

It gives this type of response to this and many other prompts Ive given it if they aren't the most simple coding problem. Even Qwen-2.5-3b-instruct will at least attempt to create the code in full without refusing, and its only a 3b parameter model. So this is not acceptable by todays standards.

Hi~I tried to use the same setting on the our original version, but I cannot reproduce this issue. I suspect that this issue is caused by the quantized model. Feel free to verify this on our demo page: https://huggingface.co./spaces/OpenCoder-LLM/OpenCoder-8B-Instruct

I did try the space, although it does attempt to write the code, it does it poorly, and doesnt follow instructions very well. It even says:

"Please note that this is a very basic implementation and does not include all the features you requested."

Dont get me wrong, for a version 1, I think its a good start. But it just needs improvements.

My guess is there is 1 of 2 problems:

  1. The model was trained with enough refusals to be "lazy".

  2. The model wasn't trained with enough data to be able to complete the request, so it leaves out parts of the answer in order to compensate for its lack of knowledge.

So overall, good job for a first coding model, you are already ahead of starcoder and codegemma when they first came out, however I think you can learn and grow from here. Use higher quality models (O1, Claude-3.5, qwen-coder-32b-instruct, ect.) to improve your datasets and continue pretraining. See where things go from there.

If you want to use a model locally I actually have a much improved version of the qwen-coder-32b-instruct that combined the pretrained and finetuned weights to improve its overall performance, you may want to look into it to improve your datasets. Ill link it bellow:

https://huggingface.co./rombodawg/Rombos-Coder-V2.5-Qwen-32b

rombodawg changed discussion status to closed
infly-ai org

Thank you for your kind suggestions! As you said, we will continue to update our dataset to make our model perform better, and more important, perform more practical.

Welcome to follow this project for future updates!

Sign up or log in to comment