Spaces:

ml6team
/

README

Running

App Files Files Community

Open source CC-BY dataset and classifier?

by burtenshaw HF staff - opened 20 days ago

Discussion

burtenshaw

20 days ago

•

edited 20 days ago

Hey ML6

Great work on this blog post. Is the dataset and/or classifier available? It would be great to use it on community projects.

cc @nielsr @BramVanroy

nielsr

ML6 Team org 20 days ago

Hi,

Yes the dataset is available here: https://huggingface.co./datasets/fondant-ai/fondant-cc-25m

BramVanroy

20 days ago

@nielsr Is the algorithm somewhere that can be shared? This blog post sort of describes it (https://blog.ml6.eu/ai-image-generation-without-copyright-infringement-a9901b64541c) but seeing an implementation could be helpful.

nielsr

ML6 Team org 20 days ago

The code is here: https://github.com/ml6team/fondant-usecase-filter-creative-commons

BramVanroy

20 days ago

@nielsr Unless I am missing it, that's just the download code, not the processing code that was used to identify copyrighted material.

RobinVC

15 days ago

•

edited 15 days ago

@BramVanroy is @burtenshaw part of your team? can you send me your email addresses through linkedin, we might be able to deliver you the code. (my name is Robin Van Craenenbroek)

BramVanroy

15 days ago

@RobinVC I think @burtenshaw works at Argilla!

RobinVC

15 days ago

•

edited 15 days ago

@burtenshaw and @BramVanroy we will refactor our code and make the source code publicly available on the ML6team github page. This will take a bit of time(probably a 1-2 weeks max) but I will keep you posted if we release it.

BramVanroy

15 days ago

Awesome, thanks!

RobinVC

13 days ago

•

edited 13 days ago

@BramVanroy , @burtenshaw We published a branch containing the dataset extraction logic on our github page: https://github.com/ml6team/fondant-usecase-filter-creative-commons/tree/add-fondant-usecase-cc-image-extraction The code has not been fully cleaned or documented yet but you can already take this as an inspiration. This branch will be merged to main once it's presentable enough. Hope this helps! You can find the dataset extraction logic in the image_extraction folder if interested.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment