yolo-doclaynet / README.md
hantian's picture
Update README.md
dfabea3 verified
|
raw
history blame
2.47 kB
metadata
datasets:
  - ds4sd/DocLayNet
language:
  - en
tags:
  - YOLO
  - document-analysis

More details refer to Github

Introduction

You know that RAG is very popular these days. There are many applications that support talking to documents. However, there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with a fast and good performance method.

Detection Sample

image

Method

  1. YOLO is the most advenced detect model developed by Ultralytics. YOLO has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to solve this challenge.
  2. DocLayNet is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of document sources. As far as I know, it's the most qualified document layout analysis dataset.

Usage

from ultralytics import YOLO

model = YOLO("{path to model file}")
pred = model("{path to test image}")
print(pred)

Dataset

DocLayNet can be found more details and download at this link. It has 11 labels:

  • Text: Regular paragraphs.
  • Picture: A graphic or photograph.
  • Caption: Special text outside a picture or table that introduces this picture or table.
  • Section-header: Any kind of heading in the text, except overall document title.
  • Footnote: Typically small text at the bottom of a page, with a number or symbol that is referred to in the text above.
  • Formula: Mathematical equation on its own line.
  • Table: Material arranged in a grid alignment with rows and columns, often with separator lines.
  • List-item: One element of a list, in a hanging shape, i.e., from the second line onwards the paragraph is indented more than the first line.
  • Page-header: Repeating elements like page number at the top, outside of the normal text flow.
  • Page-footer: Repeating elements like page number at the bottom, outside of the normal text flow.
  • Title: Overall title of a document, (almost) exclusively on the first page and typically appearing in large font.