xsum_22457_3000_1500_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_22457_3000_1500_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 26
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - would - one - year 5 -1_said_people_would_one
0 said - police - court - mr - heard 646 0_said_police_court_mr
1 labour - party - mr - scotland - vote 242 1_labour_party_mr_scotland
2 race - olympic - gold - team - medal 56 2_race_olympic_gold_team
3 president - un - mr - south - said 51 3_president_un_mr_south
4 united - foul - half - kick - win 48 4_united_foul_half_kick
5 price - bank - rose - share - said 44 5_price_bank_rose_share
6 attack - taliban - militant - killed - said 41 6_attack_taliban_militant_killed
7 care - health - nhs - hospital - patient 32 7_care_health_nhs_hospital
8 england - cricket - wicket - test - ball 27 8_england_cricket_wicket_test
9 specie - tiger - bird - said - breeding 27 9_specie_tiger_bird_said
10 rugby - wales - player - coach - world 27 10_rugby_wales_player_coach
11 celtic - league - season - game - rangers 26 11_celtic_league_season_game
12 album - music - song - show - singer 26 12_album_music_song_show
13 open - round - world - play - american 25 13_open_round_world_play
14 school - education - schools - said - child 24 14_school_education_schools_said
15 film - best - actor - star - actress 21 15_film_best_actor_star
16 eu - uk - brexit - trade - would 21 16_eu_uk_brexit_trade
17 data - us - internet - said - information 21 17_data_us_internet_said
18 league - transfer - season - club - appearance 20 18_league_transfer_season_club
19 parking - council - said - road - ringgo 19 19_parking_council_said_road
20 trump - mr - clinton - republican - president 15 20_trump_mr_clinton_republican
21 water - supply - affected - flooding - customer 12 21_water_supply_affected_flooding
22 fifa - corruption - scala - also - president 12 22_fifa_corruption_scala_also
23 testimonial - match - tevez - united - player 6 23_testimonial_match_tevez_united
24 hiv - outbreak - disease - kong - hong 6 24_hiv_outbreak_disease_kong

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.