xsum_108_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_108_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 32
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - people - would - also 7 -1_said_mr_people_would
0 win - game - right - goal - shot 841 0_win_game_right_goal
1 police - said - court - mr - told 815 1_police_said_court_mr
2 party - labour - mr - election - vote 438 2_party_labour_mr_election
3 care - nhs - patient - health - cancer 111 3_care_nhs_patient_health
4 rate - bank - growth - market - price 77 4_rate_bank_growth_market
5 film - song - show - story - one 76 5_film_song_show_story
6 school - education - student - teacher - child 71 6_school_education_student_teacher
7 syria - syrian - said - killed - force 46 7_syria_syrian_said_killed
8 trump - mr - clinton - russian - campaign 45 8_trump_mr_clinton_russian
9 rescue - helicopter - ship - search - crew 37 9_rescue_helicopter_ship_search
10 google - apple - mobile - said - company 37 10_google_apple_mobile_said
11 fire - torch - building - burner - blaze 35 11_fire_torch_building_burner
12 museum - coin - art - museums - work 32 12_museum_coin_art_museums
13 rail - train - network - service - passenger 32 13_rail_train_network_service
14 energy - gas - coal - fracking - industry 26 14_energy_gas_coal_fracking
15 wales - welsh - assembly - uk - government 25 15_wales_welsh_assembly_uk
16 facebook - company - social - said - site 24 16_facebook_company_social_said
17 president - maduro - mr - macri - venezuelan 23 17_president_maduro_mr_macri
18 president - mr - crocodile - boko - haram 22 18_president_mr_crocodile_boko
19 union - strike - rmt - staff - said 21 19_union_strike_rmt_staff
20 earthquake - quake - kathmandu - people - nepal 20 20_earthquake_quake_kathmandu_people
21 migrant - asylum - le - pen - hungary 18 21_migrant_asylum_le_pen
22 virus - disease - health - ebola - malaria 18 22_virus_disease_health_ebola
23 cat - animal - rspca - dog - said 17 23_cat_animal_rspca_dog
24 species - forest - frog - specie - tree 16 24_species_forest_frog_specie
25 space - earth - surface - mars - mission 15 25_space_earth_surface_mars
26 site - council - centre - pool - plan 14 26_site_council_centre_pool
27 mr - gandhi - minister - indias - state 13 27_mr_gandhi_minister_indias
28 plaque - memorial - died - war - akikusa 12 28_plaque_memorial_died_war
29 korea - north - missile - china - us 8 29_korea_north_missile_china
30 tax - rate - 50p - budget - chancellor 8 30_tax_rate_50p_budget

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.