xsum_6789_5000000_2500000_v1_50topics_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_6789_5000000_2500000_v1_50topics_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 50
  • Number of training documents: 204045
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - would - one - year 6 -1_said_mr_would_one
0 win - game - half - second - right 120934 0_win_game_half_second
1 said - would - labour - eu - party 18917 1_said_would_labour_eu
2 said - police - mr - court - would 14269 2_said_police_mr_court
3 mr - president - said - government - us 12871 3_mr_president_said_government
4 bank - company - sale - said - year 9344 4_bank_company_sale_said
5 fire - said - flood - people - water 2912 5_fire_said_flood_people
6 airport - flight - space - plane - aircraft 2642 6_airport_flight_space_plane
7 trump - mr - us - clinton - said 2601 7_trump_mr_us_clinton
8 energy - oil - rate - growth - price 2301 8_energy_oil_rate_growth
9 sea - water - ship - said - coastguard 2183 9_sea_water_ship_said
10 dog - animal - bird - said - zoo 1899 10_dog_animal_bird_said
11 president - fifa - mr - government - cuba 1707 11_president_fifa_mr_government
12 film - bbc - actor - star - show 1665 12_film_bbc_actor_star
13 drug - alcohol - smoking - pollution - health 1340 13_drug_alcohol_smoking_pollution
14 music - album - song - band - chart 1193 14_music_album_song_band
15 virus - ebola - health - infection - vaccine 827 15_virus_ebola_health_infection
16 yn - ar - wedi - ei - bod 765 16_yn_ar_wedi_ei
17 robot - car - research - technology - science 560 17_robot_car_research_technology
18 india - indian - delhi - indias - modi 469 18_india_indian_delhi_indias
19 found - museum - site - fossil - roman 431 19_found_museum_site_fossil
20 church - bishop - pope - vatican - cardinal 429 20_church_bishop_pope_vatican
21 australia - australian - mr - nauru - prime 381 21_australia_australian_mr_nauru
22 unsupported - updated - playback - device - media 342 22_unsupported_updated_playback_device
23 prince - royal - queen - duchess - duke 293 23_prince_royal_queen_duchess
24 marriage - gay - law - samesex - transgender 266 24_marriage_gay_law_samesex
25 updated - bst - gmt - last - 2017 235 25_updated_bst_gmt_last
26 food - product - meat - horsemeat - cheese 232 26_food_product_meat_horsemeat
27 woman - fashion - dress - women - wear 209 27_woman_fashion_dress_women
28 book - novel - author - prize - writer 202 28_book_novel_author_prize
29 mountain - avalanche - climber - everest - snow 201 29_mountain_avalanche_climber_everest
30 trident - defence - submarine - nuclear - army 180 30_trident_defence_submarine_nuclear
31 christmas - bell - cake - poppy - minster 152 31_christmas_bell_cake_poppy
32 cosby - clown - constand - mr - cosbys 144 32_cosby_clown_constand_mr
33 water - utilities - customer - company - said 125 33_water_utilities_customer_company
34 sleep - suicide - life - people - health 118 34_sleep_suicide_life_people
35 flag - confederate - white - university - statue 115 35_flag_confederate_white_university
36 nba - warriors - lakers - cavaliers - game 109 36_nba_warriors_lakers_cavaliers
37 picture - scotlandpicturesbbccouk - bbcscotlandpics - photo - selection 98 37_picture_scotlandpicturesbbccouk_bbcscotlandpics_photo
38 flag - emojis - deaf - emoji - language 89 38_flag_emojis_deaf_emoji
39 ring - diamond - jewellery - carat - jewel 65 39_ring_diamond_jewellery_carat
40 pokemon - game - go - ai - player 49 40_pokemon_game_go_ai
41 takata - airbags - recall - honda - airbag 37 41_takata_airbags_recall_honda
42 follow - - - - 35 42_follow___
43 depp - joyce - dog - boo - pistol 29 43_depp_joyce_dog_boo
44 leaguebyleague - list - managerial - below - appear 26 44_leaguebyleague_list_managerial_below
45 name - top - girls - boys - boy 17 45_name_top_girls_boys
46 film - potter - scotland - beasts - grindelwald 15 46_film_potter_scotland_beasts
47 balcony - irish - berkeley - student - donohoe 10 47_balcony_irish_berkeley_student
48 flower - garden - flowered - botanic - arum 6 48_flower_garden_flowered_botanic

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 50
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.