xsum_6789_5000000_2500000_v1_50topics_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_6789_5000000_2500000_v1_50topics_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 50
- Number of training documents: 204045
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - mr - would - one - year | 6 | -1_said_mr_would_one |
0 | win - game - half - second - right | 120934 | 0_win_game_half_second |
1 | said - would - labour - eu - party | 18917 | 1_said_would_labour_eu |
2 | said - police - mr - court - would | 14269 | 2_said_police_mr_court |
3 | mr - president - said - government - us | 12871 | 3_mr_president_said_government |
4 | bank - company - sale - said - year | 9344 | 4_bank_company_sale_said |
5 | fire - said - flood - people - water | 2912 | 5_fire_said_flood_people |
6 | airport - flight - space - plane - aircraft | 2642 | 6_airport_flight_space_plane |
7 | trump - mr - us - clinton - said | 2601 | 7_trump_mr_us_clinton |
8 | energy - oil - rate - growth - price | 2301 | 8_energy_oil_rate_growth |
9 | sea - water - ship - said - coastguard | 2183 | 9_sea_water_ship_said |
10 | dog - animal - bird - said - zoo | 1899 | 10_dog_animal_bird_said |
11 | president - fifa - mr - government - cuba | 1707 | 11_president_fifa_mr_government |
12 | film - bbc - actor - star - show | 1665 | 12_film_bbc_actor_star |
13 | drug - alcohol - smoking - pollution - health | 1340 | 13_drug_alcohol_smoking_pollution |
14 | music - album - song - band - chart | 1193 | 14_music_album_song_band |
15 | virus - ebola - health - infection - vaccine | 827 | 15_virus_ebola_health_infection |
16 | yn - ar - wedi - ei - bod | 765 | 16_yn_ar_wedi_ei |
17 | robot - car - research - technology - science | 560 | 17_robot_car_research_technology |
18 | india - indian - delhi - indias - modi | 469 | 18_india_indian_delhi_indias |
19 | found - museum - site - fossil - roman | 431 | 19_found_museum_site_fossil |
20 | church - bishop - pope - vatican - cardinal | 429 | 20_church_bishop_pope_vatican |
21 | australia - australian - mr - nauru - prime | 381 | 21_australia_australian_mr_nauru |
22 | unsupported - updated - playback - device - media | 342 | 22_unsupported_updated_playback_device |
23 | prince - royal - queen - duchess - duke | 293 | 23_prince_royal_queen_duchess |
24 | marriage - gay - law - samesex - transgender | 266 | 24_marriage_gay_law_samesex |
25 | updated - bst - gmt - last - 2017 | 235 | 25_updated_bst_gmt_last |
26 | food - product - meat - horsemeat - cheese | 232 | 26_food_product_meat_horsemeat |
27 | woman - fashion - dress - women - wear | 209 | 27_woman_fashion_dress_women |
28 | book - novel - author - prize - writer | 202 | 28_book_novel_author_prize |
29 | mountain - avalanche - climber - everest - snow | 201 | 29_mountain_avalanche_climber_everest |
30 | trident - defence - submarine - nuclear - army | 180 | 30_trident_defence_submarine_nuclear |
31 | christmas - bell - cake - poppy - minster | 152 | 31_christmas_bell_cake_poppy |
32 | cosby - clown - constand - mr - cosbys | 144 | 32_cosby_clown_constand_mr |
33 | water - utilities - customer - company - said | 125 | 33_water_utilities_customer_company |
34 | sleep - suicide - life - people - health | 118 | 34_sleep_suicide_life_people |
35 | flag - confederate - white - university - statue | 115 | 35_flag_confederate_white_university |
36 | nba - warriors - lakers - cavaliers - game | 109 | 36_nba_warriors_lakers_cavaliers |
37 | picture - scotlandpicturesbbccouk - bbcscotlandpics - photo - selection | 98 | 37_picture_scotlandpicturesbbccouk_bbcscotlandpics_photo |
38 | flag - emojis - deaf - emoji - language | 89 | 38_flag_emojis_deaf_emoji |
39 | ring - diamond - jewellery - carat - jewel | 65 | 39_ring_diamond_jewellery_carat |
40 | pokemon - game - go - ai - player | 49 | 40_pokemon_game_go_ai |
41 | takata - airbags - recall - honda - airbag | 37 | 41_takata_airbags_recall_honda |
42 | follow - - - - | 35 | 42_follow___ |
43 | depp - joyce - dog - boo - pistol | 29 | 43_depp_joyce_dog_boo |
44 | leaguebyleague - list - managerial - below - appear | 26 | 44_leaguebyleague_list_managerial_below |
45 | name - top - girls - boys - boy | 17 | 45_name_top_girls_boys |
46 | film - potter - scotland - beasts - grindelwald | 15 | 46_film_potter_scotland_beasts |
47 | balcony - irish - berkeley - student - donohoe | 10 | 47_balcony_irish_berkeley_student |
48 | flower - garden - flowered - botanic - arum | 6 | 48_flower_garden_flowered_botanic |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: 50
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.23.5
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.57.1
- Plotly: 5.15.0
- Python: 3.10.12
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.