cnn_dailymail_6789_200000_100000_v1_50topics_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_200000_100000_v1_50topics_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 50
- Number of training documents: 200000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - one - year - people - would | 5 | -1_said_one_year_people |
0 | league - player - game - team - cup | 104194 | 0_league_player_game_team |
1 | said - police - told - court - family | 27178 | 1_said_police_told_court |
2 | said - government - us - military - president | 16379 | 2_said_government_us_military |
3 | car - said - flight - fire - plane | 12476 | 3_car_said_flight_fire |
4 | per - cent - said - year - school | 6776 | 4_per_cent_said_year |
5 | obama - president - said - state - republican | 4128 | 5_obama_president_said_state |
6 | film - show - movie - cosby - the | 3747 | 6_film_show_movie_cosby |
7 | said - mexico - mexican - government - border | 2967 | 7_said_mexico_mexican_government |
8 | dog - animal - cat - zoo - pet | 2258 | 8_dog_animal_cat_zoo |
9 | fashion - weight - art - painting - dress | 2176 | 9_fashion_weight_art_painting |
10 | apple - user - iphone - google - facebook | 2139 | 10_apple_user_iphone_google |
11 | food - energy - climate - per - gas | 1861 | 11_food_energy_climate_per |
12 | ebola - virus - health - disease - outbreak | 1846 | 12_ebola_virus_health_disease |
13 | war - soldier - british - mr - said | 1693 | 13_war_soldier_british_mr |
14 | shark - whale - ship - oil - water | 1686 | 14_shark_whale_ship_oil |
15 | cancer - drug - marijuana - smoking - study | 1576 | 15_cancer_drug_marijuana_smoking |
16 | space - earth - planet - mars - nasa | 1361 | 16_space_earth_planet_mars |
17 | prince - royal - queen - duchess - princess | 1230 | 17_prince_royal_queen_duchess |
18 | ancient - found - site - archaeologist - discovered | 769 | 18_ancient_found_site_archaeologist |
19 | pope - vatican - church - francis - cardinal | 605 | 19_pope_vatican_church_francis |
20 | lottery - ticket - jackpot - million - winning | 604 | 20_lottery_ticket_jackpot_million |
21 | game - robot - console - xbox - 3d | 494 | 21_game_robot_console_xbox |
22 | park - hotel - island - beach - resort | 428 | 22_park_hotel_island_beach |
23 | hollande - sarkozy - trierweiler - french - francois | 354 | 23_hollande_sarkozy_trierweiler_french |
24 | teeth - eye - hand - ear - surgery | 180 | 24_teeth_eye_hand_ear |
25 | kyle - routh - sniper - littlefield - gun | 137 | 25_kyle_routh_sniper_littlefield |
26 | country - population - corruption - per - city | 121 | 26_country_population_corruption_per |
27 | dubai - hajj - pilgrim - mecca - mme | 88 | 27_dubai_hajj_pilgrim_mecca |
28 | ballet - filin - bolshoi - dancer - dmitrichenko | 66 | 28_ballet_filin_bolshoi_dancer |
29 | oldest - age - guinness - worlds - dangi | 50 | 29_oldest_age_guinness_worlds |
30 | fragrance - scent - perfume - smell - bottle | 45 | 30_fragrance_scent_perfume_smell |
31 | dna - cell - graphene - genome - synthetic | 44 | 31_dna_cell_graphene_genome |
32 | accent - favourite - fan - language - top | 35 | 32_accent_favourite_fan_language |
33 | nobel - prize - peace - award - committee | 33 | 33_nobel_prize_peace_award |
34 | violin - orchestra - stradivarius - instrument - symphony | 31 | 34_violin_orchestra_stradivarius_instrument |
35 | turing - bletchley - enigma - code - machine | 30 | 35_turing_bletchley_enigma_code |
36 | gandolfini - sopranos - gandolfinis - soprano - actor | 26 | 36_gandolfini_sopranos_gandolfinis_soprano |
37 | nelson - napoleon - battle - trafalgar - hms | 26 | 37_nelson_napoleon_battle_trafalgar |
38 | redskins - name - native - snyder - washington | 25 | 38_redskins_name_native_snyder |
39 | eurovision - contest - song - conchita - country | 25 | 39_eurovision_contest_song_conchita |
40 | evolution - creationism - scientific - intelligent - believe | 21 | 40_evolution_creationism_scientific_intelligent |
41 | prabowo - indonesia - jakarta - widodo - jokowi | 17 | 41_prabowo_indonesia_jakarta_widodo |
42 | dmlaterbundle - twittervia - lanza - zann - ilfracombe | 15 | 42_dmlaterbundle_twittervia_lanza_zann |
43 | clock - time - hour - daylight - westworth | 13 | 43_clock_time_hour_daylight |
44 | ikea - furniture - ikeas - kamprad - refugee | 12 | 44_ikea_furniture_ikeas_kamprad |
45 | vick - vicks - nfl - dog - virginia | 10 | 45_vick_vicks_nfl_dog |
46 | bulb - light - leds - paddle - bulbs | 8 | 46_bulb_light_leds_paddle |
47 | port - cairo - ministry - egypt - fan | 7 | 47_port_cairo_ministry_egypt |
48 | sanford - sanfords - jenny - carolina - mark | 5 | 48_sanford_sanfords_jenny_carolina |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: 50
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.23.5
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.57.1
- Plotly: 5.15.0
- Python: 3.10.12
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.