cnn_dailymail_108_50000_25000_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_50000_25000_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 92
  • Number of training documents: 13368
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - police - one - year - also 5 -1_said_police_one_year
0 league - game - player - goal - season 4918 0_league_game_player_goal
1 isis - syria - islamic - group - iraq 2700 1_isis_syria_islamic_group
2 dog - animal - elephant - bear - cat 415 2_dog_animal_elephant_bear
3 labour - mr - party - election - cameron 386 3_labour_mr_party_election
4 flight - plane - aircraft - pilot - crash 340 4_flight_plane_aircraft_pilot
5 hair - fashion - dress - look - model 248 5_hair_fashion_dress_look
6 car - driver - driving - road - police 227 6_car_driver_driving_road
7 food - cent - sugar - health - per 221 7_food_cent_sugar_health
8 police - officer - shooting - shot - said 215 8_police_officer_shooting_shot
9 clinton - email - obama - president - state 213 9_clinton_email_obama_president
10 cricket - england - cup - world - zealand 191 10_cricket_england_cup_world
11 property - house - home - room - price 184 11_property_house_home_room
12 fight - pacquiao - mayweather - manny - floyd 171 12_fight_pacquiao_mayweather_manny
13 hamilton - mercedes - race - prix - rosberg 135 13_hamilton_mercedes_race_prix
14 baby - hospital - birth - mother - child 127 14_baby_hospital_birth_mother
15 murray - wells - tennis - andy - match 127 15_murray_wells_tennis_andy
16 eclipse - earth - solar - sun - planet 102 16_eclipse_earth_solar_sun
17 police - abuse - sex - sexual - child 98 17_police_abuse_sex_sexual
18 apple - watch - device - user - google 96 18_apple_watch_device_user
19 netanyahu - iran - nuclear - israel - israeli 83 19_netanyahu_iran_nuclear_israel
20 putin - russian - nemtsov - moscow - russia 82 20_putin_russian_nemtsov_moscow
21 weight - fat - diet - size - stone 81 21_weight_fat_diet_size
22 race - armstrong - doping - world - tour 78 22_race_armstrong_doping_world
23 court - fraud - money - bank - mr 76 23_court_fraud_money_bank
24 cheltenham - hurdle - horse - race - jockey 74 24_cheltenham_hurdle_horse_race
25 mcilroy - round - masters - woods - golf 72 25_mcilroy_round_masters_woods
26 prince - charles - royal - duchess - camilla 72 26_prince_charles_royal_duchess
27 fraternity - university - sae - chapter - oklahoma 68 27_fraternity_university_sae_chapter
28 chan - sukumaran - bali - indonesian - mack 65 28_chan_sukumaran_bali_indonesian
29 ebola - sierra - virus - leone - disease 64 29_ebola_sierra_virus_leone
30 school - teacher - student - girl - sexual 58 30_school_teacher_student_girl
31 fire - building - explosion - blaze - firefighter 52 31_fire_building_explosion_blaze
32 nfl - borland - football - 49ers - season 52 32_nfl_borland_football_49ers
33 clarkson - bbc - gear - top - jeremy 50 33_clarkson_bbc_gear_top
34 ski - skier - mountain - avalanche - rock 47 34_ski_skier_mountain_avalanche
35 patient - nhs - ae - cancer - hospital 46 35_patient_nhs_ae_cancer
36 india - rape - documentary - indian - singh 45 36_india_rape_documentary_indian
37 mr - death - court - emery - miss 43 37_mr_death_court_emery
38 show - corden - host - stewart - williams 42 38_show_corden_host_stewart
39 car - vehicle - electric - cars - tesla 40 39_car_vehicle_electric_cars
40 school - child - education - porn - sex 38 40_school_child_education_porn
41 boko - haram - nigeria - nigerian - nigerias 37 41_boko_haram_nigeria_nigerian
42 marijuana - drug - cannabis - colorado - lsd 34 42_marijuana_drug_cannabis_colorado
43 law - indiana - gay - marriage - religious 33 43_law_indiana_gay_marriage
44 ferguson - department - police - justice - report 32 44_ferguson_department_police_justice
45 image - photographer - photography - photograph - photo 31 45_image_photographer_photography_photograph
46 snow - inch - winter - ice - storm 30 46_snow_inch_winter_ice
47 basketball - ncaa - coach - tournament - game 30 47_basketball_ncaa_coach_tournament
48 tsarnaev - boston - dzhokhar - tamerlan - tsarnaevs 30 48_tsarnaev_boston_dzhokhar_tamerlan
49 durst - dursts - berman - orleans - robert 29 49_durst_dursts_berman_orleans
50 jesus - ancient - stone - cave - circle 29 50_jesus_ancient_stone_cave
51 zayn - band - direction - singer - dance 29 51_zayn_band_direction_singer
52 film - movie - vivian - hollywood - script 23 52_film_movie_vivian_hollywood
53 korean - korea - kim - north - lippert 23 53_korean_korea_kim_north
54 weather - rain - temperature - snow - today 23 54_weather_rain_temperature_snow
55 robbery - woodger - store - cash - police 22 55_robbery_woodger_store_cash
56 parade - patricks - st - irish - green 21 56_parade_patricks_st_irish
57 secret - clancy - service - agent - white 20 57_secret_clancy_service_agent
58 hernandez - lloyd - jenkins - hernandezs - lloyds 20 58_hernandez_lloyd_jenkins_hernandezs
59 nazi - anne - nazis - war - camp 20 59_nazi_anne_nazis_war
60 snowden - intelligence - gchq - security - agency 18 60_snowden_intelligence_gchq_security
61 huang - chinese - china - mingxi - chen 17 61_huang_chinese_china_mingxi
62 wedding - married - marlee - platt - woodyard 17 62_wedding_married_marlee_platt
63 drug - cocaine - jailed - cannabis - tobacco 17 63_drug_cocaine_jailed_cannabis
64 cnn - transcript - student - news - roll 17 64_cnn_transcript_student_news
65 pope - francis - vatican - naples - pontiff 17 65_pope_francis_vatican_naples
66 richard - iii - leicester - king - iiis 17 66_richard_iii_leicester_king
67 chinese - tourist - temple - thailand - buddhist 16 67_chinese_tourist_temple_thailand
68 china - chinese - internet - chai - stopera 16 68_china_chinese_internet_chai
69 execution - lethal - gissendaner - injection - drug 16 69_execution_lethal_gissendaner_injection
70 woman - marriage - men - attractive - chalmers 15 70_woman_marriage_men_attractive
71 vanuatu - cyclone - vila - port - pam 15 71_vanuatu_cyclone_vila_port
72 poldark - turner - demelza - aidan - drama 15 72_poldark_turner_demelza_aidan
73 point - rebound - scored - points - harden 14 73_point_rebound_scored_points
74 rail - calais - parking - migrant - dickens 13 74_rail_calais_parking_migrant
75 johnson - student - virginia - charlottesville - uva 13 75_johnson_student_virginia_charlottesville
76 cuba - havana - cuban - rousseff - us 13 76_cuba_havana_cuban_rousseff
77 paris - attack - synagogue - hebdo - charlie 13 77_paris_attack_synagogue_hebdo
78 duckenfield - mr - gate - hillsborough - disaster 12 78_duckenfield_mr_gate_hillsborough
79 gordon - bobbi - kristina - phil - dr 12 79_gordon_bobbi_kristina_phil
80 knox - sollecito - kercher - raffaele - amanda 12 80_knox_sollecito_kercher_raffaele
81 coin - medal - war - auction - cross 12 81_coin_medal_war_auction
82 starbucks - schultz - race - racial - campaign 12 82_starbucks_schultz_race_racial
83 cosby - cosbys - thompson - bill - welles 11 83_cosby_cosbys_thompson_bill
84 jeffs - flds - rivette - compound - speer 10 84_jeffs_flds_rivette_compound
85 selma - alabama - march - bridge - civil 8 85_selma_alabama_march_bridge
86 jobs - naomi - fortune - redballoon - bn 8 86_jobs_naomi_fortune_redballoon
87 brain - object - retina - neuron - word 8 87_brain_object_retina_neuron
88 netflix - tv - content - streaming - screen 8 88_netflix_tv_content_streaming
89 social - user - tweet - twitter - tool 7 89_social_user_tweet_twitter
90 cunard - bird - darshan - ship - liner 6 90_cunard_bird_darshan_ship

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.