cnn_dailymail_55555_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_55555_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 61
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - mr 10 -1_said_one_year_people
0 league - game - player - cup - goal 961 0_league_game_player_cup
1 police - death - said - murder - family 313 1_police_death_said_murder
2 obama - republican - senate - president - republicans 182 2_obama_republican_senate_president
3 fashion - hair - look - makeup - brand 91 3_fashion_hair_look_makeup
4 dog - animal - cat - bird - pet 69 4_dog_animal_cat_bird
5 syria - isis - syrian - iraq - fighter 54 5_syria_isis_syrian_iraq
6 mexico - said - cuba - president - cartel 53 6_mexico_said_cuba_president
7 police - court - cash - jailed - said 53 7_police_court_cash_jailed
8 space - nasa - mars - planet - earth 51 8_space_nasa_mars_planet
9 property - house - price - room - london 48 9_property_house_price_room
10 patient - hospital - nhs - doctor - cancer 48 10_patient_hospital_nhs_doctor
11 tax - bank - minister - mr - pay 46 11_tax_bank_minister_mr
12 car - fire - crash - bus - train 45 12_car_fire_crash_bus
13 milk - food - raw - restaurant - chocolate 44 13_milk_food_raw_restaurant
14 gold - olympic - horse - race - medal 36 14_gold_olympic_horse_race
15 album - song - joel - music - show 35 15_album_song_joel_music
16 show - film - movie - award - les 35 16_show_film_movie_award
17 baby - born - hospital - birth - pregnancy 34 17_baby_born_hospital_birth
18 prince - queen - royal - william - duchess 31 18_prince_queen_royal_william
19 chinese - china - bo - beijing - chen 30 19_chinese_china_bo_beijing
20 labour - mr - party - ukip - miliband 30 20_labour_mr_party_ukip
21 school - student - teacher - book - fraternity 29 21_school_student_teacher_book
22 somalia - dala - african - alshabaab - mali 28 22_somalia_dala_african_alshabaab
23 ukraine - russian - russia - putin - moscow 26 23_ukraine_russian_russia_putin
24 woods - golf - golfer - hole - round 26 24_woods_golf_golfer_hole
25 sterling - nba - clippers - donald - said 26 25_sterling_nba_clippers_donald
26 found - scientist - stonehenge - researcher - frog 26 26_found_scientist_stonehenge_researcher
27 apple - iphone - apples - phone - device 24 27_apple_iphone_apples_phone
28 formula - race - schumacher - prix - ecclestone 23 28_formula_race_schumacher_prix
29 ebola - virus - outbreak - health - vaccine 22 29_ebola_virus_outbreak_health
30 church - pope - priest - francis - vatican 21 30_church_pope_priest_francis
31 sharapova - open - wimbledon - tennis - slam 21 31_sharapova_open_wimbledon_tennis
32 pakistani - pakistan - taliban - musharraf - afghanistan 21 32_pakistani_pakistan_taliban_musharraf
33 storm - weather - tornado - water - rain 21 33_storm_weather_tornado_water
34 north - korea - korean - kim - south 21 34_north_korea_korean_kim
35 war - medal - soldier - army - afghanistan 21 35_war_medal_soldier_army
36 marijuana - cigarette - alcohol - drug - smoking 20 36_marijuana_cigarette_alcohol_drug
37 internet - google - user - facebook - online 19 37_internet_google_user_facebook
38 plane - flight - crash - passenger - airport 19 38_plane_flight_crash_passenger
39 weight - diet - fat - stone - food 18 39_weight_diet_fat_stone
40 israeli - israel - gaza - hamas - palestinian 17 40_israeli_israel_gaza_hamas
41 beach - art - resort - festival - painting 17 41_beach_art_resort_festival
42 petraeus - cia - broadwell - justice - fbi 17 42_petraeus_cia_broadwell_justice
43 garner - wilson - officer - police - black 16 43_garner_wilson_officer_police
44 ship - cruise - ships - crew - pirate 16 44_ship_cruise_ships_crew
45 nfl - patriots - rice - seahawks - chris 15 45_nfl_patriots_rice_seahawks
46 dolphin - sea - creature - cuttlefish - fisherman 14 46_dolphin_sea_creature_cuttlefish
47 weather - rain - winter - temperature - warm 14 47_weather_rain_winter_temperature
48 mandela - african - africa - south - mandelas 14 48_mandela_african_africa_south
49 disney - snow - million - wars - movie 14 49_disney_snow_million_wars
50 price - bag - plastic - cent - energy 13 50_price_bag_plastic_cent
51 spartan - cliff - parachute - matthew - obstacle 12 51_spartan_cliff_parachute_matthew
52 zoo - panda - cub - giraffe - park 12 52_zoo_panda_cub_giraffe
53 iran - iranian - irans - ahmadinejad - nuclear 12 53_iran_iranian_irans_ahmadinejad
54 bin - laden - us - qaeda - al 12 54_bin_laden_us_qaeda
55 crocodile - snake - python - bascoules - alligator 12 55_crocodile_snake_python_bascoules
56 woman - ivf - men - dna - fertility 11 56_woman_ivf_men_dna
57 driver - driving - police - meracle - text 11 57_driver_driving_police_meracle
58 mitchell - mr - evans - mp - gate 10 58_mitchell_mr_evans_mp
59 france - police - mosque - salah - donetsk 10 59_france_police_mosque_salah

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.