cnn_dailymail_108_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 51
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - people - year - would 10 -1_said_one_people_year
0 league - player - cup - club - game 954 0_league_player_cup_club
1 police - said - court - told - murder 308 1_police_said_court_told
2 dog - animal - cat - elephant - zoo 290 2_dog_animal_cat_elephant
3 mr - minister - labour - cameron - prime 113 3_mr_minister_labour_cameron
4 obama - clinton - president - republican - campaign 104 4_obama_clinton_president_republican
5 school - teacher - student - nfl - said 84 5_school_teacher_student_nfl
6 food - milk - drink - wine - bottle 72 6_food_milk_drink_wine
7 flight - plane - passenger - pilot - aircraft 49 7_flight_plane_passenger_pilot
8 user - facebook - google - ipad - device 48 8_user_facebook_google_ipad
9 olympic - gold - race - games - medal 46 9_olympic_gold_race_games
10 doll - dress - fashion - look - style 44 10_doll_dress_fashion_look
11 afghan - afghanistan - taliban - military - pakistan 43 11_afghan_afghanistan_taliban_military
12 transplant - patient - heart - hospital - cancer 42 12_transplant_patient_heart_hospital
13 iran - syrian - said - president - egypt 42 13_iran_syrian_said_president
14 show - film - million - like - movie 39 14_show_film_million_like
15 property - house - price - home - apartment 38 15_property_house_price_home
16 earth - asteroid - moon - volcano - planet 34 16_earth_asteroid_moon_volcano
17 federer - djokovic - match - murray - seed 33 17_federer_djokovic_match_murray
18 jackson - jacksons - album - song - music 31 18_jackson_jacksons_album_song
19 ship - boat - coast - said - vessel 30 19_ship_boat_coast_said
20 russia - russian - putin - ukraine - moscow 30 20_russia_russian_putin_ukraine
21 snow - weather - temperature - climate - water 29 21_snow_weather_temperature_climate
22 police - station - mr - man - gang 28 22_police_station_mr_man
23 ebola - disease - vaccine - virus - health 28 23_ebola_disease_vaccine_virus
24 weight - fat - diet - burn - exercise 28 24_weight_fat_diet_burn
25 syria - isis - islamic - muslims - alqudsi 23 25_syria_isis_islamic_muslims
26 boko - haram - nigeria - nigerian - turkana 23 26_boko_haram_nigeria_nigerian
27 korea - north - korean - kim - pyongyang 22 27_korea_north_korean_kim
28 driver - driving - road - car - speed 22 28_driver_driving_road_car
29 school - child - education - internet - english 21 29_school_child_education_internet
30 mcilroy - woods - pga - tournament - round 20 30_mcilroy_woods_pga_tournament
31 race - car - driver - team - f1 19 31_race_car_driver_team
32 princess - prince - diana - royal - palace 18 32_princess_prince_diana_royal
33 climbing - climb - mountain - everest - ang 18 33_climbing_climb_mountain_everest
34 wedding - bieber - couple - together - love 18 34_wedding_bieber_couple_together
35 nhs - care - patient - hospital - health 17 35_nhs_care_patient_hospital
36 iraq - iraqi - isis - baghdad - kurdish 16 36_iraq_iraqi_isis_baghdad
37 cartel - drug - mexican - mexico - crack 15 37_cartel_drug_mexican_mexico
38 painting - picasso - art - artist - gogh 15 38_painting_picasso_art_artist
39 castro - zelaya - fidel - micheletti - president 14 39_castro_zelaya_fidel_micheletti
40 french - ford - traveller - southampton - taxi 14 40_french_ford_traveller_southampton
41 fire - florissant - bell - firefighter - burned 14 41_fire_florissant_bell_firefighter
42 fight - ali - heavyweight - pacquiao - title 13 42_fight_ali_heavyweight_pacquiao
43 fish - sea - jellyfish - manta - swell 13 43_fish_sea_jellyfish_manta
44 pope - francis - vatican - falkland - islands 12 44_pope_francis_vatican_falkland
45 gay - samesex - lgbt - marriage - state 12 45_gay_samesex_lgbt_marriage
46 castle - tower - building - brent - lego 12 46_castle_tower_building_brent
47 chinese - china - xinhua - chinas - communist 12 47_chinese_china_xinhua_chinas
48 delivery - customer - market - vacuum - coin 10 48_delivery_customer_market_vacuum
49 water - rain - storm - flooding - methane 10 49_water_rain_storm_flooding

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.