cnn_dailymail_22457_3000_1500_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_22457_3000_1500_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 49
- Number of training documents: 3000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - one - year - people - police | 10 | -1_said_one_year_people |
0 | league - player - club - game - cup | 1050 | 0_league_player_club_game |
1 | said - syria - government - iraq - islamic | 317 | 1_said_syria_government_iraq |
2 | obama - president - house - state - republican | 140 | 2_obama_president_house_state |
3 | cancer - hospital - baby - treatment - child | 122 | 3_cancer_hospital_baby_treatment |
4 | google - apple - tablet - car - device | 84 | 4_google_apple_tablet_car |
5 | fashion - dress - hair - look - woman | 78 | 5_fashion_dress_hair_look |
6 | police - officer - shooting - said - shot | 66 | 6_police_officer_shooting_said |
7 | film - movie - show - actor - comedy | 65 | 7_film_movie_show_actor |
8 | murder - death - said - home - police | 55 | 8_murder_death_said_home |
9 | mr - labour - minister - mp - blair | 52 | 9_mr_labour_minister_mp |
10 | storm - water - weather - ice - rain | 51 | 10_storm_water_weather_ice |
11 | shark - bear - turtle - crocodile - bird | 50 | 11_shark_bear_turtle_crocodile |
12 | flight - plane - passenger - airport - pilot | 49 | 12_flight_plane_passenger_airport |
13 | house - property - home - per - room | 49 | 13_house_property_home_per |
14 | drug - police - court - stealing - robbery | 40 | 14_drug_police_court_stealing |
15 | police - murder - mr - court - clavell | 36 | 15_police_murder_mr_court |
16 | games - gold - olympic - race - sport | 34 | 16_games_gold_olympic_race |
17 | student - school - teacher - said - cardosa | 34 | 17_student_school_teacher_said |
18 | country - minister - energy - cent - greece | 32 | 18_country_minister_energy_cent |
19 | golf - mcilroy - course - round - ryder | 31 | 19_golf_mcilroy_course_round |
20 | police - harris - abuse - allegation - officer | 30 | 20_police_harris_abuse_allegation |
21 | ebola - virus - africa - health - liberia | 29 | 21_ebola_virus_africa_health |
22 | chinese - china - cable - bo - beijing | 28 | 22_chinese_china_cable_bo |
23 | federer - tennis - murray - wimbledon - match | 28 | 23_federer_tennis_murray_wimbledon |
24 | dog - animal - dogs - owner - simmons | 26 | 24_dog_animal_dogs_owner |
25 | cent - per - woman - men - pickens | 23 | 25_cent_per_woman_men |
26 | ship - boat - rescue - water - sea | 23 | 26_ship_boat_rescue_water |
27 | hamilton - race - rosberg - mercedes - formula | 22 | 27_hamilton_race_rosberg_mercedes |
28 | galaxy - planet - universe - earth - telescope | 22 | 28_galaxy_planet_universe_earth |
29 | russian - russia - putin - ukraine - moscow | 22 | 29_russian_russia_putin_ukraine |
30 | pakistan - pakistani - karachi - taliban - anwar | 22 | 30_pakistan_pakistani_karachi_taliban |
31 | korea - north - korean - south - kim | 21 | 31_korea_north_korean_south |
32 | car - driver - train - accident - cope | 21 | 32_car_driver_train_accident |
33 | food - fruit - taste - cake - cream | 20 | 33_food_fruit_taste_cake |
34 | painting - art - auction - artist - gallery | 20 | 34_painting_art_auction_artist |
35 | base - drone - soldier - afghan - us | 19 | 35_base_drone_soldier_afghan |
36 | weight - fat - eating - healthy - size | 18 | 36_weight_fat_eating_healthy |
37 | mafia - wine - money - fraud - court | 18 | 37_mafia_wine_money_fraud |
38 | aguilar - bravo - brewer - rambold - court | 18 | 38_aguilar_bravo_brewer_rambold |
39 | missing - search - found - family - disappeared | 17 | 39_missing_search_found_family |
40 | juarez - quezada - mexico - mexican - cartel | 15 | 40_juarez_quezada_mexico_mexican |
41 | knicks - lin - chicago - blackhawks - game | 15 | 41_knicks_lin_chicago_blackhawks |
42 | duchess - prince - kate - royal - william | 15 | 42_duchess_prince_kate_royal |
43 | price - supermarket - asda - shop - food | 14 | 43_price_supermarket_asda_shop |
44 | school - child - pupil - teacher - xxx | 14 | 44_school_child_pupil_teacher |
45 | nhs - patient - ae - hospital - staff | 13 | 45_nhs_patient_ae_hospital |
46 | zsa - francesca - rhodes - vongtau - gabor | 12 | 46_zsa_francesca_rhodes_vongtau |
47 | medal - war - bomb - graf - vc | 10 | 47_medal_war_bomb_graf |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.22.4
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.56.4
- Plotly: 5.13.1
- Python: 3.10.6
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.