blbooksgenre_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/blbooksgenre_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 57
  • Number of training documents: 43752
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 poems - novel - poem - prose - book 11 -1_poems_novel_poem_prose
0 poems - poem - poetry - poets - poetical 18624 0_poems_poem_poetry_poets
1 novel - author - poem - heir - tales 4698 1_novel_author_poem_heir
2 ireland - dublin - scotland - irish - edinburgh 3576 2_ireland_dublin_scotland_irish
3 geography - geographical - maps - map - history 3104 3_geography_geographical_maps_map
4 shakespeare - acts - prose - comedy - theatre 1377 4_shakespeare_acts_prose_comedy
5 county - counties - pennsylvania - hampshire - history 1089 5_county_counties_pennsylvania_hampshire
6 france - spain - europe - pyrenees - paris 990 6_france_spain_europe_pyrenees
7 sailing - nautical - maritime - boat - voyages 986 7_sailing_nautical_maritime_boat
8 antiquity - greeks - rome - romans - greece 744 8_antiquity_greeks_rome_romans
9 illustrations - drawings - pencil - drawn - sketches 631 9_illustrations_drawings_pencil_drawn
10 africa - transvaal - cape - zululand - african 610 10_africa_transvaal_cape_zululand
11 egypt - egyptians - cairo - sinai - egyptian 610 11_egypt_egyptians_cairo_sinai
12 england - britain - british - george - english 570 12_england_britain_british_george
13 california - alaska - regions - tour - states 546 13_california_alaska_regions_tour
14 italia - italy - sicily - italian - italians 491 14_italia_italy_sicily_italian
15 crimean - crimea - turkey - turks - russia 481 15_crimean_crimea_turkey_turks
16 mexico - rio - honduras - colombia - panama 433 16_mexico_rio_honduras_colombia
17 wales - maoriland - otago - zealand - auckland 423 17_wales_maoriland_otago_zealand
18 waterloo - poem - battle - napoleon - battles 405 18_waterloo_poem_battle_napoleon
19 mining - mineralogy - minerals - metallurgy - metals 396 19_mining_mineralogy_minerals_metallurgy
20 history - america - states - historical - american 377 20_history_america_states_historical
21 geology - geological - geologists - cambrian - fossils 305 21_geology_geological_geologists_cambrian
22 quebec - scotia - canadas - ontario - province 204 22_quebec_scotia_canadas_ontario
23 rambles - ramble - south - lands - scrambles 194 23_rambles_ramble_south_lands
24 edition - second - series - third - revised 159 24_edition_second_series_third
25 rudge - barnaby - hutton - rivers - osborne 149 25_rudge_barnaby_hutton_rivers
26 memorials - anniversary - memorial - london - address 134 26_memorials_anniversary_memorial_london
27 railway - railways - railroad - railroads - railroadiana 115 27_railway_railways_railroad_railroads
28 forest - foresters - woods - trees - forestalled 112 28_forest_foresters_woods_trees
29 philosophy - humanity - philosophie - moralities - conscience 97 29_philosophy_humanity_philosophie_moralities
30 gazetteer - geography - geographical - dictionary - topographical 96 30_gazetteer_geography_geographical_dictionary
31 goldsmith - goldsmiths - novel - writings - epistle 93 31_goldsmith_goldsmiths_novel_writings
32 regulations - members - committees - rules - committee 89 32_regulations_members_committees_rules
33 odes - poems - poem - ode - hymno 87 33_odes_poems_poem_ode
34 doctor - doctors - physician - patients - physicians 79 34_doctor_doctors_physician_patients
35 geography - schools - longmans - colleges - school 77 35_geography_schools_longmans_colleges
36 juan - juana - sequel - carlos - genista 63 36_juan_juana_sequel_carlos
37 sporting - sports - sport - sportsmans - rugby 56 37_sporting_sports_sport_sportsmans
38 detective - detectives - crime - policeman - city 52 38_detective_detectives_crime_policeman
39 blanc - mont - blanche - montserrat - montacute 47 39_blanc_mont_blanche_montserrat
40 jack - jacks - jackdaw - house - author 46 40_jack_jacks_jackdaw_house
41 dutch - netherlands - holland - dutchman - dutchesse 43 41_dutch_netherlands_holland_dutchman
42 spider - spiders - adventure - web - webs 35 42_spider_spiders_adventure_web
43 madrasiana - madras - malabar - mysore - district 31 43_madrasiana_madras_malabar_mysore
44 doncaster - 1835 - gazette - 1862 - 1868 31 44_doncaster_1835_gazette_1862
45 lays - lay - land - empire - sea 28 45_lays_lay_land_empire
46 cyprus - syria - palestine - island - asia 28 46_cyprus_syria_palestine_island
47 gipsies - gipsy - snakes - encyclopaedia - bunyan 20 47_gipsies_gipsy_snakes_encyclopaedia
48 abydos - bride - turkish - marriage - euphrosyne 18 48_abydos_bride_turkish_marriage
49 derby - castleton - buxton - matlock - nottingham 16 49_derby_castleton_buxton_matlock
50 corsair - tale - carlo - mystery - monte 16 50_corsair_tale_carlo_mystery
51 bushman - bushranger - bushrangers - australian - novel 13 51_bushman_bushranger_bushrangers_australian
52 months - italy - weeks - six - france 12 52_months_italy_weeks_six
53 kitty - kittys - catspaw - catriona - father 12 53_kitty_kittys_catspaw_catriona
54 lighthouses - lighthouse - beacons - lights - lighting 12 54_lighthouses_lighthouse_beacons_lights
55 balfour - kidnapped - balfouriana - memoirs - adventures 11 55_balfour_kidnapped_balfouriana_memoirs

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 57
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train davanstrien/blbooksgenre_topics