bertopic_github_dataset_viewer_issues

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("asoria/bertopic_github_dataset_viewer_issues")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 78
  • Number of training documents: 3066
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 jobs - datasets - cache - fix - pandas 11 -1_jobs_datasets_cache_fix
0 issue - viewer - dataset - for - bigsciencep3 534 0_issue_viewer_dataset_for
1 parquet - files - metadata - parquetanddatasetinfo - configparquetandinfo 144 1_parquet_files_metadata_parquetanddatasetinfo
2 vulnerability - cryptography - dependencies - 4106 - update 132 2_vulnerability_cryptography_dependencies_4106
3 docs - doc - page - add - md 109 3_docs_doc_page_add
4 rows - firstrows - row - truncated - response 90 4_rows_firstrows_row_truncated
5 duckdb - index - splitduckdbindex - fts - try 78 5_duckdb_index_splitduckdbindex_fts
6 hub - hubcache - timeout - datasethubcache - tags 75 6_hub_hubcache_timeout_datasethubcache
7 audio - opus - extension - torchaudio - torch 59 7_audio_opus_extension_torchaudio
8 filter - endpoint - isvalid - column - parameters 54 8_filter_endpoint_isvalid_column
9 datasets - update - upgrade - dependency - to 54 9_datasets_update_upgrade_dependency
10 docker - images - build - image - compose 53 10_docker_images_build_image
11 cache - refresh - entries - entry - warm 51 11_cache_refresh_entries_entry
12 mongo - mongodb - indexes - atlas - index 48 12_mongo_mongodb_indexes_atlas
13 image - images - modality - support - pdf2image 47 13_image_images_modality_support
14 unblock - block - blocked - blocklist - datasets 46 14_unblock_block_blocked_blocklist
15 error - expected - xerrorcode - messages - catch 44 15_error_expected_xerrorcode_messages
16 backfill - cron - job - time - move 44 16_backfill_cron_job_time
17 jobs - waiting - job - finishedat - started 44 17_jobs_waiting_job_finishedat
18 env - config - configs - vars - default 41 18_env_config_configs_vars
19 gitpython - 3137 - 3141 - github - builddepsdev 41 19_gitpython_3137_3141_github
20 assets - s3 - cachedassets - cached - fsspec 40 20_assets_s3_cachedassets_cached
21 splitnamesfromstreaming - split - streaming - rename - names 39 21_splitnamesfromstreaming_split_streaming_rename
22 statistics - stats - descriptive - splitdescriptivestatistics - class 38 22_statistics_stats_descriptive_splitdescriptivestatistics
23 private - gated - datasets - public - gatedauto 35 23_private_gated_datasets_public
24 metrics - healthcheck - port - adminmetrics - admin 33 24_metrics_healthcheck_port_adminmetrics
25 steps - processing - step - triggers - graph 32 25_steps_processing_step_triggers
26 ci - codecov - pr - fork - invalid 31 26_ci_codecov_pr_fork
27 splits - split - list - configs - returned 31 27_splits_split_list_configs
28 openapi - openapijson - spec - publish - spectral 31 28_openapi_openapijson_spec_publish
29 queue - incremental - based - field - jobs 31 29_queue_incremental_based_field
30 error - datasetwithscriptnotsupportederror - exist - no - datasetgenerationerror 31 30_error_datasetwithscriptnotsupportederror_exist_no
31 ram - 5gb - heavy - reduce - overcommitment 31 31_ram_5gb_heavy_reduce
32 workers - number - reduce - increase - heavy 30 32_workers_number_reduce_increase
33 admin - ui - app - difficulty - prefix 30 33_admin_ui_app_difficulty
34 chart - fixchart - helm - alb - featchart 28 34_chart_fixchart_helm_alb
35 aiohttp - 386 - bump - 392 - 391 27 35_aiohttp_386_bump_392
36 e2e - tests - test - ci - testmetrics 27 36_e2e_tests_test_ci
37 huggingfacehub - upgrade - 0151 - version - branch 27 37_huggingfacehub_upgrade_0151_version
38 test - tests - unit - pytestmemray - fixtures 26 38_test_tests_unit_pytestmemray
39 webhook - webhooks - payload - visibility - hub 26 39_webhook_webhooks_payload_visibility
40 migration - migrations - database - scripts - databases 26 40_migration_migrations_database_scripts
41 refactor - dead - code - remove - abstractions 25 41_refactor_dead_code_remove
42 retry - retryable - codes - every - createcommiterror 25 42_retry_retryable_codes_every
43 log - logs - debug - level - crashes 25 43_log_logs_debug_level
44 croissant - jsonld - fields - either - recordset 25 44_croissant_jsonld_fields_either
45 pods - pod - number - scale - reverseproxy 24 45_pods_pod_number_scale
46 scan - urls - spawning - presidio - optinouturls 24 46_scan_urls_spawning_presidio
47 resources - feat - reduce - increase - production 22 47_resources_feat_reduce_increase
48 download - manual - require - enum - extracted 21 48_download_manual_require_enum
49 comment - issues - close - fix - tag 20 49_comment_issues_close_fix
50 cache - entries - clean - hf - blocked 19 50_cache_entries_clean_hf
51 worker - generic - workerjobtypesblocked - treccartools - dependencies 19 51_worker_generic_workerjobtypesblocked_treccartools
52 datasetviewer - rename - datasetsserver - domain - server 18 52_datasetviewer_rename_datasetsserver_domain
53 across - group - pip - directories - bump 18 53_across_group_pip_directories
54 runner - runners - validation - job - parent 18 54_runner_runners_validation_job
55 upgrade - datasets - feat - 221 - 1162dev0 18 55_upgrade_datasets_feat_221
56 jwt - array - authorization - cookies - bypass 18 56_jwt_array_authorization_cookies
57 allow - script - scriptbased - scripts - redpajamadata1t 17 57_allow_script_scriptbased_scripts
58 unique - metrics - metric - cache - cron 16 58_unique_metrics_metric_cache
59 aiohttp - libslibcommon - libslibapi - 386 - 385 16 59_aiohttp_libslibcommon_libslibapi_386
60 pillow - 1001 - 1020 - bump - from 16 60_pillow_1001_1020_bump
61 storage - disk - storageclient - storageadmin - client 15 61_storage_disk_storageclient_storageadmin
62 resources - increase - 108010 - reduce - 2468 15 62_resources_increase_108010_reduce
63 poetry - dependabot - align - version - 20 14 63_poetry_dependabot_align_version
64 upgrade - datasets - 188 - pufanyimimicit - meaning 14 64_upgrade_datasets_188_pufanyimimicit
65 auth - authentication - asynchronous - authcheck - 307 14 65_auth_authentication_asynchronous_authcheck
66 lock - locks - finishing - release - ttl 14 66_lock_locks_finishing_release
67 nginx - proxy - reverse - reverseproxy - 1253 14 67_nginx_proxy_reverse_reverseproxy
68 orjson - 3915 - 390 - bump - from 13 68_orjson_3915_390_bump
69 gradio - 3340 - 4110 - frontadminui - upgrade 13 69_gradio_3340_4110_frontadminui
70 starlette - 0280 - 0362 - bump - 0231 13 70_starlette_0280_0362_bump
71 secrets - fixs3 - correct - secret - name 13 71_secrets_fixs3_correct_secret
72 search - elastic - functionality - times - currently 13 72_search_elastic_functionality_times
73 token - hftoken - app - secret - hf 12 73_token_hftoken_app_secret
74 efs - nfs - mount - parquetmetadata - storage 12 74_efs_nfs_mount_parquetmetadata
75 ruff - vscode - 045 - settings - ruffcache 12 75_ruff_vscode_045_settings
76 kubernetes - kube - infrastructure - pdb - disruption 12 76_kubernetes_kube_infrastructure_pdb

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.38.post1
  • UMAP: 0.5.6
  • Pandas: 2.1.4
  • Scikit-Learn: 1.5.2
  • Sentence-transformers: 3.1.1
  • Transformers: 4.44.2
  • Numba: 0.60.0
  • Plotly: 5.24.1
  • Python: 3.10.12
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.