transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("u571/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 tensorflow - pytorch - tokenizers - tokenizer - bert 10 -1_tensorflow_pytorch_tokenizers_tokenizer
0 tokenizer - tokenizers - tokenization - tokenize - token 2107 0_tokenizer_tokenizers_tokenization_tokenize
1 cuda - memory - gpu - gpus - tensorflow 1271 1_cuda_memory_gpu_gpus
2 tf - trainer - tf2 - tpu - trainertrain 901 2_tf_trainer_tf2_tpu
3 summarization - summaries - summary - sentences - sentencepiece 543 3_summarization_summaries_summary_sentences
4 modelcard - modelcards - card - model - cards 483 4_modelcard_modelcards_card_model
5 gpt2 - gpt2tokenizer - gpt2tokenizerfast - gpt2xl - gpt 431 5_gpt2_gpt2tokenizer_gpt2tokenizerfast_gpt2xl
6 xlnet - xlnetlmheadmodel - xlm - xlmr - xla 423 6_xlnet_xlnetlmheadmodel_xlm_xlmr
7 typos - typo - fix - fixed - correction 334 7_typos_typo_fix_fixed
8 s2s - exampless2s - seq2seqtrainer - seq2seq - runseq2seq 324 8_s2s_exampless2s_seq2seqtrainer_seq2seq
9 testing - tests - test - slow - ci 316 9_testing_tests_test_slow
10 readmemd - readmetxt - readme - modelcard - file 296 10_readmemd_readmetxt_readme_modelcard
11 transformerscli - transformers - transformer - transformerxl - importerror 262 11_transformerscli_transformers_transformer_transformerxl
12 ner - pipeline - pipelines - nerpipeline - fillmaskpipeline 223 12_ner_pipeline_pipelines_nerpipeline
13 rag - ragtokenforgeneration - ragmodel - ragsequenceforgeneration - tokenizer 166 13_rag_ragtokenforgeneration_ragmodel_ragsequenceforgeneration
14 trainertrain - checkpoint - checkpoints - trainer - training 146 14_trainertrain_checkpoint_checkpoints_trainer
15 datacollatorforlanguagemodeling - datacollatorforpermutationlanguagemodeling - datacollatorforlanguagemodelling - labelsmoothingfactor - maskedlmlabels 128 15_datacollatorforlanguagemodeling_datacollatorforpermutationlanguagemodeling_datacollatorforlanguagemodelling_labelsmoothingfactor
16 onnx - onnxonnxruntime - onnxexport - 04onnxexport - 04onnxexportipynb 99 16_onnx_onnxonnxruntime_onnxexport_04onnxexport
17 longformer - longformers - longform - longformerforqa - longformerlayer 84 17_longformer_longformers_longform_longformerforqa
18 benchmark - benchmarks - results - datasets - v100a100 78 18_benchmark_benchmarks_results_datasets
19 generationbeamsearchpy - generatebeamsearch - beamsearch - nonbeamsearch - beam 75 19_generationbeamsearchpy_generatebeamsearch_beamsearch_nonbeamsearch
20 wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 71 20_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc
21 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 52 21_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
22 wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue 47 22_wandbproject_wandb_wandbcallback_wandbdisabled
23 cachedir - cache - cachedpath - caching - cached 37 23_cachedir_cache_cachedpath_caching
24 layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf 33 24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
25 dict - dictstr - returndict - parse - arguments 18 25_dict_dictstr_returndict_parse
26 pplm - pr - deprecated - variable - ppl 16 26_pplm_pr_deprecated_variable
27 colab - cola - crashes - crash - tcmalloc 14 27_colab_cola_crashes_crash
28 ctrl - ctrlsum - shortcuts - model - navigate 12 28_ctrl_ctrlsum_shortcuts_model

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.32.0
  • Numba: 0.56.4
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.