noahho commited on
Commit
f2499f0
·
1 Parent(s): 758fe2b
Files changed (1) hide show
  1. README.md +37 -183
README.md CHANGED
@@ -1,200 +1,54 @@
1
- ---
2
- extra_gated_prompt: |-
3
- By accessing TabPFN, you agree to:
4
- 1. Not use the model in ways that could harm individuals or communities
5
- 2. Comply with all applicable laws and regulations
6
- 3. Properly cite the model and its creators in any resulting publications
7
- 4. Report any discovered vulnerabilities or safety concerns to Prior Labs
8
- extra_gated_fields:
9
- Organization:
10
- type: text
11
- required: true
12
- description: Company or institution you represent
13
- Role:
14
- type: text
15
- required: true
16
- description: Your role in the organization
17
- Country:
18
- type: country
19
- required: true
20
- description: Country where you or your organization is based
21
- Intended Use:
22
- type: select
23
- required: true
24
- options:
25
- - Academic Research
26
- - Education/Teaching
27
- - Commercial Evaluation
28
- - Non-profit Use
29
- - Personal Learning
30
- - label: Other
31
- value: other
32
- description: Primary intended use of TabPFN
33
- Industry:
34
- type: select
35
- required: true
36
- options:
37
- - Healthcare/Life Sciences
38
- - Financial Services
39
- - Technology
40
- - Education
41
- - Manufacturing
42
- - Research Institution
43
- - label: Other
44
- value: other
45
- description: Your industry sector
46
- Dataset Size:
47
- type: select
48
- required: true
49
- options:
50
- - <1000 rows
51
- - 1000-10000 rows
52
- - 10000-100000 rows
53
- - '>100000 rows'
54
- description: Typical size of datasets you plan to use
55
- License Agreement:
56
- type: checkbox
57
- required: true
58
- label: >-
59
- I agree to the terms of the non-commercial license for research and
60
- evaluation
61
- Contact Permission:
62
- type: checkbox
63
- required: false
64
- label: Prior Labs may contact me about my use case and provide support (optional)
65
- pipeline_tag: tabular-classification
66
- ---
67
-
68
- # Model Card for TabPFN-v2
69
 
70
  TabPFN is a transformer-based foundation model for tabular data that leverages prior-data based learning to achieve strong performance on small tabular datasets without requiring task-specific training.
71
 
72
- ## Model Details
73
-
74
- ### Model Description
75
-
76
- TabPFN is a novel approach to tabular data modeling that uses transformer architectures combined with prior knowledge injection to create a foundation model specifically designed for tabular data tasks.
77
 
 
78
  - **Developed by:** Prior Labs
79
  - **Model type:** Transformer-based foundation model for tabular data
80
- - **Language(s):** Python
81
- - **License:** Dual licensing - Open source for research/non-commercial use
82
- - **Finetuned from model:** Custom architecture, trained from scratch
83
-
84
- ### Model Sources
85
-
86
- - **Repository:** https://github.com/priorlabs/tabpfn
87
- - **Paper:** [More Information Needed]
88
- - **Demo:** Available via API access
89
-
90
- ## Uses
91
-
92
- ### Direct Use
93
-
94
- TabPFN can be directly used for:
95
- - Classification tasks on small to medium-sized tabular datasets
96
- - Automated machine learning workflows
97
- - Quick prototyping and baseline model creation
98
- - Transfer learning applications for tabular data
99
 
100
- ### Downstream Use
101
-
102
- The model can be used as:
103
- - A feature extractor for downstream tasks
104
- - A foundation for transfer learning on domain-specific tabular data
105
- - A component in automated ML pipelines
106
- - A baseline model for benchmarking
107
-
108
- ### Out-of-Scope Use
109
-
110
- - The model is not designed for:
111
- - Very large datasets (currently optimized for smaller datasets)
112
- - Non-tabular data formats
113
- - Time series forecasting
114
- - Direct regression tasks
115
-
116
- ## Bias, Risks, and Limitations
117
-
118
- - Performance may vary based on dataset size and characteristics
119
- - Model behavior heavily depends on the quality and representativeness of training data
120
- - May not perform optimally on highly imbalanced datasets
121
- - Resource intensive for very large datasets
122
-
123
- ### Recommendations
124
-
125
- - Use on datasets with clear structure and well-defined features
126
- - Validate model outputs especially for sensitive applications
127
- - Consider dataset size limitations when applying the model
128
- - Monitor performance across different subgroups in the data
129
-
130
- ## How to Get Started with the Model
131
 
 
132
  ```python
133
  from tabpfn import TabPFNClassifier
134
 
135
  # Initialize model
136
  classifier = TabPFNClassifier()
137
-
138
- # Fit and predict
139
  classifier.fit(X_train, y_train)
140
  predictions = classifier.predict(X_test)
141
  ```
142
 
143
- ## Training Details
144
-
145
- ### Training Data
146
-
147
- [More Information Needed]
148
-
149
- ### Training Procedure
150
-
151
- #### Training Hyperparameters
152
-
153
- - **Training regime:** Mixed precision training
154
-
155
- ## Evaluation
156
-
157
- ### Testing Data, Factors & Metrics
158
-
159
- #### Metrics
160
-
161
- - Classification accuracy
162
- - F1 score
163
- - ROC-AUC
164
- - Precision-Recall curves
165
-
166
- ### Results
167
-
168
- [More Information Needed]
169
-
170
- ## Environmental Impact
171
-
172
- - **Hardware Type:** [More Information Needed]
173
- - **Hours used:** [More Information Needed]
174
- - **Cloud Provider:** [More Information Needed]
175
- - **Compute Region:** [More Information Needed]
176
- - **Carbon Emitted:** [More Information Needed]
177
-
178
- ## Technical Specifications
179
-
180
- ### Model Architecture and Objective
181
-
182
- TabPFN uses a transformer-based architecture specifically designed for tabular data processing, with modifications to handle varying input sizes and feature types.
183
-
184
- ### Compute Infrastructure
185
-
186
- #### Hardware
187
-
188
- Recommended minimum specifications:
189
- - CPU: Modern multi-core processor
190
- - RAM: 16GB+
191
- - GPU: Optional, CPU inference supported
192
-
193
- #### Software
194
-
195
- - Python 3.7+
196
- - Key dependencies: PyTorch, NumPy, Pandas
197
-
198
- ## Model Card Contact
199
-
200
- For more information, contact Prior Labs.
 
1
+ # TabPFN v2: A Tabular Foundation Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  TabPFN is a transformer-based foundation model for tabular data that leverages prior-data based learning to achieve strong performance on small tabular datasets without requiring task-specific training.
4
 
5
+ ## Installation
6
+ ```bash
7
+ pip install tabpfn
8
+ ```
 
9
 
10
+ ## Model Details
11
  - **Developed by:** Prior Labs
12
  - **Model type:** Transformer-based foundation model for tabular data
13
+ - **License:** TBD
14
+ - **Paper:** Published in Nature (January 2024)
15
+ - **Repository:** [GitHub - priorlabs/tabpfn](https://github.com/priorlabs/tabpfn)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ ### Citation
18
+ TBD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ ## Quick Start
21
  ```python
22
  from tabpfn import TabPFNClassifier
23
 
24
  # Initialize model
25
  classifier = TabPFNClassifier()
 
 
26
  classifier.fit(X_train, y_train)
27
  predictions = classifier.predict(X_test)
28
  ```
29
 
30
+ ## Technical Requirements
31
+ - Python ≥ 3.9
32
+ - PyTorch ≥ 2.1
33
+ - scikit-learn ≥ 1.0
34
+ - Hardware: 16GB+ RAM, CPU (GPU optional)
35
+
36
+ ## Limitations
37
+ - Not designed for very large datasets
38
+ - Not suitable for non-tabular data formats
39
+
40
+ ## Resources
41
+ - **Documentation:** https://priorlabs.ai/docs
42
+ - **Source:** https://github.com/priorlabs/tabpfn
43
+ - **Paper:** https://doi.org/10.1038/s41586-024-08328-6
44
+
45
+ ### Team
46
+ - Noah Hollmann
47
+ - Samuel Müller
48
+ - Lennart Purucker
49
+ - Arjun Krishnakumar
50
+ - Max Körfer
51
+ - Shi Bin Hoo
52
+ - Robin Tibor Schirrmeister
53
+ - Frank Hutter
54
+ - Eddie Bergman