Anthropic

company

Verified

https://anthropic.com

AnthropicAI

anthropics

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

esind updated a dataset 7 months ago

Anthropic/election_questions

esind updated a dataset 9 months ago

Anthropic/persuasion

nschiefer authored a paper 12 months ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

View all activity

Anthropic's activity

esind

updated a dataset 7 months ago

Anthropic/election_questions

Viewer • Updated Jun 6 • 743 • 276 • 9

esind

updated a dataset 9 months ago

Anthropic/persuasion

Viewer • Updated Apr 9 • 3.94k • 407 • 178

nschiefer

authored a paper 12 months ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 26

dganguli

authored a paper 12 months ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 26

atamkin-anthropic

updated a dataset 12 months ago

Anthropic/discrim-eval

Viewer • Updated Jan 5 • 18.9k • 580 • 44

nschiefer

authored 2 papers about 1 year ago

Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 2

Towards Understanding Sycophancy in Language Models

Paper • 2310.13548 • Published Oct 20, 2023 • 4

nschiefer

authored 2 papers over 1 year ago

Measuring Faithfulness in Chain-of-Thought Reasoning

Paper • 2307.13702 • Published Jul 17, 2023 • 27

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Paper • 2307.11768 • Published Jul 17, 2023 • 12

dganguli

authored a paper over 1 year ago

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Paper • 2306.16388 • Published Jun 28, 2023 • 6

nschiefer

authored a paper over 1 year ago

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Paper • 2306.16388 • Published Jun 28, 2023 • 6

esind

updated a dataset over 1 year ago

Anthropic/llm_global_opinions

Viewer • Updated Jun 29, 2023 • 2.56k • 472 • 41

dganguli

authored a paper over 1 year ago

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Paper • 2306.11932 • Published Jun 20, 2023 • 6

dganguli

updated a dataset over 1 year ago

Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 10.6k • 1.23k

ethanjperez

authored a paper over 1 year ago

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Paper • 2305.04388 • Published May 7, 2023 • 1

nschiefer

updated a dataset about 2 years ago

Anthropic/model-written-evals

Viewer • Updated Dec 21, 2022 • 3.25k • 222 • 48