Questions for large LLM labs they'll probably not want to answer

In GPT4 Summary Evaluation I worked through finding a high quality prompt for summarizing a YouTube video using GPT-4. Keeping with the theme of subjective LLM evaluations and going deep with one example evaluated subjectively instead of broad with quantifiability, this post plays around with Claude, specifically Claude-V1.

The video being summarized is a CS25 lecture by Jan Leike on AI Alignment. The content alternates between general and technical. The transcript quality is very poor (using the auto-generated transcript).

Most memorable insights (BUT CAVEAT)

I don’t think one should generalize from one example and limited iteration, but I think the notes have interesting observations. I would just treat these as threads to follow up an verify, not mental models to form. Given limited time, I am explicitly trading off rigor for depth with one example, and this is written like a notebook log, not a refined blog.

The 100k model becomes too high level for my liking
- But it does capture information that is lost in a split point with GPT-4.
- I’d like for them to add some documentation on how to make the most out of the 100k model, and how to get it to be more verbose.
- Is it targeted towards more specific question-answering than summarization?
- 100k is faster than I expected (but literally tested for one shot).
GPT-4 seems intrinsically tuned towards shorter outputs than Claude.
Claude-v1 is “closer” to GPT-4 than 3.5 for sure. Hard to say this objectively with this kind of analysis though.
Trying to reuse one provider’s prompt with another’s can lead you really astray.
Claude really likes calling out “Constitutional AI” as a term.
GPT-4 seems more inclined to extract factual information out of the box but Claude can be prompted towards this.

Open Table of contents

Transcript and tokenization
- Splitting (only needed for OpenAI)
  - Observations
GPT-4 summary as a reminder
- - Result
- Notes
Claude V1 (4k tokens only for dev)
Summarization V2: max_tokens_to_sample=8000
Summarization V3: a Claude specific prompt (but still 4k tokens)
- - Result (GPT4 for reference)
  - Observations
GPT 3.5 Turbo comparison
- - Result
  - Observations
Summarization V4: Trying for accuracy, last attempt with smaller sample
Claude 100k — all at once summary
- - Observations
  - Result
- Does it help at the boundaries
  - Split 1
  - Split 2
Nested Summary with GPT-4 (for comparison)
- - Result
  - Observations

Transcript and tokenization

print(sample_transcript_raw_text)
it's my pleasure to welcome Jan from
openai I'm he leads the alignment team
there and he was previously a researcher
at deepmind as well what's a PhD in
reinforcement learning theory has been
thinking about the alignment problem for
over 10 years and today he'll be giving
a very interesting topic so hope you
guys enjoy
yeah thanks a lot for the intro and
thanks a lot for having me
um
I'm very excited to talk about this
stuff

numtokens_oai(sample_transcript_raw_text), numtokens_anthropic(sample_transcript_raw_text)
(12146, 12176)

anthropic_tokenzier.encode(sample_transcript_raw_text[:100]).tokens
['it', "'s", 'Ġmy', 'Ġpleasure', 'Ġto', 'Ġwelcome', 'ĠJan', 'Ġfrom', 'Ċ', 'open', 'ai', 'ĠI', "'m", 'Ġhe', 'Ġleads', 'Ġthe', 'Ġalignment', 'Ġteam', 'Ċ', 'there', 'Ġand', 'Ġhe', 'Ġwas', 'Ġprev', 'iou']

oai_tokenizer = oai.tiktoken.encoding_for_model("gpt-4")
[oai_tokenizer.decode([t]) for t in oai_tokenizer.encode(sample_transcript_raw_text[:100])]
['it', "'s", ' my', ' pleasure', ' to', ' welcome', ' Jan', ' from', '\n', 'open', 'ai', ' I', "'m", ' he', ' leads', ' the', ' alignment', ' team', '\n', 'there', ' and', ' he', ' was', ' prev', 'iou']

Splitting (only needed for OpenAI)

# claude 100k won't need splitting so just use OAI as the reference. to keep the transcript consistent

split_up_transcript = split_transcript(
	sample_transcript_raw_text,
	oai_yt.TOKEN_LIMIT_FOR_SUMMARY,
	numtokens_oai)

print(len(split_up_transcript), [numtokens_oai(t) for t in split_up_transcript])
print(len(split_up_transcript), [numtokens_anthropic(t) for t in split_up_transcript])

4 [3997, 4000, 3973, 173]
4 [4009, 4012, 3978, 174]

Observations

They seem to use very similar tokenization schemes

GPT-4 summary as a reminder

We already converged upon a preferred summarizer in GPT4 Summary Evaluation so we’ll continue working with that. Anthropic has document-completion style API instead of chat style one so we may have to keep iterating.

def clarify_verbose_caveated(t: str) -> str:
	return ask("repeat the following from a youtube audio transcript as a bullet point list. do not remove information. be very verbose. the transcript could be the full transcript or just a section of the video." + t)

Result

- Speaker: Jan from OpenAI, leads alignment team, previously at DeepMind, PhD in reinforcement learning theory
- Topic: Alignment problem
- Observations:
  - Team EI joining the game, with stronger and stronger players over time
  - Team AI has many advantages over Team Humans (speed, cost, etc.)
  - Team Humans has an advantage in choosing which players join and when
- Main objectives for humans:
  1. Recruit players from Pmei to play on Team Humans (alignment)
  2. Write the rules of the game so that Team Human doesn't lose
- Alignment: build AI systems that follow human intent
  - Two main categories of intent: faithful, other intents (e.g., don't make up stuff, ask follow-up questions)
- Technique for alignment: Iterate reward modeling
  1. Train reward model from human preferences
  2. Optimize for the reward model with RL
- How well does it work?
  - InstructGPT: even the smallest version is preferred over the largest GPT-3 model
  - Training costs: fine-tuning on the largest GPT-3 version is about 2% of pre-training cost
- ChatGPT vs InstructGPT
  - ChatGPT: dialogue as universal interface, better at refusing harmful tasks, but still has limitations (hallucination, sensitivity to prompting)
  - InstructGPT: Upgrade to make models more aligned and useful to humans, but still room for improvement
- Principle: evaluation is easier than generation
  - Easier to spot good behavior than doing the task yourself
  - Useful for providing comparative feedback to the AI system to improve its performance

Notes

Note that the “best” summarization prompt we discovered in GPT4 Summary Evaluation was actually a prompt asking the model to repeat the content and not lose any information.

Claude V1 (4k tokens only for dev)

Notes

Note that in the OpenAI prompt, “repeat” wasn’t actually repeating the text and was instead summarizing heavily. This may fail us now.
- But let’s proceed with the OAI prompt modified to Anthropic’s spec and edit from there
- This also simulates a potential production setting where one’s first instinct might be to call a second model provider with the same prompt in case there’s an outage.

def summarize_anthropic(input: str, max_tokens_to_sample=5000) -> str:
	return client.completion(
		prompt=f"""{anthropic.HUMAN_PROMPT}
Repeat the following from a youtube audio transcript as a bullet point list. do not remove information. be very verbose. the transcript could be the full transcript or just a section of the video
---
{input}
---
	{anthropic.AI_PROMPT}""",
	stop_sequences=[anthropic.HUMAN_PROMPT],
	model="claude-v1",
	max_tokens_to_sample=max_tokens_to_sample,
	)

Result (GPT4 for reference)

result = summarize_anthropic(split_up_transcript[0])
result['completion']

• It's my pleasure to welcome Jan from openai. He leads the alignment team there and he was previously a researcher at DeepMind as well. Has a PhD in reinforcement learning theory and has been thinking about the alignment problem for over 10 years.
•Today he'll be giving a very interesting topic so hope you guys enjoy. Yeah thanks a lot for the intro and thanks a lot for having me. I'm very excited to talk about this stuff. I'm also super happy to keep it interactive if you have questions and then you find please interrupt me.
•I wanted to start out with a few very basic observations on kind of what I think is going on.So the first one is team EI is joining the game. So IMEI has a lot of different players they own All Join at the same time but rather they Join one by one.Not all of the their players are like very a lot in how good they are and right now a lot of the players that have joined so far.They're really that's mine and usually you can do it on your very narrow setup.
•But one thing that we've kind of observed is that over time you know we're seeing stronger and stronger players drawing and this is kind of where we where we are now.And then in general we expect that hemiah has um incredibly strong pairs. Those will be players that are able to think so much better than humans so much faster and so much more cheaply. And these haven't joined yet.
•So the have like Anchor Point that we have if you think for example about chat Epp chat TBT can already be any human at like knowing more facts or speaking more languages and it can write about 50 words per second and can do so about 100 times cheaper than humans could at minimum wage. And so you know there's tomorrow also has some really important limitations and there's a lot of things that you can't do yet.
•But it is kind of an indicator of you know some of the players that maybe will be done in the future. So it seems like in the long run kimai will have all the advantages over team humans. But and there's one there's an important caveat which is there's one important advantage that team humans has which is team human gets to pick which players from Kimi joined and when. And so this is kind of like an advantage that should we should really be leaning into when we're thinking about what to do and when we're thinking about you know this game that we're playing with cmei and then we'll be playing with pmei.
•So I think two of the main objectives of what we as team human should do it's like first we should try to recruit players from pmei to play on 10 teams and so this is kind of what I would broadly call on it. And this is kind of like the problem that I'm working on.And then there's also other objectives so another objective that I think is going to be really important is you want to write the rules of the game so that team human doesn't lose. And right now team human kind of has the ball and we get to write the rules so we should write rules that you know make sense and uh still playing this game in the future.

numtokens_anthropic(result['completion'])
676

result['stop_reason']
'stop_sequence'

Observations

Not what we want, but understandable why.
Doesn’t compress the result as much as the OAI model does but does compress it from ~4k tokens to 676 tokens
Keeps spelling errors e.g. “Chat EPP”
Is actually doing a good job of repeating content (following the actual instructions instead of implicitly summarizing and spell correcting it like GPT-4 did) but abruptly cuts off. Initially I thought this was because it ran out of tokens (token limit for claude-v1 isn’t clearly documented)
Also, a little suspicious of max_tokens_to_sample. I set it to 5000 because I thought it was only the input, but maybe it’s input + output

Summarization V2: `max_tokens_to_sample=8000`

Changes

def summarize_anthropic_v2(input: str, max_tokens_to_sample=8000) -> str:
	return client.completion(
		prompt=f"""{anthropic.HUMAN_PROMPT}
Repeat the following from a youtube audio transcript as a bullet point list. do not remove information. be very verbose. the transcript could be the full transcript or just a section of the video
---
{input}
---
{anthropic.AI_PROMPT}""",
		stop_sequences=[anthropic.HUMAN_PROMPT],
		model="claude-v1",
		max_tokens_to_sample=max_tokens_to_sample,
		)

System Observations

numtokens = 633 this time (so tokens_to_sample wasn’t the cause)
Adds newlines (making parsing tricky harder)
Now it reads more like a summary instead of a verbatim repetition, even though the actual prompt is literally the same.

Result (GPT4 for reference)

Here is the response as a verbose bullet point list:

• The speaker begins by welcoming Jan from OpenAI who leads the alignment team there and previously worked at DeepMind. Jan has a PhD in reinforcement learning theory and has been thinking about the alignment problem for over 10 years.

•Jan expresses excitement to talk about this topic and is happy to keep the discussion interactive with questions.

•Jan wanted to start with some basic observations about what is going on. The first observation is that AI is joining the game gradually with players joining one by one. The players have varying skill levels but over time the players get stronger. AI will eventually have incredibly strong players that can think much better than humans.

•An example is ChatGPT which can beat any human at knowing more facts or speaking more languages. It can write 50 words per second 100 times cheaper than a human. ChatGPT has important limitations but shows what future AI may be capable of.

•AI will eventually have all the advantages but humans get to decide which AI systems join and when. Humans should recruit AI systems to play on the human team. Jan calls this alignment. Humans also get to set the rules of the game so they don't lose. Humans currently have the opportunity to set good rules.

•Jan wants to talk about setting good rules. He phrases alignment as building AI systems that follow human intent and preferences. He will discuss past work trying to make current models as aligned as possible and future work that needs to be done.

•Jan breaks down what it means for AI to follow human intent. AI should follow direct instructions as an assistant but also follow implied intent like not doing harmful things, asking follow up questions when unsure, and not making things up. Capturing implied intent is difficult to specify precisely but important.

•The main technique used today is preference learning through model comparison. Humans compare model responses and the models learn to predict which responses humans would prefer. The models then optimize to generate the preferred responses. Different people may have different preferences and the models average across preferences. This is a simple but general technique that can apply to many AI models.

•Jan shows a graph demonstrating how well models of different sizes scored on human preferences. Surprisingly, a model 100 times smaller was preferred. Optimization and mixing in pre-training data mitigated regressions. The small, optimized model outperformed the much larger model.

•The training costs are also surprisingly cheap relative to pre-training costs. The most expensive optimized model was less than 2% of the pre-training compute. Larger models could still use this technique but there are still important limitations to address with more compute and data.

•Jan considers ChatGPT an upgrade to instructions GPT for alignment and usefulness. It is better at using dialogue, asking follow up questions, avoiding harmful tasks but still hallucinates facts and is sensitive to prompting showing misalignment. Evaluation is easier than generation meaning evaluating model responses is easier for humans than generating responses from scratch. This allows models to do better than individuals could alone.

Content Observations

“Jan considers ChatGPT an upgrade to instructions GPT for alignment and usefulness” is a subtle but important point GPT-4 did not capture
Missing the nice nesting structure of GPT-4 bullets, but maybe it can be prompted towards something like it.

Summarization V3: a Claude specific prompt (but still 4k tokens)

Changes

Summarization instead of repetition.
Ask for strict markdown
Ask for grammar and spell correction
Ask for nested bullets

def summarize_anthropic_v3(input: str, max_tokens_to_sample=8000) -> str:
	return client.completion(prompt=f"""{anthropic.HUMAN_PROMPT}
Summarize the following from a youtube audio transcript as a bullet point list.
Use strict markdown. Use nested bullet points if appropriate.
Correct spelling and grammar errors.
The transcript could be the full transcript or just a section of the video.
---
{input}
---
{anthropic.AI_PROMPT}""",
	stop_sequences=[anthropic.HUMAN_PROMPT],
	model="claude-v1",
	max_tokens_to_sample=max_tokens_to_sample,
)

result_v3['stop_reason']
'stop_sequence'

Result (GPT4 for reference)

Here is a summary of the transcript in bullet points:

• OpenAI is developing increasingly capable AI systems
• The AI systems joining "the game" (interacting with the world) are getting stronger over time
• OpenAI expects to develop "incredibly strong players" (AI systems) that will far surpass human capabilities

•OpenAI's objectives:
› Recruit AI systems to play on "team human" (ensure AI is aligned with human values) through a process called "Constitutional AI"
› Develop rules for how AI interacts with the world that ensures "team human" does not lose

•Alignment means building AI systems that:
› Follow instructions and human preferences
› Infer human intent beyond direct instructions

• Current technique for alignment:
›Gather human preferences through comparisons of model outputs
›Train a reward model on these preferences
›Optimize the model with reinforcement learning using the reward model

•Surprisingly, fine-tuning a smaller model (InstructGPT) to be aligned results in higher human preference than a much larger model (GPT-3)

• Fine-tuning models to be aligned is very computationally cheap relative to pre-training the models

• OpenAI's chatbot CLIP is an improvement on InstructGPT but still has significant limitations, like hallucinating facts and being sensitive to how it's prompted

• Evaluation (comparing model outputs) is easier for humans than generation (creating responses from scratch)
• This allows the model to perform better than humans could on their own

print(numtokens_anthropic(result_v3['completion']))
339

Observations

Did not follow strict markdown as asked. Does a 2 level nesting in its own format.
Spelling and grammar correction (from a noisy transcript) is excellent
- Calls ChatGPT “CLIP” though, which is weird because the transcript does not say “clip” anywhere.
“(interacting with the world)” is something the model inferred by itself that GPT-4 didn’t. I wonder if the values it’s been tuned towards (“helpful” of HHH) push it more towards trying to explain it to people.
(Subtle) false claim that OpenAI follows the constitutional AI paradigm, he just mentioned it as an interesting paper.
Misses more specific facts than GPT4 summary (cost of fine tuning is 2%, ways in which AI has advantage over humans). Seems like GPT4 is trying to cram more information in per word while Claude is preferring coherence.
Claude generally prefers longer outputs than GPT-4 (I wonder if OpenAI is using this as a cost measure, assuming the model is still autoregressive)

GPT 3.5 Turbo comparison

Maybe it’s unfair to compare Claude to 4, and it’s a 3.5 class model, so I added a quick comparison to 3.5 as well.

ask = oai.create_simple_completer(model="gpt-3.5-turbo")

def clarify_verbose_caveated(t: str) -> str:
	return ask("""repeat the following from a youtube audio transcript as a bullet point list.
do not remove information. be very verbose. the transcript could be the full transcript or just a section of the video.
--- TRANSCRIPT ---\n\n""" + t)

Result

- Jan from OpenAI leads the alignment team and has a PhD in reinforcement learning theory - He previously worked at DeepMind
- Jan will be discussing the alignment problem
- OpenAI is working to align AI systems with human intent
-

Observations

That’s it, the result cuts of there. So Claude is definitely closer to 4 than 3.5
Not going to pursue 3.5 turbo any further.

Summarization V4: Trying for accuracy, last attempt with smaller sample

Changes

Adding “If you’re not sure about a point, leave it out instead of making false claims.”
Adding “Be verbose and include everything you do understand.”

def summarize_anthropic_v4(input: str, max_tokens_to_sample=8000) -> str:
	return client.completion(prompt=f"""{anthropic.HUMAN_PROMPT}
Summarize the following from a youtube audio transcript as a bullet point list.
Use strict markdown. Use nested bullet points if appropriate.
Correct spelling and grammar errors. If you're not sure about a point, leave it out instead of making false claims. Be verbose and include everything you do understand.
The transcript could be the full transcript or just a section of the video.
---
{input}
---
{anthropic.AI_PROMPT}""",
	stop_sequences=[anthropic.HUMAN_PROMPT],
	model="claude-v1",
	max_tokens_to_sample=max_tokens_to_sample,
)

Result

 Here's a summary of the transcript in bullet point form:

• Jan from OpenAI leads the alignment team.

•He has a PhD in reinforcement learning theory and has worked on the alignment problem for over 10 years.

• AI is getting progressively stronger over time. Eventually, there will be AI far beyond human capabilities.

• The objectives for humans should be:

	› Recruit AI to play on team human (alignment)

	› Set rules so team human doesn't lose

• Alignment means building AI that follows human  intent and preferences. This includes:

	› Instructions: Following specific commands

	› Implicit preferences: Avoiding harm, asking clarifying questions, etc.

•The main technique for alignment is:

	› Train a  reward model on human preferences (comparisons between model outputs)

	› Then optimize the model with reinforcement learning and the reward model

• An experiment showed an  "instructed GPT" model 100x smaller than GPT-3 but preferred by humans. This shows the power of alignment techniques.

• Aligning models is surprisingly cheap compared to pre-training them. The most expensive instructed GPT was <2% of the cost of pre-training GPT-3.

• There are still important limitations and more work is needed.

• ChatGPT is an improvement over InstructGPT. It's better at:

	› Using dialogue

	› Avoiding harmful responses

•But ChatGPT still has issues like hallucinating facts and being sensitive to prompts.

• A key principle: Evaluation is easier than generation. Having humans compare model outputs allows spotting good and bad behavior even for very difficult tasks.

• This principle applies broadly, e.g. in distinguishing GPT-3 and CLIP. Generation is harder than evaluation.

398 tokens

Observations

Formatting a little weird, extra newlines again.
This prompt captures specific facts the last one missed — 2%, 100x
Thinks ChatGPT is “CLIP” again.
If not for the formatting, I would prefer this to GPT-4 summary slightly for the more readable language.

Claude 100k — all at once summary

Because more tokens are expensive, I’m going to mimic a real world setting and do this one shot, using my favorite prompt so far developed in V4

def summarize_anthropic_v5(input: str, max_tokens_to_sample=90000) -> str:
	return client.completion(prompt=f"""{anthropic.HUMAN_PROMPT}
Summarize the following from a youtube audio transcript as a bullet point list.
Use strict markdown. Use nested bullet points if appropriate.
Correct spelling and grammar errors. If you're not sure about a point, leave it out instead of making false claims. Be verbose and include everything you do understand.
The transcript could be the full transcript or just a section of the video.
---
{input}
---
{anthropic.AI_PROMPT}""",
	stop_sequences=[anthropic.HUMAN_PROMPT],
	model="claude-v1-100k",
	max_tokens_to_sample=max_tokens_to_sample,
)

Observations

Fast, 22 seconds, V4 was 17.5 on 1/3rd the content. Hmm, I wonder if the speed can be reverse engineered to reveal the potential method they’re using to get to 100k.
285 tokens output (maybe explains the speed), kind of concerning that it’s less than V4’s 398 with the same prompt. And it was asked to “be very verbose”. GPT4 appended summaries for reference.
Claude surfaces “constitutional AI” again when it’s not that central. I wonder if that term is just “overweighted” in its fine tuning.
Very high level, misses many fun details for a keen observer interested in the space, but will give an outsider an overview of “what it’s about”. I would prefer appended 3 outputs

Result

* AI progress is accelerating as increasingly capable AI systems are developed
* Team human currently has the advantage of determining which AI systems are deployed and when
* A key objective should be aligning AI systems with human values and preferences to ensure human control
* Techniques like reward modeling and Constitutional AI show promise for AI alignment using feedback from humans
* Evaluating AI systems may be easier for humans than generating them from scratch, allowing human-AI collaboration to evaluate more advanced systems
* AI assistance could be leveraged to augment human capability to evaluate more advanced AI systems through critique generation, explanation, fact-checking, etc.
* Targeted perturbations could be used to evaluate the effectiveness of AI assistance by inserting subtle flaws for the AI system to detect.
* The gap between human capability to evaluate systems and AI capability to generate them is a concern, as AI could become too advanced for humans to reliably evaluate without AI assistance
* Scalable oversight may be needed to evaluate very advanced AI systems, but ensuring the effectiveness of scalable oversight itself is challenging without ground truth
* Interpretability may be useful for detecting deception or flaws in AI systems but likely insufficient and possibly unnecessary for full alignment
* The key may be aligning the actual decisions and behaviors of AI systems with human values, not just their internal reasoning processes

Does it help at the boundaries

Split 1

“Evaluating AI systems may be easier for humans than generating them from scratch, allowing human-AI collaboration to evaluate more advanced systems” is the key point at the first split.
- This does get captured in the GPT4 first split in at least as much fidelity as Claude

print(split_up_transcript[0][-400:])
print("---")
print(split_up_transcript[1][-400:])

're giving this kind of like feedback now let's you know the system should do better than you actually could and I think that's a very general principle that holds in lots of domains so kind of like you're probably most familiar if you start yes you know that pivas MP and everyone you know we don't actually know whether they're different but in practice it seems like and pitas are just much harder
---
gain I think what gets harder is that you know like as you've deployed the model and people started using it for all kinds of you know tasks that they want to build their company around like they um if you're update and you change the model then they also have to do a bunch of work into like adapting their prompts to whatever they have to do what they're doing and so it doesn't come as a zero cost

Split 2

is right in the middle of a Q&A question (where the question is partially inaudible) about “reducing dimensionality effects in representation”. As far as I can tell over the poor audio quality, it’s about the model losing robustness, being fragile to side-effects. The follow up goes into interpretability
Claude captures this key claim that GPT-4’s forced split could not
- “Interpretability may be useful for detecting deception or flaws in AI systems but likely insufficient and possibly unnecessary for full alignment”
- Convolving the split points could maybe capture this but there’s some sophistication and risk involved with that
  - e.g. how do you reconcile 2 convolved summaries
But the transcript is very difficult to make any sense of at this point, so it’s possible Claude is actually capturing it from the general context of the talk instead of this specific place.

print(split_up_transcript[1][-400:])
print("---")
print(split_up_transcript[2][-400:])

gain I think what gets harder is that you know like as you've deployed the model and people started using it for all kinds of you know tasks that they want to build their company around like they um if you're update and you change the model then they also have to do a bunch of work into like adapting their prompts to whatever they have to do what they're doing and so it doesn't come as a zero cost
---
en't you just selecting for models that have misalignments they're really hard to find with you and show you the body tools sure sure does not be necessary yes so why would it not be necessary so again this is kind of like an open question but basically what stance you could take is that at the end of the day what really is going to matter is the decisions that the model actually takes and not the

Nested Summary with GPT-4 (for comparison)

final_oai = ask(f"""
Summarize the following bullet points into a high level summary. Use bullet points. Be very verbose:
---

{oai0}

{oai1}

{oai2}

""")

Result

 - Jan from OpenAI discusses AI alignment and building systems that follow human intent
- AI skill levels increasing, with humans "recruiting" AI players for various tasks
- Methods: inferring reward models, optimizing with Proximal Policy Optimization, and using Instruct-GPT
- Limitation: human evaluation level stagnates as AI progresses
- Utilizing AI for code, critiques, and evaluation assistance
- Challenges: avoiding bias and manipulation, handling uncertainty, balancing preference updates
- GPT-3: immense knowledge potential, learns effectively with less data through fine-tuning
- Addressing style preferences, risks of over-reliance on AI in education, and safety concerns
- Importance of developing AI interpretability, outer alignment signals, and leveraging AI assistance for evaluation

Observations

15s response time, 158 tokens
With respect to capturing specific points, there’s pros and cons to GPT-4’s and Claude 100k’s summaries. I can’t really say one is “better” than the other.
- GPT-4 captures specific details like use of PPO.
- Claude captures point about targeted pertubations
I think this is too complex of a talk to try and summarize into 158 (GPT-4, 2nd level summary) or 285 tokens (Claude 100k) summary.
A little surprised that GPT-4 captures the point about bias being an important concern and Claude does not.
I can read either summary and get some high level understanding.

Questions for large LLM labs they'll probably not want to answer

Most memorable insights (BUT CAVEAT)

Table of contents

Transcript and tokenization

Splitting (only needed for OpenAI)

Observations

GPT-4 summary as a reminder

Result

Notes

Claude V1 (4k tokens only for dev)

Notes

Result (GPT4 for reference)

Observations

Summarization V2: max_tokens_to_sample=8000

Changes

System Observations

Content Observations

Summarization V3: a Claude specific prompt (but still 4k tokens)

Result (GPT4 for reference)

Observations

GPT 3.5 Turbo comparison

Result

Observations

Summarization V4: Trying for accuracy, last attempt with smaller sample

Changes

Result

Observations

Claude 100k — all at once summary

Observations

Result

Does it help at the boundaries

Split 1

Split 2

Nested Summary with GPT-4 (for comparison)

Result

Observations

Summarization V2: `max_tokens_to_sample=8000`