Mark's Remarks

Thoughts and observations

§ the first token problem

Ruminations from a year ago…

In the “Deep Dive into LLMs like ChatGPT” video, Andrej Karpathy talks what he calls “models need tokens to think”. He is showing us the a prompt with two possible responses. Both responses are correct; the questions is which response should we prefer? The prompt is a math question, and one response gives the answer followed by an explanation. The other response works out the problem and then provides the answer. As humans we might prefer the former, with the answer first, but with the goal of fine-tuning, we should prefer the one which works out the problem first. The problem is that by training the model to respond this way, we are expecting it to solve the problem in one or very few passes, using very few tokens, and thus using very little thinking. If a prompt can’t be solved in only one or a few passes then the model is going to struggle, get it wrong, or have to reset.

This is what I think of as the “first token” problem.

We don’t want to require the model to have to figure out the entire answer in the first few tokens.

However we tend to dislike verbose answers and often prompt models saying “Give a concise answer” or “no yapping”. Some response formats such as JSON do effectively the same thing because they require the shape of the answer to be known right away (JSON in particular requires the first token to be punctuation like {, [, ", or a number or the word true or false.)

So, if we want models to be able to work out the problem before providing the answer, we need to give them an opportunity. This could be “Show your work, then write Answer: and show the exact answer. As it happened, the entire industry was thinking about this problem and invented “thinking tags” in late 2024. This permits the models to spend lots of tokens working out a problem but keeps that output separate from the answer.

I definitely think that any prompts requiring JSON responses should only be conducted with thinking models. Even with non-JSON responses (say DSPy chat format), it can be beneficial to ask for a “thoughts” answer first. For example:

class Classification(dspy.Signature):
    item_name: str = dspy.InputField()
    item_description: str = dspy.InputField()

    thoughts: str = dspy.OutputField(desc="Thoughts about which category to output for this question")
    category: str = dspy.OutputField(desc="One of: " + ", ".join(categories))

This prompt invites the model to think before doing the strict categorization. In my experiments I saw just the addition of the thoughts field to produce a 1% increase in accuracy (and it was high accuracy to begin with, so a bigger deal than it sounds).

§ cursor skills

Erik Zakariasson posted about some Cursor “commands” - which seem similar to Claude “skills”.

  • deslop Removes stuff which a human wouldn’t add.
  • create pr How to use gh to open a good PR.
  • commit How to review code to be sure that it is relevant to the last work we did, and commit it.
  • weekly review How to review the past week of commit to create a summary of what we’ve done.
  • fix merge conflict How to detect and resolve conflicts, then validate the results using tests, and finally to commit it.

These all look like good starting points but will need revision to work for me.

§ Spoken Word Programming

Gergey in his Pragmatic Engineer newletter wrote about companies using voice input. He visited Wispr Flow which makes dictation software that can fix up what you say automatically e.g. turning “let’s start a server on localhost 3000… no, sorry, localhost 8000” into “let’s start a server on localhost:8000” (notice that the software is url-aware). People are apparently fond of the BOYA microphone which is directionally sensitive so that one can speak very quietly into it even in an open office, get picked up, and not include surrounding noises.

§ Git worktree for agents

I discovered git worktree a few days ago; this lets you create a second work directory with its own branches but connected to the same local git repo.

git worktree add ../other-project other-project

Clean up with:

git worktree remove ../feature-branch

Apparently this is very useful for working with multiple agents to keep them from colliding with each other.