Summary of the fix #21

Merged
navicore merged 1 commit from parse into main 2026-05-07 01:29:56 +00:00
Owner

server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the
upstream Ollama call returned 200 OK:

  1. TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each
    chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle
    tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent."
  2. Multiple complete lines per chunk → only the first survived. The closure returned a single Result per input chunk
    and returned on the first successful parse, silently discarding any further lines in the same packet.

The rewrite (lines 149-228) keeps a Vec buffer across chunks, drains only up to the last \n (any trailing partial waits for
the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the
parse-error logging behavior are preserved.

If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are
still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.

server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the upstream Ollama call returned 200 OK: 1. TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent." 2. Multiple complete lines per chunk → only the first survived. The closure returned a single Result<ChatChunk> per input chunk and returned on the first successful parse, silently discarding any further lines in the same packet. The rewrite (lines 149-228) keeps a Vec<u8> buffer across chunks, drains only up to the last \n (any trailing partial waits for the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the parse-error logging behavior are preserved. If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.
Summary of the fix
All checks were successful
CI / ci (pull_request) Successful in 3m1s
d5a5dce473
server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the
  upstream Ollama call returned 200 OK:

  1. TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each
  chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle
  tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent."
  2. Multiple complete lines per chunk → only the first survived. The closure returned a single Result<ChatChunk> per input chunk
   and returned on the first successful parse, silently discarding any further lines in the same packet.

  The rewrite (lines 149-228) keeps a Vec<u8> buffer across chunks, drains only up to the last \n (any trailing partial waits for
   the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the
  parse-error logging behavior are preserved.

  If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are
  still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.
Sign in to join this conversation.
No description provided.