Summary of the fix #21

Merged

navicore merged 1 commit from parse into main

2026-05-07 01:29:56 +00:00

navicore commented

2026-05-07 01:24:48 +00:00

Owner

server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the
upstream Ollama call returned 200 OK:

TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each
chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle
tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent."
Multiple complete lines per chunk → only the first survived. The closure returned a single Result per input chunk
and returned on the first successful parse, silently discarding any further lines in the same packet.

The rewrite (lines 149-228) keeps a Vec buffer across chunks, drains only up to the last \n (any trailing partial waits for
the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the
parse-error logging behavior are preserved.

If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are
still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.

server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the upstream Ollama call returned 200 OK: 1. TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent." 2. Multiple complete lines per chunk → only the first survived. The closure returned a single Result<ChatChunk> per input chunk and returned on the first successful parse, silently discarding any further lines in the same packet. The rewrite (lines 149-228) keeps a Vec<u8> buffer across chunks, drains only up to the last \n (any trailing partial waits for the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the parse-error logging behavior are preserved. If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.

navicore added 1 commit

2026-05-07 01:24:48 +00:00

Summary of the fix

CI / ci (pull_request) Successful in 3m1s

Details

d5a5dce473

server/src/providers/ollama.rs had two bugs that combined to drop tokens — sometimes entire responses — even though the
  upstream Ollama call returned 200 OK:

  1. TCP chunks split mid-JSON-line. bytes_stream() returns raw byte chunks with no newline alignment. The old code parsed each
  chunk in isolation, so any object spanning a packet boundary failed to parse on both sides. Bigger/slower models trickle
  tokens, making this far more likely — which matches your symptom of "smaller models work, big ones go silent."
  2. Multiple complete lines per chunk → only the first survived. The closure returned a single Result<ChatChunk> per input chunk
   and returned on the first successful parse, silently discarding any further lines in the same packet.

  The rewrite (lines 149-228) keeps a Vec<u8> buffer across chunks, drains only up to the last \n (any trailing partial waits for
   the next chunk), and uses scan + flat_map to emit one ChatChunk per parsed JSON line. Stop-pattern detection and the
  parse-error logging behavior are preserved.

  If you want to see the previously-hidden parse errors before the fix, the existing tracing::error! lines on parse failure are
  still there — but with the fix, they should go quiet for normal traffic instead of firing on every TCP boundary.

navicore merged commit 96e0ac71f8 into main

2026-05-07 01:29:56 +00:00

navicore referenced this pull request from a commit

2026-05-07 01:29:57 +00:00

Merge pull request 'Summary of the fix' (#21) from parse into main