Three Days, 1,864 Pages, and One Upside-Down Line Number.

Published on March 19,2026

Being the "tech guy" of the family, even if that just means knowing how to open a laptop, has its perks.. And its curses.

Family reached out asking how to tenth-line a PDF document.

My immediate reaction: please, please dumb it down.

"What do you mean by tenth-line?"

"The way the document is numbered. 10, 20, after every tenth line."

Oh. That kind of tenth-lining.

A quick Google told me one or two sites already did this. But family couldn't afford them. The Kasongo economy has done us dirty.

"Give me a day," I said.

Famous last words.

First, what even is tenth-lining?

If you have never set foot in a courtroom, here is the thing: when lawyers argue a case, they reference documents constantly. Not just "page 12" but "page 12, line 30." Judges, opposing counsel, everyone needs to be on the same page. Literally.

So before filing, every document has to be numbered. Every tenth line gets a marker. Line 10. Line 20. Line 30. All the way through, sometimes across hundreds of pages.

In Kenya, this is not optional. It is a court requirement. And different courts have different formatting rules. The Court of Appeal wants it one way, the High Court another, the Supreme Court its own thing.

So how do lawyers currently do it? Word macros that break. Manual counting. Or expensive desktop software built for UK and US courts that does not quite understand Kenyan formatting. There is a gap, and people fall into it every day.

That was my "give me a day" problem.

Day one. Easy, right?

I sat down, confident. How hard can it be? You open a PDF, count the lines, stamp a number every tenth one. Done.

Then I learned something that changed everything: a PDF is not a document.

A PDF is closer to a set of drawing instructions. It tells a viewer: draw this character at this x,y coordinate, in this font, at this size. There are no "lines" in a PDF. No paragraphs. No sentences. Just shapes and positions.

So "count the lines" is actually: figure out which characters are at similar vertical positions, group them, sort them, figure out which groups are actual lines of text and which are decorative borders or headers or artifacts, then number every tenth group.

Day one became day two.

Then the scanned pages showed up.

Halfway through celebrating, I ran into a document that laughed at everything I had just built.

A lot of legal documents in Kenya are not born digital. They are physical papers that got scanned into a PDF. Which means instead of drawing instructions, the PDF contains a photograph. A picture of a page.

You cannot extract text from a photograph. There is nothing to extract. It is pixels.

The solution is OCR, Optical Character Recognition. Software that looks at an image and reads it the way a human would. Normally this runs on a server somewhere. But I had already made a quiet promise to myself: no uploads. A lawyer's documents are confidential. The whole thing had to run in the browser, on the user's machine, nothing leaving their computer.

So I had to run OCR in the browser.

Turns out this is possible. There is a library that does exactly this. Reads images and extracts text, locally, in JavaScript. The catch? It is slow. Running OCR on a 300-page scanned document, one page at a time, is the kind of thing you start and go make tea. Then lunch. Then wonder if CAF is just a reflection of the state of most African countries.

It worked. Slowly, but it worked. I filed that under "good enough for now" and moved on.

The page that was sideways. And upside down.

I thought I was nearly done when a test document broke everything.

Some PDFs store pages rotated. The page is physically saved sideways but the PDF has a flag that says "rotate this 90 degrees before displaying." Most viewers handle this invisibly. You never notice.

I noticed.

My line numbers were appearing horizontally across the top of the page. Then after I fixed that, they appeared on the wrong side. Then upside down. Then upside down on the wrong side, counting from the bottom up.

This one bug took longer to fix than the entire first version of the tool. The problem is that PDF coordinate systems and display coordinate systems disagree about where (0,0) is, which direction is "up," and how rotation affects all of that. Getting text to appear in the right place, right-side up, on the correct margin, on a page that has been rotated 90 or 270 degrees, requires thinking in four different coordinate spaces simultaneously.

The fix, when it finally clicked, was almost embarrassingly simple. The debug journey was not.

1,864 pages.

The real stress test came quietly. A record of appeal. 1,864 pages. I hit process, leaned back, and watched the progress bar move like it had somewhere else to be. Fifteen minutes later it was done.

Fifteen minutes!

Nobody has time for that.

The problem was sequential processing. The tool was handling one page, finishing, moving to the next.

Fix: batch text pages in groups of twenty, all running in parallel. Run four OCR workers simultaneously. Drop the OCR image resolution slightly. Quality barely changes, speed improves significantly.

Fifteen minutes became five. And because of the two-phase approach, text first and OCR in the background, the tool is interactive in about thirty seconds. You are reviewing and navigating a 1,864-page document while the scanned pages catch up behind the scenes.

What "done" actually looks like

Here is what the final tool handles:

A normal text PDF? Line numbers, done.
A scanned document where every page is a photograph? OCR runs, text is extracted, line numbers applied.
A mixed document with some text pages, some scanned, some image-only pages with no text? It detects each type, handles them differently, skips numbering on pure image pages, applies page numbers throughout.

You can adjust where the line numbers sit. Margin, size, font. There are presets for Court of Appeal, High Court, and Supreme Court formatting built in. You can start reviewing page one while page 1,400 is still being processed.

And nothing ever leaves your computer.

The promise, kept

I told family: give me a day.

It took longer than a day. It took debugging coordinate systems at midnight, learning more about PDF internals than any sane person should know, discovering that "just add line numbers" is one of those problems that looks simple until you are three weeks in wondering why page 47 is upside down.

But the tool exists. It works. And the next time someone in the family, or anyone, needs to tenth-line a court document, they can do it, in a browser, in minutes, without uploading their confidential files anywhere.

That felt worth the extra days.

Try the tool

***********

PS: the speed problem, solved (two months later)

Two months on, the tool has numbered 35,667 pages and is in steady use by advocates, paralegals, and law firms preparing real filings. And not just family 🎉

Which is exactly why the slowness mattered. It was real, and eventually an advocate said so plainly in a review: "Quality work, though there is a need to improve on speed." Five stars, but the message was clear.

They were right, so I went back in.

The fix turned out to be a lesson I should have learned the first time round: I was solving a harder problem than the one in front of me.

To find where to place a line number on a scanned page, I had been running full OCR. Tesseract would read every character on the page, recognise the words, return the text, and then I would throw all of that away and keep one thing: the vertical position of each line. I was paying for a full transcription of the document and using none of it.

But tenth-line numbering never needs to know what a line says. It only needs to know where the line sits. And you can find that without reading a single character.

The technique is called a horizontal projection profile, and it is almost insultingly simple. Render the page to a canvas, then walk down it row of pixels by row of pixels, counting how many are dark. Rows that fall inside a line of text are full of ink; the gaps between lines are nearly empty. Plot those counts and the text lines announce themselves as peaks. Group the consecutive dark rows into bands, take the centre of each band, and that is your line position. It is pure pixel arithmetic.

The difference is not subtle. OCR spent seconds on each scanned page. The projection profile spends milliseconds. The phase that used to take minutes on a long document now finishes in seconds, and as a bonus the tool no longer downloads an eleven-megabyte language model before it can start, which matters a great deal on a Nairobi connection.