Programming

24501 readers
150 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
1
 
 

Hi all, I'm relatively new to this instance but reading through the instance docs I found:

Donations are currently made using snowe’s github sponsors page. If you get another place to donate that is not this it is fake and should be reported to us.

Going to the sponsor page we see the following goal:

@snowe2010's goal is to earn $200 per month

pay for our πŸ“« SendGrid Account: $20 a month πŸ’» Vultr VPS for prod and beta sites: Prod is $115-130 a month, beta is $6-10 a month πŸ‘©πŸΌ Paying our admins and devops any amount ◀️ Upgrade tailscale membership: $6-? dollars a month (depends on number of users) Add in better server infrastructure including paid account for Pulsetic and Graphana. Add in better server backups, and be able to expand the team so that it's not so small.

Currently only 30% of the goal to break-even is being met. Please consider setting up a sponsorship, even if it just $1. Decentralized platforms are great but they still have real costs behind the scenes.

Note: I'm not affiliated with the admin team, just sharing something I noticed.

2
3
4
 
 

Today's software tools have weird names. We call a "library" some collection of functions that you can use in your program.

I think that software repositories (where apt downloads your programs from) should be the actual libraries, since that's where you go to get your information; Meanwhile individual packages of information should be called books because they are one solid object containing a bundle of information.

5
6
7
 
 

I wrote a proof of concept that allows the user to sign up to a service using their matrix ID e.g @user:server.test. The user then receives an activation link in an encrypted room from the service. It worked quite easily and within 2 days of fumbling around with the matrix SDK in python and FastAPI, here we are.

This has been in my head for a while and I just wanted to see if it's possible (the proof is in the ~~pudding~~ code). Emails are insecure and national services are starting to implement communication services on top of matrix. It's a not inconceivable that citizens might get a government issued Matrix account and communicate safely with the government over a secure protocol. Why not allow other services to do the same?

Imagine if instead of providing your email address for signing up to services you used matrix instead. Your host wouldn't be able to read your messages and it could replace things like 2FA codes over SMS, activation links in emails, or health documents from your doctor's CMS in your email inbox.

Should there be enough time, I'd like to try and contribute this login method to forgejo (the software behind codeberg that's hosting this repository), but let's see. First it would take learning go πŸ˜…

8
9
69
How FOSS Won and Why It Matters (www.softwaremaxims.com)
submitted 3 days ago* (last edited 3 days ago) by HaraldvonBlauzahn@feddit.org to c/programming@programming.dev
 
 

See also this post: https://feddit.org/post/24404539

10
 
 

I am an iOS Application Developer.

I am now trying to learn Android application development using the newly released Kotlin Jetpack Compose.

Learning alone can be challenging, especially when you have other responsibilities.

I am looking for buddies who want to learn Kotlin Jetpack Compose. I learned about the basics of the Kotlin Jetpack Compose course.

So, I am looking for a project-based Kotlin Jetpack journey that can teach us how to tackle real-world challenges.

11
21
Dead Simple CI (deadsimpleci.sparrowhub.io)
submitted 3 days ago* (last edited 3 days ago) by melezhik@programming.dev to c/programming@programming.dev
 
 

Dead simple CI - http://deadsimpleci.sparrowhub.io/ could be thought as an extension to any modern CI system - GitHub/Gitea/Gitlab/Forgejo/you name it , adding to default pipeline mechanism (usually based on yaml) the convenient for programmers use of general programming languages, it uses web hooks and commit statues API to report results back to native CI

12
 
 

A stupid question I know.. but I left Reddit and came here, so please appreciate this.

Can we share our work here? like apps, libraries, etc...

13
 
 

When you employ AI agents, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly.

Matryoshka is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the anki-connect codebase.


The Problem: Context Rot and Token Costs

A common task is to analyze a codebase to answers a question such as β€œWhat is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase.

Traditional approach:

  1. Read all source files into context (~95,000 tokens for a medium project)
  2. The LLM analyzes the entire codebase’s structure and component relationships
  3. For follow-up questions, the full context is round-tripped every turn

This creates two problems:

Token Costs Compound

Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel.

Context Rot Degrades Quality

As described in the Recursive Language Models paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity.

The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as external environments with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything.


Prior Work: Two Key Insights

Matryoshka builds on two research directions:

Recursive Language Models (RLM)

The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents.

Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information.

Barliman: Synthesis from Examples

Barliman, a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of miniKanren. Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis engine tries to satisfy them. This approach makes it possible to describe what is desired for concrete test cases.

The approach is to simply show examples of the kind of behavior one wishes the system to exhibit, letting it derive the implmentation on its own. Thus, the emphasis shifts from writing long and detailed step-by-step recipes for behavior to simply portraying, in a declarative fashion, what the desired goal is.


Matryoshka: Combining the Insights

Matryoshka incorporates these insights into a functioning system for LLM agents. A practical tool is provided that enables agents to decompose challenging tasks into a sequence of smaller and more manageable objectives.

1. Nucleus: A Declarative Query Language

Instead of issuing commands, the LLM describes what it wants, using Nucleus, a simple S-expression query language. This changes the focus from describing each step to specifying the desired outcome.

(grep "class ")           ; Find all class definitions
(count RESULTS)           ; Count them
(map RESULTS (lambda x    ; Extract class names
  (match x "class (\\w+)" 1)))

We observe that the declarative interface retains its robustness even when the LLM employs different vocabulary or sentence structures. This robustness originates from the system’s commitment to elucidating the underlying intent of a request, independent of superficial linguistic variations.

2. Pointer-Based State

The key new insight is that we can separate the results from the context. Results are now stored in the REPL state, rather than in the context.

When the agent runs (grep "def ") and gets 150 matches:

  • Traditional tools: All 150 lines are fed into context, and round-tripped every turn
  • Matryoshka: Binds matches to RESULTS in the REPL, returning only "Found 150 results"

The variable RESULTS is bound to the actual value in the REPL. This binding acts as a pointer, revealing the location of the data within the server's memory. Subsequent operations, queries, for example, or updates, use this reference to access the data. But the data itself never actually enters the conversation:

Turn 1: (grep "def ")         β†’ Server stores 150 matches as RESULTS
                              β†’ Context gets: "Found 150 results"

Turn 2: (count RESULTS)       β†’ Server counts its local RESULTS
                              β†’ Context gets: "150"

Turn 3: (filter RESULTS ...)  β†’ Server filters locally
                              β†’ Context gets: "Filtered to 42 results"

The LLM never sees the 150 function definitions, just the aggregated answers from these functions.

3. Synthesis from Examples

When queries need custom parsing, Matryoshka synthesizes functions from examples:

(synthesize_extractor
  "$1,250.00" 1250.00
  "€500" 500
  "$89.99" 89.99)

The synthesizer learns the pattern directly from examples, obtaining numerical values straight from the currency strings and entirely circumventing the need to construct manual regex.


The Lifecycle

A typical Matryoshka session:

1. Load Document

(load "./plugin/__init__.py")
β†’ "Loaded: 2,244 lines, 71.5 KB"

The document is parsed and stored server-side. Only metadata enters the context.

2. Query Incrementally

(grep "@util.api")
β†’ "Found 122 results, bound to RESULTS"
   [402] @util.api()
   [407] @util.api()
   ... (showing first 20)

Each query returns a preview plus the count. Full data stays on server.

3. Chain Operations

(count RESULTS)           β†’ 122
(filter RESULTS ...)      β†’ "Filtered to 45 results"
(map RESULTS ...)         β†’ Transforms bound to RESULTS

Operations chain through the RESULTS binding. Each step refines without re-querying.

4. Close Session

(close)
β†’ "Session closed, memory freed"

Sessions auto-expire after 10 minutes of inactivity.


How Agents Discover and Use Matryoshka

Matryoshka integrates with LLM agents via the Model Context Protocol (MCP).

Tool Discovery

When the agent starts, it launches Matryoshka as an MCP server and receives a tool manifest:

{
  "tools": [
    {
      "name": "lattice_load",
      "description": "Load a document for analysis..."
    },
    {
      "name": "lattice_query",
      "description": "Execute a Nucleus query..."
    },
    {
      "name": "lattice_help",
      "description": "Get Nucleus command reference..."
    }
  ]
}

The agent sees the available tools and their descriptions. When a user asks to analyze a file, it decides which tools to use based on the task.

Guided Discovery

The lattice_help tool returns a command reference, teaching the LLM the query language on-demand:

; Search commands
(grep "pattern")              ; Regex search
(fuzzy_search "query" 10)     ; Fuzzy match, top N
(lines 10 20)                 ; Get line range

; Aggregation
(count RESULTS)               ; Count items
(sum RESULTS)                 ; Sum numeric values

; Transformation
(map RESULTS fn)              ; Transform each item
(filter RESULTS pred)         ; Keep matching items

The agent learns capabilities incrementally rather than needing upfront training.

Session Flow

User: "How many API endpoints does anki-connect have?"

Agent: [Calls lattice_load("plugin/__init__.py")]
        β†’ "Loaded: 2,244 lines"

Agent: [Calls lattice_query('(grep "@util.api")')]
        β†’ "Found 122 results"

Agent: [Calls lattice_query('(count RESULTS)')]
        β†’ "122"

Agent: "The anki-connect plugin exposes 122 API endpoints,
         decorated with @util.api()."

Each tool invocation maintains its own state within the conversation. So, for example, when a document is loaded, that content is retained in memory. Similarly, the results of any query that is executed are saved and available for later use.


Real-World Example: Analyzing anki-connect

Let's walk through a complete analysis of the anki-connect Anki plugin. Here we have a real-world codebase with 7,770 lines across 17 files.

The Task

"Analyze the anki-connect codebase: find all classes, count API endpoints, extract configuration defaults, and document the architecture."

The Workflow

The agent uses Matryoshka's prompt hints to accomplish the following workflow:

  1. Discover files with Glob
  2. Read small files directly (<300 lines)
  3. Use Matryoshka for large files (>500 lines)
  4. Aggregate across all files

Step 1: File Discovery

Glob **/*.py β†’ 15 Python files
Glob **/*.md β†’ 2 markdown files

File sizes:
  plugin/__init__.py    2,244 lines  β†’ Matryoshka
  plugin/edit.py          458 lines  β†’ Read directly
  plugin/web.py           301 lines  β†’ Read directly
  plugin/util.py          107 lines  β†’ Read directly
  README.md             4,660 lines  β†’ Matryoshka
  tests/*.py           11 files      β†’ Skip (tests)

Step 2: Read Small Files

Reading util.py (107 lines) reveals configuration defaults:

DEFAULT_CONFIG = {
    'apiKey': None,
    'apiLogPath': None,
    'apiPollInterval': 25,
    'apiVersion': 6,
    'webBacklog': 5,
    'webBindAddress': '127.0.0.1',
    'webBindPort': 8765,
    'webCorsOrigin': None,
    'webCorsOriginList': ['http://localhost/'],
    'ignoreOriginList': [],
    'webTimeout': 10000,
}

Reading web.py (301 lines) reveals the server architecture:

  • Classes: WebRequest, WebClient, WebServer
  • JSON-RPC style API with jsonschema validation
  • CORS support with configurable origins

Step 3: Query Large Files with Matryoshka

; Load the main plugin file
(load "plugin/__init__.py")
β†’ "Loaded: 2,244 lines, 71.5 KB"

; Find all classes
(grep "^class ")
β†’ "Found 1 result: [65] class AnkiConnect:"

; Count methods
(grep "def \\w+\\(self")
β†’ "Found 148 results"

; Count API endpoints
(grep "@util.api")
β†’ "Found 122 results"

; Load README for documentation
(load "README.md")
β†’ "Loaded: 4,660 lines, 107.2 KB"

; Find documented action categories
(grep "^### ")
β†’ "Found 13 sections"
   [176] ### Card Actions
   [784] ### Deck Actions
   [1231] ### Graphical Actions
   ...

Complete Findings

Metric Value
Total files 17 (15 .py + 2 .md)
Total lines 7,770
Classes 8 (1 main + 3 web + 4 edit)
Instance methods 148
API endpoints 122
Config settings 11
Imports 48
Documentation sections 8 categories, 120 endpoints

Token Usage Comparison

Approach Lines Processed Tokens Used Coverage
Read everything 7,770 ~95,000 100%
Matryoshka only 6,904 ~6,500 65%
Hybrid 7,770 ~17,000 100%

The hybrid method achieves a 82% savings in tokens while retaining 100% of the original coverage. This approach combines two different strategies, one for compressing redundant information and one for preserving unique insights.

The pure Matryoshka approach ends up missing details from small files (configuration defaults, web server classes), because the agent only uses the tool to query large ones. The hybrid workflow does direct, full-content reads on small files, while leveraging Matryoshka to analyze bigger files, in a kind of divide-and-conquer strategy. All that's needed is to provide the agent an explicit hint on the strategy to use.

Why Hybrid Works

Small files (<300 lines) contain critical details:

  • util.py: All configuration defaults, the API decorator implementation
  • web.py: Server architecture, CORS handling, request schema

These fit comfortably in context, and there's no need to do anything different. Matryoshka adds value for:

  • __init__.py (2,244 lines): Query specific patterns without loading everything
  • README.md (4,660 lines): Search documentation sections on demand

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Adapters                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Pipe   β”‚  β”‚   HTTP   β”‚  β”‚   MCP Server          β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚       β”‚             β”‚                     β”‚             β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚                          β”‚                               β”‚
β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                β”‚   LatticeTool     β”‚                    β”‚
β”‚                β”‚   (Stateful)      β”‚                    β”‚
β”‚                β”‚   β€’ Document      β”‚                    β”‚
β”‚                β”‚   β€’ Bindings      β”‚                    β”‚
β”‚                β”‚   β€’ Session       β”‚                    β”‚
β”‚                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                          β”‚                               β”‚
β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                β”‚  NucleusEngine    β”‚                    β”‚
β”‚                β”‚  β€’ Parser         β”‚                    β”‚
β”‚                β”‚  β€’ Type Checker   β”‚                    β”‚
β”‚                β”‚  β€’ Evaluator      β”‚                    β”‚
β”‚                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                          β”‚                               β”‚
β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                β”‚    Synthesis      β”‚                    β”‚
β”‚                β”‚  β€’ Regex          β”‚                    β”‚
β”‚                β”‚  β€’ Extractors     β”‚                    β”‚
β”‚                β”‚  β€’ miniKanren     β”‚                    β”‚
β”‚                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Getting Started

Install from npm:

npm install matryoshka-rlm

As MCP Server

Add to your MCP configuration:

{
  "mcpServers": {
    "lattice": {
      "command": "npx",
      "args": ["lattice-mcp"]
    }
  }
}

Programmatic Use

import { NucleusEngine } from "matryoshka-rlm";

const engine = new NucleusEngine();
await engine.loadFile("./document.txt");

const result = engine.execute('(grep "pattern")');
console.log(result.value); // Array of matches

Interactive REPL

npx lattice-repl
lattice> :load ./data.txt
lattice> (grep "ERROR")
lattice> (count RESULTS)

Conclusion

Matryoshka embodies the principle, emerging from RLM research, that documents are to be treated as external environments rather than as contexts to be parsed. This principle alters the fundamental character of the model’s engagement, no longer a passive reader but an active agent, navigating through and interrogating a document to extract specific information, somewhat as a programmer would browse through code. Combined with Barliman-style synthesis, in which a solution is built up in a series of small, well-defined steps, and pointer-based state management, it achieves:

  • 82% token savings on real-world codebase analysis
  • 100% coverage when combined with direct reads for small files
  • Incremental exploration where each query builds on previous results
  • No context rot because documents stay outside the model

We observe that variable bindings such as RESULTS refer to REPL state rather than holding data directly in model context. As we formulate and submit queries, what is sent to the server are mere pointers, placeholders indicating where the actual computation should occur. It is the server that executes the substantive computational tasks, returning only the distilled results.

source here: https://git.sr.ht/~yogthos/matryoshka

14
15
 
 

This is an older blog post I came across while reading this related one on syntax highlighting:

I am sorry, but everyone is getting syntax highlighting wrong @ tonsky.me. It was posted here 3 months ago.

I think both make great points and has pushed me to into a rabbit hole of re-writing my current Nord theme into something a bit more minimal, only for me to eventually realize Nord theme with barely any syntax highlighting (mostly white text) looks very bleak and I didn't want to spend the time to hunt all the highlight groups to make things look good, so I tried out the Alabaster theme, which the guy from the 2nd article created and I love it, feels like it really hits that middle spot between too much highlighting and not enough.

Here's the theme I used for nvim :

https://github.com/p00f/alabaster.nvim?tab=readme-ov-file

I changed some things (matching bracket background color for visibility, comments grayed out and property names of tables should be yellow, instead green).

You can see the picture of how it looks here

16
17
18
 
 

I know JavaScript is a very special boi but c’mon, you’re embarrassing me in front of the wizards.

19
20
21
 
 

I found it fascinating this OS/VM and what it can do. Thought I would share when they started to share their system in full. If your interested, its a great talk from start to finish!

https://100r.co/site/uxn.html

https://100r.co/site/projects.html

22
 
 

We recently wrote about Torvalds' atypically subtle and nuanced position on the use of LLM bots in coding. It seems that the reasons have suddenly become a little clearer.

Google's Antigravity LLM has been winning other friends of late, including Register columnist Mark Pesce, who wrote that "vibe coding will deliver a wonderful proliferation of personalized software." Some other big names in the world of FOSS have also come out in favor of LLM coding assistants recently, including Redis creator Salvatore "Antirez" Sanfilippo, who wrote "don't fall into the anti-AI hype." Said hype is, of course, a subject about which Torvalds opined previously.

Torvalds' position has been more moderate, which is not entirely like his former self. He is famed for his outbursts at Nvidia, GitHub, third-party companies, and kernel contributors. We could go on, but you get the picture.

23
24
-5
submitted 4 days ago* (last edited 4 days ago) by mohyoo@lemmy.world to c/programming@programming.dev
 
 

Hello everybody!

Not to brag, but I finally found a place (first place) to publish my humble work. It's a small & simple AI chat CLI written in Python.

Main features are privacy control & simplicity.

Why? Gemini (like other web sites) is too slow for poor internet or potato-PC users; this fixes it!

I'm not asking for credit, just for opinions, suggestions or -if possible- testers; but please show mercy because I'm a beginner and I have a long list of bugs.

Thanks in advance!

25
view more: next β€Ί