FloorSweepings

Sunday, August 17, 2025

Mama, don't let your kids grow up to be Vibe Coders!

Gary Marcus and Nathan Hamiel explain why in this article.

"Cybersecurity has always been a game of cat and mouse, back to early malware like the Morris Worm in 1988 and the anti-virus solutions that followed. Attackers seek vulnerabilities, defenders try to patch those vulnerabilities, and then attackers seek new vulnerabilities. The cycle repeats. There is nothing new about that.

But two new technologies are radically increasing what is known as the attack surface (or the space for potential vulnerabilities): LLMs and coding agents.

...

The best defense would be not using agentic coding altogether. But the tools are so seductive that we doubt many developers will resist. Still, the arguments for abstinence, given the risks, are strong enough to merit consideration.

...

Don’t treat LLM coding agents as highly capable superintelligent systems

Treat them as lazy, intoxicated robots

https://open.substack.com/pub/garymarcus/p/llms-coding-agents-security-nightmare?r=joc82&utm_campaign=post&utm_medium=email

Thursday, August 14, 2025

AI critic vindicated.

"I endlessly challenged these people to debate, to discuss the facts at hand. None of them accepted. Not once. Nobody ever wanted to talk science."

https://open.substack.com/pub/garymarcus/p/openais-waterloo

Tuesday, August 12, 2025

"Critically, as I argued at the end of June (and going back to 2019) LLMs never induce proper world models, which is why, for example, they still can’t even play chess reliably, and continue to make stupid, head-scratching errors with startling regularity."

LLMs are not like you and me - and never will be

The mystery religion of ML-based AI from its first miracles to its latest incarnation, LLMs, announced from Day Zero: "we don't need no steenkin' models". Classic anti-intellectual techbro arrogance.

Monday, August 11, 2025

Posted this on LinkedIn first: a response to the unveiling of GPT-5.

'Reading the abstract (Chain of Thought reasoning is “a brittle mirage that vanishes when it is pushed beyond training distributions”) practically gave me deja vu. In 1998 I wrote that “universals are pervasive in language and reasoning” but showed experimentally that neural networks of that era could not reliably “extend universals outside [a] training space of examples”.

The ASU team showed that exactly the same thing was true even in the latest, greatest models. Throw in every gadget invented since 1998, and the Achilles’ Heel I identified then still remains. That’s startling. Even I didn’t expect that.

And, crucially, the failure to generalize adequately outside distribution tells us why all the dozens of shots on goal at building “GPT-5 level models” keep missing their target. It’s not an accident. That failing is principled.'

And the principle is far older than LLMs: it goes back to the AI wars of the 60s and 70s. ML-based AI was a mystery religion that produced miracles that could not be explained. The miracles were flashy enough to get the plodding tortoises of symbolic logic and linguistics out of Big AI (universities and tech bro startups) and banish them to the margins. Gary Marcus, who wrote the critique below, was one of the survivors.

"In his first book, The Algebraic Mind (2001), Marcus challenged the idea that the mind might consist of largely undifferentiated neural networks. He argued that understanding the mind would require integrating connectionism with classical ideas about symbol-manipulation."

Gary Marcus Wikipedia entry

GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.

Monday, April 1, 2024

Waterfalling down the Staircase

(Reposted from Groups.io extremeprogramming group)

For those who haven't been watching 3 Body Problem on Netflix, the last episode of Season 1 includes a resounding lesson about the difference between Waterfall and Agile. (It would be great if someone with more video savvy than me were to capture the clip and link to it here.)

The Staircase Project is the old Project Orion concept reimagined to send a probe to recon an alien enemy fleet many light years away. Without going into spoilers, the basic idea is to accelerate a probe with an EM sail to near light speed by shooting it past 300 nuclear bombs, each to be exploded at just the right moment to blast it with radiation. This is a purely ballistic launch: the probe has no power or steering capabilities, so the explosions have to be timed perfectly and the trajectory is locked in.

Sounding familiar?

Early on, the shock of an explosion disconnects one of the tethers connecting the probe to the sail, the probe goes off course, and the entire project is lost - a world-threatening catastrophe.

It would not have been rocket science to give the probe the minimal intelligence and power to adjust its trajectory, perhaps by "trimming" the sail with a tug on one of its many tethers. But no: the finest minds on earth agree it's necessary to lock the trajectory in up front.

(Disclaimer - I've read only the first book in the trilogy on which the series is based, which ends before the probe project is undertaken, so I don't know if the author, Liu Cixin, is responsible for the waterfall.)

Friday, December 14, 2018

Perils of Messing with the Speed Force

This Twitter thread opened my eyes.

Wile E. faces a starvation deadline and focuses so intently on speed that he neglects the need to analyze the domain before implementing the chase story. This applies equally to the other major Roadrunner trope: the painted tunnel on the rockface.

As of this writing, all the commenters on that thread apparently believe that because "edge" and "run" are common English words there's no need to dig any deeper into them. Wrong: "edge" and "run" are the tip of the domain iceberg: things like cliffs, gravity, inertia, cartoon physics, etc. have to be understood before we can even look at the structure of the implementation.

The increasing micromanagement and microsiloization brought on by "Dark Industrial Agile" and the pressure from vulture capital for short-term thinking and asset-stripping that has done so much damage to the economy have had an equally destructive effect on the culture of development. We are all coyotes one paycheck away from starvation, so if management says "Don't look back (or forward or down), just run!", we run. It's not just testing and refactoring that get thrown away.

The Cloud is just another kind of plumbing, but so many architects and developers apparently think it's the only domain we need to organize around.

The fetishizing of so-called dynamic languages because they allow you to generate a lot of code really fast is one example. Benchmarks that "prove" Node is faster than Java (like this one) succeed only by comparing current reactive JS implementations to old servlet implementations. The interoperable JVM ecosystem provides much more modern options than servlets. In spite of ES6, Javascript is intrinsically slower, and NPM is currently suffering from the torture of a thousand tiny libraries.

Slow down to speed up, look around at the domain (I'd say "master it" but that's a whole nother thing) and optionally inhale that warm smell of colitas.

Sunday, July 2, 2017

Category Theory and Corporate Culture

A mediocre poet once wrote

    Imagine me at 
             my age looking for 
    a job. Strangely exciting.

A sudden end to a contract - less than a week's notice, probably because the manager I reported to suddenly resigned.

Like a kid whose parents are divorcing, I ask myself if it's my fault.

It's not, really.

I was hired to help automate an operations support system for a communications infrastructure provider. Working alongside a developer (DW) whose assignment was to develop "automations" using a commercial application I'll call Grit (not its real name), it became clear to me that Grit's promise of easy development (even by non-programmers) was absurd. The "development environment" lacked basic amenities like unit testing and source control. Operations support in an industry like this requires a realtime event-driven system, but both Grit and the surrounding software environment were heavily batch- and file-oriented and required human involvement and slow manual workflows.

My coworker was so unhappy with the tools he was tempted to walk. I suggested we come up with a plan for a true realtime system based on microservices. We brought this to our manager (SN) who, in spite of not having software development experience, quickly understood what we were driving at.

DW has had a lot of experience with operations support and with web infrastructure. He provided the expertise that allowed me to concentrate on applying what I had learned in years of object-oriented development and reading up on functional programming and category theory. Furthermore, he supplied a perspective on system and application monitoring that my test-driven approach to development really needed. And he set up our "disjunct" development environment.

Due to some cultural issues with security, our Windows laptops were so locked down that we could not change even the most trivial settings in our browsers, and all software could only be installed from an approved list after a long drawn-out request process. There was no way we could do modern software development in that environment.

Thanks to the loan of a server from a middle manager, we were able to set up an IDE (Intellij IDEA Community Version) and access it through SecureCrt and XServer. (Since we would sometimes lose connection to the server, IDEA was a safer choice than Eclipse because files are always being saved in the background by default.)

We successfully implemented the first phase of the framework, based on a fractal view of systems as consisting of total functions, dependent types and hexagonal architecture. The second phase would involve organizing the functions around models-as-types.

We couldn't put our code into production without official approval for the (extremely popular and well-tested) open source libraries we were using. This was another sign of the cultural chasm: there was no provision for software development because, as we were actually told at one point, "this company doesn't do software development."

In subsequent posts, I'll spell out the organization of the framework as it evolved and how I was able to bring some of the mathematical power of category theory and functional programming to the practical tasks of implementing an evolutionary development process that allowed for the maximum leverage of the batch/file/cmdline-based legacy systems we would have to communicate with - and perhaps eventually disintermediate, applying a more efficient variant of the Strangler pattern.

For now, assessing the matter of responsibility for what appears to be the end of that framework and that team, I have to blame myself because my background in anthropology and linguistics made me the only person who might have had a chance of understanding the cultural mechanisms that underlie Conway's Law.

But I don't blame myself much, because our team was doing some pretty serious development work, and DW, SN and our rookie developer JS were reasonably clear on the concepts. The cultural problems were operating at a "higher" level. Yes, if I knew at the beginning what I knew now, I might have been able to do something about the cultural problem.

But hey - I'm not a $500-an-hour consultant with all the flashy creds. Just a developer with too many years of experience.