Clintford Tech Review

My agentic setup and using Remotion with Codex

Jason Clintford — Sun, 01 Mar 2026 20:49:07 GMT

Always sandbox and then Yolo

This is the ethos that I've been practising when it comes to agentic workflows, especially on my own hardware. I was a bit aversed at the notion of giving an AI access to my system, even with limited permissions, and given my recent experience of these agents excelling when given a free and capable environment in which they can use all that they need to get a task done, it's a bit pointless to run them so restricted and in a sandboxy environment within your main system.

And given the restrictive nature of containers and container escape vectors, the only way that I could make myself run agents is either with bare metal hypervisors or desktop hypervisors, so I ended up using VMWare Workstation Pro (which is still a thing and now is FREE).

Choice of virtualised environment

For codex or any other agent to properly test the things they build, they would need to run it in a given environment, and ideally in the expected end environment in which it is supposed to run. Therefore, I made to vmware boxes, one for Windows, updated to the latest version and another for Ubuntu 24.04.

Most agents do excel a lot better on Debian/Ubuntu environments under yolo, but for those times when you'd need to build something for Windows, or use something that is only there for Windows, it's much better to have that natively available environment than using something like Wine.

Most AI platforms as a service, aka replit are selling the same thing

Repliit recently released its animated videos functionality, which is fundamentally an AI agent with skills (documented functionality, structured prompts), and a custom library, which most users can definitely try to replicate with Codex, Claude code and open code, given a bit of tweaking. It's possible to try to recreate this with a subscription to a SOTA model and a bit of fancy frameworks and skills, as Replit doesn't necessarily have a moat of non-recreatable custom technology around it.

And the beauty of finding the best combination of skills and methods these days is that specialist knowledge is democratised in a way, given that access to a decent model is available. It's easy to follow a chain of thought yourself and then flesh it out using a model, because modern chatbots and agents are very good at browsing the internet. Around the middle of 2025, I came to the realisation that my workflow of searching for things manually was bested by the results I was seeing by models, when it came to general knowledge.

Specialised knowledge would still require you to go out there and find resources for yourself, given the lack of performance for models in domains where they lack training data. But nowadays, with Codex 5.3, Claude Opus 4.6, I barely do try to manually search the internet myself, given that these models can access the domains which are necessary for the task being undertaken.

There are curated skills regarding remotion that one can get from skill.sh to help with this particular task, or you can tell Codex to go out there and search for skills and best use practices itself, given you have the tokens for that as well. There really is no bounds to what you can make Codex do, it's just a matter of the time turaround that you are looking for.

OpenAI did recently speed up Codex as well, hence, if you really feel like multitasking, you can write a prompt.md that instruct codex to best set up itself, research, plan, build, test and iterate while you can focus on a totally diffferent task yourself!

My usual way is to first test that the codex can do what it needs to autonomously in the environment setup, so I usually let it demo produce an artefact before moving onwards.

Codex can research and make videos all by itself

For the first test, I made it look up skills to help it achieve this task, then draft, time-scale, use subtitles, formulate, render and iterate until a proper video was produced. The idea in mind was to document the process of OpenAI models to this date.

https://files.catbox.moe/mjqjd2.mp4

While the end result isn't entirely ideal, it's still quite impressive for what is a general coding model's ability to flex and do multiple tasks given tools. Just as humans can achieve wonders with a tool, vs a human without said tool, generalist pre-trained AI can use the same leverage when tasked to work with tools it can control.

The evolution of abstraction

Jason Clintford — Fri, 20 Feb 2026 19:46:32 GMT

Technology, and in general the way we build on the shoulders of the older generations, is an evolution of the ability of abstraction that humans possess. Humans grow up not questioning society, commerce, technology and life in general, but with the ability to compound upon and build further on advancing abstraction to a higher level, which shows that it is indeed a tangible technical method that is effective with real-world implications.

Yet there is a part of me that has a strong belief in the understanding of life in an end to end manner. This will lead to better utilisations, discoveries, optimisations and understanding, but this is at a stark contrast as to how humans at large are.

As we approach the point of maturity in big data and big compute, with artificial intelligence getting ever closer towards general intelligence and a bit more, this is ever more apparent. The benefits of abstraction simply outweigh full scope realisation, because humans are computationally limited, in the same way modern llms have to rely on pretrained data unlike alpha go zero which has a more manageable sample space.

Agents

Agents are the latest wave of abstraction machines, which are to take the place of a human in the loop, consulting something, implementing it, reading documentation, and then troubleshooting using tools.

Humans no longer have to be an intermediary layer between these discrete processes; we have harnesses which can take an overarching idea and put it into motion, letting us travel into a higher order of abstraction and thought. This new paradigm does help the human retain more relevant context to the project as a whole, improving overall direction control, while not having to concern itself with or offload oversight for smaller mechanical implementations. It furthermore helps the human to self iterate the idea and direction through the pipeline that makes the human more akin to a consumer, looking at end goals and implementations, as the best skill these days when it comes to this department is the writing of requirement elicited documents, or better known as a host of markdown documents.

This structure did already exist as clusters of human work forces, architects and product designers worked at a higher level, while implementation was abstracted away to different stratified workers. The main achievement we have maid with this technology is the same as every other technology throughout the past, it's a work multiplier, what once took the effort of multiple people can now be done by one, leading to a mass climb of abstraction upwards so people can resettle again in a completely new stratified environment.

Anybody can be a steve jobs these days

Agents and in general, the advancements in AI have simply made being a visionary more attainable to the masses. Individuals can spend more time observing, thinking and contemplating while abstracting away technical requirements, years of mechanical experience and repetitive enumeration. In a similar way of how the printing press made the dissemination of ideas much easier as you didn't need scribes who had to slave away at a desk and also be learned to be slaving away at a desk as a pre-requisite, the current advancements in machine learning will bring forth the ability to build, to almost everybody, to visualize one's own imagination.

Tools in the age of prompting

Jason Clintford — Sat, 03 Jan 2026 22:45:17 GMT

We’ve amassed the power of big data and crunched through the majority of written human history to gain the ability that lets us expand little sentences into fully fledged and almost functional things. These things may be simply a mix of software, audio, visual, and textual content, but it is still an amazing feat, harnessing the power of linguistic expression.

Linguistical expression is one of the, if not the most important, skills to master at the moment. As humans, we have tokenised and quantified our perceptions and thoughts through words, and fed said tokenisation into a computer, which has, in turn, gained the ability to tokenise the tokenisation and mimic human thought to such an extent that we can see tangible improvements and capabilities stemming from what is basically linguistic inference performed by LLMs. This is still such a groundbreaking event to me, seeing the tangible effect of language, being more than just something impactful upon the human brain.

I do believe that everything has a tangible effect, even though it may not be apparent at first. We as humans work through heavy abstraction for survival, so seeing the underlying workings of everything isn’t our top priority.

Regardless, the best tool that we can have right now is something that lets us think, lets us detail things linguistically, and orchestrate a great visualisation in the mind of a reader, or in this case, within the context of a given LLM.

This was lingering in the back of my head for a while, the power of the written word and the pipeline of thought to tokenised thought. And I believe it is a task that mainly bottlenecks modern technology that involves humans, and unlike what most people think, humans are essential for this technology to exist (well, currently… ). All said and done, tokenised data is useless without a human to acknowledge/understand and consume it, bringing value to it, which it had in the first place. However, as the main bottleneck is the articulation and the rate at which humans are able to start with their internal tokenisation, a tool can be of great help to start assist with the prompt/idea generation stage, not just the stage of prompt to execution.

Yes, it is possible to simply prompt an existing SOTA model about an idea and have it interactively ask questions to refine any given idea, but something more structured, visual and laid out will have a bigger impact on most individuals. Because when it comes to humans, there is more than just chunks of text that leave an impression. And that is the main thing that is yet to improve in the field of AI, the UI in which it can interact with the regular user that is captivating and indulging, and I believe this is why there isn’t as big of a widespread adoption of the technology by the layman.

So, to try to tackle this problem myself, I ventured into aistudio.google.com, where I found that google had somewhat done what I was looking for, something called the Proactive Co-Creator, but it’s mainly focusing on the image, story and video prompting assistance. It lacked the touch of thought generation and refinement that I was looking for.

So, as there is plenty of vibe coding present within the google ecosystem, I laid out my requirements after great introspection, and let google take the wheel to spin up a version of it that best suited my requirements. And with it, I gained a tool that is indispensable in almost all branches in which I have vested interests.

My version of the Proactive Co-Creator as an Idea section, where I can still use the amazing Mind Map and Clarify section as normal, but now can explore a given idea as different variations and expand it in ways that work as a very inquisitive friend who is dissecting your output.

The default versions of Clarify and mind map are still quite effective when used alone by themselves, it really makes you think about what you are thinking, in a very metacognitive way that breaks down abstractions and helps the discovery of fundamentals.

I use this tool alongside targeted reading that helps me textualise my ideas, such as The Art of Description: World into Word. https://www.amazon.co.uk/Art-Description-Mark-Doty/dp/1555975631.

As a human in this worldwide chain of information consumption and creation economy, the best that can be done to improve efficiency using modern technology is to focus on the root which dictated all that we have right now, idea generation.

The world of audio technology and the audiophile

Jason Clintford — Wed, 10 Dec 2025 23:46:41 GMT

The distinct difference between the two worlds of audio and video is quite stark when viewed from an external perspective. If a person were to go ahead and buy a projector, TV or a monitor, there are measurable and reliable metrics to judge the value proposition that you get compared to a given price tag. Be it size, resolution, dot pitch, pixel density or colour gamut, it’s easy to pin a scaling value on the money spent, or the price tags which accompany products.

There are obviously exceptions to the rule, be it bedazzled frames or handcrafted oak panelling on rare occasions, but that is still tangible value with an objective measurement.

But if one were to take a single look at enthusiast audio equipment, be it headphones, IEMs, or floor-standing speakers, they would be washed in a flurry of marketing buzzwords, empty reviews, and little to no indication of an objective performance metric.

💡

The world of Audio exists on a totally different spectrum

And this is not a good thing overall. This spectrum draws parallels to the fields of luxury watches, wine, and modern art. Nothing inherently wrong with that, but it does that while masquerading as a piece of technology that can provide a better experience than another given piece of technology.

In other words, audio gear borrows the language of engineering while behaving like a lifestyle trinket. It is sold as precision hardware, then defended as if it were a painting that nobody is allowed to criticise because it is all a matter of “personal taste”.

Two tribes sharing one hobby

There are two groups of thought that exists within this hobby, and they are at stark contrast to one another. One group is mainly concerned about audio reproduction being faithful to the source. They use audio equipment to relive a moment in time in the real world, to recreate a composition done by a real orchestra, to recreate human voice as if it was spoken by a human. This group is mainly concerned with minimizing any and all artefacts introduced with the transducer system in place.

They use measurements, science, reasoning logic, spanning across the basics of sound, frequency response, distortion, phase, impulse behaviour, channel matching, and maintaining a baseline reference to ground truth. Instruments are evaluated not just by the ear, but through repeatable and reproducible tests, through scientific graphs and known comparisons.

The second group abstracts away all elements of sound and resorts purely to the neurological perception of sound and the subjective choice of the way it's presented to them. Gone are understanding of sound is, it's defined as an emotion, almost a religious attribute. Sound is described through vague and subjective terms which carry no value outside of one frame of reference in a hiveminded construction, using words such as "liquid mids", "inky timbre", "tight bass" and "warmth". Science is no longer an element of thought, pseudoscience and scams run rampant in this field, but the biggest flaw that stems from this group of thought is the attempt of objectifing subjective attributes as an inherent value.

And did I say that the above is done without any reproducible scientific basis? That should go as a given due to their cultist nature.

The problem is not that people enjoy their equipment or use colourful language. The problem is that this second world loudly claims technical superiority while actively rejecting the very tools that exist to verify technical performance. It wants the status and pricing of engineering, with the accountability of poetry.

What can actually be measured

If someone genuinely wants better audio performance, there are hard, boring, unglamorous numbers that matter. At the most basic level you have:

Frequency response, which tells you how loud different frequencies are relative to each other. This is the main determinant of how something sounds. Peaks and dips map cleanly to “brightness”, “muddiness”, “shout”, “hollowness”.
Harmonic and intermodulation distortion, which quantify how much the driver adds content that was not in the original signal. At high enough levels, this is not “character”, it is simply error.
Time domain behaviour, such as impulse response, group delay and cumulative spectral decay, which show how quickly the driver starts and stops, how cleanly it handles transients, and whether it rings or smears detail over time.
Channel matching and unit consistency, which indicate whether both ears are hearing the same thing, and whether one person’s glowing review even applies to the unit that another person will buy.

None of these are mystical. They are direct descriptions of how the device behaves. Measurement rigs are not perfect, human ears are not identical, and there is always some uncertainty, but the escape route of “measurements do not matter” collapses once you realise that every listening experience is still grounded in the same physical event: pressure at the eardrum.

If two headphones produce the same pressure curve over time into the same ear, they will sound the same. There is no hidden fifth dimension where a cable changes “microdynamics” without affecting any measurable variable. That fantasy only survives when nobody insists on looking at the data.

Snake oil thrives in the gaps

The lack of standardised, consumer-facing metrics in audio is not an accident. It is fertile ground for nonsense.

Once you remove objective anchors, almost anything can be sold. A multi-driver IEM with chaotic crossover behaviour and phase issues can be framed as “tri-brid technology that unlocks holographic layering”. A cable that cannot be distinguished from a cheap copper one in a blind test becomes “cryogenically treated silver that reveals inner detail”. Power conditioners, magic fuses, exotic stands, “audiophile” network switches, all of it slots neatly into a universe where the only rule is that nothing needs to be proven.

The language also flips the burden of proof. If a person questions a claim, they are told their “chain is not resolving enough”, their ears are “not trained”, or they lack the right emotional connection to the music. In any other technical field, the person making the claim would be expected to show data. In this one, scepticism is treated as a character flaw.

The irony is that when these devices are finally put under proper scrutiny, they either do nothing measurable at all, or they actively degrade the signal. The supposed “warmth” or “musicality” often correlates with simple things like a bass boost, a treble roll off, or elevated distortion. You do not need alchemy to explain it, a graph will do.

The single goal that actually matters

At the core, high fidelity has one job. Take a recording and reproduce it as faithfully as possible. Everything else is a side quest.

From that perspective, many of the modern “innovations” in enthusiast audio look less like progress and more like ornamental detours. Complex multi driver arrays that struggle with crossover integration, hybrid designs that trade coherence for brochure bullet points, artificial “air” from sharp treble peaks, all of it drifts away from the simple aim of getting out of the way.

A well designed single driver headphone or IEM that keeps distortion low, maintains a sensible tuning and behaves predictably across volume levels does more for actual fidelity than a labyrinth of drivers pretending to be a miniature line array in your ear canal. The physics of small acoustic volumes already favour low excursion and low distortion, yet much of the industry chooses unnecessary complexity because complexity photographs better.

From an analytical standpoint, the hierarchy is straightforward. If a device cannot demonstrate clean, repeatable, predictable behaviour in the basic measurable domains, then it has no business being marketed as a reference or high performance product, regardless of how many walnut boxes or braided cables it ships with.