My agentic setup and using Remotion with Codex

Always sandbox and then Yolo
This is the ethos that I've been practising when it comes to agentic workflows, especially on my own hardware. I was a bit aversed at the notion of giving an AI access to my system, even with limited permissions, and given my recent experience of these agents excelling when given a free and capable environment in which they can use all that they need to get a task done, it's a bit pointless to run them so restricted and in a sandboxy environment within your main system.
And given the restrictive nature of containers and container escape vectors, the only way that I could make myself run agents is either with bare metal hypervisors or desktop hypervisors, so I ended up using VMWare Workstation Pro (which is still a thing and now is FREE).
Choice of virtualised environment
For codex or any other agent to properly test the things they build, they would need to run it in a given environment, and ideally in the expected end environment in which it is supposed to run. Therefore, I made to vmware boxes, one for Windows, updated to the latest version and another for Ubuntu 24.04.
Most agents do excel a lot better on Debian/Ubuntu environments under yolo, but for those times when you'd need to build something for Windows, or use something that is only there for Windows, it's much better to have that natively available environment than using something like Wine.
Most AI platforms as a service, aka replit are selling the same thing
Repliit recently released its animated videos functionality, which is fundamentally an AI agent with skills (documented functionality, structured prompts), and a custom library, which most users can definitely try to replicate with Codex, Claude code and open code, given a bit of tweaking. It's possible to try to recreate this with a subscription to a SOTA model and a bit of fancy frameworks and skills, as Replit doesn't necessarily have a moat of non-recreatable custom technology around it.
And the beauty of finding the best combination of skills and methods these days is that specialist knowledge is democratised in a way, given that access to a decent model is available. It's easy to follow a chain of thought yourself and then flesh it out using a model, because modern chatbots and agents are very good at browsing the internet. Around the middle of 2025, I came to the realisation that my workflow of searching for things manually was bested by the results I was seeing by models, when it came to general knowledge.
Specialised knowledge would still require you to go out there and find resources for yourself, given the lack of performance for models in domains where they lack training data. But nowadays, with Codex 5.3, Claude Opus 4.6, I barely do try to manually search the internet myself, given that these models can access the domains which are necessary for the task being undertaken.
There are curated skills regarding remotion that one can get from skill.sh to help with this particular task, or you can tell Codex to go out there and search for skills and best use practices itself, given you have the tokens for that as well. There really is no bounds to what you can make Codex do, it's just a matter of the time turaround that you are looking for.
OpenAI did recently speed up Codex as well, hence, if you really feel like multitasking, you can write a prompt.md that instruct codex to best set up itself, research, plan, build, test and iterate while you can focus on a totally diffferent task yourself!
My usual way is to first test that the codex can do what it needs to autonomously in the environment setup, so I usually let it demo produce an artefact before moving onwards.
Codex can research and make videos all by itself
For the first test, I made it look up skills to help it achieve this task, then draft, time-scale, use subtitles, formulate, render and iterate until a proper video was produced. The idea in mind was to document the process of OpenAI models to this date.
https://files.catbox.moe/mjqjd2.mp4
While the end result isn't entirely ideal, it's still quite impressive for what is a general coding model's ability to flex and do multiple tasks given tools. Just as humans can achieve wonders with a tool, vs a human without said tool, generalist pre-trained AI can use the same leverage when tasked to work with tools it can control.


