On crushing older tests
About 6 years ago, I wrote this, about a benchmark test that did a 2TB write in 73s or so, on pure spinning disk. That result was just so far out there, compared to pretty much anything else available, in terms of performance density (single rack of storage units). The hardware was Scalable Informatics Unison storage, designed to be an IO monster in all respects. It was. Way ahead of its time.
On using legacy tooling in modern HPC systems
Or as House MD may have put it So there you are, working on a system with a group, when you realize that something is out of kilter. And you think to yourself … It’s not DNS There is a no way it’s DNS It was DNS So your team works on resolving the issue. And the tooling they use … the tooling. Is from the late 80s/early 90s. There are so many … better … easier to use … tools.
On risk and how to mitigate it
On 1-March-2020, I wrote this article. In it I argued that the risk benefit/reward equations have been thrown out of kilter by the pandemic. Or maybe, rather than thrown out of kilter, maybe they are reverting to a more natural state, where risks that have been previously discounted, are now showing their true (or more nearly true) values. A former SGI colleague, and now HBS professor, Willy Shih, wrote a great article on how management might wish to adapt to this reconfiguration of risk strength/value.
Not a math post, I promise. Really about teams. You have a team of people. You have a mission. You have a (short) timeline. You need them to focus on the problem, and find the minimum temporal path length process to achieve a resolution. You have a process, albeit informal, to address issues, which is in place, functioning well, solving problems. Then someone loops someone else into the effort. Who starts quoting paragraph and verse out of how they would like it to work.
Thoughts on configuration management vs image artefact management
Years ago … ok … decades ago, when I was building my first large clusters, I worried about configuration and drift. OS installers are notoriously finicky, and one of the hard lessons is that you should spend as absolutely little time inside them as possible. Do the bare minimum work you need to in order to get a functional system, and handle everything else after the first boot. I actually learned this lesson at SGI, while writing Autoinst, a tool to handle large scale OS deployment.
The link to his obituary. A wonderful person, deeply insightful, excellent communicator. Gone too soon. I will miss him. I think everyone in #HPC will.
On optimizing scripting languages, and where they are useful and where they are not
So yesterday, an article and discussion appeared on Hacker News. In the article, the author asks reasonable questions, of how to optimize a Python code. What happened next, probably wasn’t as they intended. The article was, ostensibly, on optimizing Python code. It, after the 4th attempt at source level optimization, switched languages. It was no longer written in Python, this attempt to optimize … Python code. Ok. So the code they were “optimizing” was trivial, and not really indicative of any particular workload.
Updating compressors for NyBLE
Compressing tools like gzip and bzip2 have been around quite a long time. They are, well, mature. Almost boring. You depend upon them for many things. You don’t really pay attention to them until you use them for significant work. Like with NyBLE, compression and decompression are important, and time sensitive steps … well … decompression is anyway … in the boot process. My preference is generally for tools that enable me to use the full processing power of an underlying machine.
My urgent #HPC computational project, COVID-19 related
This is the project that I alluded to. We tuned the system, the code, the environment. We wrote tooling to massively simplify job creation and submission. Moreover, we worked around numerous issues that arise in each technological layer. Multiple simultaneous tools are being deployed to work on this, and I am hopeful that the net result of this are a few small molecules that will have action against this disease.
Time to replace some hardware
I built an updated raid pair of drives with a brand new OS for the system that underlies this blog and other services. Basically, the previous system load had been updated from debian 7 through debian 9 and had accumulated lots of cruft. So I rebuilt this using my wonderful nyble system on a lab machine. I moved most of the config over from the live system. Switched over and spent about 2 hours fixing up the missing services (things I forgot to enable, etc.