Midnight Pub

mildly infuriating

~madqubit

Alright. One small gripe about my job.

We are attempting to transition our batch processing and Event Stream Processing from a centralized system (Mainframe) to a Distributed computing system.

There are many flaws with the ideologies behind both, and there are many valid points for both sides of this argument. The big dollar guys can’t seem to stand on their preferred side though. The IBM subscription fees are enormous. The cost to upgrade the Distributed servers is huge every five years. In all reality it’s a capex for the company so they can write them all off. The infuriating part for me is for half of the jobs that take place for our operations. Are scheduled on the mainframe’s ESP to run in the Distributed network.

So if the job fails. There’s no way for the operators to get a log of any sort since it’s all generated “in the cloud” and only the analysts have access to those logs. But half of the time the Analysts are asking for logs and we as OP’s can’t give it to them. Now if we chucked them onto the Distributed ESP (Yes we have two separate ESP schedulers. I know, gross) I could barf out a spool file without issue.

TLDR

if you’re a big dollar manager for a company that is in the same boat.

The operators and analysts don’t care. Just pick a platform.


tetris

Distributed as in SLURM? or Distributed as in some P2P scheduler?

IMO nothing beats a centralised service, just due to how easy it is to manage. Yes it runs hot and is always overloaded, but you don't have to manage the nightmare of streaming resources over a spotty network or running from machine to machine trying to figure out why a job didn't take

reply

madqubit

Much like SLURM. We use Broadcom (CA) workload directors. One server that acts as the scheduler then all of the workloads get pushed to the servers. Mainly Oracle’s suite of computation, Informatica, Dataserv. Etc. we have a lot of internal data processing as well but they’re really trying to get off of it and shoehorn into something else. It’s quite sad really. The in-house tools are so reliable and beautifully done on the backend. The front end… well. They were definitely made by a backend dev. It isn’t really pretty to look at but dang it everything is there and it makes logical sense on where everything is.

Spotty networks are the bane of my existence. Oh? This super important job failed because of a ping spike? Let me call an analyst at 3AM and deal with their grumpiness because of a reroute that caused latency to go up for a few seconds.

Only time something fails on the mainframe is if the analyst broke the code, forgot to update the JCL, or a resource like IMS is down for maintenance and they forgot to hold the job.

reply

tetris

We used the Sun Grid Engine in my old group before it got bought and destroyed by Oracle, and it was janky and ugly as hell -- but it worked, and everyone just needed a day or two to understand how it worked before submitting jobs to it.

Very small, easy to configure specific piece of software for a specific task. I look at new systems like Hadoop or whatever Amazon is doing these days, and I'm completely lost at how to interact with them.

(Do I need to create an account? Is it a local account? Does every job need to be registered in order to run? etc.)

reply