ITECS

ITECS

4 jobs near Columbus, OH

IBM Workload Scheduler/Automation Delivery Consultant - Riverwoods IL

IBM Workload Scheduler/Automation Delivery Consultant - Riverwoods IL

ITECS

Riverwoods, IL

Other

Posted 3 days ago


Job description

Location: Riverwoods, IL

IBM Workload Scheduler (IWS) Administrator / Infrastructure Engineer

Role Overview

  • Ability to modernize, implement, install, configure, upgrade, migrate, develop, or design IBM Workload Scheduler (IWS) / IBM Workload Automation (IWA) solutions.
  • Support migration activities across pre-production and production environments.
  • Participate in knowledge transfer and documentation to enable team self-sufficiency.

Position Details

  • Title: IBM Workload Scheduler Administrator / Infrastructure Engineer
  • Reports To: Senior Manager, Software Engineering
  • Work Schedule: Monday Friday, 9:00 AM 5:00 PM (US Central)
  • Full-time position with:
    • Occasional weekend change-control support
    • Rotating on-call schedule shared with two other team members
  • Location: Riverwoods, IL (Preferred: 3 days onsite per week)
    • Remote may be considered for exceptional candidates

Job Summary

  • Seeking a highly skilled professional with 3 5+ years of dedicated IBM Workload Scheduler administration experience.
  • Responsible for managing, maintaining, and optimizing enterprise batch scheduling infrastructure.
  • Primary environment hosted on Red Hat Enterprise Linux (RHEL).
  • Requires strong expertise in:
    • IBM Workload Scheduler (IWS)
    • Linux System Administration
    • Scripting and Automation
  • Focus on ensuring high availability and reliable execution of critical business workloads.

Key Responsibilities

IBM Workload Scheduler Administration

  • Administer Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment:
    • 28,000 unique daily jobs
    • Approximately 350,000 daily job runs
    • 44 servers
    • Three additional change-control environments
  • Install, configure, administer, patch, and upgrade IWS components:
    • Master Domain Manager (MDM)
    • Dynamic Agents
    • Dynamic Pools
    • Dynamic Workload Console (DWC)

Change Management & Governance

  • Work closely with Product Owners and communicate workstreams through Jira.
  • Manage job promotions using a Workload Application Template-based process.
  • Perform safety and stability assessments for all job promotions.
  • Manage change control across four separate environments.
  • Enforce change management standards, policies, and governance.

Platform Availability & Operations

  • Maintain and continuously improve Production platform uptime target of 99.17% per month.
  • Follow SOPs, DevOps practices, and disciplined change-control processes.
  • Coordinate platform-impacting communications to a user community of approximately 500 developers and data engineers.
  • Support Production infrastructure consisting of:
    • 44 servers
    • MDM, DWC, and Agent environments

Troubleshooting & Support

  • Resolve:
    • Complex job failures
    • Performance bottlenecks
    • Agent-related issues
    • Infrastructure-related issues
  • Provide guidance on complex job scheduling designs to less experienced team members.

Monitoring, Security & Compliance

  • Monitor scheduler platform health and performance.
  • Manage database maintenance activities.
  • Perform backup, disaster recovery, and monthly failover testing.
  • Define and maintain:
    • Security policies
    • User authorizations
    • Authentication for Dynamic Workload Console (DWC)
  • Respond to:
    • Cybersecurity vulnerability assessments
    • PCI compliance audits
    • Other regulatory audit requests

Automation & DevOps

  • Design and implement Ansible-based automation solutions.
  • Develop self-healing mechanisms to reduce unplanned outages.
  • Coordinate with offshore teams performing SOP activities during non-business hours.
  • Develop automation scripts using:
    • Python
    • IWS REST APIs

Required Technical Skills:
Strong experience with IBM Workload Scheduler architecture, especially Dynamic Workload Broker, V10.1+, high availability of MDM s managing Fault Tolerant Agent and Dynamic Agent agent architectures.

  • Strong conceptual understanding of Master Domain Manager (MDM), Backup MDM (BMDM), Dynamic Workload Console (DWC), Fault Tolerant Agent (FTA), Dynamic Agent (DA).
  • Strong grasp of conman CLI to monitor and control production plan, check job/job stream/resource status.
  • Strong grasp of composer CLI to define, modify and extract scheduling objects.
  • Strong grasp of planman CLI to control pre-production plan and GUI mirroring.
  • Strong grasp of lifecycle of daily production planning process, phases of JNextplan/FINAL.
  • Proficiency in navigating the DWC web-based GUI to monitor workloads, manage user access security, and define scheduling objects.
  • Experience installing IWS components, applying Fix Packs, and Interim Fixes.
  • Troubleshooting with logs under TWSDATA/stdlist, adjusting trace level for netman, batchman, writer, mailman, etc.
  • Strong experience with IBM WebSphere Liberty.
  • Strong grasp of reading messages.log, traces.log, FFDC logs.
  • Strong grasp of configuring JVM heap sizes.
  • Strong grasp of configuring tracing scope, tracing levels, tracing retention.
  • Strong experience with Red Hat Enterprise Linux 8+.
  • Deep familiarity with bash/shell commands for text processing (for example, grep, awk, sed), file manipulation, and system navigation.
  • Ability to manage, start, stop, and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDM.
  • Managing user accounts, groups, service accounts and deep knowledge of Linux file permissions (chmod, chown, ACL on local filesystems and NFS).
  • Ability to monitor system performance using tools like top, htop, vmstat, iostat, and sar to troubleshoot bottlenecks and platform unresponsiveness.
  • Understanding of Logical Volume Manager (LVM) and filesystem usage.
  • Checking TCP port availability, firewall rules (firewalld/iptables), and connectivity between MDM and Dynamic Agents using netstat, ss, ping, curl, etc.
  • Managing SSL/TLS certificates, private keystores, public truststores, and working with Certificate Authority.
  • Strong experience with scripting (Bash Shell, Python, etc.) for automation.
  • Understanding of networking principles.
  • Understanding of basic Oracle database administration, enough to troubleshoot with DBA s to prove when an issue is in Oracle.
  • Understanding of basic SQL to query job metadata.
  • Understanding of checking database connectivity.
  • Understanding of AWS cloud infrastructure.