My Experience
In my capacity as an IT Engineer, Staff, within the Data Center Management Team at Qualcomm, also known as ITOS, I assumed a leadership role with responsibilities encompassing project management, software development, and advanced analytics. My expertise and dedication were reflected in my progression from IT Engineer to IT Engineer, Sr, and eventually to IT Engineer, Staff.
Collaborating closely with team members, managers, and directors from the Server Management Group (SMG), IT Operations Services (ITOS), IT Network Group (ITNET), Data Management Group (DMG), and Mission Critical Facilities Team (MCFT), I played a pivotal role in ensuring the reliability and efficiency of Qualcomm's global data centers, server rooms, and enclaves.
Key responsibilities and contributions included:
-
Tier 2 Escalation Support: Provided 24/7/365 Tier 2 escalation support for global data centers, server rooms, and enclaves, ensuring uninterrupted operations on a global scale.
-
Project Management: Successfully managed various projects, encompassing infrastructure planning, software development, and DevOps initiatives, ensuring seamless project execution and delivery.
-
Software Development & DevOps: Leveraged software development expertise to design, develop, and maintain tools, websites, APIs, and automation scripts critical to team operations, including the management of numerous GitHub repositories, Jenkins services, Grafana services, Prometheus services, and multiple Linux servers.
-
Advanced Analytics & Data Collection: Implemented advanced analytics and data collection techniques to gain insights into server, service, and tool performance, contributing to significant cost savings and informed decision-making.
-
Team Server and Tool Management: Effectively managed and maintained server and tool infrastructure, ensuring optimal performance and reliability.
-
Global Data Center Training: Facilitated annual Data Center Training for the global team, ensuring training compliance and the availability of up-to-date training materials.
-
Major Incident Handling & Root Cause Analysis: Led the response to major incidents, conducting in-depth root cause analyses to prevent future occurrences and enhance system stability.
-
Change Management Evaluations & Approvals: Played a key role in evaluating and approving Change Management requests, ensuring minimal disruption and optimal system performance.
-
Management of Direct Reports: Managed two Data Center Software Engineers based in India and five on-site interns at Qualcomm's HQ Campus, providing mentorship and guidance.
-
IT Advisor (DCIM / ELCM): As an Administrator of IT Advisor, I resolved permissions management issues, developed Role-Based Access Controls (RBAC), and automated quarterly permissions audits, enhancing access management efficiency.
-
Data Center Expert (Environment Monitoring): As an Administrator of Data Center Expert (DCE), I ensured system maintenance and the health of electrical, HVAC, and generation endpoints, overseeing eight DCE servers in production and developing a Prometheus Exporter for system metrics.
-
PagerDuty Implementation: Led the evaluation, onboarding, and adoption of PagerDuty, significantly improving response times, on-call scheduling, alert management, and multi-team collaboration while reducing operational costs.
-
ServiceNow Knowledge Base: Implemented and configured a ServiceNow Knowledge Base for the team, publishing nearly 1000 well-written and maintained articles, ranging from SOPs and standards to technical articles.
-
Prometheus & Grafana Deployment: Successfully deployed Prometheus and Grafana for observability, saving the company significant costs and providing valuable insights into future growth and sustainability.
My role as an IT Engineer, Staff, underscored my commitment to innovation, efficiency, and reliability within a complex and dynamic IT environment, driving forward the capabilities of Qualcomm's data center operations.