Your Responsibilities :
- Using Azure – DataLake tools , you will be doing hands-on role on Data Fabric Tools (Informatica Suite BDM) and Big Data Technology (Handoop – Hortonworks/Cloudera)
- Responsible on DevOps Tools and best practice across organization including automated testing, provisioning, configuration management, service orchestration and dynamic monitoring.
- You will work with Development and Infrastructure activities.
- Supervising Hadoop jobs using scheduler & optimizing it.
- Yarn Queue management to improve the performance
- Assist Map Reduce/Spark/Hive/Informatica programs running on the Hadoop cluster.
- Preserve security and data privacy.
- Develop highly scalable and services with exceptional performance, for data tracking.
- Conduct POC independently and responsible to build, test & performance tuning in Hadoop ecosystem.
- Responsible for troubleshooting Hadoop technologies like HDFS, Hive, Sqoop, Zookeeper, Spark2, MapReduce2, YARN, HBase, Tez, Kafka, kibana, solr, elastic search.
- Responsible for Authorization and Audit tools management with Hadoop Technologies like Ranger/Sentry.
- Fine tune applications and systems for high performance and higher volume throughput.
- Administration of Hadoop Cluster (Cloudera/Hortonworks ) including adding or removing nodes
- Translate, load and exhibit unrelated data sets in various formats and sources like JSON, text files, Kafka queues, and log data.
Your Profile :
Qualification & Experience:
- Total 5+ years of total IT experience working
- 3+ years on Data Sourcing, Quality, Warehousing, Mining & ETL tools
- 3+ years into Big Data platform like Hadoop like Hortonworks HDP 3.5.x / Cloudera 6.x / CDP
Mandatory Skills
- Hands on experience on Hadoop Data Platform preferably Hortonworks (HDP)
- Hands on experience on any big data ETL workloads (Sqoop, Flume, Spark Core/SQL API (Scala), Hive (Tez/LLAP) and Phoenix)
- Hands on experience on Storage covering Schema layout, partitioning & Read/Write API for Relational, MPP & NoSQL stack including File Format (Avro, Parquet), Hive, HBase, MongoDB & Oracle.
- Job Monitoring, debugging, Scheduling and Performance Tuning using Data Platform Operations Tools like Oozie, YARN Settings, Spark tuning, Ambari Config & Grafana
- Hadoop Administrator skills covering Hadoop cluster operation, tuning, troubleshooting in multi-tenant environment, strong Linux Shell, java and python scripting is preferred
Mandatory Skills
- Informatica Administrator skills covering Informatica 10.2.2/10.4.x suite BDM/EDL/EDC/BDQ/TDM/PDM integration with external Hadoop cluster operation, tuning, troubleshooting in multi-tenant environment, strong Linux Shell, java and python scripting is preferred
- Knowledge on data Encryption algorithms and creation of encryption zones and encryption key management tools like Ranger KMS or KMS
- Experience with RDBMS like SQL/Oracle
- Experience in Informatica administration skills for Informatica tools like BDM/EDL/EDC/BDQ/DDM/TDM
- Experience with SAML/SMTP/TCP/HTTPS protocols
- Experience with TLS and Keystore/Trust store creation.
- Hands on experience with Dev Ops tools like Git/Jenkins/Nexus
Good to have- Experience in setting up data governance, data security, metadata management, lineage tracking on Hadoop Platform using Kerberos, Ranger Policy, Atlas & Ambari
- Networking and DMZ setup.
- Knowledge of Autosys Scheduling tool or the like, GIT, Ansible, and Docker
- ITIL practices.
- Agile methodologies with focus on Scrum Framework
Powered by JazzHR