
Search by job, company or skills
KEY RESPONSIBILITES AND SKILL SET
- Design and develop highly scalable, Real time systems using Hadoop ecosystem components(Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink and Nifi
- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi model data(image, audio, video, unstructured documents) with both batch and real-time
- Develop full stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React)
- learning models using Cloudera Machine Learning (CML).
• Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
• Experience working with ML platforms such as CML, Spark MLlib, and Python ML libraries (scikit learn, XBoost), including model deployment.
- Design and develop highly scalable, Real time systems using Hadoop ecosystem components Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink and Nifi)
- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi model data(image, audio, video, unstructured documents) with both batch and real-time.
- Develop full stack applications and internal engineering tools using
Python, shell scripting, and modern web frameworks (e.g., Flask, React).
• Collaborate closely with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML).
- Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
KEY SKILLS
Job ID: 148950449
Skills:
Java, Ranger, Hadoop, Kafka, React, Hive, XGBoost, Shell scripting, Spark, Flask, Python, scikit-learn, Flink, Ozone, Iceberg, Spark MLlib, Trino, NiFi
Skills:
Algorithms, Java, Nlp, Hive, Spark, Python, Cv, big data processing technologies, deep learning models, Flink, sequence graph models
Skills:
Scala, Pyspark, Workflows, Sql, Pandas, Numpy, Spark, Tensorflow, XGBoost, Python, Pytorch, scikit-learn, Jobs, MLflow model registry, MLflow deployment, CI CD pipelines, Databricks notebooks, Delta Lake, MLflow tracking, batch inference, real-time inference, lightgbm
Skills:
Java, Ranger, Hadoop, Scala, Kafka, Scikit Learn, React, Nlp, Hive, XGBoost, Spark, Shell scripting, Flask, Keras, Python, Hugging Face, Flink, Ozone, Iceberg, NLQ, Trino, Nifi
Skills:
Jax, Pytorch, Python, SFT, GPU-based training and inference system, Distillation, DPO, LoRA, QLoRA
We don’t charge any money for job offers