Search by job, company or skills

B

Multimodal Large Model Data Engineer

This job is no longer accepting applications

  • Posted a month ago

Job Description

Responsibilities

1. Design and develop large-scale pre-training data processing links to provide stable and reliable high-quality data processing capabilities for base model pre-training, including data sourcing, data capture/collection, data analysis (OCR, pictures, web pages) and other work contents 2. Design and develop a data platform that serves large model pre-training, and manages data life cycle elements such as meta-information, lineage, and storage management of data provides visualization and observability capabilities of pre-training data explores the engineering upper limit of data experiments and data release 3. Constructs data synthesis solutions and frameworks for models such as LLM and VLM to support data scale and other work 4. Based on the characteristics of large model training data, abstract and develop an efficient and reliable data processing framework to improve the engineering efficiency of all large model algorithm engineers in processing data.

Qualifications

1. Familiarity with at least one programming language, such as Go, Python, Java, etc. 2. Bonus points for having an in-depth understanding of big data technology, and bonus points for being proficient in tools such as Spark, Flink, Kafka, Hive, HDFS, etc. 3. Bonus points for having system platform development and in-depth usage experience related to data center and machine learning 4. Bonus points for having an in-depth understanding of large model technology and product ecology 5. Have enthusiasm for facing technical challenges, be able to think independently, be curious and have the ability to learn quickly.

More Info

Job Type:
Employment Type:

About Company

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

Job ID: 104658531