Responsibilities
Team introduction: Responsible for the search algorithm innovation and architecture research and development of Douyin, Toutiao and other products. We use cutting-edge machine learning technology for end-to-end modeling and continue to innovate and make breakthroughs. At the same time, we focus on the construction and performance optimization of distributed systems and machine learning systems, from optimization of memory and disk to the exploration of algorithms such as index compression, recall, and sorting. We also apply RAG technology to realize the potential value of AI to hundreds of millions of Douyin users. Provide students with ample opportunities to grow themselves. 1. Explore cutting-edge retrieval technology: across many genres such as videos, live broadcasts, graphics, and group purchases, from basic NLP technology to recall based on multi-modal understanding, user behavior understanding, EMBED DING similarity, etc. 2. Explore large-scale sorting technology: Based on the original BERT and large-scale sparse models, explore and implement the ultra-large-scale autoregressive model SCALING LAW 3. Explore the ultra-large-scale AI search RAG engine: fully tap the potential value of Douyin's massive traffic, and build an ultra-large-scale, multi-AGENT collaborative AI search overall architecture to meet the potential user value 4. Large-scale streaming machine learning technology: ultra-high throughput real-time data flow, streaming large-scale machine learning, so that more personalized searches can understand you better 5. Architecture for hundreds of billions of data scale: There is in-depth research and innovation in all aspects from large-scale offline computing, distributed system performance and scheduling optimization to building high-availability, high-throughput and low-latency online services. 1. Customize the distributed system base for ByteDance's search business, supporting the rapid construction and sustainable development of product search businesses such as Douyin, Toutiao, Tomato Novels, etc. 2. Participate in the development and maintenance of the new generation of search distributed retrieval systems and feature storage systems, continue to optimize performance, cost, stability, and strengthen its expanded customization capabilities.
Qualifications
1. Proficient in using C/C++ 2. Solid knowledge of Linux system, proficient in multi-thread programming and network programming in any language 3. Mastering key technologies of distributed storage and distributed computing and having practical experience 4. Proficient in using common development and debugging tools. Bonus points: 1. Preference will be given to those who are active contributors to the open source community 2. Preference will be given to those who are familiar with any source code of ROCKSDB/REDIS/HBASE/ETCD/ES/XAPIAN 3. Preference will be given to those who are familiar with consistency protocols such as PAXOS and RAFT.