(1)Team & Project Introduction
We are dedicated to reshaping enterprise productivity through AI. Our team's core project is KNemo, a next-generation intelligent meeting organization software. KNemo integrates Large Language Models (LLM) with advanced speech processing (ASR/Diarization) to provide meeting summaries, speaker identification, and actionable item extraction on edge server.
As a part of our team, you will be involved in cutting-edge tech integrations. We highly value engineering quality and the ability to deploy real-world applications, encouraging our members to leverage AI-assisted tools for maximum efficiency. You will collaborate with an outstanding product and engineering team to transform the latest AI models into reliable, low-latency, and privacy-focused products.
(2)Responsibilities
- AI Core Feature Development: Design and implement LLM/Audio-based applications, including the development of RAG architectures, Chat Q/A systems, and highly accurate long-context/meeting summarization features.
- Systematic Prompt Engineering: Optimize prompts across complex contexts and various use cases. Ensure the system consistently outputs structured data to accurately extract key meeting takeaways and Action Items.
- Backend Architecture & Deployment: Participate in API development and architectural planning for AI applications. Containerize models and apps (Docker), deploy them to Linux servers, and help maintain CI/CD pipelines and daily system operations.
- Model Inference Optimization & Tech Integration: Evaluate and adopt LLM acceleration frameworks (e.g., vLLM, llama.cpp) to enhance system performance. Assist in researching and designing architectures to deploy audio processing or AI models onto Edge devices (NPU).
- Cross-Functional Collaboration: Work closely with PMs and other engineering teams to define product requirements and design system workflows. Actively utilize AI-assisted tools to optimize the team's overall development efficiency.
(3)Job Requirements
Basic Qualifications
- LLM Application Development: Practical experience in developing Large Language Model (LLM) applications. Proficient in Chat Q/A, RAG (Retrieval-Augmented Generation) architectures, and long-context/document summarization techniques.
- Prompt Engineering: Proven ability to systematically tune and optimize prompts to consistently output structured data (e.g., JSON) for precise extraction of meeting summaries and Action Items.
- System Deployment & Version Control: Familiarity with Python and Linux environments. Solid understanding of Docker containerization, GitLab version control, and basic CI/CD pipelines, along with hands-on server-side deployment experience.
- AI Tool Utilization: Proficiency in leveraging various AI-assisted development tools to enhance overall engineering and workflow efficiency.
How to Stand Out
- Audio & Text AI Processing: Experience in developing and training ASR (Automatic Speech Recognition), Speaker Diarization, or speech frontend technologies. Familiarity with low-level audio processing (e.g., FFmpeg) or real-time streaming (e.g., WebRTC, WebSocket) is a strong plus.
- Workflow & Context Design: Practical experience in designing complex system workflows and Context Engineering.
- Model Deployment & Acceleration: Hands-on experience with LLM inference deployment and acceleration techniques. Familiarity with frameworks such as llama.cpp, sglang, vLLM, or BentoML.
- Edge Computing Integration: Ability to optimize AI models, with conceptual knowledge and practical experience in deploying models to Edge Devices or NPUs.
- Backend & API Development: Familiarity with backend frameworks (e.g., FastAPI, Flask, or Node.js). Experience in API integration and connecting internal/external systems.
- Database Management: Experience in the architectural design and operation of relational and non-relational databases (e.g., PostgreSQL, Redis).
- Server Operations & Management: Experience managing model execution across multiple servers, configuring load balancing, and handling daily system operations.