C
Clay Labs

Senior Software Engineer, Search Infrastructure

Remote
Full-time
New York
3 months ago

Job Overview

Actively Hiring
C

Clay Labs

View all remote opportunities

Job Type

100% Remote

Work from anywhere

Employment Type

Full-time

Flexible schedule

Location Preference

New York

Preferred time zones

Experience Level

Senior

Required experience

Job Categories

Software Engineering

Job Description

About Clay

Clay is a creative tool for growth. Our mission is to help businesses grow  — without huge investments in tooling or manual labor. We’re already helping over 100,000 people grow their business with Clay. From local pizza shops to enterprises like Anthropic and Notion, our tool lets you instantly translate any idea that you have for growing your company into reality.

We believe that modern GTM teams win by finding GTM alpha—a unique competitive edge powered by data, experimentation, and automation. Clay is the platform they use to uncover hidden signals, build custom plays, and launch faster than their competitors. We’re looking for sharp, low-ego people to help teams find their GTM alpha.

Why is Clay the best place to work?

  • Customers love the product (100K+ users and growing)

  • We’re growing a lot (6x YoY last year, and 10x YoY the two years before that)

  • Incredible culture (our customers keep applying to work here)

  • Well-resourced (raised a Series B expansion in January 2025 from investors like Sequoia and Meritech)

Read more about why people love working at Clay here and explore our wall of love to learn more about the product.

Data Engineering, Search @ Clay

As a Senior Data Engineer on the Search team, you'll be responsible for building and maintaining the data pipelines that power Clay's comprehensive datasets of companies, people, and job postings. You'll be tackling fundamental challenges in entity resolution—matching millions of records across datasets without common identifiers—while building the foundation for next-generation natural language search capabilities. Our team is scaling from processing millions to billions of records, requiring innovative approaches to data quality, validation, and infrastructure. Strong candidates will have experience building production data pipelines at scale and a deep understanding of search infrastructure.

What You'll Do

  • Design and implement robust entity resolution systems that match and merge records from multiple providers using advanced matching algorithms, enabling large-scale enrichment of customer data

  • Build scalable data pipelines that process billions of profiles while maintaining data accuracy through sophisticated validation and quarantine frameworks

  • Implement modern data architecture patterns that enable point-in-time recovery, analytics at scale, and real-time data quality monitoring

  • Develop systems to normalize and standardize messy real-world data (like locations, company names, and job titles) across billions of records

  • Create intelligent data validation systems that prevent bad data from reaching customers while providing feedback loops for continuous improvement

  • Collaborate with ML engineers to build the data foundation for embedding-based search, enabling users to describe what they're looking for in natural language

What You'll Bring

  • Experience building and maintaining production data pipelines that process millions of records daily

  • Strong proficiency in Python and SQL, with experience in data processing frameworks (Apache Airflow, Prefect, Dagster, or similar)

  • Hands-on experience with search engines (Elasticsearch, OpenSearch, Solr) including data modeling and indexing strategies

  • Understanding of entity resolution, record linkage, and deduplication techniques at scale

  • Experience with both batch and streaming data processing patterns

  • Familiarity with cloud data platforms (AWS, GCP, or Azure) and their data services

  • Strong problem-solving skills with the ability to debug complex data issues across distributed systems

Nice To Haves

  • Experience with workflow orchestration using Dagster or similar modern data orchestration tools

  • Knowledge of ML approaches to entity resolution and experience with embedding pipelines

  • Familiarity with Apache Iceberg or similar table formats for data versioning and time travel

  • Experience with geocoding and location normalization at scale

  • Background in building data platforms that dramatically scale processing capabilities

  • Exposure to our current tech stack:

    • Orchestration: Dagster

    • Search: OpenSearch

    • Databases: PostgreSQL (Aurora), Redis

    • Cloud: AWS (S3, Lambda, ECS)

    • Languages: Python, TypeScript

    • Infrastructure as Code: Terraform

    • Data Validation: Pydantic

Ready to Join Clay Labs?

Take the next step in your remote career. Click below to apply directly on Clay Labs's official careers page.

Apply on Clay Labs Website
Secure & Direct Application

More Jobs at Clay Labs

Explore other remote opportunities with this company