As a Data Engineer at GoodHabitz, you’ll be part of an exciting journey as we migrate to AWS, enhancing our data infrastructure to support our growing business. In our scale-up environment, adaptability and problem-solving are key. This role is crucial in designing, building, and optimizing our data architecture to meet evolving needs. With expertise in data pipeline stacks (both open-source and AWS), you’ll develop scalable, high-performance solutions that drive efficiency and reliability. Collaborating closely with engineering teams, you’ll help shape robust data pipelines that align with our business objectives. If you're eager to make a real impact and be part of this transformation, we’d love to hear from you.
Key Responsibilities:
- Data pipeline Design: Robust design experience in follow and implement scalable, high-performance data architectures using AWS services and OLAP databases (Clickhouse/ Snowflake).
- Data Pipeline Development: Design, build, and maintain robust ETL pipelines that efficiently handle large-scale data ingestion, transformation, and storage using solutions like Databricks.
- Cloud Infrastructure: Combine open-source data stack and AWS technologies to build and optimize data workflows.
- Data Governance & Quality: Ensure data accuracy and consistency through best practices in data governance, lineage, and monitoring.
- Performance Optimization: Optimize data storage, retrieval, and processing to support high-performance analytical workloads using partitioning, indexing, and query optimization techniques.
- Collaboration & Leadership: Work closely with data analysts, and software engineers to understand requirements and deliver data-driven solutions, mentoring junior engineers.
- Automation & CI/CD: Implement automated data pipeline deployment and monitoring strategies.
Requirements:
- 5+ years of experience in data engineering with solid experience on open-source data stack, and cloud native experiences.
- Deep understanding of ETL processes, data modeling, and data warehousing (experience with medallion architecture and delta lake).
- Strong experience in designing and architecting large-scale data systems.
- Proficiency in programming languages like PySpark, Python, or productivity libraries scripting languages for data processing and pipeline development.
- Experience with orchestration tools such as Apache Airflow, Step Functions, or Dragster.
- Hands-on experience with infrastructure-as-code (Terraform, CloudFormation, CDK).
- Strong problem-solving skills and ability to work in a fast-paced environment.
- Knowledge of SQL query performance tuning, materialized views, and sharding strategies for large datasets.
Nice to have:
- Expertise in ClickHouse or Snowflake, or similar OLAP databases is a plus.
- Familiarity with containerization and serverless computing (Docker, Kubernetes) is a plus.
- Experience with monitoring and observability tools such as Prometheus, Grafana, AWS CloudWatch is a plus.
Here's a glimpse of what's waiting for you:
- A competitive salary package that rewards your hard work.
- 25 paid vacation days. And if that's not enough, you can purchase up to 10 more.
- A world of growth and development opportunities to enhance your skills. You'll have unlimited access to our treasure trove of GoodHabitz resources and MyAcademy .
- Access to mental coaching through our partner, OpenUp, to keep your mind in top shape.
- An annual do-good-day, fully paid, so you can contribute to a cause you're passionate about.
- Travel and expense reimbursement because we've got your journey covered.
- Pension and disability insurance, securing your financial well-being in the long run.
- A hybrid way of working .
- Working in a company that welcomes artificial intelligence and uses it to improve internal processes and push AI-powered features quickly .
- MacBook Pro.