1

Contractual Python Web Scraping Jobs in California

Senior Data Engineer

Calabasas, CA · On-site

$145K - $160K/yr

Snowpark Python stored procs, External Access Integrations, INGESTION_CONFIG and RUN_LOG admin ... Web scraping and source integration * Use Playwright with persistent browser profiles for SSO ...

Applied AI Engineer

San Francisco, CA · On-site

$180K - $220K/yr

... scale web scraping for Facebook Jobs and Meta Reality Labs. * Vishruth (Founding Applied AI ... Work primarily across LLM-driven systems, data pipelines, and AI-enabled services, using Python ...

Senior Data Engineer

San Diego, CA · On-site

$117.30K - $158.70K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Principal Data Engineer

San Francisco, CA · On-site +1

$157.25K - $212.75K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Senior Data Engineer

San Francisco, CA · On-site +1

$127.50K - $172.50K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Senior Data Engineer

San Francisco, CA · On-site

$127.50K - $172.50K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Principal Data Engineer

San Francisco, CA · On-site

$157.25K - $212.75K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Advanced in SQL and Python, experience with Retool, API & Web Scraping a bonus * Experience driving results as an IC and working with technical teams to drive scalable architecture * Ability to drive ...

next page

Showing results 1-20

Contractual Python Web Scraping information

What are the key skills and qualifications needed to thrive as a Contractual Python Web Scraping specialist, and why are they important?

To thrive as a Contractual Python Web Scraping specialist, you need strong proficiency in Python programming, web scraping libraries like BeautifulSoup and Scrapy, and a solid understanding of HTML, CSS, and HTTP protocols. Familiarity with version control systems (e.g., Git), API usage, and cloud platforms is often required, along with experience in handling data storage formats like JSON or CSV. Attention to detail, problem-solving skills, and effective communication are vital soft skills for interpreting client requirements and overcoming scraping challenges. These skills ensure accurate data extraction, reliable project delivery, and successful collaboration with clients or teams.

What are some common challenges faced in a contractual Python web scraping role, and how can job seekers prepare for them?

In a contractual Python web scraping role, professionals often encounter challenges such as dealing with websites that implement anti-scraping measures, handling frequent changes in website structures, and managing large volumes of data efficiently. To prepare, job seekers should familiarize themselves with tools like Selenium, BeautifulSoup, and Scrapy, as well as techniques for rotating proxies and handling CAPTCHAs. Staying up to date with legal and ethical considerations regarding web scraping is also crucial. Proactive communication with clients to clarify project requirements and timelines can further help ensure successful project delivery.

What are Contractual Python Web Scraping jobs?

Contractual Python Web Scraping jobs involve hiring professionals on a temporary or project basis to extract data from websites using the Python programming language. These jobs typically require knowledge of Python libraries such as BeautifulSoup, Scrapy, or Selenium to automate the process of collecting and processing web data. Contractual roles are ideal for companies or individuals who need specific data sets or one-time scraping projects completed efficiently, without committing to a long-term hire.

What is the difference between Contractual Python Web Scraping vs Contractual Data Analyst?

AspectContractual Python Web ScrapingContractual Data Analyst
Required SkillsPython, web scraping libraries, data extractionData analysis, SQL, Excel, visualization tools
Work EnvironmentProject-based, remote or on-site, technical focusBusiness-focused, collaborative, reporting-oriented
Industry UsageWeb data collection, market research, e-commerceBusiness intelligence, marketing, finance

Contractual Python Web Scraping specialists focus on extracting data from websites using Python, while Contractual Data Analysts interpret and analyze data to support business decisions. Both roles often collaborate but differ in technical skills and primary responsibilities.

What are the most commonly searched types of Python Web Scraping jobs in California? The most popular types of Python Web Scraping jobs in California are:
What job categories do people searching Contractual Python Web Scraping jobs in California look for? The top searched job categories for Contractual Python Web Scraping jobs in California are:
What cities in California are hiring for Contractual Python Web Scraping jobs? Cities in California with the most Contractual Python Web Scraping job openings:

Senior Data Engineer

AmaWaterways, LLC

Calabasas, CA • On-site

$145K - $160K/yr

Full-time

Posted 22 days ago


Job description

At AmaWaterways, we believe meaningful careers begin with purpose, passion and a shared commitment to delivering unforgettable experiences. For those who value curiosity, connection and personal enrichment, AmaWaterways offers the opportunity to help craft meaningful river journeys that invite travelers to follow their own current. Built on a foundation of heartfelt hospitality, we treat our guests—and each other—with genuine care, warmth and respect. AmaWaterways fosters a collaborative environment both onboard our ships and across our global network of offices, where team members grow together, support one another and take pride in upholding the high standards and thoughtful service our company is known for.

We invite talented, motivated professionals to explore our career opportunities and begin their journey with AmaWaterways today.

Role Summary

AmaWaterways is hiring our first Senior Data Engineer to scale the modern data platform we are actively building on Snowflake, dbt, AWS, and Airflow. You will own the next generation of warehouse-native ingestion pipelines that are replacing our remaining Fivetran connectors, partner directly with the Director of Data Engineering on platform architecture, and help establish the engineering standards for a growing team. You will also become a power user of AI-native development tooling. This is not a 'we are exploring AI' role. Our daily engineering environment is Claude Code with custom skills and MCP servers, Snowflake Cortex for in-warehouse AI, and multi-model sparring through zen-mcp for architecture review. Candidates who already work this way will be productive in week one.

What You Will Build

You will inherit and extend an active portfolio of Snowflake-native pipelines. Twelve are already in production, and roughly twenty are on the roadmap. Recent and near-term work includes:

  • A unified Brand Intelligence pipeline ingesting reviews, social, trade press, and awards across the river-cruise segment, with Snowflake Cortex driving sentiment, classification, translation, and entity extraction.
  • A Competitive Intelligence pipeline with ten direct competitor scrapers feeding a unified pricing and promotions schema.
  • An EDW build-out across Bronze, Silver, Gold, Reporting, and Activation layers, including dbt-mesh project structuring, the dbt Semantic Layer, dbt unit tests, and SCD2 modeling for conformed dimensions.
  • The migration of the remaining Fivetran connectors to our standard Snowflake-native ingestion pattern: Snowpark Python stored procs, External Access Integrations, INGESTION_CONFIG and RUN_LOG admin tables, and Snowflake Tasks for scheduling. AWS Lambda handles the workloads that cannot run inside Snowpark.
  • An Astronomer or AWS MWAA layer to govern task graphs once the pipeline count exceeds what Snowflake Tasks can cleanly manage. You will help decide which path we take.
  • Salesforce Data Cloud, LiveRamp, and SFTP outbound integrations from our ACTIVATE layer.
Day to Day Responsibilities

Snowflake-native ingestion engineering

  • Build pipelines using our standard template: Snowpark Python stored procs, External Access Integrations, network rules, Snowflake SECRETs hydrated from AWS Secrets Manager, idempotent deploy.py scripts, and config-driven INGESTION_CONFIG tables.
  • Author and tune incremental load patterns (watermark cursors, MERGE statements, change-data capture where supported).
  • Design conformed dimensions with SCD2 snapshots and append-only fact tables in dbt.

Transformation and modeling

  • Build and maintain dbt models across Bronze, Silver, Gold, and Reporting layers in our medallion warehouse.
  • Use dbt Core and dbt Cloud, dbt unit tests, dbt Mesh for cross-project refs, and the dbt Semantic Layer for governed metrics.
  • Keep VARIANT columns confined to Bronze. Gold and Reporting models are strictly typed.

Cloud infrastructure and DevOps

  • Manage the AWS side of our pipelines: S3 staging, IAM roles, Lambda functions in Python, API Gateway where needed.
  • Author Terraform for every AWS resource. No ad-hoc console work.
  • Use AWS Secrets Manager as the source of truth for machine credentials. Naming convention is ama/{env}/{domain}/{name}. Never put credentials in .env files or GitHub repo secrets.
  • Build and own CI/CD pipelines in GitHub Actions. The standard automation identity is SVC_ETL_RUNNER with RSA keypair auth to Snowflake.

Orchestration

  • Operate Snowflake Tasks for the current generation of pipelines.
  • Help design and stand up the next-tier orchestration layer in Astronomer or AWS MWAA, including dbt Cosmos integration and DAG migration from Snowflake Tasks.

Observability and quality

  • Configure Snowflake Alerts (failure, zero-row, missed-run, freshness) and Microsoft Teams notifications through Power Automate.
  • Build data quality checks into every pipeline using our ADMIN.DQ_EXCEPTION_LOG pattern and dedicated QC layers for cross-system reconciliation.

Web scraping and source integration

  • Use Playwright with persistent browser profiles for SSO-protected and API-less sources (TrueVault, Tableau Admin Insights, internal SharePoint).
  • Author OAuth, MSAL certificate auth, and refresh token flows for source APIs.

AI-native engineering

  • Use Claude Code as your primary engineering environment, including custom skills, MCP servers, and the Claude Agent SDK for sub-agent fan-out work.
  • Use Snowflake Cortex for in-warehouse LLM tasks. Author Cortex Analyst semantic YAML.
  • Use multi-model sparring (Gemini, GPT-5, Ollama) through zen-mcp for architecture review and race-condition debugging.
  • Author and maintain shared team skills in our internal AmaWaterways-IT/data-team-skills marketplace, following our skill conventions (under 500 lines, templates folder for heavy SQL or Jinja, trigger-only descriptions).
  • Apply the Trail of Bits differential-review workflow to significant diffs.
Required Qualifications
  • 6+ years of professional data engineering experience, including a stretch as a senior engineer on a small-to-mid-sized team.
  • Expert SQL on Snowflake. You can read a query profile, identify spillage and partition pruning issues, and rewrite the query.
  • Strong production Python. Code is ruff-clean, has pytest coverage, uses type hints, and never contains hardcoded secrets.
  • Hands-on production experience with dbt (Core or Cloud), including incremental models, SCD2 snapshots, and dbt tests.
  • Hands-on experience with Airflow or Astronomer for production orchestration.
  • AWS fundamentals: IAM, S3, Lambda, and Terraform.
  • GitHub plus GitHub Actions CI/CD with branch protection and code review discipline.
  • Shipped at least one production pipeline that replaced a managed ELT tool (Fivetran, Stitch, Airbyte, ADF) with custom warehouse-native code.
  • You already use Claude Code, Cursor, Aider, or equivalent agent tooling daily. You can speak concretely about what works, what does not, and your context-management practices.
Strongly Preferred
  • Snowflake Snowpark Python: writing and deploying stored procs, External Access Integrations, network rules, Snowflake SECRETs.
  • dbt Mesh, dbt Semantic Layer, dbt unit tests.
  • Snowflake Cortex (Complete, Search, Analyst) used in production.
  • MCP server authoring or Claude Agent SDK applications.
  • Playwright for browser automation.
  • Salesforce Data Cloud, LiveRamp, or SFTP outbound delivery.
  • Travel, hospitality, or consumer industry context.
Nice to Have
  • Cortex Analyst semantic YAML authoring.
  • Power Automate flows for Teams notifications.
  • Tableau REST API or GraphQL Metadata API work.
  • Microsoft Graph API with MSAL certificate auth.
  • Familiarity with the DeepMind agent-trap threat taxonomy or similar agent-security thinking.