1

Freelance Python Web Scraping Jobs in California

Senior Data Engineer

Calabasas, CA · On-site

$145K - $160K/yr

Snowpark Python stored procs, External Access Integrations, INGESTION_CONFIG and RUN_LOG admin ... Web scraping and source integration * Use Playwright with persistent browser profiles for SSO ...

Applied AI Engineer

San Francisco, CA · On-site

$180K - $220K/yr

... scale web scraping for Facebook Jobs and Meta Reality Labs. * Vishruth (Founding Applied AI ... Work primarily across LLM-driven systems, data pipelines, and AI-enabled services, using Python ...

Senior Data Engineer

San Diego, CA · On-site

$117.30K - $158.70K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Principal Data Engineer

San Francisco, CA · On-site +1

$157.25K - $212.75K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Senior Data Engineer

San Francisco, CA · On-site +1

$127.50K - $172.50K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Principal Data Engineer

San Francisco, CA · On-site

$157.25K - $212.75K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Senior Data Engineer

San Francisco, CA · On-site

$127.50K - $172.50K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Advanced in SQL and Python, experience with Retool, API & Web Scraping a bonus * Experience driving results as an IC and working with technical teams to drive scalable architecture * Ability to drive ...

Senior Data Engineer

San Diego, CA · On-site +1

$117.30K - $158.70K/yr

Extensive knowledge of coding in Python or Scala with a focus on data processing. * Experience ... Comfortable working with unstructured and semi-structured data (Web scraping). * Experience working ...

Core Skills (Required) * Python (Pandas, NumPy, scikit-learn, SciPy) * Strong grounding ... Web scraping/development (Selenium, Javascript, HTTP, CSS) * Forecasting or time-series analysis

next page

Showing results 1-20

Freelance Python Web Scraping information

What are the key skills and qualifications needed to thrive as a Freelance Python Web Scraping specialist, and why are they important?

To thrive as a Freelance Python Web Scraping specialist, you need strong programming skills in Python, a deep understanding of web protocols, and knowledge of data extraction techniques. Familiarity with libraries like BeautifulSoup, Scrapy, Selenium, and experience using APIs, along with basic knowledge of version control systems like Git, is typically required. Problem-solving, attention to detail, and effective communication are crucial soft skills for managing client requirements and navigating technical challenges. These skills ensure efficient, ethical, and accurate data extraction, which is vital for delivering reliable results to clients.

What are some common challenges faced by freelance Python web scraping professionals, and how can they be addressed?

Freelance Python web scraping professionals often encounter challenges such as dealing with websites that have anti-scraping measures, handling frequent changes in website structures, and managing large volumes of data efficiently. To address these issues, it's important to stay updated with the latest scraping libraries and techniques, utilize rotating proxies and user-agent strings, and write modular code that can be easily adapted when websites update their layouts. Additionally, maintaining clear communication with clients about legal considerations and project scope helps set realistic expectations and ensures a smooth workflow.

What is freelance Python web scraping?

Freelance Python web scraping involves using the Python programming language to extract data from websites on a project or contract basis, rather than as a full-time employee. Freelancers in this field use libraries like BeautifulSoup, Scrapy, or Selenium to gather, parse, and organize data from various web sources according to client needs. Projects can range from collecting product prices, aggregating news articles, monitoring social media trends, or compiling research datasets. Freelance web scrapers must ensure they comply with relevant legal and ethical guidelines, as well as website terms of service. The work requires technical proficiency, problem-solving skills, and clear communication with clients.

What is the difference between Freelance Python Web Scraping vs Freelance Data Analyst?

AspectFreelance Python Web ScrapingFreelance Data Analyst
Skills & CredentialsPython, web scraping libraries, basic data handlingExcel, SQL, data visualization, statistical analysis
Work EnvironmentRemote, project-based, client-specificRemote or on-site, consulting or project-based
Industry UsageWeb data extraction for research, marketing, or business insightsData interpretation, reporting, and strategic recommendations

Freelance Python Web Scraping focuses on extracting data from websites using Python, while Freelance Data Analysts interpret and analyze data to provide insights. Both roles often work remotely and require technical skills, but their core functions differ: one is data collection, the other is data analysis.

What are the most commonly searched types of Python Web Scraping jobs in California? The most popular types of Python Web Scraping jobs in California are:
What are popular job titles related to Freelance Python Web Scraping jobs in California? For Freelance Python Web Scraping jobs in California, the most frequently searched job titles are:
What job categories do people searching Freelance Python Web Scraping jobs in California look for? The top searched job categories for Freelance Python Web Scraping jobs in California are:
What cities in California are hiring for Freelance Python Web Scraping jobs? Cities in California with the most Freelance Python Web Scraping job openings:

Senior Data Engineer

AmaWaterways, LLC

Calabasas, CA • On-site

$145K - $160K/yr

Full-time

Posted 20 days ago


Job description

At AmaWaterways, we believe meaningful careers begin with purpose, passion and a shared commitment to delivering unforgettable experiences. For those who value curiosity, connection and personal enrichment, AmaWaterways offers the opportunity to help craft meaningful river journeys that invite travelers to follow their own current. Built on a foundation of heartfelt hospitality, we treat our guests—and each other—with genuine care, warmth and respect. AmaWaterways fosters a collaborative environment both onboard our ships and across our global network of offices, where team members grow together, support one another and take pride in upholding the high standards and thoughtful service our company is known for.

We invite talented, motivated professionals to explore our career opportunities and begin their journey with AmaWaterways today.

Role Summary

AmaWaterways is hiring our first Senior Data Engineer to scale the modern data platform we are actively building on Snowflake, dbt, AWS, and Airflow. You will own the next generation of warehouse-native ingestion pipelines that are replacing our remaining Fivetran connectors, partner directly with the Director of Data Engineering on platform architecture, and help establish the engineering standards for a growing team. You will also become a power user of AI-native development tooling. This is not a 'we are exploring AI' role. Our daily engineering environment is Claude Code with custom skills and MCP servers, Snowflake Cortex for in-warehouse AI, and multi-model sparring through zen-mcp for architecture review. Candidates who already work this way will be productive in week one.

What You Will Build

You will inherit and extend an active portfolio of Snowflake-native pipelines. Twelve are already in production, and roughly twenty are on the roadmap. Recent and near-term work includes:

  • A unified Brand Intelligence pipeline ingesting reviews, social, trade press, and awards across the river-cruise segment, with Snowflake Cortex driving sentiment, classification, translation, and entity extraction.
  • A Competitive Intelligence pipeline with ten direct competitor scrapers feeding a unified pricing and promotions schema.
  • An EDW build-out across Bronze, Silver, Gold, Reporting, and Activation layers, including dbt-mesh project structuring, the dbt Semantic Layer, dbt unit tests, and SCD2 modeling for conformed dimensions.
  • The migration of the remaining Fivetran connectors to our standard Snowflake-native ingestion pattern: Snowpark Python stored procs, External Access Integrations, INGESTION_CONFIG and RUN_LOG admin tables, and Snowflake Tasks for scheduling. AWS Lambda handles the workloads that cannot run inside Snowpark.
  • An Astronomer or AWS MWAA layer to govern task graphs once the pipeline count exceeds what Snowflake Tasks can cleanly manage. You will help decide which path we take.
  • Salesforce Data Cloud, LiveRamp, and SFTP outbound integrations from our ACTIVATE layer.
Day to Day Responsibilities

Snowflake-native ingestion engineering

  • Build pipelines using our standard template: Snowpark Python stored procs, External Access Integrations, network rules, Snowflake SECRETs hydrated from AWS Secrets Manager, idempotent deploy.py scripts, and config-driven INGESTION_CONFIG tables.
  • Author and tune incremental load patterns (watermark cursors, MERGE statements, change-data capture where supported).
  • Design conformed dimensions with SCD2 snapshots and append-only fact tables in dbt.

Transformation and modeling

  • Build and maintain dbt models across Bronze, Silver, Gold, and Reporting layers in our medallion warehouse.
  • Use dbt Core and dbt Cloud, dbt unit tests, dbt Mesh for cross-project refs, and the dbt Semantic Layer for governed metrics.
  • Keep VARIANT columns confined to Bronze. Gold and Reporting models are strictly typed.

Cloud infrastructure and DevOps

  • Manage the AWS side of our pipelines: S3 staging, IAM roles, Lambda functions in Python, API Gateway where needed.
  • Author Terraform for every AWS resource. No ad-hoc console work.
  • Use AWS Secrets Manager as the source of truth for machine credentials. Naming convention is ama/{env}/{domain}/{name}. Never put credentials in .env files or GitHub repo secrets.
  • Build and own CI/CD pipelines in GitHub Actions. The standard automation identity is SVC_ETL_RUNNER with RSA keypair auth to Snowflake.

Orchestration

  • Operate Snowflake Tasks for the current generation of pipelines.
  • Help design and stand up the next-tier orchestration layer in Astronomer or AWS MWAA, including dbt Cosmos integration and DAG migration from Snowflake Tasks.

Observability and quality

  • Configure Snowflake Alerts (failure, zero-row, missed-run, freshness) and Microsoft Teams notifications through Power Automate.
  • Build data quality checks into every pipeline using our ADMIN.DQ_EXCEPTION_LOG pattern and dedicated QC layers for cross-system reconciliation.

Web scraping and source integration

  • Use Playwright with persistent browser profiles for SSO-protected and API-less sources (TrueVault, Tableau Admin Insights, internal SharePoint).
  • Author OAuth, MSAL certificate auth, and refresh token flows for source APIs.

AI-native engineering

  • Use Claude Code as your primary engineering environment, including custom skills, MCP servers, and the Claude Agent SDK for sub-agent fan-out work.
  • Use Snowflake Cortex for in-warehouse LLM tasks. Author Cortex Analyst semantic YAML.
  • Use multi-model sparring (Gemini, GPT-5, Ollama) through zen-mcp for architecture review and race-condition debugging.
  • Author and maintain shared team skills in our internal AmaWaterways-IT/data-team-skills marketplace, following our skill conventions (under 500 lines, templates folder for heavy SQL or Jinja, trigger-only descriptions).
  • Apply the Trail of Bits differential-review workflow to significant diffs.
Required Qualifications
  • 6+ years of professional data engineering experience, including a stretch as a senior engineer on a small-to-mid-sized team.
  • Expert SQL on Snowflake. You can read a query profile, identify spillage and partition pruning issues, and rewrite the query.
  • Strong production Python. Code is ruff-clean, has pytest coverage, uses type hints, and never contains hardcoded secrets.
  • Hands-on production experience with dbt (Core or Cloud), including incremental models, SCD2 snapshots, and dbt tests.
  • Hands-on experience with Airflow or Astronomer for production orchestration.
  • AWS fundamentals: IAM, S3, Lambda, and Terraform.
  • GitHub plus GitHub Actions CI/CD with branch protection and code review discipline.
  • Shipped at least one production pipeline that replaced a managed ELT tool (Fivetran, Stitch, Airbyte, ADF) with custom warehouse-native code.
  • You already use Claude Code, Cursor, Aider, or equivalent agent tooling daily. You can speak concretely about what works, what does not, and your context-management practices.
Strongly Preferred
  • Snowflake Snowpark Python: writing and deploying stored procs, External Access Integrations, network rules, Snowflake SECRETs.
  • dbt Mesh, dbt Semantic Layer, dbt unit tests.
  • Snowflake Cortex (Complete, Search, Analyst) used in production.
  • MCP server authoring or Claude Agent SDK applications.
  • Playwright for browser automation.
  • Salesforce Data Cloud, LiveRamp, or SFTP outbound delivery.
  • Travel, hospitality, or consumer industry context.
Nice to Have
  • Cortex Analyst semantic YAML authoring.
  • Power Automate flows for Teams notifications.
  • Tableau REST API or GraphQL Metadata API work.
  • Microsoft Graph API with MSAL certificate auth.
  • Familiarity with the DeepMind agent-trap threat taxonomy or similar agent-security thinking.