How to Crawl Freelancer Jobs Using Beautiful Soup in Python

How to Crawl Freelancer Jobs Using Beautiful Soup in Python
code : https://gist.github.com/KhaledHawwas/a1cdd070c7bbe07bc8476c609938d0ab Freelancer job platforms like Freelancer.com contain thousands of live opportunities across categories such as Android development, web design, and data science. Crawling these listings using Beautiful Soup in Python allows developers to extract structured job data for analytics, automation, and AI workflows.
In this article, we will walk through the structure and logic behind a real-world freelancer job crawling script, based on the shared gist. Instead of rewriting the code, we will explain what each section does and how it enables scalable job extraction.
Target: Android Jobs on Freelancer
The script focuses on crawling:
https://www.freelancer.com/jobs/android
This category page lists active Android-related freelance jobs. The goal is to crawl multiple pages of listings and extract structured job information.
Understanding the Core Configuration
The script begins with three important variables:
1. API_KEY
API_KEY = "YOUR API KEY"
This is your authentication credential. It authorizes requests to the Crawleo crawling API.
Instead of scraping directly with raw HTTP requests, the script uses an API endpoint to handle:
-
Page fetching
-
HTML cleaning
-
Anti-bot handling
-
Structured output formatting
You must replace this placeholder with your actual Crawleo API key.
2. API_ENDPOINT
API_ENDPOINT = "https://api.crawleo.dev/crawl"
This defines the crawling endpoint used to fetch and process the target page.
Rather than manually handling headers, sessions, and parsing complexities, the script delegates crawling to the API endpoint. This simplifies large-scale job scraping and reduces maintenance overhead.
3. base_url
base_url = "https://www.freelancer.com/jobs/android"
This is the root category page containing Android freelancer jobs.
Freelancer uses paginated URLs. When crawling job listings at scale, you must iterate across multiple pages to collect complete datasets.
4. total_pages
total_pages = 3
This defines how many pages of job listings will be crawled.
For example:
-
Page 1:
/jobs/android -
Page 2:
/jobs/android/2 -
Page 3:
/jobs/android/3
By setting total_pages = 3, the script ensures broader coverage beyond just the first page of listings.
How the Crawling Flow Works
Even without examining the full code, the logic typically follows this structure:
Step 1: Generate Paginated URLs
The script dynamically constructs URLs based on base_url and total_pages.
This enables scalable crawling across multiple result pages instead of scraping a single HTML document.
Step 2: Send Each URL to the Crawling API
Instead of scraping directly using requests, the script sends each constructed page URL to:
This approach provides several advantages:
-
Cleaner HTML extraction
-
Reduced risk of IP blocking
-
Automatic handling of dynamic content
-
Structured output options such as markdown or text
Step 3: Parse the Returned HTML Using Beautiful Soup
After receiving the page content, Beautiful Soup is used to:
-
Locate job containers
-
Extract job titles
-
Capture budgets
-
Retrieve short descriptions
-
Collect posting metadata
Beautiful Soup remains essential here because it allows structured navigation of the returned HTML.
This hybrid approach combines:
-
API-level crawling infrastructure
-
Python-level HTML parsing
Which creates a scalable and maintainable freelancer job scraping pipeline.
Why This Approach Is Better Than Basic Scraping
Many developers attempt to crawl freelancer job listings using only requests and Beautiful Soup. While that works for small projects, it often fails at scale due to:
-
Rate limits
-
Anti-bot protections
-
Frequent layout changes
-
Dynamic content loading
Using an API endpoint for crawling provides:
-
Infrastructure abstraction
-
Consistent HTML cleaning
-
Better reliability
-
Production-ready scaling
Then Beautiful Soup handles structured parsing locally.
Tags
Related Posts

Perplexity vs Crawleo: Best Search API for AI Apps (2026)
Choosing between Perplexity's Sonar API and Crawleo for your AI application? This in-depth comparison breaks down pricing, features, privacy, and scalability to help you pick the right search API for your next project.

Brave Search API vs Crawleo: Which Web Data Tool Should Developers Choose?
Choosing the right web data source is critical for AI apps, search products, and automation workflows. This guide compares Brave Search API and Crawleo side-by-side, explaining how they differ in search data, crawling flexibility, and developer use cases so you can pick the best tool for your project.
Best Web Scraping API for LLM Training Data and RAG Pipelines
Looking for a reliable web scraping API for LLM training data or RAG workflows? Discover how Crawleo delivers clean, real-time, privacy-first web data optimized for AI pipelines, without the complexity of traditional scraping infrastructure.