Web Scraping Domain.com.au with Golang
Web scraping enables developers to extract and process information from websites, supporting a range of uses such as data analysis, market research, and API building. For Australian real estate data, Domain.com.au is a valuable resource. This blog will walk you through scraping Domain.com.au using Golang, so you can gather real estate data with ease.
For non-technical users or those looking for a ready-made solution, I’ve already built an Apify actor for Domain.com.au, which you can use without needing to code. The actor automates the data extraction, making it simple for users to retrieve property details without needing technical expertise. This blog will guide you through:
Setting up a Golang project for scraping.
Choosing libraries for web scraping.
Writing code to extract data.
Handling anti-bot measures.
Storing and utilizing scraped data.
1. Project Setup
First, let's create a new project directory for our Golang code.
mkdir domain-scraper
cd domain-scraper
go mod init domain-scraper
This will initialize a new Golang project with a go.mod
file. We’ll use colly
, a popular scraping library in Golang, which is both efficient and feature-rich.
2. Choosing the Right Libraries
The Golang colly library is an excellent choice for web scraping due to its ease of use and support for handling cookies, headers, and sessions. We’ll also use goquery, which integrates with colly to simplify HTML parsing.
Install colly
and goquery
:
go get -u github.com/gocolly/colly
go get -u github.com/PuerkitoBio/goquery
3. Scraping Basic Data
Identify the Structure of Domain.com.au
Before diving into code, inspect the structure of Domain.com.au pages using browser developer tools. Typical data points on a listing might include:
Property Title Price Location Description Agent Details Writing the Scraper Code
Let's write a basic scraper to extract these details.
package main
import (
"fmt"
"log"
"github.com/gocolly/colly"
"github.com/PuerkitoBio/goquery"
)
func main() {
// Instantiate default collector
c := colly.NewCollector(
colly.AllowedDomains("domain.com.au"),
)
// Set up error handling
c.OnError(func(_ *colly.Response, err error) {
log.Println("Request failed:", err)
})
// Extract property details
c.OnHTML(".css-1kkm9qk", func(e *colly.HTMLElement) {
title := e.ChildText(".css-164r41r")
price := e.ChildText(".css-1rzse3v")
address := e.ChildText(".css-t54e5i")
agent := e.ChildText(".css-1gkcyyc")
description := e.ChildText(".css-1yuhvjn")
fmt.Printf("Title: %s\nPrice: %s\nAddress: %s\nAgent: %s\nDescription: %s\n",
title, price, address, agent, description)
})
// URL of the property listing
err := c.Visit("https://www.domain.com.au/some-property-url")
if err != nil {
log.Fatal("Failed to scrape the page:", err)
}
}
In this code:
We initialize a
colly.Collector
withAllowedDomains
set todomain.com.au
to restrict the scraper.We define the CSS selectors to target specific elements, such as the title, price, and agent information.
The
OnHTML
function takes a CSS selector to extract content. Here,.css-1kkm9qk
is the main container for listing details (you’ll need to adjust selectors based on the current structure of Domain.com.au).The
Visit
function sends the HTTP request to the provided URL.
4. Handling Anti-Bot Measures
Domain.com.au, like many large websites, might use anti-bot mechanisms. Here are some ways to work around common issues:
#### User-Agent Spoofing
Changing the User-Agent string can prevent the server from blocking requests.
c.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
Delay and Random Intervals
To avoid rate-limiting, add random delays between requests
c.Limit(&colly.LimitRule{
DomainGlob: "*domain.com.au",
RandomDelay: 2 * time.Second,
})
This code snippet sets a random delay of up to 2 seconds between requests to domain.com.au
.
Proxy Rotation
Use a proxy pool to rotate IPs between requests, which is particularly useful if your IP is blocked frequently. Here’s how to set up a proxy:
c.SetProxy("http://yourproxy:port")
5. Saving Data to a Database
To store the scraped data, use a database like PostgreSQL or MongoDB. Here’s an example of saving data in SQLite for simplicity.
Install the SQLite driver:
codego get -u github.com/mattn/go-sqlite3
Then, create a function to save the data.
import (
"database/sql"
_ "github.com/mattn/go-sqlite3"
)
func saveToDatabase(title, price, address, agent, description string) {
database, err := sql.Open("sqlite3", "./domain.db")
if err != nil {
log.Fatal(err)
}
defer database.Close()
statement, _ := database.Prepare("CREATE TABLE IF NOT EXISTS properties (title TEXT, price TEXT, address TEXT, agent TEXT, description TEXT)")
statement.Exec()
statement, _ = database.Prepare("INSERT INTO properties (title, price, address, agent, description) VALUES (?, ?, ?, ?, ?)")
_, err = statement.Exec(title, price, address, agent, description)
if err != nil {
log.Println("Failed to insert data:", err)
} else {
fmt.Println("Data inserted successfully!")
}
}
Call saveToDatabase()
within your OnHTML
callback.
goCopy codesaveToDatabase(title, price, address, agent, description)
6. Running the Scraper
Compile and run your scraper:
codego run main.go
If everything is set up correctly, the script will visit the Domain.com.au page, extract the details, and save them to the SQLite database.
For Non-Technical Users: Apify Actor
For those who want to collect Domain.com.au data but aren’t comfortable with programming, I’ve built an Apify actore automates the data extraction for Domain.com.au, allowing you to gather property details without any coding. You simply configure the settings on Apify, and the actor does the rest.
Conclusion
In this post, we built a simple but powerful scraper for Domain.com.au using Golang and colly
. We also discussed techniques to handle anti-bot measures, like user-agent spoofing and request delays. Remember to scrape responsibly, and always check a website’s Terms of Service to ensure compliance.
This setup should serve as a foundation to expand your scraper with additional features, such as concurrent requests or broader database storage options. With Golang's efficiency, you’ll find scraping with it to be fast and reliable.