How to Scrape Google Reviews Using Python — Complete 2025 Guide
Google reviews are a goldmine of customer feedback, but manually collecting thousands of them? That's inefficient and impractical.
With Python, you can automate the entire extraction process and gather thousands of reviews in hours. This guide walks you through everything — from basic setup to production-ready scrapers that handle Google's anti-scraping defenses.
By the end, you'll understand how to build reliable review scrapers, avoid detection, and extract actionable customer insights at scale.
What is Google Reviews Scraping?
Google reviews scraping is the automated extraction of customer feedback data from Google Maps and Google Business profiles using Python scripts.
When you scrape Google reviews, you capture:
- Star ratings (1–5 stars)
- Review text (full customer comments)
- Reviewer names and profiles
- Review dates and timestamps
- Business responses (if the owner replied)
- Helpful vote counts
Unlike Google's official API — which is expensive and heavily rate-limited — scraping gives you access to all available reviews without quotas or restrictions.
Why Scrape Instead of Using Google's API?
Google's official Places API for reviews has serious limitations:
| Feature | Official API | Web Scraping |
|---|---|---|
| Reviews per request | ~10 (recent only) | All available |
| Cost | $7 per 1,000 requests | Free to low-cost |
| Rate limits | 100 requests/second max | Configurable |
| Competitor reviews | ❌ No | ✅ Yes |
| Setup complexity | Moderate (authentication) | Higher (technical) |
For competitive analysis, market research, or large-scale data collection, scraping is the practical choice.
Why Python is the Best Language for This Task
Python dominates web scraping for three core reasons:
1. Rich Scraping Ecosystem
Python has purpose-built libraries specifically for web scraping:
- Playwright — Modern, fast browser automation
- Selenium — Battle-tested, maximum compatibility
- BeautifulSoup — Lightweight HTML parsing
- Scrapy — Enterprise-grade scraping framework
- Pandas — Data manipulation and export
You can go from raw HTML to cleaned CSV in one script.
2. JavaScript Execution
Google reviews load dynamically through JavaScript, not in the initial HTML. Python's browser automation tools (Playwright, Selenium) execute JavaScript just like a real browser, allowing you to:
- Trigger infinite scroll
- Click "Show more" buttons
- Wait for content to appear
- Interact with page elements
3. Built-In Anti-Detection Features
Modern Python libraries come with stealth capabilities out of the box:
# Hide automation indicators
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
These flags prevent Google from detecting your script as a bot.
4. Seamless Data Processing Pipeline
With Python, you scrape → clean → analyze → export all in one script. No tool-switching required.
The Challenge: Why Google Makes Scraping Difficult
Google employs multiple anti-scraping layers. Understanding these helps you build resilient scrapers.
1. Dynamic Content Loading
Google reviews don't load all at once. Instead:
- Initial page load shows 10–20 reviews
- Additional reviews load via AJAX as you scroll
- Each batch requires separate network requests
- The mechanism changes frequently
Solution: Use browser automation (Playwright/Selenium) that executes JavaScript and simulates scrolling.
2. Sophisticated Bot Detection
Google analyzes:
- Browser fingerprinting — Screen resolution, installed fonts, timezone
- Behavioral patterns — Mouse movements, scroll speed, click timing
- Request frequency — Detecting non-human request patterns
- IP reputation — Flagging suspicious addresses
Solution: Rotate proxies, add random delays, use realistic user agents.
3. Rate Limiting and CAPTCHAs
Hit Google too hard, and you'll face:
- Temporary IP blocks (24–48 hours)
- CAPTCHA challenges
- Complete access denial
- Progressive throttling
Solution: Implement request throttling (1–3 second delays between actions).
4. Constantly Evolving HTML Structure
Google updates their page structure regularly, breaking CSS selectors overnight.
Solution: Use multiple fallback selectors and attribute-based queries instead of fragile class names.
Method 1: Playwright — The Modern Approach
Playwright is the recommended choice for 2025. It's faster than Selenium, has better anti-detection features, and handles modern JavaScript-heavy sites effortlessly.
Setting Up Playwright
First, create a virtual environment and install dependencies:
# Create virtual environment
python -m venv google_scraper
source google_scraper/bin/activate # On Windows: google_scraper\Scripts\activate
# Install required packages
pip install playwright pandas beautifulsoup4 lxml emoji
# Install browser
playwright install chromium
Complete Playwright Scraper Code
Here's a production-ready scraper that handles the complexities of Google reviews:
from playwright.sync_api import sync_playwright
import pandas as pd
import re
import emoji
import logging
import time
import random
from datetime import datetime
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class GoogleReviewsScraper:
def __init__(self, headless=True):
self.headless = headless
self.reviews_data = []
def clean_text(self, text):
"""Remove emojis and normalize text"""
if not text:
return ""
text = emoji.replace_emoji(text, replace='')
text = re.sub(r'\s+', ' ', text).strip()
return text
def random_delay(self, min_delay=1, max_delay=3):
"""Add random delays to mimic human behavior"""
delay = random.uniform(min_delay, max_delay)
time.sleep(delay)
def initialize_browser(self):
"""Initialize Playwright with stealth settings"""
playwright = sync_playwright().start()
browser = playwright.chromium.launch(
headless=self.headless,
args=[
'--disable-blink-features=AutomationControlled',
'--disable-extensions',
'--no-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu'
]
)
context = browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1366, 'height': 768}
)
page = context.new_page()
# Hide webdriver property
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
""")
return playwright, browser, page
def search_business(self, page, business_name):
"""Search for a business on Google Maps"""
try:
page.goto("https://www.google.com/maps", wait_until="networkidle")
self.random_delay(2, 4)
# Find search box and enter business name
search_box = page.locator("input[id='searchboxinput']")
search_box.fill(business_name)
search_box.press("Enter")
# Wait for results to load
page.wait_for_timeout(5000)
logger.info(f"Searched for: {business_name}")
return True
except Exception as e:
logger.error(f"Error searching for business: {e}")
return False
def navigate_to_reviews(self, page):
"""Navigate to the reviews section"""
try:
# Look for reviews tab
reviews_tab = page.get_by_role("tab", name=re.compile("Reviews|reviews", re.IGNORECASE))
if reviews_tab.is_visible():
reviews_tab.click()
page.wait_for_timeout(3000)
logger.info("Navigated to reviews section")
return True
else:
logger.warning("Reviews tab not found")
return False
except Exception as e:
logger.error(f"Error navigating to reviews: {e}")
return False
def scroll_and_load_reviews(self, page, max_reviews=100):
"""Scroll to load more reviews dynamically"""
loaded_reviews = 0
scroll_attempts = 0
max_scroll_attempts = 20
while loaded_reviews < max_reviews and scroll_attempts < max_scroll_attempts:
try:
# Scroll to bottom
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
self.random_delay(2, 4)
# Count current reviews
current_reviews = page.locator('[data-review-id]').count()
if current_reviews > loaded_reviews:
loaded_reviews = current_reviews
logger.info(f"Loaded {loaded_reviews} reviews...")
scroll_attempts = 0 # Reset counter
else:
scroll_attempts += 1
# Try to click "More reviews" button if available
try:
more_button = page.locator("button:has-text('More'), button:has-text('more')")
if more_button.is_visible():
more_button.click()
self.random_delay(2, 3)
except:
pass
except Exception as e:
logger.error(f"Error during scrolling: {e}")
break
logger.info(f"Finished loading. Total reviews: {loaded_reviews}")
return loaded_reviews
def extract_review_data(self, page):
"""Extract individual review data from loaded page"""
reviews = []
try:
review_elements = page.locator('[data-review-id]').all()
for element in review_elements:
try:
review_data = {}
# Extract reviewer name
try:
name_element = element.locator('div[class*="name"] span').first
review_data['reviewer_name'] = name_element.inner_text() if name_element.is_visible() else "Anonymous"
except:
review_data['reviewer_name'] = "Anonymous"
# Extract rating
try:
rating_element = element.locator('[role="img"][aria-label*="star"]').first
if rating_element.is_visible():
rating_text = rating_element.get_attribute('aria-label')
rating_match = re.search(r'(\d+)', rating_text)
review_data['rating'] = int(rating_match.group(1)) if rating_match else None
else:
review_data['rating'] = None
except:
review_data['rating'] = None
# Extract review text
try:
text_elements = element.locator('span[class*="review-text"], div[class*="review-text"]').all()
review_text = ""
for text_elem in text_elements:
if text_elem.is_visible():
review_text += text_elem.inner_text() + " "
review_data['review_text'] = self.clean_text(review_text.strip())
except:
review_data['review_text'] = ""
# Extract date
try:
date_element = element.locator('span[class*="date"], div[class*="date"]').first
review_data['review_date'] = date_element.inner_text() if date_element.is_visible() else "Unknown"
except:
review_data['review_date'] = "Unknown"
# Extract helpful count
try:
helpful_element = element.locator('[aria-label*="helpful"], [aria-label*="Helpful"]').first
if helpful_element.is_visible():
helpful_text = helpful_element.get_attribute('aria-label')
helpful_match = re.search(r'(\d+)', helpful_text)
review_data['helpful_count'] = int(helpful_match.group(1)) if helpful_match else 0
else:
review_data['helpful_count'] = 0
except:
review_data['helpful_count'] = 0
# Only add reviews with actual text content
if review_data['review_text']:
reviews.append(review_data)
except Exception as e:
logger.warning(f"Error extracting individual review: {e}")
continue
logger.info(f"Successfully extracted {len(reviews)} reviews")
return reviews
except Exception as e:
logger.error(f"Error extracting reviews: {e}")
return []
def scrape_reviews(self, business_name, max_reviews=100):
"""Main scraping method"""
playwright, browser, page = self.initialize_browser()
try:
# Search for business
if not self.search_business(page, business_name):
return []
# Navigate to reviews tab
if not self.navigate_to_reviews(page):
return []
# Load reviews by scrolling
self.scroll_and_load_reviews(page, max_reviews)
# Extract review data
reviews = self.extract_review_data(page)
self.reviews_data = reviews
return reviews
except Exception as e:
logger.error(f"Scraping failed: {e}")
return []
finally:
browser.close()
playwright.stop()
def save_to_csv(self, filename="google_reviews.csv"):
"""Save reviews to CSV file"""
if self.reviews_data:
df = pd.DataFrame(self.reviews_data)
df['extraction_date'] = datetime.now().isoformat()
df.to_csv(filename, index=False, encoding='utf-8')
logger.info(f"Reviews saved to {filename}")
print(f"✅ Exported {len(self.reviews_data)} reviews to {filename}")
else:
logger.warning("No reviews to save")
# Usage example
if __name__ == "__main__":
scraper = GoogleReviewsScraper(headless=False) # Set to True for production
business_name = "Starbucks Times Square New York"
reviews = scraper.scrape_reviews(business_name, max_reviews=50)
if reviews:
scraper.save_to_csv(f"reviews_{business_name.replace(' ', '_')}.csv")
print(f"✅ Successfully scraped {len(reviews)} reviews!")
else:
print("❌ No reviews were scraped.")
How This Scraper Works
1. Stealth Configuration
The scraper hides automation indicators that Google uses to detect bots:
options.add_argument("--disable-blink-features=AutomationControlled")
page.add_init_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
2. Random Delays
Instead of firing requests instantly, the scraper waits 1–3 seconds between actions to mimic human browsing.
3. Dynamic Scrolling
It scrolls to the bottom of the reviews section repeatedly, triggering Google's infinite scroll mechanism to load more reviews.
4. Robust Error Handling
If a selector fails (because Google changed the HTML), the scraper tries fallback selectors instead of crashing.
5. Data Cleaning
Reviews are cleaned to remove emojis, normalize whitespace, and ensure quality data.
Method 2: Selenium — The Battle-Tested Alternative
While Playwright is faster, Selenium remains the gold standard for compatibility and has a massive community. Use Selenium if you need maximum browser support or have existing Selenium infrastructure.
Selenium Installation
pip install selenium webdriver-manager pandas beautifulsoup4
# Download ChromeDriver (automatically managed by webdriver-manager)
Complete Selenium Implementation
```python from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.action_chains import ActionChains from selenium.common.exceptions import TimeoutException, NoSuchElementException import pandas as pd import time import random import re import logging from datetime import datetime
logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name)
class SeleniumGoogleReviewsScraper: def init(self, headless=True): self.headless = headless self.driver = None self.wait = None self.reviews_data = []
def setup_driver(self):
"""Configure and initialize Chrome driver"""
options = Options()
if self.headless:
options.add_argument("--headless")
# Anti-detection measures
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-extensions")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1366,768")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
self.driver = webdriver.Chrome(options=options)
# Hide webdriver property
self.driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined,});")
self.wait = WebDriverWait(self.driver, 20)
logger.info("Chrome driver initialized")
def random_delay(self, min_seconds=1, max_seconds=3):
"""Add random delays between actions"""
delay = random.uniform(min_seconds, max_seconds)
time.sleep(delay)
def search_google_maps(self, business_name):
"""Search for business on Google Maps"""
try:
self.
Ready to get started?
Access every Google Maps business, enriched with emails and legal data.
Try IBLead freeRelated articles
10 Proven Tips to Get Customers to Leave More Google Reviews on Maps
Learn 10 actionable strategies to increase Google Maps reviews. Timing, incentives, QR codes, and response tactics that actually work.
7 Cold Email Mistakes to Avoid: Examples & Templates
Avoid these 7 cold email mistakes to avoid examples that kill response rates. Real examples, AIDA templates, and proven fixes for better outreach.
ABM Google Maps Data: The Complete Strategic Guide
Learn how abc account based marketing google maps data drives 208% more revenue. Build precise target lists with 50M+ pre-indexed businesses.