Back to blog
Guides & How-tos2026-02-12·7 min read

How to Scrape Google Reviews Using Python — Complete 2025 Guide

By Ibrahim DemolCEO IBLeadUpdated March 26, 2026

Google reviews are a goldmine of customer feedback, but manually collecting thousands of them? That's inefficient and impractical.

With Python, you can automate the entire extraction process and gather thousands of reviews in hours. This guide walks you through everything — from basic setup to production-ready scrapers that handle Google's anti-scraping defenses.

By the end, you'll understand how to build reliable review scrapers, avoid detection, and extract actionable customer insights at scale.


What is Google Reviews Scraping?

Google reviews scraping is the automated extraction of customer feedback data from Google Maps and Google Business profiles using Python scripts.

When you scrape Google reviews, you capture:

  • Star ratings (1–5 stars)
  • Review text (full customer comments)
  • Reviewer names and profiles
  • Review dates and timestamps
  • Business responses (if the owner replied)
  • Helpful vote counts

Unlike Google's official API — which is expensive and heavily rate-limited — scraping gives you access to all available reviews without quotas or restrictions.

Why Scrape Instead of Using Google's API?

Google's official Places API for reviews has serious limitations:

Feature Official API Web Scraping
Reviews per request ~10 (recent only) All available
Cost $7 per 1,000 requests Free to low-cost
Rate limits 100 requests/second max Configurable
Competitor reviews ❌ No ✅ Yes
Setup complexity Moderate (authentication) Higher (technical)

For competitive analysis, market research, or large-scale data collection, scraping is the practical choice.


Why Python is the Best Language for This Task

Python dominates web scraping for three core reasons:

1. Rich Scraping Ecosystem

Python has purpose-built libraries specifically for web scraping:

  • Playwright — Modern, fast browser automation
  • Selenium — Battle-tested, maximum compatibility
  • BeautifulSoup — Lightweight HTML parsing
  • Scrapy — Enterprise-grade scraping framework
  • Pandas — Data manipulation and export

You can go from raw HTML to cleaned CSV in one script.

2. JavaScript Execution

Google reviews load dynamically through JavaScript, not in the initial HTML. Python's browser automation tools (Playwright, Selenium) execute JavaScript just like a real browser, allowing you to:

  • Trigger infinite scroll
  • Click "Show more" buttons
  • Wait for content to appear
  • Interact with page elements

3. Built-In Anti-Detection Features

Modern Python libraries come with stealth capabilities out of the box:

# Hide automation indicators
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])

These flags prevent Google from detecting your script as a bot.

4. Seamless Data Processing Pipeline

With Python, you scrape → clean → analyze → export all in one script. No tool-switching required.


The Challenge: Why Google Makes Scraping Difficult

Google employs multiple anti-scraping layers. Understanding these helps you build resilient scrapers.

1. Dynamic Content Loading

Google reviews don't load all at once. Instead:

  • Initial page load shows 10–20 reviews
  • Additional reviews load via AJAX as you scroll
  • Each batch requires separate network requests
  • The mechanism changes frequently

Solution: Use browser automation (Playwright/Selenium) that executes JavaScript and simulates scrolling.

2. Sophisticated Bot Detection

Google analyzes:

  • Browser fingerprinting — Screen resolution, installed fonts, timezone
  • Behavioral patterns — Mouse movements, scroll speed, click timing
  • Request frequency — Detecting non-human request patterns
  • IP reputation — Flagging suspicious addresses

Solution: Rotate proxies, add random delays, use realistic user agents.

3. Rate Limiting and CAPTCHAs

Hit Google too hard, and you'll face:

  • Temporary IP blocks (24–48 hours)
  • CAPTCHA challenges
  • Complete access denial
  • Progressive throttling

Solution: Implement request throttling (1–3 second delays between actions).

4. Constantly Evolving HTML Structure

Google updates their page structure regularly, breaking CSS selectors overnight.

Solution: Use multiple fallback selectors and attribute-based queries instead of fragile class names.


Method 1: Playwright — The Modern Approach

Playwright is the recommended choice for 2025. It's faster than Selenium, has better anti-detection features, and handles modern JavaScript-heavy sites effortlessly.

Setting Up Playwright

First, create a virtual environment and install dependencies:

# Create virtual environment
python -m venv google_scraper
source google_scraper/bin/activate  # On Windows: google_scraper\Scripts\activate

# Install required packages
pip install playwright pandas beautifulsoup4 lxml emoji

# Install browser
playwright install chromium

Complete Playwright Scraper Code

Here's a production-ready scraper that handles the complexities of Google reviews:

from playwright.sync_api import sync_playwright
import pandas as pd
import re
import emoji
import logging
import time
import random
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class GoogleReviewsScraper:
    def __init__(self, headless=True):
        self.headless = headless
        self.reviews_data = []

    def clean_text(self, text):
        """Remove emojis and normalize text"""
        if not text:
            return ""
        text = emoji.replace_emoji(text, replace='')
        text = re.sub(r'\s+', ' ', text).strip()
        return text

    def random_delay(self, min_delay=1, max_delay=3):
        """Add random delays to mimic human behavior"""
        delay = random.uniform(min_delay, max_delay)
        time.sleep(delay)

    def initialize_browser(self):
        """Initialize Playwright with stealth settings"""
        playwright = sync_playwright().start()
        browser = playwright.chromium.launch(
            headless=self.headless,
            args=[
                '--disable-blink-features=AutomationControlled',
                '--disable-extensions',
                '--no-sandbox',
                '--disable-dev-shm-usage',
                '--disable-gpu'
            ]
        )

        context = browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            viewport={'width': 1366, 'height': 768}
        )

        page = context.new_page()

        # Hide webdriver property
        page.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined,
            });
        """)

        return playwright, browser, page

    def search_business(self, page, business_name):
        """Search for a business on Google Maps"""
        try:
            page.goto("https://www.google.com/maps", wait_until="networkidle")
            self.random_delay(2, 4)

            # Find search box and enter business name
            search_box = page.locator("input[id='searchboxinput']")
            search_box.fill(business_name)
            search_box.press("Enter")

            # Wait for results to load
            page.wait_for_timeout(5000)
            logger.info(f"Searched for: {business_name}")
            return True

        except Exception as e:
            logger.error(f"Error searching for business: {e}")
            return False

    def navigate_to_reviews(self, page):
        """Navigate to the reviews section"""
        try:
            # Look for reviews tab
            reviews_tab = page.get_by_role("tab", name=re.compile("Reviews|reviews", re.IGNORECASE))
            if reviews_tab.is_visible():
                reviews_tab.click()
                page.wait_for_timeout(3000)
                logger.info("Navigated to reviews section")
                return True
            else:
                logger.warning("Reviews tab not found")
                return False

        except Exception as e:
            logger.error(f"Error navigating to reviews: {e}")
            return False

    def scroll_and_load_reviews(self, page, max_reviews=100):
        """Scroll to load more reviews dynamically"""
        loaded_reviews = 0
        scroll_attempts = 0
        max_scroll_attempts = 20

        while loaded_reviews < max_reviews and scroll_attempts < max_scroll_attempts:
            try:
                # Scroll to bottom
                page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
                self.random_delay(2, 4)

                # Count current reviews
                current_reviews = page.locator('[data-review-id]').count()

                if current_reviews > loaded_reviews:
                    loaded_reviews = current_reviews
                    logger.info(f"Loaded {loaded_reviews} reviews...")
                    scroll_attempts = 0  # Reset counter
                else:
                    scroll_attempts += 1

                # Try to click "More reviews" button if available
                try:
                    more_button = page.locator("button:has-text('More'), button:has-text('more')")
                    if more_button.is_visible():
                        more_button.click()
                        self.random_delay(2, 3)
                except:
                    pass

            except Exception as e:
                logger.error(f"Error during scrolling: {e}")
                break

        logger.info(f"Finished loading. Total reviews: {loaded_reviews}")
        return loaded_reviews

    def extract_review_data(self, page):
        """Extract individual review data from loaded page"""
        reviews = []

        try:
            review_elements = page.locator('[data-review-id]').all()

            for element in review_elements:
                try:
                    review_data = {}

                    # Extract reviewer name
                    try:
                        name_element = element.locator('div[class*="name"] span').first
                        review_data['reviewer_name'] = name_element.inner_text() if name_element.is_visible() else "Anonymous"
                    except:
                        review_data['reviewer_name'] = "Anonymous"

                    # Extract rating
                    try:
                        rating_element = element.locator('[role="img"][aria-label*="star"]').first
                        if rating_element.is_visible():
                            rating_text = rating_element.get_attribute('aria-label')
                            rating_match = re.search(r'(\d+)', rating_text)
                            review_data['rating'] = int(rating_match.group(1)) if rating_match else None
                        else:
                            review_data['rating'] = None
                    except:
                        review_data['rating'] = None

                    # Extract review text
                    try:
                        text_elements = element.locator('span[class*="review-text"], div[class*="review-text"]').all()
                        review_text = ""
                        for text_elem in text_elements:
                            if text_elem.is_visible():
                                review_text += text_elem.inner_text() + " "
                        review_data['review_text'] = self.clean_text(review_text.strip())
                    except:
                        review_data['review_text'] = ""

                    # Extract date
                    try:
                        date_element = element.locator('span[class*="date"], div[class*="date"]').first
                        review_data['review_date'] = date_element.inner_text() if date_element.is_visible() else "Unknown"
                    except:
                        review_data['review_date'] = "Unknown"

                    # Extract helpful count
                    try:
                        helpful_element = element.locator('[aria-label*="helpful"], [aria-label*="Helpful"]').first
                        if helpful_element.is_visible():
                            helpful_text = helpful_element.get_attribute('aria-label')
                            helpful_match = re.search(r'(\d+)', helpful_text)
                            review_data['helpful_count'] = int(helpful_match.group(1)) if helpful_match else 0
                        else:
                            review_data['helpful_count'] = 0
                    except:
                        review_data['helpful_count'] = 0

                    # Only add reviews with actual text content
                    if review_data['review_text']:
                        reviews.append(review_data)

                except Exception as e:
                    logger.warning(f"Error extracting individual review: {e}")
                    continue

            logger.info(f"Successfully extracted {len(reviews)} reviews")
            return reviews

        except Exception as e:
            logger.error(f"Error extracting reviews: {e}")
            return []

    def scrape_reviews(self, business_name, max_reviews=100):
        """Main scraping method"""
        playwright, browser, page = self.initialize_browser()

        try:
            # Search for business
            if not self.search_business(page, business_name):
                return []

            # Navigate to reviews tab
            if not self.navigate_to_reviews(page):
                return []

            # Load reviews by scrolling
            self.scroll_and_load_reviews(page, max_reviews)

            # Extract review data
            reviews = self.extract_review_data(page)
            self.reviews_data = reviews
            return reviews

        except Exception as e:
            logger.error(f"Scraping failed: {e}")
            return []

        finally:
            browser.close()
            playwright.stop()

    def save_to_csv(self, filename="google_reviews.csv"):
        """Save reviews to CSV file"""
        if self.reviews_data:
            df = pd.DataFrame(self.reviews_data)
            df['extraction_date'] = datetime.now().isoformat()
            df.to_csv(filename, index=False, encoding='utf-8')
            logger.info(f"Reviews saved to {filename}")
            print(f"✅ Exported {len(self.reviews_data)} reviews to {filename}")
        else:
            logger.warning("No reviews to save")

# Usage example
if __name__ == "__main__":
    scraper = GoogleReviewsScraper(headless=False)  # Set to True for production

    business_name = "Starbucks Times Square New York"
    reviews = scraper.scrape_reviews(business_name, max_reviews=50)

    if reviews:
        scraper.save_to_csv(f"reviews_{business_name.replace(' ', '_')}.csv")
        print(f"✅ Successfully scraped {len(reviews)} reviews!")
    else:
        print("❌ No reviews were scraped.")

How This Scraper Works

1. Stealth Configuration

The scraper hides automation indicators that Google uses to detect bots:

options.add_argument("--disable-blink-features=AutomationControlled")
page.add_init_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

2. Random Delays

Instead of firing requests instantly, the scraper waits 1–3 seconds between actions to mimic human browsing.

3. Dynamic Scrolling

It scrolls to the bottom of the reviews section repeatedly, triggering Google's infinite scroll mechanism to load more reviews.

4. Robust Error Handling

If a selector fails (because Google changed the HTML), the scraper tries fallback selectors instead of crashing.

5. Data Cleaning

Reviews are cleaned to remove emojis, normalize whitespace, and ensure quality data.


Method 2: Selenium — The Battle-Tested Alternative

While Playwright is faster, Selenium remains the gold standard for compatibility and has a massive community. Use Selenium if you need maximum browser support or have existing Selenium infrastructure.

Selenium Installation

pip install selenium webdriver-manager pandas beautifulsoup4

# Download ChromeDriver (automatically managed by webdriver-manager)

Complete Selenium Implementation

```python from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.action_chains import ActionChains from selenium.common.exceptions import TimeoutException, NoSuchElementException import pandas as pd import time import random import re import logging from datetime import datetime

logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name)

class SeleniumGoogleReviewsScraper: def init(self, headless=True): self.headless = headless self.driver = None self.wait = None self.reviews_data = []

def setup_driver(self):
    """Configure and initialize Chrome driver"""
    options = Options()

    if self.headless:
        options.add_argument("--headless")

    # Anti-detection measures
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument("--disable-extensions")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--disable-gpu")
    options.add_argument("--window-size=1366,768")
    options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")

    self.driver = webdriver.Chrome(options=options)

    # Hide webdriver property
    self.driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined,});")

    self.wait = WebDriverWait(self.driver, 20)
    logger.info("Chrome driver initialized")

def random_delay(self, min_seconds=1, max_seconds=3):
    """Add random delays between actions"""
    delay = random.uniform(min_seconds, max_seconds)
    time.sleep(delay)

def search_google_maps(self, business_name):
    """Search for business on Google Maps"""
    try:
        self.

Ready to get started?

Access every Google Maps business, enriched with emails and legal data.

Try IBLead free