Scrape Google Maps Reviews with Python: 2025 Guide
If you want to scrape Google Maps reviews with Python, you're dealing with one of the trickier scraping targets on the web. Google loads reviews dynamically, rotates its HTML structure, and actively detects bots. This guide covers two working methods — Playwright and Selenium — with complete code, anti-detection techniques, and honest notes on what breaks and why.
No fluff. Just code that works.
What Is Google Reviews Scraping?
Google reviews scraping is the automated extraction of customer review data from Google Maps and Google Business listings. Instead of copying reviews manually, a script visits business pages and pulls the data for you.
Each review contains useful fields:
- Star rating (1–5)
- Review text
- Reviewer name
- Date posted
- Business response (if any)
That data has real value. Reputation monitoring, competitive analysis, sentiment tracking, lead qualification — all of it starts with raw review data.
Why Not Use the Official Google API?
The Google Places API gives you reviews, but with strict limits. You get at most 5 reviews per business. No historical data. No competitor reviews. Pricing scales fast once you exceed the free tier.
Web scraping gives you access to all public reviews, with no artificial cap. The tradeoff: you have to handle Google's anti-bot systems yourself.
Why Python for This Task?
Python has the best ecosystem for browser automation and data extraction. Three libraries do most of the heavy lifting:
- Playwright — modern, fast, async-ready, built-in stealth features
- Selenium — battle-tested, massive community, maximum compatibility
- BeautifulSoup — lightweight HTML parsing once you have the raw content
Google reviews load via JavaScript. Static scrapers (requests + BeautifulSoup alone) won't work here. You need a real browser that executes JS, scrolls the page, and clicks buttons — exactly what Playwright and Selenium do.
The Core Challenge: Why Google Fights Back
Before writing a single line of code, understand what you're up against.
Dynamic Content Loading
Google doesn't serve all reviews in the initial HTML. The first page load shows 10–20 reviews. More load as you scroll. Each batch triggers separate network requests. Your scraper must simulate scrolling to trigger those loads.
Bot Detection Layers
Google runs several detection systems simultaneously:
- Browser fingerprinting — screen resolution, fonts, timezone, language
- Behavioral analysis — mouse movement patterns, scroll speed, click timing
- Request pattern recognition — non-human request frequency
- IP reputation — flagging IPs that send too many requests
Hit any of these triggers and you'll see CAPTCHAs, empty results, or a full block.
Constantly Changing HTML Structure
Google updates its frontend regularly. A CSS selector that works today may return zero results next week. Robust scrapers use multiple fallback selectors for every field.
Method 1: Playwright (Recommended for 2025)
Playwright is the better starting point for new projects. It's 2–3x faster than Selenium, has built-in async support, and handles anti-detection with less manual configuration.
Setup
python -m venv google_scraper_env
source google_scraper_env/bin/activate # Windows: google_scraper_env\Scripts\activate
pip install playwright pandas emoji beautifulsoup4 lxml
playwright install chromium
Complete Playwright Scraper
from playwright.sync_api import sync_playwright
import pandas as pd
import re
import emoji
import logging
import time
import random
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class GoogleReviewsScraper:
def __init__(self, headless=True):
self.headless = headless
self.reviews_data = []
def clean_text(self, text):
text = emoji.replace_emoji(text, replace='')
text = re.sub(r'\s+', ' ', text).strip()
return text
def random_delay(self, min_delay=1, max_delay=3):
time.sleep(random.uniform(min_delay, max_delay))
def initialize_browser(self):
playwright = sync_playwright().start()
browser = playwright.chromium.launch(
headless=self.headless,
args=[
'--disable-blink-features=AutomationControlled',
'--disable-extensions',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu'
]
)
context = browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1366, 'height': 768}
)
page = context.new_page()
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
""")
return playwright, browser, page
def search_business(self, page, business_name):
try:
page.goto("https://www.google.com/maps", wait_until="networkidle")
self.random_delay(2, 4)
search_box = page.locator("input[id='searchboxinput']")
search_box.fill(business_name)
search_box.press("Enter")
page.wait_for_timeout(5000)
logger.info(f"Searched for: {business_name}")
return True
except Exception as e:
logger.error(f"Error searching: {e}")
return False
def navigate_to_reviews(self, page):
try:
reviews_tab = page.get_by_role("tab", name=re.compile("Reviews|reviews", re.IGNORECASE))
if reviews_tab.is_visible():
reviews_tab.click()
page.wait_for_timeout(3000)
logger.info("Navigated to reviews section")
return True
logger.warning("Reviews tab not found")
return False
except Exception as e:
logger.error(f"Error navigating to reviews: {e}")
return False
def scroll_and_load_reviews(self, page, max_reviews=100):
loaded_reviews = 0
scroll_attempts = 0
max_scroll_attempts = 20
while loaded_reviews < max_reviews and scroll_attempts < max_scroll_attempts:
try:
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
self.random_delay(2, 4)
current_reviews = page.locator('[data-review-id]').count()
if current_reviews > loaded_reviews:
loaded_reviews = current_reviews
logger.info(f"Loaded {loaded_reviews} reviews...")
scroll_attempts = 0
else:
scroll_attempts += 1
try:
more_button = page.locator("button", has_text=re.compile("more|More", re.IGNORECASE))
if more_button.is_visible():
more_button.click()
self.random_delay(2, 3)
except:
pass
except Exception as e:
logger.error(f"Error during scrolling: {e}")
break
logger.info(f"Total reviews found: {loaded_reviews}")
return loaded_reviews
def extract_review_data(self, page):
reviews = []
try:
review_elements = page.locator('[data-review-id]').all()
for element in review_elements:
try:
review_data = {}
name_element = element.locator('div[class*="name"] span, div[class*="Name"] span').first
review_data['reviewer_name'] = name_element.inner_text() if name_element.is_visible() else "Anonymous"
rating_element = element.locator('[role="img"][aria-label*="star"]').first
if rating_element.is_visible():
rating_text = rating_element.get_attribute('aria-label')
rating_match = re.search(r'(\d+)', rating_text)
review_data['rating'] = int(rating_match.group(1)) if rating_match else None
text_elements = element.locator('span[class*="review-text"], div[class*="review-text"]').all()
review_text = ""
for text_elem in text_elements:
if text_elem.is_visible():
review_text += text_elem.inner_text() + " "
review_data['review_text'] = self.clean_text(review_text.strip())
date_element = element.locator('span[class*="date"], div[class*="date"]').first
review_data['review_date'] = date_element.inner_text() if date_element.is_visible() else "Unknown"
if review_data['review_text']:
reviews.append(review_data)
except Exception as e:
logger.warning(f"Error on individual review: {e}")
continue
logger.info(f"Extracted {len(reviews)} reviews")
return reviews
except Exception as e:
logger.error(f"Extraction error: {e}")
return []
def scrape_reviews(self, business_name, max_reviews=100):
playwright, browser, page = self.initialize_browser()
try:
if not self.search_business(page, business_name):
return []
if not self.navigate_to_reviews(page):
return []
self.scroll_and_load_reviews(page, max_reviews)
reviews = self.extract_review_data(page)
self.reviews_data = reviews
return reviews
except Exception as e:
logger.error(f"Scraping failed: {e}")
return []
finally:
browser.close()
playwright.stop()
def save_to_csv(self, filename="google_reviews.csv"):
if self.reviews_data:
df = pd.DataFrame(self.reviews_data)
df.to_csv(filename, index=False, encoding='utf-8')
logger.info(f"Saved to {filename}")
else:
logger.warning("No reviews to save")
if __name__ == "__main__":
scraper = GoogleReviewsScraper(headless=False)
business_name = "Starbucks Times Square New York"
reviews = scraper.scrape_reviews(business_name, max_reviews=50)
if reviews:
scraper.save_to_csv(f"reviews_{business_name.replace(' ', '_')}.csv")
print(f"Scraped {len(reviews)} reviews.")
else:
print("No reviews scraped.")
What This Code Does
- Stealth flags hide the automation fingerprint from Google's detection layer
- Random delays between 1–4 seconds mimic human browsing rhythm
- Scroll loop keeps loading until it hits
max_reviewsor runs out of content - Multiple fallback selectors handle Google's frequent HTML changes
- CSV export gives you a clean file ready for analysis or import into any tool
Method 2: Selenium (Reliable Alternative)
Selenium has been the standard for browser automation for over a decade. It's slower than Playwright but has a larger community and more documentation.
When to Pick Selenium
- You're working with legacy infrastructure that already uses it
- You need maximum compatibility across browser versions
- Your team has existing Selenium expertise
Setup
pip install selenium pandas
You'll also need ChromeDriver matching your Chrome version. Modern Selenium (4.6+) handles driver management automatically.
Complete Selenium Scraper
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import pandas as pd
import time
import random
import re
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class SeleniumGoogleReviewsScraper:
def __init__(self, headless=True):
self.headless = headless
self.driver = None
self.wait = None
self.reviews_data = []
def setup_driver(self):
options = Options()
if self.headless:
options.add_argument("--headless")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--window-size=1366,768")
options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
self.driver = webdriver.Chrome(options=options)
self.driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined,});")
self.wait = WebDriverWait(self.driver, 20)
logger.info("Driver initialized")
def random_delay(self, min_s=1, max_s=3):
time.sleep(random.uniform(min_s, max_s))
def search_google_maps(self, business_name):
try:
self.driver.get("https://www.google.com/maps")
self.random_delay(2, 4)
search_box = self.wait.until(EC.presence_of_element_located((By.ID, "searchboxinput")))
search_box.clear()
for char in business_name:
search_box.send_keys(char)
time.sleep(random.uniform(0.05, 0.15))
search_box.submit()
self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-value='Reviews']")))
logger.info(f"Searched for: {business_name}")
return True
except TimeoutException:
logger.error("Timeout on search")
return False
def click_reviews_tab(self):
try:
reviews_tab = self.wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "[data-value='Reviews']")))
self.driver.execute_script("arguments[0].scrollIntoView(true);", reviews_tab)
self.random_delay(1, 2)
reviews_tab.click()
self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-review-id]")))
logger.info("Reviews tab clicked")
return True
except TimeoutException:
logger.error("Reviews tab not found")
return False
def scroll_to_load_reviews(self, target_reviews=100):
last_height = self.driver.execute_script("return document.body.scrollHeight")
reviews_loaded = 0
scroll_attempts = 0
while reviews_loaded < target_reviews and scroll_attempts < 30:
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
self.random_delay(2, 4)
try:
show_more = self.driver.find_element(By.XPATH, "//button[contains(text(), 'more') or contains(text(), 'More')]")
if show_more.is_displayed():
ActionChains(self.driver).move_to_element(show_more).click().perform()
self.random_delay(2, 3)
except NoSuchElementException:
pass
current_count = len(self.driver.find_elements(By.CSS_SELECTOR, "[data-review-id]"))
if current_count > reviews_loaded:
reviews_loaded = current_count
logger.info(f"Loaded {reviews_loaded} reviews...")
scroll_attempts = 0
else:
scroll_attempts += 1
new_height = self.driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
scroll_attempts += 1
last_height = new_height
return reviews_loaded
def extract_reviews(self):
reviews = []
review_elements = self.driver.find_elements(By.CSS_SELECTOR, "[data-review-id]")
for element in review_elements:
try:
review_data = {}
try:
review_data['reviewer_name'] = element.find_element(By.CSS_SELECTOR, "div[class*='name'] span").text.strip()
except NoSuchElementException:
review_data['reviewer_name'] = "Anonymous"
try:
aria_label = element.find_element(By.CSS_SELECTOR, "[role='img'][aria-label*='star']").get_attribute('aria-label')
match = re.search(r'(\d+)', aria_label)
review_data['rating'] = int(match.group(1)) if match else None
except NoSuchElementException:
review_data['rating'] = None
try:
text_elems = element.find_elements(By.CSS_SELECTOR, "span[class*='review-text']")
review_data['review_text'] = " ".join([e.text for e in text_elems if e.text]).strip()
except NoSuchElementException:
review_data['review_text'] = ""
try:
review_data['review_date'] = element.find_element(By.CSS_SELECTOR, "span[class*='date']").text.strip()
except NoSuchElementException:
review_data['review_date'] = "Unknown"
if review_data['review_text']:
reviews.append(review_data)
except Exception as e:
logger.warning(f"Review extraction error: {e}")
continue
logger.info(f"Extracted {len(reviews)} reviews")
return reviews
def scrape_business_reviews(self, business_name, max_reviews=100):
try:
self.setup_driver()
if not self.search_google_maps(business_name):
return []
if not self.click_reviews_tab():
return []
self.scroll_to_load_reviews(max_reviews)
reviews = self.extract_reviews()
self.reviews_data = reviews
return reviews
except Exception as e:
logger.error(f"Scraping failed: {e}")
return []
finally:
if self.driver:
self.driver.quit()
def save_to_csv(self, filename="selenium_reviews.csv"):
if self.reviews_data:
pd.DataFrame(self.reviews_data).to_csv(filename, index=False, encoding='utf-8')
logger.info(f"Saved to {filename}")
if __name__ == "__main__":
scraper = SeleniumGoogleReviewsScraper(headless=False)
reviews = scraper.scrape_business_reviews("McDonald's Times Square", max_reviews=75)
if reviews:
scraper.save_to_csv("mcdonalds_times_square_reviews.csv")
print(f"Scraped {len(reviews)} reviews.")
Anti-Detection Techniques That Actually Work
Both scrapers above include basic stealth. Here's what to add when you need to go further.
Proxy Rotation
Single-IP scraping gets blocked fast. Rotate proxies to distribute requests:
import random
PROXY_LIST = [
"http://user:pass@proxy1:port",
"http://user:pass@proxy2:port",
"http://user:pass@proxy3:port",
]
def get_random_proxy():
return random.choice(PROXY_LIST)
# In Playwright:
context = browser.new_context(proxy={"server": get_random_proxy()})
Residential proxies work better than datacenter proxies for Google specifically. Datacenter IPs get flagged faster.
User Agent Rotation
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/120.0',
]
def get_random_ua():
return random.choice(USER_AGENTS)
Human-Like Typing
def human_type(element, text):
for char in text:
element.send_keys(char)
time.sleep(random.uniform(0.05, 0.2))
Typing at uniform speed is a bot signal. Variable delays per character look human.
Session Warm-Up
Don't go straight to Google Maps. Visit Google Search first, wait a few seconds, then navigate to Maps. Cold sessions that jump directly to scraping targets get flagged more often.
Handling Dynamic Content
Google reviews use infinite scroll — no page numbers, no "next" button. Your scraper needs to keep scrolling until either:
- It hits your
max_reviewstarget, or - No new reviews load after several scroll attempts
The code above handles this with a consecutive_failures counter. After 5 scrolls with no new reviews, it stops. That's the right approach — don't loop forever.
Expanding Truncated Reviews
Long reviews get cut off with a "More" link. To get full text:
def expand_truncated_reviews(page):
expand_buttons = page.locator("button:has-text('More'), span:has-text('...')")
count = expand_buttons.count()
for i in range(min(count, 100)):
btn = expand_buttons.nth(i)
if btn.is_visible():
btn.click()
page.wait_for_timeout(300)
logger.info(f"Expanded {count} truncated reviews")
Run this after loading all reviews, before extraction.
Legal and Ethical Considerations
Scraping publicly visible data is generally legal in most jurisdictions. The 2022 HiQ v. LinkedIn ruling in the US confirmed that scraping public data doesn't violate the Computer Fraud and Abuse Act.
That said, a few rules apply:
- Don't overload servers. Keep requests under 10 per minute for casual use.
- Respect robots.txt. Google's robots.txt restricts some paths — check it.
- Don't republish scraped content verbatim. Aggregate and analyze, don't copy-paste.
- Avoid personal data. Reviewer names are public, but don't build profiles on individuals.
- Commercial use needs legal review. If you're selling scraped data, consult a lawyer.
The safest approach: scrape for internal analysis, not redistribution.
When Python Scraping Isn't the Right Tool
Writing and maintaining a Google Maps scraper takes real effort. Google changes its HTML structure regularly. Selectors break. Anti-bot measures evolve. You'll spend time debugging, not analyzing.
If you need Google Maps review data at scale — across hundreds or thousands of businesses — a pre-indexed database is faster and more reliable than a DIY scraper.
IBLead indexes 50M+ businesses across 37 countries, with up to 500 Google reviews per listing: full text, star rating, date, and reviewer name. The data is updated weekly and exports instantly to CSV. No scraping infrastructure to maintain, no proxies to manage, no selectors to fix when Google updates its frontend.
For one-off research on a handful of businesses, the Python approach in this guide works fine. For ongoing lead generation or reputation monitoring at scale, $52 for 10,000 leads is hard to beat.
Start free — 200 credits, no card required
FAQ
How many reviews can I scrape per day without getting blocked?
Start conservative: 100–500 reviews per day, across 5–10 businesses, with 2–3 second delays between actions. With proxy rotation and proper session management, you can push to 1,000–2,000 reviews per day. Aggressive scraping (5,000+ reviews/day) requires residential proxy networks and multiple browser sessions running in parallel.
Is Playwright or Selenium better for scraping Google Maps reviews with Python?
Playwright is the better choice for new projects in 2025. It's 2–3x faster, has built-in async support, and handles anti-detection with less manual configuration. Selenium is still valid if you have existing infrastructure or need maximum community support. Both methods work — the code in this guide demonstrates both.
Why are my selectors returning empty results?
Google updates its frontend regularly. A selector that worked last month may return nothing today. The fix: use multiple fallback selectors for each field, and test with headless=False so you can see what the page actually looks like. The extract_with_fallbacks() pattern shown in this guide handles this systematically.
Can I scrape Google reviews for competitor analysis?
Yes. Public review data is publicly accessible. Analyzing competitor sentiment, tracking rating trends, or identifying common complaints is a legitimate use case. Don't republish individual reviews verbatim or build personal profiles on reviewers. Focus on aggregate insights.
How do I handle CAPTCHAs?
Prevention beats solving. Slow down your request rate, use residential proxies, add realistic delays, and warm up sessions before scraping. When a CAPTCHA appears anyway: in development, run with headless=False and solve it manually. In production, either integrate a CAPTCHA-solving service or implement an exponential backoff that waits 5–10 minutes before retrying.
Ready to get started?
Access every Google Maps business, enriched with emails and legal data.
Try IBLead freeRelated articles
10 Proven Tips to Get Customers to Leave More Google Reviews on Maps
Learn 10 actionable strategies to increase Google Maps reviews. Timing, incentives, QR codes, and response tactics that actually work.
7 Cold Email Mistakes to Avoid: Examples & Templates
Avoid these 7 cold email mistakes to avoid examples that kill response rates. Real examples, AIDA templates, and proven fixes for better outreach.
ABM Google Maps Data: The Complete Strategic Guide
Learn how abc account based marketing google maps data drives 208% more revenue. Build precise target lists with 50M+ pre-indexed businesses.