Hi guys, its been a long time since my last post about half years or so, and this is my first post at 2022. I hope this year is better than before, and we can recover from this COVID.
This idea pop up in my mind few weeks ago while i'm surfing at github search. Since 2013 i've been writing blog manually, finding a niche and topic to write at my blog, writing news and sometime personal.
Since 2017 i love writing a program using PHP and it make me lazy to writing an article or news or something personal to me, i love to mine data from a website or from an app so i can use it as my content at my Auto Generated Content (AGC) Blog.
Few weeks ago i think i should start making listing website, because in Indonesia there are so many people still use google for finding information on anything, a place, a service etc. So this is a good oportunity to make website traffic, the idea is simple thats is to make usefull AGC website that contain listing about people business and infographic.
After few hours surfing on github search i decided to write a script to scrape Google Maps Business data by using query like this : "xxxx near yyy", x stand for business type, and y is for location name like district, village, city, etc.
There's 2 option i got, the first is to write it on javascript node and the second on python, so i choose python because i dont really like node.
I used 2 main library for scraping data automatically:
1. Selenium WebDriver (for automating browsing process)
2. BeautifulSoup (for Parsing HTML Content)
from datetime import datetime
from fileinput import filename
import logging
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
import json
import time
import os
import sys
os.system('clear')
try:
from selenium import webdriver
except:
seleniumcommand = "python3 -m pip install selenium"
os.system(seleniumcommand)
try:
from bs4 import BeautifulSoup
except:
bs4command = "python3 -m pip install bs4"
os.system(bs4command)
i wrote some function that may help our script
def scrolling(driver):
try:
scrollable_div = driver.find_element_by_xpath(
'//*[@id="pane"]/div/div[1]/div/div/div[2]/div[1]')
driver.execute_script(
'arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_div)
time.sleep(2)
except NoSuchElementException:
print("Error: can't find scrollbar")
print("")
AUTOMATED SCRAPING PROCESS
Basic info that we need for each business
{
link : business maps url, so it can be used later
title : business name
thumbnail: business image
category : business category
address : business address
phone : business phone number
plusCode : business address with plus code
openHours: business open hours
rating : business rating
website : business website
}
a.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPdlinks = [x.get_attribute('href') for x in driver.find_elements_by_css_selector("a.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd")]
Run for each item in Links and open browser to scrape the data
title = parser.select('h1')[0].text.strip()
if (parser.find('button', {
'jsaction': 'pane.heroHeaderImage.click'
})):
img = parser.find(
'button', {
'jsaction': 'pane.heroHeaderImage.click'
}
).img['src']
else :
img = ""
if (parser.find('button', jsaction = "pane.rating.category")):
category = parser.find('button', jsaction = "pane.rating.category").text.strip()
else :
category = ""
if (parser.find('button', {
'data-tooltip': 'Salin alamat'
})):
address = parser.find(
'button', {
'data-tooltip': 'Salin alamat'
}
).text.strip()
else :
address = ""
if (parser.find('button', {
'data-tooltip': 'Salin nomor telepon'
})):
phone = parser.find(
'button', {
'data-tooltip': 'Salin nomor telepon'
}
).text.strip()
else :
phone = ""
if (parser.find('button', {
'data-tooltip': 'Salin Plus Codes'
})):
plusCode = parser.find(
'button', {
'data-tooltip': 'Salin Plus Codes'
}
).text.strip()
else :
plusCode = ""
if (parser.find('div', {
'class': 'LJKBpe-open-R86cEd-haAclf'
})):
openHoursResults = {}
openHours = parser.find(
'div', {
'class': 'LJKBpe-open-R86cEd-haAclf'
})['aria-label']
for days in openHours.split('; '):
dayTime = days.replace(
'hingga', '-').replace('. Sembunyikan jam buka untuk seminggu', '').split(',')
dayInput = {
'dayName': dayTime[0],
'openHour': dayTime[1]
}#
print(type(dayInput))
openHoursResults[dayTime[0]] = dayTime[1]
else :
openHoursResults = {}
if (parser.find('span', {
'class': 'aMPvhf-fI6EEc-KVuj8d'
})):
rating = parser.find(
'span', {
'class': 'aMPvhf-fI6EEc-KVuj8d'
}).text.strip()
else :
rating = ""
driver.find_element_by_xpath(
'//img[@alt="Salin situs"]').click()
website = clipboard.paste()
After we got the data you can put it on the list and append on main list for further use
result = {
"link": driver.current_url,
"title": title,
"thumbnail": img,
"category": category,
"address": address,
"phone": phone,
"plusCode": plusCode,
"openHours": openHoursResults,
"rating": rating,
"website": website
}
logging.info("Scraping done, append results...")
results.append(result)
to run the script just edit the file and find result variable and change it to your query
python forhive.py
{
"link": "https://www.google.com/maps/place/Bengkel+mobil+%22DNF+Auto+Service+Pekanbaru%22/data=!4m5!3m4!1s0x31d5abe309bd1cdb:0xfe08771ea01b758a!8m2!3d0.4948262!4d101.4188712?authuser=0&hl=id&rclk=1",
"title": "Bengkel mobil \"DNF Auto Service Pekanbaru\"",
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNveI4kZ01yc_ypg9pWc-ShMP-dQZxUrPDGrE67=w408-h510-k-no",
"category": "Bengkel Mobil",
"address": "Didepan Plaza Mebel, Jl. Soekarno - Hatta No.8, 9, Delima, Kec. Tampan, Kota Pekanbaru, Riau 28292",
"phone": "0812-6178-1555",
"plusCode": "FCV9+WG Delima, Kota Pekanbaru, Riau",
"openHours": {
"Senin": "08.30 - 17.00",
"Selasa": "08.30 - 17.00",
"Rabu": "08.30 - 17.00",
"Kamis": "08.30 - 17.00",
"Jumat": "08.30 - 17.00",
"Sabtu": "08.30 - 15.00",
"Minggu": "Tutup"
},
"rating": "4,5",
"website": "log_level=0"
}
this scraper just run on Google Maps with indonesian language, i will update the selector to a config file later so you can change the selector based on your map language.
For full code you can download it here