Logo

How to Scrape Zomato Listings Using BeautifulSoup and Python?

July 19, 2021
how-to-scrape-zomato-listings-using-beautifulsoup-and-

Amongst the biggest apps of Web Scraping is within scraping restaurants listings from different websites. It might be to create aggregators, monitor prices, or offer superior UX on the top of available hotel booking sites.

We will see how a simple script can do that. We will utilize BeautifulSoup for scraping information as well as retrieve hotels data on Zomato.

To begin with, the given code is boilerplate and we require to get Zomato search result pages and set BeautifulSoup for helping us utilize CSS selectors for asking the pages for important data.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
	try:
		print('----------------------------------------')
		print(item)
	except Exception as e:
		#raise e
		print('')

We are passing user agents’ headers for simulating a browser call to avoid getting blocked.

Now, it’s time to analyze Zomato searching results for the destination we need and it works like this.

we-need-to-use-it-to-break-an-html-document

When we review the page, we will find that all the HTML items are encapsulated in the tag having class search-results.

when-we-review-the-page

We need to use it to break an HTML document to these parts that have individual item data like this.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
	try:
		print('----------------------------------------')
		print(item)
	except Exception as e:
		#raise e
		print('')

And once you run that…

python3 scrapeZomato.py

You could tell that the code is separating the HTML cards.

code

For further assessment, you can observe the restaurant’s name that always has a class result-title. Therefore, let’s try to reclaim that.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
	try:
		print('----------------------------------------')
		#print(item)
		print(item.select('.result-title')[0].get_text())
	except Exception as e:
		#raise e
		print('')

This will provide us different names…

code

Hurrah!

Now, it’s time to get other data…

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.zomato.com/ncr/restaurants/pizza'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
#print(soup.select('[data-lid]'))
for item in soup.select('.search-result'):
	try:
		print('----------------------------------------')
		#print(item)
		print(item.select('.result-title')[0].get_text().strip())
		print(item.select('.search_result_subzone')[0].get_text().strip())
		print(item.select('.res-rating-nf')[0].get_text().strip())
		print(item.select('[class*=rating-votes-div]')[0].get_text().strip())
		print(item.select('.res-timing')[0].get_text().strip())
		print(item.select('.res-cost')[0].get_text().strip())
	except Exception as e:
		#raise e
		print('')

And once you run that…

Creates all the details we require including reviews, price, ratings, and addresses.

In more superior implementations, you would need to rotate a User-Agent string as Zomato just can’t detect it is the similar browser!

In case, we find a bit advanced, you would understand that Zomato could just block the IP by ignoring all the other tricks. It is a letdown and that is where the majority of web scraping projects fail.

Disabling IP Blocks

Investing in the private turning proxy services including Proxies API could mostly make a difference between any successful as well as headache-free data scraping project that complete the job constantly and one, which never works.

In addition, with 1000 free API calls working, you have nothing to lose with using our comparing notes and rotating proxy. This only takes a single line of addition to its barely disruptive.

Our turning proxy server Proxies API offers an easy API, which can solve your IP Blocking difficulties instantly.

  • Having millions of higher speed rotating proxies positioned around the world,
  • Having our auto IP rotation
  • Having our auto User-Agent-String rotations (that simulate requests from various, authentic web browsers as well as web browser varieties)
  • Having our auto CAPTCHA resolving technology,

Hundreds of clients have successfully resolved the problem of IP blocking using our east API.

The entire thing could be accessed with an easy API from Foodspark.

To know more about our Zomato Listings Scraper, contact us or ask for a free quote!

Get in touch

We will catch you as early as we receive the message


We hate spam, and we respect your privacy.

What Our Clients Say About Foodspark

Over 1000+ Satisfied Clients and Still Growing

Read More Reviews

review review review
shape