How Web Scraping is Used to Search for the Cheapest Supermarket for your Groceries?

March 24, 2022

Scrimper is a software that will assist students and other individuals in locating the cost-effective supermarket store for their grocery shopping. To accomplish so, the script web-scrapes via the online inventories of numerous stores to extract the necessary information, which is then stored as a CSV file. This information is sorted to provide the lowest supermarket as well as the final pricing.

As a team comprised primarily of students, we are all aware that life is costly. We discovered that food and grocery shopping account for the majority of variable expenses. It’s inconvenient to visit many stores in search of the best deal on each commodity or to save money by surviving on bread and water. This is why we set out to create a web scraping tool that will scrape the cheapest supermarket grocery store location for all of your needs. Our purpose is also to assist those in need who do not have the financial means to spend a lot of money on meals.

Our first idea was to build a platform where users could add a single item to a shopping list. We intended to scrape grocery data from the internet manually

The scraper should then look for the lowest product offer at each supermarket near the user. The results’ names and price are then recorded in a Pandas DataFrame with columns “product,” “price,” “supermarket,” and “distance.”

Following that, the computer filters the “price” and “distance” columns before determining the optimal compromise between these two criteria. If the user enters a complete grocery list, the scraper obtains all of the lowest prices for each product for every supermarket, then stores the total of those prices, as well as the names of the supermarket and its distance, in a Pandas DataFrame. Finally, the list is ordered by overall cost and distance, and the optimal compromise between those two criteria is determined once more. This will be the final output that the platform user receives.

To begin, we searched for supermarkets that have internet stores or online offerings. Our project’s procedure and end outcome did don’t go as anticipated, as it frequently does. Simply a few stores, turns out, provide this kind of online selection, with many others displaying only their current discounts or non-edible offerings. For example, if you search for “bread” on the Lidl online store, the first result is a toaster. When we attempted to cooperate with REWE, we ran into some legal concerns due to “Captcha,” a tool that determines whether a user is human or a robot.

Most websites also don’t let you search for a certain region’s selection, which may vary somewhat from one location to the next. We’d have to use Google Maps for the location, which would fast get too complicated and hence be outside the scope of this project. We would have required a team member with web programming abilities to build the platform. We’d have to do it through a third-party server, which would be extremely complicated. Another difficulty we noticed was that certain data sources are rather untidy, and HTML files varied significantly.

We had to take a step back and simplify our idea as a result of all of these roadblocks. We went through all of the supermarkets that web scraping services and created a script for each of them. To accomplish so, we scraped supermarket prices from the websites using the Python Selenium program and a Chrome driver. Initially, we created and ran a function named “choose a product,” which prompts the user to select a product from a list. The user enters a product name, such as “noodles.” This data is stored as a variable. The initial step after visiting a supermarket’s website is to allow cookies using “driver.find_element _by_id(“####”)”. click ()”.

After that, we created a function named “enter_website” that inserts the selected product into the website’s search field. This function is also invoked. Then, from the site’s filter drop-down list, select “price low to high.” Last but not least, the first shown product’s name and price are preserved as variables. By duplicating the entire XPath, we were able to locate those items on the HTML site. The item, cost, and name of the associated store are then saved to a CSV-file. We built some extra code to join the scrapers and store the information into the very same CSV-file because we didn’t want to go through each supermarket’s script one by one.

To accomplish so, we created a function for each supermarket-scraper and a for-loop that is intended to loop over all of them. In general, the code is functional. There are just a few issues with identifying some of the items on some of the websites, which worked fine whenever the programs execute separately. It may be easily fixed with a little additional time.

We now have all of the information we require. The final item on the shopping list is to obtain the results of the cheapest supermarket and the ultimate lowest price of all the items on the list. Another script was written to filter the data in our newly formed file. Firstly, we created a method named min_sum () that groups the supermarket items in the data frame. Then we created the favorite_store (). The user is asked whether he has a favorite place in this method. If he says yes, he’ll have to select from one of our scraped stores.

He gets the price total and a menu of product names he plans to purchase in his preferred store when he answers his favorite retailer. If he says no, the code executes the min_sum() method, which outputs the price sum, store name, and product list. The main issue currently was that not all stores had all of their items advertised on the internet. As a result, all of the non-existent goods were substituted with the values “Beispiel” and a price of 3€. This is something else to think about this before the web scraper goes online.

If you are looking to for scraping the cheapest Grocery store then, contact Foodspark today or request for a quote!