How to Scrape Website Data using Infinite Scrolling?
Consider that you are extracting products from flipkart and you also want to extract other 100 products from all the categories, however you are incapable to utilize this technique as it grabs the initial 15 products from the page.
Flipkart is having a feature named infinite
scrolling therefore there are no pagination (like? page=2, page=3) within the
URL. In case, it had the feature we might have entered a value in the “while
loop” as well as incremented page values like we have given below.
page_count = 0
while page_count < 5:
url =
"http://example.com?page=%d" %(page_count)
# scraping code...
page_count += 1
Now, let’s get back to the infinite scrolling.
“Ajax” allows any website of using infinite
scrolling. However, the ajax request has the URL from which products gets
loaded on the similar pages on scroll.
To observe the URL.
- Open a page in the Google
Chrome
- After that, go to a console and
right click as well as allow LogXMLHttpRequests.
- Now reload a page as well as
scroll down slowly. While the new products get populated, you would see
various URLs called after “XHR finished loading: GET” and click on it.
Flipkart has various kinds of URLs. The one that you are searching for
begins with
“flipkart.com/lc/pr/pv1/spotList1/spot1/productList?p=blahblahblah&lots_of_crap”
- Then left click on the URL and
this would be highlighted within a Network tab of Chrome dev tools. From
that, you could copy the URL or open that in the new window. (here is the
image)
Whenever you open a link in a new tab then you
would see something like that with about 15-20 products every page.
1. You can observe that merely 15 products
again! However, we want all these products”.
So, just check a URL there like Get a parameter called?
start= (any number) Now for the initial 20 products, set a number to 0; and for
the next 20, get a number to 21 as well as in case, there are 15 products every
page with 0, 16, 31 etc. Iterate the URL in a while loop including we have
showed you before and you will be done.
2. Again facing any problem the where are the
images?
Just right click to view the page source about
the URL, you would see the tag having data-src=”” attribute; which is
your product’s image..
It is an example about Flipkart.com only.
Various websites might have various Ajax URLs and various get parameters on a
URL.
Web scraping services
websites might also have the “JSON” responses within Ajax URLs. In case, you
get them you don’t have to utilize scraping; only access the JSON response
including any JSON API that you have utilized before.
In case of any doubts please make your comments
in the below section or contact X-Byte Enterprise Crawling or ask for a free
quote!
Happy Scraping!
More more visit: https://www.xbyte.io/web-scraping-services.php
Comments
Post a Comment