Proxies https://changedetection.io/ en-gb Using Bright Data's "Scraping Browser" to by-pass CAPTCHA's and other protection when monitoring pages https://changedetection.io/tutorial/using-bright-datas-scraping-browser-pass-captchas-and-other-protection-when-monitoring <span class="field field--name-title field--type-string field--label-hidden">Using Bright Data&#039;s &quot;Scraping Browser&quot; to by-pass CAPTCHA&#039;s and other protection when monitoring pages</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/tech-writer/stephen" class="username">Stephen</a></span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2023-11-16T17:39:50+01:00" title="Thursday, November 16, 2023 - 17:39" class="datetime">Thu, 11/16/2023 - 17:39</time> </span> <div class="field field--name-field-topic field--type-entity-reference field--label-above"> <div class="field__label">Topic</div> <div class='field__items'> <div class="field__item"><a href="/topic/how" hreflang="en-gb">How-To</a></div> <div class="field__item"><a href="/topic/proxies" hreflang="en-gb">Proxies</a></div> </div> </div> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>For many websites - simply using a proxy is not enough, the website is using much more complex anti-robot software to detect the actual browser that is being used, not just the IP connection or headers (including common headers such as user-agent and others)</p><p>You need a better way to simulate a real browser - A scraping browser is different to a proxy, a proxy just tunnels your connection but the Browser still looks the same, with a scraping browsers you are also simulating a real browser hidden away in Bright Data's infrastructure.</p><p>As it turns out, many websites are probing much deeper into your browser - analysing the response from your 2D and 3D/GPU videocard is just one of the tactics they use.</p><p>Unfortunately for many headless Chrome sessions it means they stand out all too easy - their "fingerprint" is just too obvious, they dont have anything that resembles a real video card or other hardware attached.</p><p>The result is - you get pushed to enter a CAPTCHA or other similar anti-robot mechanism.</p><p><em><strong>But there is a solution, </strong></em>whilst not guaranteed - it definitely helps a lot!&nbsp;</p><p>The clever people over at Bright Data have added a "Scraping Browser" to their offers, which more precisely simulates a real browser than just about anything you can try to run yourself.</p><p>(note: this functionality in changedetection.io will be released late November 2023, but you can try it now under the current<em> master </em>tag from our <a href="https://github.com/dgtlmoon/changedetection.io">GitHub</a> or <a href="https://hub.docker.com/layers/dgtlmoon/changedetection.io/dev/images/sha256-2397ad50a81527514492e859da685dada3863659b1e233f8e8f020eb08af0da0?context=explore">dev tag from our Docker Hub</a>)<br>&nbsp;</p><p>More information about <a href="https://brightdata.com/products/scraping-browser">Bright Data's scraping browser can be found here.</a></p><h4>&nbsp;</h4><h4><em><strong>Here's how to setup Bright Data's "Scraping Browser" with changedetection.io</strong></em></h4><p>&nbsp;</p><p>Head on over to your Bright Data <em><strong>control panel</strong></em> - When you sign up using this link - &nbsp;<a href="https://brightdata.grsm.io/n0r16zf7eivq">https://brightdata.grsm.io/n0r16zf7eivq</a> - BrightData will match any first deposit up to $150.</p><p>Once logged in - click on <em><strong>"Scraping Browser"</strong></em></p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_28.png" data-entity-uuid="30783e83-19fa-4607-ba8e-b03f3c5b7512" data-entity-type="file" alt="The Bright Data control panel - where to find the Scraping Browser link" width="690" height="650" loading="lazy"><p>&nbsp;</p><p>Now we will copy the special "Connection address" so changedetection.io knows how to find the Scraping Browser, click on <strong>Check out code and integration examples o</strong>n the bottom right.</p><p>It's also worth considering here that you can add on extra residential proxies , residential proxies also greatly increase your chance of skipping past any potential CAPTCHA issues, but for now we will continue with the default setup.</p><p><br><img src="/sites/changedetection.io/files/inline-images/image_33.png" data-entity-uuid="5e39f204-0e15-464b-af15-eeb285f708ce" data-entity-type="file" alt="The Bright Data control panel - where to find the Scraping Browser link" width="1153" height="720" loading="lazy"></p><p>&nbsp;</p><p>Copy the text from the next page into your clipboard, it will start with <strong>wss://</strong>, be sure to not include any quotes or other text, copy the whole text which is marked in blue below.</p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_34.png" data-entity-uuid="d90064d8-1c18-428b-a0f3-0dbc1af510f9" data-entity-type="file" alt="The Bright Data control panel - where to find the Scraping Browser wss:// link for changedetection" width="708" height="577" loading="lazy"><p>&nbsp;</p><p>Now for the fun part, jump over to &nbsp;your changedetection.io login and click on <strong>Settings &gt; CAPTCHA &amp; Proxies, </strong>scroll down to the <em><strong>"Extra Browsers" </strong></em>section, give your browser a name (in this case just <strong>BrightData Scraping Browser</strong> and paste the <strong>wss://... </strong>type URL into the <strong>Connection URL</strong> box</p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_37.png" data-entity-uuid="36d00f7f-75c1-4ea3-8f69-932860d716ea" data-entity-type="file" alt="Adding a scraping browser to changedetection.io - step 1 adding the browser link" width="988" height="608" loading="lazy"><p>&nbsp;</p><p>Now to use your new Scraping Browser with any website you are watching for changes, simply click <strong>Edit</strong> in your overview list and select the new browser&nbsp;</p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_39.png" data-entity-uuid="03bf5840-30f9-41ef-92dc-34770ba222da" data-entity-type="file" alt="list of websites with changes overview" width="860" height="155" loading="lazy"><p>&nbsp;</p><p>Choose the new browser <strong>Bright Data Scraping Browser</strong> that we setup in the previous step.</p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_40.png" data-entity-uuid="36c2bac5-dc00-4125-8278-57e55d5cd53a" data-entity-type="file" alt="Setting a single website to use a particular scraping browsers" width="814" height="386" loading="lazy"><p>&nbsp;</p><p>And there you have it - <em>how to setup a scraping browser to get a much better success rate (say goodbye CAPTCHA!) when watching websites for changes.</em></p><p>&nbsp;</p><p><em>Some extra tips -</em></p><p>When you setup additional Residential Proxies - Bright Data will present you with a new "wss://.." type connection URL, so you can then setup multiple proxy networks with multiple scraping browsers.</p><p>Sign up with Bright Data using <a href="https://brightdata.grsm.io/n0r16zf7eivq">https://brightdata.grsm.io/n0r16zf7eivq</a> BrightData will match any first deposit up to $150</p><p>&nbsp;</p><h3>Troubleshooting&nbsp;</h3><p>If you see the error &nbsp;"<code>Overriding accept-language, user-agent headers forbidden</code>" - &nbsp;You may need to enable "<em>Custom headers</em>" - In your Brightdata dashboard - under the <code>Proxy Settings</code> &gt; <code>Configuration</code> &gt; <code>Advanced Settings</code> &gt; <code>Custom headers &amp; cookies</code> this option needs to be activated. (changedetection.io may send its own custom User-Agent and other headers)<br><br>If you receive an error like <code>"WebSocket error: wss://brd-customer..... 403 wrong_auth"</code> then try whitelisting the IP of our server (or your local connection) from inside the BrightData control panel at the "Scraping Browser" settings - as an alternative to username+password (But still keep the username+password in the <code>wss://.. </code>connection URL)</p><p>&nbsp;</p><img src="/sites/changedetection.io/files/inline-images/image_52.png" data-entity-uuid="941c5e34-0be0-4861-8e65-fa61d518fb7e" data-entity-type="file" alt="Example of adding a white-list IP access in BrightData for the scraping browser" width="400" height="363" loading="lazy"><p>Happy changedetecting!</p><p>&nbsp;</p><p>&nbsp;</p></div> Thu, 16 Nov 2023 16:39:50 +0000 Stephen 24 at https://changedetection.io How to - Bright Data Proxies and changedetection.io https://changedetection.io/tutorial/how-bright-data-proxies-and-changedetectionio <span class="field field--name-title field--type-string field--label-hidden">How to - Bright Data Proxies and changedetection.io</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/tech-writer/stephen" class="username">Stephen</a></span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2023-08-02T17:36:12+02:00" title="Wednesday, August 2, 2023 - 17:36" class="datetime">Wed, 08/02/2023 - 17:36</time> </span> <div class="field field--name-field-topic field--type-entity-reference field--label-above"> <div class="field__label">Topic</div> <div class='field__items'> <div class="field__item"><a href="/topic/how" hreflang="en-gb">How-To</a></div> <div class="field__item"><a href="/topic/proxies" hreflang="en-gb">Proxies</a></div> </div> </div> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><img src="/sites/changedetection.io/files/inline-images/image_4_0.png" data-entity-uuid="f5cb04c0-6de0-461b-b245-cde99e52b47a" data-entity-type="file" width="588" height="753" loading="lazy"><p><br>Using Bright Data proxies with changedetection.io is super beneficial for accessing web pages from different countries and bypassing blocking rules for several reasons:</p><ol><li><strong>Geographical Location</strong>: Proxies allow you to route your web requests through servers located in different countries. By doing so, you can effectively appear as if you are accessing the web from that specific country. This is useful for accessing geo-restricted content or services that are only available to users in certain regions.</li><li><strong>Bypassing IP Restrictions</strong>: Some websites implement IP-based restrictions to limit access to their content or prevent excessive requests from the same IP address. Proxies help you overcome these restrictions by changing your IP address, so you can access the website as if you were coming from a different IP.</li><li><strong>Overcoming Censorship</strong>: In regions with strict internet censorship, certain websites and online services may be blocked by the government or internet service providers. Proxies can help users bypass such censorship and access blocked content.</li><li><strong>Anonymity and Privacy</strong>: Proxies provide an extra layer of anonymity by masking your original IP address. This can be valuable for users who want to protect their online privacy and avoid being tracked by websites or third parties.</li><li><strong>Load Distribution</strong>: When accessing websites through proxies, the load is distributed across multiple servers. This can help prevent server overload and improve the overall performance and speed of web scraping or data collection activities.</li><li><strong>Scalability</strong>: Proxies allow you to scale your web scraping or data collection efforts by making multiple requests from different IP addresses. This helps avoid getting blocked by websites that have rate-limiting or anti-scraping measures in place.</li><li><strong>Legal and Ethical Compliance</strong>: Proxies can be used to access websites in a manner that is compliant with their terms of service and legal regulations. By rotating IP addresses and not overloading servers, you can ensure responsible data collection practices.</li></ol><p>&nbsp;</p><p><em>It's super easy to connect Bright Data proxies to your changedetection.io account or installation, head on over to </em><a href="https://brightdata.com/integration/changedetection"><em><strong>https://brightdata.com/integration/changedetection</strong></em></a><em> to find out more!</em></p><p>&nbsp;</p></div> Wed, 02 Aug 2023 15:36:12 +0000 Stephen 13 at https://changedetection.io