Scraping Web Data Up To 500% Faster
November 06, 2017
We are excited to announce an impressive new slate of updates to Mozenda. Building on important tools released in October, our update provides you with new features for Request Blocker and Job Sequencer to make your scraping experience even better. Coupled with a brand new tool, Run JavaScript Action, this release will make your agents more efficient, effective, and faster.
In doing this, we are demonstrating our continued commitment to bringing you not only the best and most effective web scraping software, but also our dedication to building software that evolves to meet the challenges and needs you experience each day.
Let’s dive in.
Auto-Blocker (or Request Blocker Version 2)
The original version of Request Blocker released in October enabled you to block navigation requests in each agent that slowed down the data gathering process making your agent faster. Our new Auto-Blocker feature within the Request Blocker functionality allows you to get results even faster with less heavy lifting to more effectively scrape the data you need.
On this new release, Shane Whitlock, a Mozenda developer who worked on the new Auto-Blocker feature, said, “With Request Blocker Version 2 users can compare every navigation request against a list of more than 500,000 domains that we have identified as potentially bad or unnecessary. This will automatically block many requests from ad servers that slow down an agent and the agent building process. There is a toggle button to turn this setting on or off. With this update, you can get some of the performance gains of request blocking without having to do any of the dirty work.”
You can turn on the Auto-Blocker by toggling it on/off in the Navigation Request window. When matches with the Auto-Block list are found, the Request Blocker control will light up with a green background to let you know there are requests that Auto-Block suggests you block. When you click through to the Request Blocking editor, the Auto-Block matches will prompt you to keep or deny. While additional gains can be achieved through manual and personalized request blocking, the results from simply clicking a button to accept the recommended blocks will be significant.
Here are our latest speed testing results showing the use of the Agent Builder with or without the Request Blocker feature:
View our Help Center’s guided tutorial of Auto-Block.
Job Sequencer Version 2
Building upon the Job Sequencer release last month, the Mozenda team is excited to give you three new tools within Job Sequencer that remove roadblocks to greater efficiency:
- Delete a View
- Update Field View
- Run a Sequence
Kenny Nielsen, an Account Manager in Mozenda’s Professional Services department who relies on the Job Sequencer functionality to gather millions of pieces of data for Mozenda clients, said, “The Professional Services team has used API functionality inside a sequencer environment for some time, and now that functionality is available in an automated step. These new steps allow you to do even more with the Job Sequencer tool, making it more versatile and customizable for the wide variety of ways our customers run their data.”
Be sure to check out our Help Center’s step-by-step tutorial of the new Job Sequencer features.
The first new step is the ability to Delete a View. This allows you to specify certain data from a scrape and delete it. For example, you may be scraping data about luggage but only want to compare the options that are $99 or less. You can set view that includes anything $100 or more and then use Delete View Data to remove that data from the collection. This feature gives you the control to view only the data you want to.
The second new feature is Update Field Value. It’s similar to the Delete a View feature, but instead of deleting the data it enables you to change the data. This can be especially helpful when scraping large amounts of data as it empowers you to do it in batches. Simply choose a view and a field from that subset, and then select a value and this feature allows you to change what that value is.
For example, say you are creating a web data extraction project that involves 100 items and each has a file that needs to be downloaded. Originally, the only option to download all of those files was to wait until the scrape had completed all 100 items. Now, you can set a view that only begins the file downloads on those that have a “Ready” status. You can use the Update Field Value feature to update 10 fields at a time to say “Ready” and then download 10 files at a time, instead of 100 at once. You can then update the field to say “Done” after download to know which items have been published.
The final feature is aptly named Run a Sequence. Now you can run another sequence in addition to the one currently running. This is important for three reasons:
- It breaks up the work. This sequence can start multiple other sequences to compartmentalize the scrape into smaller, easier to digest components.
- It allows for two very similar sequences to run each other. For example, after a substantial amount of data has been collected, you can start another sequence to start collecting more data right away while the large scrape publishes simultaneously.
- Now you can reuse repeated steps. You can reuse the same five steps for every single sequence every time instead of building them in every time.
Chris Curtis, Mozenda’s System Architect, said of these new Job Sequencer features, “If you’re doing difficult, hard projects you’ll recognize the value of these features because you understand the pain of doing these things in large projects. Each one of these options solves a major pain point.”
Run JavaScript Action
Not all websites are created equal. That fact can make it very difficult to get the data you want from certain web pages. Now, with the Run JavaScript Action feature, Mozenda can help you get into the code of the websites you’re trying to pull data from and manipulate it to make the sites more conducive for scraping. This update will open doors to data and to do things that just weren’t possible before.
Chris Curtis, Mozenda’s System Architect, said, “It’s an enabling feature that opens doors and paths to things you just couldn’t do before. With ability to inject JavaScript into webpages, you can completely transform how a website performs. There’s no better—and sometimes no other possible way—to do it.”
The possibilities with this feature are open-ended and can be used in very diverse and specific instances. Here are just a few examples of how you can use this new feature:
- Currency conversion – Say you’re scraping product prices that are listed in Euros, but you need to compare them to your products, which are listed in US dollars. You can use a Capture Text action to collect the current conversion rate from website A, and then navigate to website B (the site with the prices you want to convert) and execute Run JavaScript Action to calculate the US dollars price and replace the Euros price on the page with this US dollars price. Then you can use a Capture Text Action to scrape the data from the site—with the prices in dollars.
- Calculations – If your competitor is running a 20% off sale, you may want to scrape their prices with that discount applied to make comparison easier. Run JavaScript Action can make that easy by calculating and applying the discount to each price before you scrape the data.
- Perform right-click – Some sites (like Google Drive) have right-click context menus specific to the site. Using Run JavaScript Action to right-click on these elements allows you to access this context menu in your automated data scrape.
- DOM manipulation – Simply put, this allows you to add, modify, or remove any elements on the page. For example, this could enable you to add a new column or row to a table.
Our new Run JavaScript Action feature is available in the “Add Actions” menu. Simply input the JavaScript code you’ve created and run it. We’ve used this feature internally for a number of clients, and it has been often been the critical key to getting the data needed. Run JavaScript Action is very helpful in controlling a website and bending it to a format that will easily work with the Mozenda agent—which enables you to get the data you need.
Need more details? Check out our Help Center’s detailed tutorial of the new Run JavaScript Action feature.
All of these features combine to make Mozenda an even more robust web scraping experience, bringing you the data you need faster than ever.
Demo Webinar Signup
We’ll also be hosting a live training webinar on Thursday, November 9 at 11 AM Mountain time. Space is limited so save your seat here right away.