Scraping Extension

Select, organize, preview and turn any website into an API / spreadsheet with ease.

Get Started

This guide walks you through how to create your first task by scraping HackerNews. It shows you how to use our scraping extension to select, organize and preview then launch a task to run on the cloud.

Access Screpto's devtool tab

Once you've installed our extension, right click on any website and select Inspect. This would pull up the chrome devtools in which you'd find a tab called Screpto. You can drag the tab to the first position to make accessing it easy next time.

Alternatively, you can press cmd+option+j (MacOS) / control+alt+j (Windows) to pull up the chrome devtools.

Create your first task

After accessing Screpto in devtools, click on the New task button, which will open the task creation page.

  1. Change the default name of the task to HackerNews (Edit task name)
  2. Change first level node to array instead of object (Change node type)
  3. Rename first level node from data to articles (Rename node)
  4. Add 4 arrays inside articles array (Add array or object node)
  5. Rename the four arrays into title, index, points and link respectively (Rename node)
  6. Activate selection for title by clicking on the tag icon
    • Bulk select all similar titles from the page (Select elements)
    • Repeat same proceedure for the remaining arrays
  7. Select and change link attributes into href instead of content (Select and update multiple nodes)
  8. Preview results (Preview results)
  9. Reorder index array into first position inside articles
  10. Merge the arrays into multiple objects containing 4 properties title, index, points and link (Merge multiple arrays)
  11. Change schedule to run task every day (Schedule task)
  12. Setup pagination (Setup pagination)
  13. Launch task to run on the cloud by clicking on Create Task button
  14. Export task to JSON or CSV.

Edit task name

To edit a task name, you simply click on the text next to the back button. This would allow you to type in a custom name and you can click outside to save.

Example:

The default task name is retrieved from the page title.

Add Array or Object node

For full customizability of your task structure, Screpto supports many data types including Arrays and Objects so you can create complex data structures with ease.

To add on these data types:

  1. Click on a node's settings menu icon
  2. Select which type you would like to add

Example:

Change node type

Each node contains a sign of its content's type:

  • S: String
  • N: Number
  • {}: Object
  • []: Array

To change a node's type, you simply click on its sign.

Example:

Organize nodes

Use the drag-and-drop feature to organize and restructure your task in real time.

Rename node

Upon creation or addition of an element as a node, it's automatically assigned a random string which can later be changed. To change the node name:

  1. Click on the name part in the node
  2. Type in the new name
  3. Click outside the input to save

Example:

Beware of duplicate node names inside one object as this would result in the missing of one node in the results.

Delete nodes

In case you'd like to remove a node, you can simply remove it via the node settings menu.

Example:

Accidents can happen and we recommend you to check how to redo for fast correction 😊

Duplicate nodes

This feature allows you to copy a node and its contents without reference to the original node and it's also accessible via the node settings menu.

Example:

Applies to all nodes but the first level one.

Select elements from a page

Screpto allows you to select any element in the page such as images, paragraphs, headers and links and many more.

Signle selection of an element

To single select an element from a page:

  1. Enable selection by clicking the tag icon of the node in which you'd like to insert the selected element
  2. Click on the element in the page
  3. Press enter or click on the tag icon again to finish extraction and disable selection

Multi-selection of elements

  1. Enable selection by clicking the tag icon of the node in which you'd like to insert the selected element
  2. Click on the element in the page
  3. While staying on the node, press command (MacOS) / Control (Windows) key to highlight the similar elements, click on the same element to select all.
  4. Press enter or click on the tag icon again to finish extraction and disable selection

In case you want to ignore the selection, you can press ESC key on your keyboard.

Change node attribute

Upon selection of an element, Screpto also extracts all of its attribute and assigns the default attribute automatically. You can change a node's attribute via the dropdown next the settings menu icon.

Example:

The dropdown will not be displayed in case there are no other element attributes.

Select and update multiple nodes

To make updating multiple nodes easy, Screpto allows you to select multiple nodes and change their types or attributes by clicking and dragging selection box over multiple nodes.

Example:

💡 Currently, the selection size is limited to the height of your window.

Merge multiple arrays

This feature allows you to merge multiple arrays containing the same number of children into one array with multiple objects with each object containing a single value from each of the original arrays.(see example below)

The merging icon only appear when the following criteria are met:

  • Parent node should be an array as well as the descendant nodes
  • All parent node's descendants should have the same number of children nodes

Preview results

Previewing results in JSON format can either be done by clicking on the eye icon or via commands:

  • MacOS: option+v
  • Windows: alt+v

Example:

Copy results

Whenever you're satisfied with your selection, you can instantly copy results to your clipboard as JSON by clicking the copy icon in the top bar.

Example:

Reset task

In case you'd like to reset the whole task's changes, you can do so by clicking on Reset in the first level node's settings menu.

Example:

Export

To export your task's results, you can simply go to the task settings tab and select a format (CSV or JSON), then click download.

Example:

Undo / Redo

Accidents can happen, that's why we're supporting history changes 😀

  • To undo changes:

    • MacOS: command+z
    • Windows: control+z
  • To redo changes:

    • MacOS: command+shift+z
    • Windows: control+shift+z

Schedule task

Upon the completion of setting up your task, you can schedule it to get fresh data on the cloud by accessing to the task settings page, changing the Schedule value from disabled to one of the following values:

  • Every 30 minutes
  • Every hour
  • Every half a day
  • Every day
  • Every week
  • Every month

For a one time run, you can leave it set to disabled.

Setup pagination

Setting up pagination for a task starts with getting the next page URL. Once you have the URL, it suffices to replace the changing part of it with [pagination] and specifying the number of pages you'd like to get.

Example:

  • Original URL: https://news.ycombinator.com/news?p=2
  • Pagination URL: https://news.ycombinator.com/news?p=[pagination]

Pagination can come in many different ways but we currently only support Next page and numbered pagination