How to Scrape Data and Save to Google Sheets with n8n
If you're still copying data from websites into spreadsheets by hand, you're leaving hours on the table every week. n8n gives you a visual way to build scraping pipelines that run on a schedule, handl
If you're still copying data from websites into spreadsheets by hand, you're leaving hours on the table every week. n8n gives you a visual way to build scraping pipelines that run on a schedule, handle pagination, and dump everything straight into Google Sheets — no custom server required, no glue code between tools. This guide walks you through exactly how to set that up.
What You'll Need Before You Start
Before building the workflow, make sure you have these pieces in place:
- A running n8n instance — either self-hosted or via n8n Cloud
- A Google account with Google Sheets API enabled and OAuth credentials configured in n8n
- The target URL you want to scrape (a public page works best to start)
- Basic understanding of CSS selectors or XPath to extract specific elements
The Google Sheets credential setup is the part that trips people up most. Go to your Google Cloud Console, create an OAuth 2.0 client ID, and add the Sheets and Drive scopes. In n8n, create a new Google credential using that client ID and secret, then authorize it. Once that's working, the rest is straightforward.
Building the Scraping Workflow
Start with an HTTP Request node. This is your data fetcher — point it at the URL, set the method to GET, and leave authentication empty for public pages. For pages behind a login, you'll use cookie-based auth or a session token in the headers.
Once you have the raw HTML response, add an HTML node. This is where you extract the actual data using CSS selectors. Configure it like this:
- Set the extraction type to "HTML" and point it at the response body from the previous node
- Define each field you want: name, price, URL, date — whatever the page contains
- Use selectors like
.product-titleortable tr td:nth-child(2)to target specific elements - For lists of items, enable the "Return Array" option so you get one item per row
If the site loads content dynamically via JavaScript, the HTTP Request node won't see it. In that case, you'll need to either find the underlying API endpoint the page calls (check the Network tab in DevTools) or use a browser automation approach via Puppeteer — though that adds complexity. Most data-heavy sites expose a JSON API that's actually easier to work with than scraping the HTML directly.
Cleaning and Transforming the Data
Raw scraped data is almost never in the shape you want. Add a Code node after the HTML extractor to normalize it before it hits your sheet.
Common transformations you'll handle here:
- Stripping whitespace and newlines from extracted text with
.trim() - Parsing prices from strings like "$1,299.00" into numbers
- Converting date strings into a consistent ISO format
- Deduplicating items if the page has repeated elements
- Adding a scraped_at timestamp so you know when each row was collected
Keep this transformation logic in the Code node rather than trying to do it inside the Google Sheets node. It's easier to debug and easier to update when the source page changes its structure — which it will.
Writing to Google Sheets
Add a Google Sheets node at the end of the chain. Set the operation to "Append or Update" if you want to avoid duplicates on repeat runs, or just "Append" if you're building a time-series log where duplicate rows are expected.
Map your transformed fields to the sheet columns explicitly. Don't rely on automatic column detection — it breaks when columns are reordered. Set up your sheet header row manually first, then map each n8n field to the exact column name.
- Use "Append or Update" with a key column (like a product ID or URL) to upsert records
- Use "Append" for logs where you want every run's data preserved
- Set the range to something like
Sheet1!A:Zto give the node room to grow - Enable "Value Input Option: RAW" if you're writing dates or numbers and want Sheets to interpret them correctly
Finally, add a Schedule Trigger at the top of the workflow to run it automatically. For most scraping jobs, daily or hourly is enough. Set the trigger, activate the workflow, and your sheet populates itself.
Handling Pagination and Rate Limits
Single-page scraping is the easy case. If the data spans multiple pages, you'll use a Loop node in n8n to iterate through page numbers or cursor-based pagination. The pattern is: fetch page → extract data → append to sheet → check if there's a next page → loop or stop.
For rate limiting, add a Wait node between iterations — 1 to 2 seconds is usually enough to avoid getting blocked. If the site returns 429s, increase the wait time or randomize it slightly. For heavy scraping, consider adding a proxy rotation layer, but that's rarely necessary for smaller datasets.
If you'd rather skip the setup and start with a working pipeline, check out ready-made n8n templates — there are pre-built workflows for common scraping patterns that you can import and adapt in minutes instead of building from scratch.
Web scraping with n8n hits a sweet spot: it's flexible enough to handle real-world messiness, but visual enough that you can debug a broken selector without digging through logs. Once the workflow is running, the data just shows up in your sheet. That's the part worth automating.