diff --git a/docs/01_introduction/index.mdx b/docs/01_introduction/index.mdx
index 33feb04a..066687c4 100644
--- a/docs/01_introduction/index.mdx
+++ b/docs/01_introduction/index.mdx
@@ -6,20 +6,15 @@ slug: /overview
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
---
+import CodeBlock from '@theme/CodeBlock';
+
+import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
-```python
-from apify import Actor
-from bs4 import BeautifulSoup
-import requests
-
-async def main():
- async with Actor:
- input = await Actor.get_input()
- response = requests.get(input['url'])
- soup = BeautifulSoup(response.content, 'html.parser')
- await Actor.push_data({ 'url': input['url'], 'title': soup.title.string })
-```
+
+ {IntroductionExample}
+
## What are Actors
diff --git a/docs/01_introduction/quick-start.mdx b/docs/01_introduction/quick-start.mdx
index 3c991045..1e568c5b 100644
--- a/docs/01_introduction/quick-start.mdx
+++ b/docs/01_introduction/quick-start.mdx
@@ -13,6 +13,9 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
+import MainExample from '!!raw-loader!./code/actor_structure/main.py';
+import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/__main__.py';
+
## Step 1: Create Actors
To create and run Actors in Apify Console, refer to the [Console documentation](/platform/actors/development/quick-start/web-ide).
@@ -61,33 +64,14 @@ The Actor's source code is in the `src` folder. This folder contains two importa
- {
-`from apify import Actor
-${''}
-async def main():
- async with Actor:
- Actor.log.info('Actor input:', await Actor.get_input())
- await Actor.set_value('OUTPUT', 'Hello, world!')`
- }
+
+ {MainExample}
+
- {
-`import asyncio
-import logging
-${''}
-from apify.log import ActorLogFormatter
-${''}
-from .main import main
-${''}
-handler = logging.StreamHandler()
-handler.setFormatter(ActorLogFormatter())
-${''}
-apify_logger = logging.getLogger('apify')
-apify_logger.setLevel(logging.DEBUG)
-apify_logger.addHandler(handler)
-${''}
-asyncio.run(main())`
- }
+
+ {UnderscoreMainExample}
+
@@ -96,21 +80,30 @@ We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
## Next steps
+### Concepts
+
+To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:
+
+- [Actor lifecycle](../concepts/actor-lifecycle)
+- [Actor input](../concepts/actor-input)
+- [Working with storages](../concepts/storages)
+- [Actor events & state persistence](../concepts/actor-events)
+- [Proxy management](../concepts/proxy-management)
+- [Interacting with other Actors](../concepts/interacting-with-other-actors)
+- [Creating webhooks](../concepts/webhooks)
+- [Accessing Apify API](../concepts/access-apify-api)
+- [Logging](../concepts/logging)
+- [Actor configuration](../concepts/actor-configuration)
+- [Pay-per-event monetization](../concepts/pay-per-event)
+
### Guides
-To see how you can integrate the Apify SDK with some of the most popular web scraping libraries, check out our guides for working with:
+To see how you can integrate the Apify SDK with popular web scraping libraries, check out our guides:
-- [Requests or HTTPX](../guides/requests-and-httpx)
-- [Beautiful Soup](../guides/beautiful-soup)
+- [BeautifulSoup with HTTPX](../guides/beautifulsoup-httpx)
+- [Parsel with Impit](../guides/parsel-impit)
- [Playwright](../guides/playwright)
- [Selenium](../guides/selenium)
+- [Crawlee](../guides/crawlee)
- [Scrapy](../guides/scrapy)
-
-### Usage concepts
-
-To learn more about the features of the Apify SDK and how to use them, check out the Usage Concepts section in the sidebar, especially the guides for:
-
-- [Actor lifecycle](../concepts/actor-lifecycle)
-- [Working with storages](../concepts/storages)
-- [Handling Actor events](../concepts/actor-events)
-- [How to use proxies](../concepts/proxy-management)
+- [Running webserver](../guides/running-webserver)
diff --git a/docs/03_guides/01_beautifulsoup_httpx.mdx b/docs/03_guides/01_beautifulsoup_httpx.mdx
index 42452a2a..166261a0 100644
--- a/docs/03_guides/01_beautifulsoup_httpx.mdx
+++ b/docs/03_guides/01_beautifulsoup_httpx.mdx
@@ -28,3 +28,9 @@ Below is a simple Actor that recursively scrapes titles from all linked websites
## Conclusion
In this guide, you learned how to use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) with the [HTTPX](https://www.python-httpx.org/) in your Apify Actors. By combining these libraries, you can efficiently extract data from HTML or XML files, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: BeautifulSoup](https://apify.com/templates/python-beautifulsoup)
+- [BeautifulSoup: Official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
+- [HTTPX: Official documentation](https://www.python-httpx.org/)
diff --git a/docs/03_guides/02_parsel_impit.mdx b/docs/03_guides/02_parsel_impit.mdx
index 0b572bf8..b68efec4 100644
--- a/docs/03_guides/02_parsel_impit.mdx
+++ b/docs/03_guides/02_parsel_impit.mdx
@@ -26,3 +26,9 @@ The following example shows a simple Actor that recursively scrapes titles from
## Conclusion
In this guide, you learned how to use [Parsel](https://github.com/scrapy/parsel) with [Impit](https://github.com/apify/impit) in your Apify Actors. By combining these libraries, you get a powerful and efficient solution for web scraping: [Parsel](https://github.com/scrapy/parsel) provides excellent CSS selector and XPath support for data extraction, while [Impit](https://github.com/apify/impit) offers a fast and simple HTTP client built by Apify. This combination makes it easy to build scalable web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
+- [Parsel: GitHub repository](https://github.com/scrapy/parsel)
+- [Impit: GitHub repository](https://github.com/apify/impit)
diff --git a/docs/03_guides/03_playwright.mdx b/docs/03_guides/03_playwright.mdx
index 2c7428a5..16de8b67 100644
--- a/docs/03_guides/03_playwright.mdx
+++ b/docs/03_guides/03_playwright.mdx
@@ -10,6 +10,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import PlaywrightExample from '!!raw-loader!roa-loader!./code/03_playwright.py';
+In this guide, you'll learn how to use [Playwright](https://playwright.dev) for web scraping in your Apify Actors.
+
+## Introduction
+
[Playwright](https://playwright.dev) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Playwright for web scraping include:
@@ -19,8 +23,6 @@ Some of the key features of Playwright for web scraping include:
- **Powerful selectors** - Playwright provides a variety of powerful selectors that allow you to target specific elements on a web page, including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms, and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Playwright in Actors
-
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image, including the tools and setup necessary to run browsers in headful mode.
@@ -55,3 +57,9 @@ It uses Playwright to open the pages in an automated Chrome browser, and to extr
## Conclusion
In this guide you learned how to create Actors that use Playwright to scrape websites. Playwright is a powerful tool that can be used to manage browser instances and scrape websites that require JavaScript execution. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Playwright + Chrome](https://apify.com/templates/python-playwright)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Playwright: Official documentation](https://playwright.dev/python/)
diff --git a/docs/03_guides/04_selenium.mdx b/docs/03_guides/04_selenium.mdx
index bbc6abe1..a7c9ed19 100644
--- a/docs/03_guides/04_selenium.mdx
+++ b/docs/03_guides/04_selenium.mdx
@@ -7,6 +7,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import SeleniumExample from '!!raw-loader!roa-loader!./code/04_selenium.py';
+In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for web scraping in your Apify Actors.
+
+## Introduction
+
[Selenium](https://www.selenium.dev/) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Selenium for web scraping include:
@@ -21,8 +25,6 @@ including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Selenium in Actors
-
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Selenium and the necessary browsers preinstalled in its Docker image,
@@ -44,3 +46,8 @@ It uses Selenium ChromeDriver to open the pages in an automated Chrome browser,
## Conclusion
In this guide you learned how to use Selenium for web scraping in Apify Actors. You can now create your own Actors that use Selenium to scrape dynamic websites and interact with web pages just like a human would. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Selenium + Chrome](https://apify.com/templates/python-selenium)
+- [Selenium: Official documentation](https://www.selenium.dev/documentation/)
diff --git a/docs/03_guides/05_crawlee.mdx b/docs/03_guides/05_crawlee.mdx
index ed805dea..f6050654 100644
--- a/docs/03_guides/05_crawlee.mdx
+++ b/docs/03_guides/05_crawlee.mdx
@@ -44,3 +44,12 @@ The [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler
## Conclusion
In this guide, you learned how to use the [Crawlee](https://crawlee.dev/python) library in your Apify Actors. By using the [`BeautifulSoupCrawler`](https://crawlee.dev/python/api/class/BeautifulSoupCrawler), [`ParselCrawler`](https://crawlee.dev/python/api/class/ParselCrawler), and [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler) crawlers, you can efficiently scrape static or dynamic web pages, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Crawlee + BeautifulSoup](https://apify.com/templates/python-crawlee-beautifulsoup)
+- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Crawlee: Official website](https://crawlee.dev/python)
+- [Crawlee: Documentation](https://crawlee.dev/python/docs)
+- [Crawlee: GitHub repository](https://github.com/apify/crawlee-python)
diff --git a/docs/03_guides/06_scrapy.mdx b/docs/03_guides/06_scrapy.mdx
index 7d790b7d..ac9e5fa2 100644
--- a/docs/03_guides/06_scrapy.mdx
+++ b/docs/03_guides/06_scrapy.mdx
@@ -13,6 +13,10 @@ import ItemsExample from '!!raw-loader!./code/scrapy_project/src/items.py';
import SpidersExample from '!!raw-loader!./code/scrapy_project/src/spiders/title.py';
import SettingsExample from '!!raw-loader!./code/scrapy_project/src/settings.py';
+In this guide, you'll learn how to use the [Scrapy](https://scrapy.org/) framework in your Apify Actors.
+
+## Introduction
+
[Scrapy](https://scrapy.org/) is an open-source web scraping framework for Python. It provides tools for defining scrapers, extracting data from web pages, following links, and handling pagination. With the Apify SDK, Scrapy projects can be converted into Apify [Actors](https://docs.apify.com/platform/actors), integrated with Apify [storages](https://docs.apify.com/platform/storage), and executed on the Apify [platform](https://docs.apify.com/platform).
## Integrating Scrapy with the Apify platform
diff --git a/docs/03_guides/07_running_webserver.mdx b/docs/03_guides/07_running_webserver.mdx
index d9deedc1..9c9ef474 100644
--- a/docs/03_guides/07_running_webserver.mdx
+++ b/docs/03_guides/07_running_webserver.mdx
@@ -1,12 +1,16 @@
---
id: running-webserver
-title: Running webserver in your Actor
+title: Running webserver
---
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import WebserverExample from '!!raw-loader!roa-loader!./code/07_webserver.py';
+In this guide, you'll learn how to run a web server inside your Apify Actor. This is useful for monitoring Actor progress, creating custom APIs, or serving content during the Actor run.
+
+## Introduction
+
Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`), which enables HTTP access to an optional web server running inside the Actor run's container.
The URL is available in the following places:
@@ -17,10 +21,18 @@ The URL is available in the following places:
The web server running inside the container must listen at the port defined by the `Actor.configuration.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
-## Example
+## Example Actor
-The following example demonstrates how to start a simple web server in your Actor,which will respond to every GET request with the number of items that the Actor has processed so far:
+The following example demonstrates how to start a simple web server in your Actor, which will respond to every GET request with the number of items that the Actor has processed so far:
{WebserverExample}
+
+## Conclusion
+
+In this guide, you learned how to run a web server inside your Apify Actor. By leveraging the container URL and port provided by the platform, you can expose HTTP endpoints for monitoring, reporting, or serving content during Actor execution. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU).
+
+## Additional resources
+
+- [Apify templates: Standby Python project](https://apify.com/templates/python-standby)
diff --git a/docs/03_guides/code/07_webserver.py b/docs/03_guides/code/07_webserver.py
index d4bc0655..66ecfe3c 100644
--- a/docs/03_guides/code/07_webserver.py
+++ b/docs/03_guides/code/07_webserver.py
@@ -7,9 +7,9 @@
http_server = None
-# Just a simple handler that will print the number of processed items so far
-# on every GET request.
class RequestHandler(BaseHTTPRequestHandler):
+ """A handler that prints the number of processed items on every GET request."""
+
def do_get(self) -> None:
self.log_request()
self.send_response(200)
@@ -18,8 +18,7 @@ def do_get(self) -> None:
def run_server() -> None:
- # Start the HTTP server on the provided port,
- # and save a reference to the server.
+ """Start the HTTP server on the provided port, and save a reference to the server."""
global http_server
with ThreadingHTTPServer(
('', Actor.configuration.web_server_port), RequestHandler
diff --git a/website/versioned_docs/version-1.7/01-introduction/code/01_introduction.py b/website/versioned_docs/version-1.7/01-introduction/code/01_introduction.py
new file mode 100644
index 00000000..6b744b14
--- /dev/null
+++ b/website/versioned_docs/version-1.7/01-introduction/code/01_introduction.py
@@ -0,0 +1,12 @@
+import requests
+from bs4 import BeautifulSoup
+
+from apify import Actor
+
+
+async def main() -> None:
+ async with Actor:
+ actor_input = await Actor.get_input()
+ response = requests.get(actor_input['url'])
+ soup = BeautifulSoup(response.content, 'html.parser')
+ await Actor.push_data({'url': actor_input['url'], 'title': soup.title.string})
diff --git a/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/__main__.py b/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/__main__.py
new file mode 100644
index 00000000..2b2e3e7b
--- /dev/null
+++ b/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/__main__.py
@@ -0,0 +1,15 @@
+import asyncio
+import logging
+
+from apify.log import ActorLogFormatter
+
+from .main import main
+
+handler = logging.StreamHandler()
+handler.setFormatter(ActorLogFormatter())
+
+apify_logger = logging.getLogger('apify')
+apify_logger.setLevel(logging.DEBUG)
+apify_logger.addHandler(handler)
+
+asyncio.run(main())
diff --git a/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/main.py b/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/main.py
new file mode 100644
index 00000000..97bb5956
--- /dev/null
+++ b/website/versioned_docs/version-1.7/01-introduction/code/actor_structure/main.py
@@ -0,0 +1,7 @@
+from apify import Actor
+
+
+async def main() -> None:
+ async with Actor:
+ Actor.log.info('Actor input:', await Actor.get_input())
+ await Actor.set_value('OUTPUT', 'Hello, world!')
diff --git a/website/versioned_docs/version-1.7/01-introduction/index.mdx b/website/versioned_docs/version-1.7/01-introduction/index.mdx
index d4cc4cd3..575abdcf 100644
--- a/website/versioned_docs/version-1.7/01-introduction/index.mdx
+++ b/website/versioned_docs/version-1.7/01-introduction/index.mdx
@@ -6,20 +6,15 @@ slug: /overview
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
---
+import CodeBlock from '@theme/CodeBlock';
+
+import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python.
-```python
-from apify import Actor
-from bs4 import BeautifulSoup
-import requests
-
-async def main():
- async with Actor:
- actor_input = await Actor.get_input()
- response = requests.get(actor_input['url'])
- soup = BeautifulSoup(response.content, 'html.parser')
- await Actor.push_data({ 'url': actor_input['url'], 'title': soup.title.string })
-```
+
+ {IntroductionExample}
+
## What are Actors?
diff --git a/website/versioned_docs/version-1.7/01-introduction/quick-start.mdx b/website/versioned_docs/version-1.7/01-introduction/quick-start.mdx
index 1d039fc9..7bed47f3 100644
--- a/website/versioned_docs/version-1.7/01-introduction/quick-start.mdx
+++ b/website/versioned_docs/version-1.7/01-introduction/quick-start.mdx
@@ -13,6 +13,9 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
+import MainExample from '!!raw-loader!./code/actor_structure/main.py';
+import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/__main__.py';
+
## Step 1: Create Actors
To create and run Actors through Apify Console, see the [Console documentation](https://docs.apify.com/academy/getting-started/creating-actors#choose-your-template).
@@ -59,33 +62,14 @@ The Actor's source code is in the `src` folder. This folder contains two importa
- {
-`from apify import Actor
-${''}
-async def main():
- async with Actor:
- Actor.log.info('Actor input:', await Actor.get_input())
- await Actor.set_value('OUTPUT', 'Hello, world!')`
- }
+
+ {MainExample}
+
- {
-`import asyncio
-import logging
-${''}
-from apify.log import ActorLogFormatter
-${''}
-from .main import main
-${''}
-handler = logging.StreamHandler()
-handler.setFormatter(ActorLogFormatter())
-${''}
-apify_logger = logging.getLogger('apify')
-apify_logger.setLevel(logging.DEBUG)
-apify_logger.addHandler(handler)
-${''}
-asyncio.run(main())`
- }
+
+ {UnderscoreMainExample}
+
@@ -120,21 +104,28 @@ python -m pip install -r requirements.txt
## Next steps
+### Concepts
+
+To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:
+
+- [Actor lifecycle](../concepts/actor-lifecycle)
+- [Actor input](../concepts/actor-input)
+- [Working with storages](../concepts/storages)
+- [Handling Actor events & persisting state](../concepts/actor-events)
+- [Proxy management](../concepts/proxy-management)
+- [Interacting with other Actors](../concepts/interacting-with-other-actors)
+- [Creating webhooks](../concepts/webhooks)
+- [Accessing the Apify API](../concepts/access-apify-api)
+- [Logging](../concepts/logging)
+- [Actor configuration and environment variables](../concepts/configuration)
+
### Guides
-To see how you can integrate the Apify SDK with some of the most popular web scraping libraries, check out our guides for working with:
+To see how you can integrate the Apify SDK with popular web scraping libraries, check out our guides:
-- [Requests or HTTPX](../guides/requests-and-httpx)
+- [Requests and HTTPX](../guides/requests-and-httpx)
- [Beautiful Soup](../guides/beautiful-soup)
- [Playwright](../guides/playwright)
- [Selenium](../guides/selenium)
- [Scrapy](../guides/scrapy)
-
-### Concepts
-
-To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar, especially the guides for:
-
-- [Actor lifecycle](../concepts/actor-lifecycle)
-- [Working with storages](../concepts/storages)
-- [Handling Actor events](../concepts/actor-events)
-- [How to use proxies](../concepts/proxy-management)
+- [Running webserver](../guides/running-webserver)
diff --git a/website/versioned_docs/version-1.7/02-guides/01-requests-and-httpx.mdx b/website/versioned_docs/version-1.7/02-guides/01-requests-and-httpx.mdx
index 21d72f4f..dbc31ad7 100644
--- a/website/versioned_docs/version-1.7/02-guides/01-requests-and-httpx.mdx
+++ b/website/versioned_docs/version-1.7/02-guides/01-requests-and-httpx.mdx
@@ -3,6 +3,10 @@ title: Using Requests and HTTPX
sidebar_label: Using Requests and HTTPX
---
+In this guide, you'll learn how to use the [Requests](https://requests.readthedocs.io) and [HTTPX](https://www.python-httpx.org/) libraries for making HTTP requests in your Apify Actors.
+
+## Introduction
+
To use either of the libraries mentioned below in your Actors,
you can start from the [Start with Python](https://apify.com/templates?category=python) Actor template.
@@ -98,3 +102,13 @@ async def main():
```
To learn more about using proxies in your Actor with `httpx`, check the [documentation for proxy management](../concepts/proxy-management).
+
+## Conclusion
+
+In this guide, you learned how to use the Requests and HTTPX libraries for making HTTP requests in your Apify Actors. These libraries provide flexible and efficient ways to fetch web content, with HTTPX offering asynchronous support for parallel requests. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Getting started with Python](https://apify.com/templates/python-start)
+- [Requests: Official documentation](https://requests.readthedocs.io)
+- [HTTPX: Official documentation](https://www.python-httpx.org/)
diff --git a/website/versioned_docs/version-1.7/02-guides/02-beautiful-soup.mdx b/website/versioned_docs/version-1.7/02-guides/02-beautiful-soup.mdx
index a625741f..f8b1c40b 100644
--- a/website/versioned_docs/version-1.7/02-guides/02-beautiful-soup.mdx
+++ b/website/versioned_docs/version-1.7/02-guides/02-beautiful-soup.mdx
@@ -3,12 +3,14 @@ title: Using Beautiful Soup
sidebar_label: Using Beautiful Soup
---
+In this guide, you'll learn how to use [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) for web scraping in your Apify Actors.
+
+## Introduction
+
[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) is a Python library for pulling data out of HTML and XML files.
It provides simple methods and Pythonic idioms for navigating, searching, and modifying a website's element tree,
allowing you to quickly extract the data you need.
-## Using BeautifulSoup in Actors
-
To create Actors which use BeautifulSoup, start from the [BeautifulSoup & Python](https://apify.com/templates?category=python) Actor template.
This Actor template already contains the BeautifulSoup library preinstalled, which means you can start using it right away.
@@ -79,3 +81,12 @@ async def main():
# Mark the request as handled so it's not processed again
await default_queue.mark_request_as_handled(request)
```
+
+## Conclusion
+
+In this guide, you learned how to use Beautiful Soup for web scraping in your Apify Actors. Beautiful Soup makes it easy to parse HTML and XML documents and extract the data you need. See the [Actor templates](https://apify.com/templates?category=python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: BeautifulSoup](https://apify.com/templates/python-beautifulsoup)
+- [Beautiful Soup: Official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
diff --git a/website/versioned_docs/version-1.7/02-guides/03-playwright.mdx b/website/versioned_docs/version-1.7/02-guides/03-playwright.mdx
index 8094e621..66958a6a 100644
--- a/website/versioned_docs/version-1.7/02-guides/03-playwright.mdx
+++ b/website/versioned_docs/version-1.7/02-guides/03-playwright.mdx
@@ -7,6 +7,10 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
+In this guide, you'll learn how to use [Playwright](https://playwright.dev) for web scraping in your Apify Actors.
+
+## Introduction
+
[Playwright](https://playwright.dev) is a tool for web automation and testing that can also be used for web scraping.
It allows you to control a web browser programmatically and interact with web pages just as a human would.
@@ -22,8 +26,6 @@ including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms,
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Playwright in Actors
-
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates?category=python) Actor template.
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image,
@@ -118,3 +120,12 @@ async def main():
await page.close()
await default_queue.mark_request_as_handled(request)
```
+
+## Conclusion
+
+In this guide, you learned how to create Actors that use Playwright to scrape websites. Playwright is a powerful tool that can be used to manage browser instances and scrape websites that require JavaScript execution. See the [Actor templates](https://apify.com/templates?category=python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Playwright + Chrome](https://apify.com/templates/python-playwright)
+- [Playwright: Official documentation](https://playwright.dev/python/)
diff --git a/website/versioned_docs/version-1.7/02-guides/04-selenium.mdx b/website/versioned_docs/version-1.7/02-guides/04-selenium.mdx
index 3efa5149..48cc3276 100644
--- a/website/versioned_docs/version-1.7/02-guides/04-selenium.mdx
+++ b/website/versioned_docs/version-1.7/02-guides/04-selenium.mdx
@@ -3,6 +3,10 @@ title: Using Selenium
sidebar_label: Using Selenium
---
+In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for web scraping in your Apify Actors.
+
+## Introduction
+
[Selenium](https://www.selenium.dev/) is a tool for web automation and testing that can also be used for web scraping.
It allows you to control a web browser programmatically and interact with web pages just as a human would.
@@ -18,8 +22,6 @@ including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Selenium in Actors
-
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates?category=python) Actor template.
On the Apify platform, the Actor will already have Selenium and the necessary browsers preinstalled in its Docker image,
@@ -108,3 +110,12 @@ async def main():
driver.quit()
```
+
+## Conclusion
+
+In this guide, you learned how to use Selenium for web scraping in Apify Actors. You can now create your own Actors that use Selenium to scrape dynamic websites and interact with web pages just like a human would. See the [Actor templates](https://apify.com/templates?category=python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Selenium + Chrome](https://apify.com/templates/python-selenium)
+- [Selenium: Official documentation](https://www.selenium.dev/documentation/)
diff --git a/website/versioned_docs/version-1.7/02-guides/05-scrapy.mdx b/website/versioned_docs/version-1.7/02-guides/05-scrapy.mdx
index f73c4a3c..21eedc41 100644
--- a/website/versioned_docs/version-1.7/02-guides/05-scrapy.mdx
+++ b/website/versioned_docs/version-1.7/02-guides/05-scrapy.mdx
@@ -3,6 +3,10 @@ title: Using Scrapy
sidebar_label: Using Scrapy
---
+In this guide, you'll learn how to use the [Scrapy](https://scrapy.org/) framework in your Apify Actors.
+
+## Introduction
+
:::tip
Our CLI now has native support for running Scrapy spiders on Apify! Check out the [Scrapy migration guide](https://docs.apify.com/cli/docs/integrating-scrapy) for more information.
@@ -22,8 +26,6 @@ including how to follow links, how to handle pagination, and how to parse the da
allowing you to easily extract data from HTML and XML documents.
- **Integration with other tool** - Scrapy can be integrated with other Python tools like BeautifulSoup and Selenium for more advanced scraping tasks.
-## Using Scrapy in Actors
-
To create Actors which use Scrapy, start from the [Scrapy & Python](https://apify.com/templates?category=python) Actor template.
This template already contains the structure and setup necessary to integrate Scrapy into your Actors,
@@ -109,3 +111,13 @@ async def main():
process.crawl(TitleSpider, start_urls=start_urls)
process.start()
```
+
+## Conclusion
+
+In this guide, you learned how to use Scrapy in Apify Actors. You can now start building your own web scraping projects using Scrapy and host them on the Apify platform. See the [Actor templates](https://apify.com/templates?category=python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Scrapy](https://apify.com/templates/python-scrapy)
+- [Apify CLI: Integrating Scrapy projects](https://docs.apify.com/cli/docs/integrating-scrapy)
+- [Scrapy: Official documentation](https://docs.scrapy.org/)
diff --git a/website/versioned_docs/version-1.7/02-guides/06-running-webserver.mdx b/website/versioned_docs/version-1.7/02-guides/06-running-webserver.mdx
new file mode 100644
index 00000000..dc048042
--- /dev/null
+++ b/website/versioned_docs/version-1.7/02-guides/06-running-webserver.mdx
@@ -0,0 +1,38 @@
+---
+title: Running webserver
+sidebar_label: Running webserver
+---
+
+import CodeBlock from '@theme/CodeBlock';
+
+import WebserverExample from '!!raw-loader!./code/06_webserver.py';
+
+In this guide, you'll learn how to run a web server inside your Apify Actor. This is useful for monitoring Actor progress, creating custom APIs, or serving content during the Actor run.
+
+## Introduction
+
+Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`), which enables HTTP access to an optional web server running inside the Actor run's container.
+
+The URL is available in the following places:
+
+- In Apify Console, on the Actor run details page as the **Container URL** field.
+- In the API as the `containerUrl` property of the [Run object](https://docs.apify.com/api/v2#/reference/actors/run-object/get-run).
+- In the Actor as the `Actor.config.container_url` property.
+
+The web server running inside the container must listen at the port defined by the `Actor.config.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
+
+## Example Actor
+
+The following example demonstrates how to start a simple web server in your Actor, which will respond to every GET request with the number of items that the Actor has processed so far:
+
+
+ {WebserverExample}
+
+
+## Conclusion
+
+In this guide, you learned how to run a web server inside your Apify Actor. By leveraging the container URL and port provided by the platform, you can expose HTTP endpoints for monitoring, reporting, or serving content during Actor execution. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU).
+
+## Additional resources
+
+- [Apify templates: Standby Python project](https://apify.com/templates/python-standby)
diff --git a/website/versioned_docs/version-1.7/02-guides/code/06_webserver.py b/website/versioned_docs/version-1.7/02-guides/code/06_webserver.py
new file mode 100644
index 00000000..ac21f68e
--- /dev/null
+++ b/website/versioned_docs/version-1.7/02-guides/code/06_webserver.py
@@ -0,0 +1,43 @@
+import asyncio
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+
+from apify import Actor
+
+processed_items = 0
+http_server = None
+
+
+class RequestHandler(BaseHTTPRequestHandler):
+ """A handler that prints the number of processed items on every GET request."""
+
+ def do_GET(self):
+ self.log_request()
+ self.send_response(200)
+ self.end_headers()
+ self.wfile.write(bytes(f'Processed items: {processed_items}', encoding='utf-8'))
+
+
+def run_server():
+ """Start the HTTP server on the provided port, and save a reference to the server."""
+ global http_server
+ with ThreadingHTTPServer(('', Actor.config.container_port), RequestHandler) as server:
+ Actor.log.info(f'Server running on {Actor.config.container_url}')
+ http_server = server
+ server.serve_forever()
+
+
+async def main():
+ global processed_items
+ async with Actor:
+ # Start the HTTP server in a separate thread
+ run_server_task = asyncio.get_running_loop().run_in_executor(None, run_server)
+
+ # Simulate doing some work
+ for _ in range(100):
+ await asyncio.sleep(1)
+ processed_items += 1
+ Actor.log.info(f'Processed items: {processed_items}')
+
+ # Signal the HTTP server to shut down, and wait for it to finish
+ http_server.shutdown()
+ await run_server_task
diff --git a/website/versioned_docs/version-1.7/03-concepts/10-logging.mdx b/website/versioned_docs/version-1.7/03-concepts/09-logging.mdx
similarity index 100%
rename from website/versioned_docs/version-1.7/03-concepts/10-logging.mdx
rename to website/versioned_docs/version-1.7/03-concepts/09-logging.mdx
diff --git a/website/versioned_docs/version-1.7/03-concepts/09-running-webserver.mdx b/website/versioned_docs/version-1.7/03-concepts/09-running-webserver.mdx
deleted file mode 100644
index d1ad9d90..00000000
--- a/website/versioned_docs/version-1.7/03-concepts/09-running-webserver.mdx
+++ /dev/null
@@ -1,66 +0,0 @@
----
-title: Running a webserver in your Actor
-sidebar_label: Running a webserver
----
-
-Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`),
-which enables HTTP access to an optional web server running inside the Actor run's container.
-
-The URL is available in the following places:
-
-- In Apify Console, on the Actor run details page as the **Container URL** field.
-- In the API as the `containerUrl` property of the [Run object](https://docs.apify.com/api/v2#/reference/actors/run-object/get-run).
-- In the Actor as the `Actor.config.container_url` property.
-
-The web server running inside the container must listen at the port defined by the `Actor.config.container_port` property.
-When running Actors locally, the port defaults to `4321`,
-so the web server will be accessible at `http://localhost:4321`.
-
-## Example
-
-The following example demonstrates how to start a simple web server in your Actor,
-which will respond to every GET request with the number of items that the Actor has processed so far:
-
-```python title="src/main.py"
-import asyncio
-from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
-
-from apify import Actor
-
-processed_items = 0
-http_server = None
-
-# Just a simple handler that will print the number of processed items so far
-# on every GET request
-class RequestHandler(BaseHTTPRequestHandler):
- def do_GET(self):
- self.log_request()
- self.send_response(200)
- self.end_headers()
- self.wfile.write(bytes(f'Processed items: {processed_items}', encoding='utf-8'))
-
-def run_server():
- # Start the HTTP server on the provided port,
- # and save a reference to the server
- global http_server
- with ThreadingHTTPServer(('', Actor.config.container_port), RequestHandler) as server:
- Actor.log.info(f'Server running on {Actor.config.container_url}')
- http_server = server
- server.serve_forever()
-
-async def main():
- global processed_items
- async with Actor:
- # Start the HTTP server in a separate thread
- run_server_task = asyncio.get_running_loop().run_in_executor(None, run_server)
-
- # Simulate doing some work
- for _ in range(100):
- await asyncio.sleep(1)
- processed_items += 1
- Actor.log.info(f'Processed items: {processed_items}')
-
- # Signal the HTTP server to shut down, and wait for it to finish
- http_server.shutdown()
- await run_server_task
-```
diff --git a/website/versioned_docs/version-1.7/03-concepts/11-configuration.mdx b/website/versioned_docs/version-1.7/03-concepts/10-configuration.mdx
similarity index 100%
rename from website/versioned_docs/version-1.7/03-concepts/11-configuration.mdx
rename to website/versioned_docs/version-1.7/03-concepts/10-configuration.mdx
diff --git a/website/versioned_docs/version-2.7/01_introduction/quick-start.mdx b/website/versioned_docs/version-2.7/01_introduction/quick-start.mdx
index 3ea5a5d0..6864e89e 100644
--- a/website/versioned_docs/version-2.7/01_introduction/quick-start.mdx
+++ b/website/versioned_docs/version-2.7/01_introduction/quick-start.mdx
@@ -104,6 +104,22 @@ python -m pip install -r requirements.txt
## Next steps
+### Concepts
+
+For a deeper understanding of the Apify SDK's features, refer to the Concepts section in the sidebar:
+
+- [Actor lifecycle](../concepts/actor-lifecycle)
+- [Actor input](../concepts/actor-input)
+- [Working with storages](../concepts/storages)
+- [Actor events & state persistence](../concepts/actor-events)
+- [Proxy management](../concepts/proxy-management)
+- [Interacting with other Actors](../concepts/interacting-with-other-actors)
+- [Creating webhooks](../concepts/webhooks)
+- [Accessing Apify API](../concepts/access-apify-api)
+- [Logging](../concepts/logging)
+- [Actor configuration](../concepts/actor-configuration)
+- [Pay-per-event monetization](../concepts/pay-per-event)
+
### Guides
Integrate the Apify SDK with popular web scraping libraries by following these guides:
@@ -113,12 +129,4 @@ Integrate the Apify SDK with popular web scraping libraries by following these g
- [Playwright](../guides/playwright)
- [Selenium](../guides/selenium)
- [Scrapy](../guides/scrapy)
-
-### Concepts
-
-For a deeper understanding of the Apify SDK's features, refer to the Concepts section in the sidebar. Key topics include:
-
-- [Actor lifecycle](../concepts/actor-lifecycle)
-- [Working with storages](../concepts/storages)
-- [Handling Actor events](../concepts/actor-events)
-- [Using proxies](../concepts/proxy-management)
+- [Running webserver](../guides/running-webserver)
diff --git a/website/versioned_docs/version-2.7/02_guides/01_beautifulsoup_httpx.mdx b/website/versioned_docs/version-2.7/02_guides/01_beautifulsoup_httpx.mdx
index 4ecabd6e..bbb0f57b 100644
--- a/website/versioned_docs/version-2.7/02_guides/01_beautifulsoup_httpx.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/01_beautifulsoup_httpx.mdx
@@ -28,3 +28,9 @@ Below is a simple Actor that recursively scrapes titles from all linked websites
## Conclusion
In this guide, you learned how to use the `BeautifulSoup` with the `HTTPX` in your Apify Actors. By combining these libraries, you can efficiently extract data from HTML or XML files, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: BeautifulSoup](https://apify.com/templates/python-beautifulsoup)
+- [BeautifulSoup: Official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
+- [HTTPX: Official documentation](https://www.python-httpx.org/)
diff --git a/website/versioned_docs/version-2.7/02_guides/02_crawlee.mdx b/website/versioned_docs/version-2.7/02_guides/02_crawlee.mdx
index b040cad2..a0371889 100644
--- a/website/versioned_docs/version-2.7/02_guides/02_crawlee.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/02_crawlee.mdx
@@ -14,7 +14,7 @@ In this guide you'll learn how to use the [Crawlee](https://crawlee.dev/python)
`Crawlee` is a Python library for web scraping and browser automation that provides a robust and flexible framework for building web scraping tasks. It seamlessly integrates with the Apify platform and supports a variety of scraping techniques, from static HTML parsing to dynamic JavaScript-rendered content handling. Crawlee offers a range of crawlers, including HTTP-based crawlers like [`HttpCrawler`](https://crawlee.dev/python/api/class/HttpCrawler), [`BeautifulSoupCrawler`](https://crawlee.dev/python/api/class/BeautifulSoupCrawler) and [`ParselCrawler`](https://crawlee.dev/python/api/class/ParselCrawler), and browser-based crawlers like [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler), to suit different scraping needs.
-In this guide, you'll learn how to use Crawlee with `BeautifulSoupCrawler` and `PlaywrightCrawler` to build Apify Actors for web scraping.
+In this guide, you'll learn how to use [Crawlee](https://crawlee.dev/python) with `BeautifulSoupCrawler` and `PlaywrightCrawler` to build Apify Actors for web scraping.
## Actor with BeautifulSoupCrawler
@@ -35,3 +35,11 @@ The `PlaywrightCrawler` is built for handling dynamic web pages that rely on Jav
## Conclusion
In this guide, you learned how to use the `Crawlee` library in your Apify Actors. By using the `BeautifulSoupCrawler` and `PlaywrightCrawler` crawlers, you can efficiently scrape static or dynamic web pages, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Crawlee + BeautifulSoup](https://apify.com/templates/python-crawlee-beautifulsoup)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Crawlee: Official website](https://crawlee.dev/python)
+- [Crawlee: Documentation](https://crawlee.dev/python/docs)
+- [Crawlee: GitHub repository](https://github.com/apify/crawlee-python)
diff --git a/website/versioned_docs/version-2.7/02_guides/03_playwright.mdx b/website/versioned_docs/version-2.7/02_guides/03_playwright.mdx
index 8cada682..05409101 100644
--- a/website/versioned_docs/version-2.7/02_guides/03_playwright.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/03_playwright.mdx
@@ -9,6 +9,10 @@ import CodeBlock from '@theme/CodeBlock';
import PlaywrightExample from '!!raw-loader!./code/03_playwright.py';
+In this guide, you'll learn how to use [Playwright](https://playwright.dev) for web scraping in your Apify Actors.
+
+## Introduction
+
[Playwright](https://playwright.dev) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Playwright for web scraping include:
@@ -18,8 +22,6 @@ Some of the key features of Playwright for web scraping include:
- **Powerful selectors** - Playwright provides a variety of powerful selectors that allow you to target specific elements on a web page, including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms, and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Playwright in Actors
-
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image, including the tools and setup necessary to run browsers in headful mode.
@@ -54,3 +56,9 @@ It uses Playwright to open the pages in an automated Chrome browser, and to extr
## Conclusion
In this guide you learned how to create Actors that use Playwright to scrape websites. Playwright is a powerful tool that can be used to manage browser instances and scrape websites that require JavaScript execution. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Playwright + Chrome](https://apify.com/templates/python-playwright)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Playwright: Official documentation](https://playwright.dev/python/)
diff --git a/website/versioned_docs/version-2.7/02_guides/04_selenium.mdx b/website/versioned_docs/version-2.7/02_guides/04_selenium.mdx
index 834dc33c..95350439 100644
--- a/website/versioned_docs/version-2.7/02_guides/04_selenium.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/04_selenium.mdx
@@ -7,6 +7,10 @@ import CodeBlock from '@theme/CodeBlock';
import SeleniumExample from '!!raw-loader!./code/04_selenium.py';
+In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for web scraping in your Apify Actors.
+
+## Introduction
+
[Selenium](https://www.selenium.dev/) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Selenium for web scraping include:
@@ -21,8 +25,6 @@ including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Selenium in Actors
-
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Selenium and the necessary browsers preinstalled in its Docker image,
@@ -44,3 +46,8 @@ It uses Selenium ChromeDriver to open the pages in an automated Chrome browser,
## Conclusion
In this guide you learned how to use Selenium for web scraping in Apify Actors. You can now create your own Actors that use Selenium to scrape dynamic websites and interact with web pages just like a human would. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Selenium + Chrome](https://apify.com/templates/python-selenium)
+- [Selenium: Official documentation](https://www.selenium.dev/documentation/)
diff --git a/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx b/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx
index 95f34fae..feb858b9 100644
--- a/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx
@@ -13,6 +13,10 @@ import ItemsExample from '!!raw-loader!./code/scrapy_project/src/items.py';
import SpidersExample from '!!raw-loader!./code/scrapy_project/src/spiders/title.py';
import SettingsExample from '!!raw-loader!./code/scrapy_project/src/settings.py';
+In this guide, you'll learn how to use the [Scrapy](https://scrapy.org/) framework in your Apify Actors.
+
+## Introduction
+
[Scrapy](https://scrapy.org/) is an open-source web scraping framework for Python. It provides tools for defining scrapers, extracting data from web pages, following links, and handling pagination. With the Apify SDK, Scrapy projects can be converted into Apify [Actors](https://docs.apify.com/platform/actors), integrated with Apify [storages](https://docs.apify.com/platform/storage), and executed on the Apify [platform](https://docs.apify.com/platform).
## Integrating Scrapy with the Apify platform
diff --git a/website/versioned_docs/version-2.7/03_concepts/09_running_webserver.mdx b/website/versioned_docs/version-2.7/02_guides/06_running_webserver.mdx
similarity index 50%
rename from website/versioned_docs/version-2.7/03_concepts/09_running_webserver.mdx
rename to website/versioned_docs/version-2.7/02_guides/06_running_webserver.mdx
index 7d13a504..4a8ee87c 100644
--- a/website/versioned_docs/version-2.7/03_concepts/09_running_webserver.mdx
+++ b/website/versioned_docs/version-2.7/02_guides/06_running_webserver.mdx
@@ -1,11 +1,15 @@
---
id: running-webserver
-title: Running webserver in your Actor
+title: Running webserver
---
import CodeBlock from '@theme/CodeBlock';
-import WebserverExample from '!!raw-loader!./code/09_webserver.py';
+import WebserverExample from '!!raw-loader!./code/06_webserver.py';
+
+In this guide, you'll learn how to run a web server inside your Apify Actor. This is useful for monitoring Actor progress, creating custom APIs, or serving content during the Actor run.
+
+## Introduction
Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`), which enables HTTP access to an optional web server running inside the Actor run's container.
@@ -17,10 +21,18 @@ The URL is available in the following places:
The web server running inside the container must listen at the port defined by the `Actor.config.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
-## Example
+## Example Actor
-The following example demonstrates how to start a simple web server in your Actor,which will respond to every GET request with the number of items that the Actor has processed so far:
+The following example demonstrates how to start a simple web server in your Actor, which will respond to every GET request with the number of items that the Actor has processed so far:
{WebserverExample}
+
+## Conclusion
+
+In this guide, you learned how to run a web server inside your Apify Actor. By leveraging the container URL and port provided by the platform, you can expose HTTP endpoints for monitoring, reporting, or serving content during Actor execution. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU).
+
+## Additional resources
+
+- [Apify templates: Standby Python project](https://apify.com/templates/python-standby)
diff --git a/website/versioned_docs/version-2.7/03_concepts/code/09_webserver.py b/website/versioned_docs/version-2.7/02_guides/code/06_webserver.py
similarity index 87%
rename from website/versioned_docs/version-2.7/03_concepts/code/09_webserver.py
rename to website/versioned_docs/version-2.7/02_guides/code/06_webserver.py
index 48a5c10d..4cea7654 100644
--- a/website/versioned_docs/version-2.7/03_concepts/code/09_webserver.py
+++ b/website/versioned_docs/version-2.7/02_guides/code/06_webserver.py
@@ -7,9 +7,9 @@
http_server = None
-# Just a simple handler that will print the number of processed items so far
-# on every GET request.
class RequestHandler(BaseHTTPRequestHandler):
+ """A handler that prints the number of processed items on every GET request."""
+
def do_get(self) -> None:
self.log_request()
self.send_response(200)
@@ -18,8 +18,7 @@ def do_get(self) -> None:
def run_server() -> None:
- # Start the HTTP server on the provided port,
- # and save a reference to the server.
+ """Start the HTTP server on the provided port, and save a reference to the server."""
global http_server
with ThreadingHTTPServer(
('', Actor.config.web_server_port), RequestHandler
diff --git a/website/versioned_docs/version-2.7/03_concepts/10_logging.mdx b/website/versioned_docs/version-2.7/03_concepts/09_logging.mdx
similarity index 96%
rename from website/versioned_docs/version-2.7/03_concepts/10_logging.mdx
rename to website/versioned_docs/version-2.7/03_concepts/09_logging.mdx
index 69dea7dd..e1db8b53 100644
--- a/website/versioned_docs/version-2.7/03_concepts/10_logging.mdx
+++ b/website/versioned_docs/version-2.7/03_concepts/09_logging.mdx
@@ -5,8 +5,8 @@ title: Logging
import CodeBlock from '@theme/CodeBlock';
-import LogConfigExample from '!!raw-loader!./code/10_log_config.py';
-import LoggerUsageExample from '!!raw-loader!./code/10_logger_usage.py';
+import LogConfigExample from '!!raw-loader!./code/09_log_config.py';
+import LoggerUsageExample from '!!raw-loader!./code/09_logger_usage.py';
The Apify SDK is logging useful information through the [`logging`](https://docs.python.org/3/library/logging.html) module from Python's standard library, into the logger with the name `apify`.
diff --git a/website/versioned_docs/version-2.7/03_concepts/11_configuration.mdx b/website/versioned_docs/version-2.7/03_concepts/10_configuration.mdx
similarity index 96%
rename from website/versioned_docs/version-2.7/03_concepts/11_configuration.mdx
rename to website/versioned_docs/version-2.7/03_concepts/10_configuration.mdx
index 36ea66b5..980324f7 100644
--- a/website/versioned_docs/version-2.7/03_concepts/11_configuration.mdx
+++ b/website/versioned_docs/version-2.7/03_concepts/10_configuration.mdx
@@ -5,7 +5,7 @@ title: Actor configuration
import CodeBlock from '@theme/CodeBlock';
-import ConfigExample from '!!raw-loader!./code/11_config.py';
+import ConfigExample from '!!raw-loader!./code/10_config.py';
The [`Actor`](../../reference/class/Actor) class gets configured using the [`Configuration`](../../reference/class/Configuration) class, which initializes itself based on the provided environment variables.
diff --git a/website/versioned_docs/version-2.7/03_concepts/12_pay_per_event.mdx b/website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx
similarity index 100%
rename from website/versioned_docs/version-2.7/03_concepts/12_pay_per_event.mdx
rename to website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx
diff --git a/website/versioned_docs/version-2.7/03_concepts/code/10_log_config.py b/website/versioned_docs/version-2.7/03_concepts/code/09_log_config.py
similarity index 100%
rename from website/versioned_docs/version-2.7/03_concepts/code/10_log_config.py
rename to website/versioned_docs/version-2.7/03_concepts/code/09_log_config.py
diff --git a/website/versioned_docs/version-2.7/03_concepts/code/10_logger_usage.py b/website/versioned_docs/version-2.7/03_concepts/code/09_logger_usage.py
similarity index 100%
rename from website/versioned_docs/version-2.7/03_concepts/code/10_logger_usage.py
rename to website/versioned_docs/version-2.7/03_concepts/code/09_logger_usage.py
diff --git a/website/versioned_docs/version-2.7/03_concepts/code/11_config.py b/website/versioned_docs/version-2.7/03_concepts/code/10_config.py
similarity index 100%
rename from website/versioned_docs/version-2.7/03_concepts/code/11_config.py
rename to website/versioned_docs/version-2.7/03_concepts/code/10_config.py
diff --git a/website/versioned_docs/version-3.3/01_introduction/index.mdx b/website/versioned_docs/version-3.3/01_introduction/index.mdx
index 33feb04a..066687c4 100644
--- a/website/versioned_docs/version-3.3/01_introduction/index.mdx
+++ b/website/versioned_docs/version-3.3/01_introduction/index.mdx
@@ -6,20 +6,15 @@ slug: /overview
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
---
+import CodeBlock from '@theme/CodeBlock';
+
+import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
-```python
-from apify import Actor
-from bs4 import BeautifulSoup
-import requests
-
-async def main():
- async with Actor:
- input = await Actor.get_input()
- response = requests.get(input['url'])
- soup = BeautifulSoup(response.content, 'html.parser')
- await Actor.push_data({ 'url': input['url'], 'title': soup.title.string })
-```
+
+ {IntroductionExample}
+
## What are Actors
diff --git a/website/versioned_docs/version-3.3/01_introduction/quick-start.mdx b/website/versioned_docs/version-3.3/01_introduction/quick-start.mdx
index 3c991045..1e568c5b 100644
--- a/website/versioned_docs/version-3.3/01_introduction/quick-start.mdx
+++ b/website/versioned_docs/version-3.3/01_introduction/quick-start.mdx
@@ -13,6 +13,9 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
+import MainExample from '!!raw-loader!./code/actor_structure/main.py';
+import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/__main__.py';
+
## Step 1: Create Actors
To create and run Actors in Apify Console, refer to the [Console documentation](/platform/actors/development/quick-start/web-ide).
@@ -61,33 +64,14 @@ The Actor's source code is in the `src` folder. This folder contains two importa
- {
-`from apify import Actor
-${''}
-async def main():
- async with Actor:
- Actor.log.info('Actor input:', await Actor.get_input())
- await Actor.set_value('OUTPUT', 'Hello, world!')`
- }
+
+ {MainExample}
+
- {
-`import asyncio
-import logging
-${''}
-from apify.log import ActorLogFormatter
-${''}
-from .main import main
-${''}
-handler = logging.StreamHandler()
-handler.setFormatter(ActorLogFormatter())
-${''}
-apify_logger = logging.getLogger('apify')
-apify_logger.setLevel(logging.DEBUG)
-apify_logger.addHandler(handler)
-${''}
-asyncio.run(main())`
- }
+
+ {UnderscoreMainExample}
+
@@ -96,21 +80,30 @@ We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
## Next steps
+### Concepts
+
+To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:
+
+- [Actor lifecycle](../concepts/actor-lifecycle)
+- [Actor input](../concepts/actor-input)
+- [Working with storages](../concepts/storages)
+- [Actor events & state persistence](../concepts/actor-events)
+- [Proxy management](../concepts/proxy-management)
+- [Interacting with other Actors](../concepts/interacting-with-other-actors)
+- [Creating webhooks](../concepts/webhooks)
+- [Accessing Apify API](../concepts/access-apify-api)
+- [Logging](../concepts/logging)
+- [Actor configuration](../concepts/actor-configuration)
+- [Pay-per-event monetization](../concepts/pay-per-event)
+
### Guides
-To see how you can integrate the Apify SDK with some of the most popular web scraping libraries, check out our guides for working with:
+To see how you can integrate the Apify SDK with popular web scraping libraries, check out our guides:
-- [Requests or HTTPX](../guides/requests-and-httpx)
-- [Beautiful Soup](../guides/beautiful-soup)
+- [BeautifulSoup with HTTPX](../guides/beautifulsoup-httpx)
+- [Parsel with Impit](../guides/parsel-impit)
- [Playwright](../guides/playwright)
- [Selenium](../guides/selenium)
+- [Crawlee](../guides/crawlee)
- [Scrapy](../guides/scrapy)
-
-### Usage concepts
-
-To learn more about the features of the Apify SDK and how to use them, check out the Usage Concepts section in the sidebar, especially the guides for:
-
-- [Actor lifecycle](../concepts/actor-lifecycle)
-- [Working with storages](../concepts/storages)
-- [Handling Actor events](../concepts/actor-events)
-- [How to use proxies](../concepts/proxy-management)
+- [Running webserver](../guides/running-webserver)
diff --git a/website/versioned_docs/version-3.3/03_guides/01_beautifulsoup_httpx.mdx b/website/versioned_docs/version-3.3/03_guides/01_beautifulsoup_httpx.mdx
index 42452a2a..166261a0 100644
--- a/website/versioned_docs/version-3.3/03_guides/01_beautifulsoup_httpx.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/01_beautifulsoup_httpx.mdx
@@ -28,3 +28,9 @@ Below is a simple Actor that recursively scrapes titles from all linked websites
## Conclusion
In this guide, you learned how to use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) with the [HTTPX](https://www.python-httpx.org/) in your Apify Actors. By combining these libraries, you can efficiently extract data from HTML or XML files, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: BeautifulSoup](https://apify.com/templates/python-beautifulsoup)
+- [BeautifulSoup: Official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
+- [HTTPX: Official documentation](https://www.python-httpx.org/)
diff --git a/website/versioned_docs/version-3.3/03_guides/02_parsel_impit.mdx b/website/versioned_docs/version-3.3/03_guides/02_parsel_impit.mdx
index 0b572bf8..b68efec4 100644
--- a/website/versioned_docs/version-3.3/03_guides/02_parsel_impit.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/02_parsel_impit.mdx
@@ -26,3 +26,9 @@ The following example shows a simple Actor that recursively scrapes titles from
## Conclusion
In this guide, you learned how to use [Parsel](https://github.com/scrapy/parsel) with [Impit](https://github.com/apify/impit) in your Apify Actors. By combining these libraries, you get a powerful and efficient solution for web scraping: [Parsel](https://github.com/scrapy/parsel) provides excellent CSS selector and XPath support for data extraction, while [Impit](https://github.com/apify/impit) offers a fast and simple HTTP client built by Apify. This combination makes it easy to build scalable web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
+- [Parsel: GitHub repository](https://github.com/scrapy/parsel)
+- [Impit: GitHub repository](https://github.com/apify/impit)
diff --git a/website/versioned_docs/version-3.3/03_guides/03_playwright.mdx b/website/versioned_docs/version-3.3/03_guides/03_playwright.mdx
index 2c7428a5..16de8b67 100644
--- a/website/versioned_docs/version-3.3/03_guides/03_playwright.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/03_playwright.mdx
@@ -10,6 +10,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import PlaywrightExample from '!!raw-loader!roa-loader!./code/03_playwright.py';
+In this guide, you'll learn how to use [Playwright](https://playwright.dev) for web scraping in your Apify Actors.
+
+## Introduction
+
[Playwright](https://playwright.dev) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Playwright for web scraping include:
@@ -19,8 +23,6 @@ Some of the key features of Playwright for web scraping include:
- **Powerful selectors** - Playwright provides a variety of powerful selectors that allow you to target specific elements on a web page, including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms, and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Playwright in Actors
-
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image, including the tools and setup necessary to run browsers in headful mode.
@@ -55,3 +57,9 @@ It uses Playwright to open the pages in an automated Chrome browser, and to extr
## Conclusion
In this guide you learned how to create Actors that use Playwright to scrape websites. Playwright is a powerful tool that can be used to manage browser instances and scrape websites that require JavaScript execution. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Playwright + Chrome](https://apify.com/templates/python-playwright)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Playwright: Official documentation](https://playwright.dev/python/)
diff --git a/website/versioned_docs/version-3.3/03_guides/04_selenium.mdx b/website/versioned_docs/version-3.3/03_guides/04_selenium.mdx
index bbc6abe1..a7c9ed19 100644
--- a/website/versioned_docs/version-3.3/03_guides/04_selenium.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/04_selenium.mdx
@@ -7,6 +7,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import SeleniumExample from '!!raw-loader!roa-loader!./code/04_selenium.py';
+In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for web scraping in your Apify Actors.
+
+## Introduction
+
[Selenium](https://www.selenium.dev/) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
Some of the key features of Selenium for web scraping include:
@@ -21,8 +25,6 @@ including CSS selectors, XPath, and text matching.
- **Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
-## Using Selenium in Actors
-
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates/categories/python) Actor template.
On the Apify platform, the Actor will already have Selenium and the necessary browsers preinstalled in its Docker image,
@@ -44,3 +46,8 @@ It uses Selenium ChromeDriver to open the pages in an automated Chrome browser,
## Conclusion
In this guide you learned how to use Selenium for web scraping in Apify Actors. You can now create your own Actors that use Selenium to scrape dynamic websites and interact with web pages just like a human would. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Selenium + Chrome](https://apify.com/templates/python-selenium)
+- [Selenium: Official documentation](https://www.selenium.dev/documentation/)
diff --git a/website/versioned_docs/version-3.3/03_guides/05_crawlee.mdx b/website/versioned_docs/version-3.3/03_guides/05_crawlee.mdx
index ed805dea..f6050654 100644
--- a/website/versioned_docs/version-3.3/03_guides/05_crawlee.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/05_crawlee.mdx
@@ -44,3 +44,12 @@ The [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler
## Conclusion
In this guide, you learned how to use the [Crawlee](https://crawlee.dev/python) library in your Apify Actors. By using the [`BeautifulSoupCrawler`](https://crawlee.dev/python/api/class/BeautifulSoupCrawler), [`ParselCrawler`](https://crawlee.dev/python/api/class/ParselCrawler), and [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler) crawlers, you can efficiently scrape static or dynamic web pages, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Apify templates: Crawlee + BeautifulSoup](https://apify.com/templates/python-crawlee-beautifulsoup)
+- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
+- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
+- [Crawlee: Official website](https://crawlee.dev/python)
+- [Crawlee: Documentation](https://crawlee.dev/python/docs)
+- [Crawlee: GitHub repository](https://github.com/apify/crawlee-python)
diff --git a/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx b/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx
index 7d790b7d..ac9e5fa2 100644
--- a/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx
@@ -13,6 +13,10 @@ import ItemsExample from '!!raw-loader!./code/scrapy_project/src/items.py';
import SpidersExample from '!!raw-loader!./code/scrapy_project/src/spiders/title.py';
import SettingsExample from '!!raw-loader!./code/scrapy_project/src/settings.py';
+In this guide, you'll learn how to use the [Scrapy](https://scrapy.org/) framework in your Apify Actors.
+
+## Introduction
+
[Scrapy](https://scrapy.org/) is an open-source web scraping framework for Python. It provides tools for defining scrapers, extracting data from web pages, following links, and handling pagination. With the Apify SDK, Scrapy projects can be converted into Apify [Actors](https://docs.apify.com/platform/actors), integrated with Apify [storages](https://docs.apify.com/platform/storage), and executed on the Apify [platform](https://docs.apify.com/platform).
## Integrating Scrapy with the Apify platform
diff --git a/website/versioned_docs/version-3.3/03_guides/07_running_webserver.mdx b/website/versioned_docs/version-3.3/03_guides/07_running_webserver.mdx
index d9deedc1..9c9ef474 100644
--- a/website/versioned_docs/version-3.3/03_guides/07_running_webserver.mdx
+++ b/website/versioned_docs/version-3.3/03_guides/07_running_webserver.mdx
@@ -1,12 +1,16 @@
---
id: running-webserver
-title: Running webserver in your Actor
+title: Running webserver
---
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import WebserverExample from '!!raw-loader!roa-loader!./code/07_webserver.py';
+In this guide, you'll learn how to run a web server inside your Apify Actor. This is useful for monitoring Actor progress, creating custom APIs, or serving content during the Actor run.
+
+## Introduction
+
Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`), which enables HTTP access to an optional web server running inside the Actor run's container.
The URL is available in the following places:
@@ -17,10 +21,18 @@ The URL is available in the following places:
The web server running inside the container must listen at the port defined by the `Actor.configuration.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
-## Example
+## Example Actor
-The following example demonstrates how to start a simple web server in your Actor,which will respond to every GET request with the number of items that the Actor has processed so far:
+The following example demonstrates how to start a simple web server in your Actor, which will respond to every GET request with the number of items that the Actor has processed so far:
{WebserverExample}
+
+## Conclusion
+
+In this guide, you learned how to run a web server inside your Apify Actor. By leveraging the container URL and port provided by the platform, you can expose HTTP endpoints for monitoring, reporting, or serving content during Actor execution. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU).
+
+## Additional resources
+
+- [Apify templates: Standby Python project](https://apify.com/templates/python-standby)
diff --git a/website/versioned_docs/version-3.3/03_guides/code/07_webserver.py b/website/versioned_docs/version-3.3/03_guides/code/07_webserver.py
index d4bc0655..66ecfe3c 100644
--- a/website/versioned_docs/version-3.3/03_guides/code/07_webserver.py
+++ b/website/versioned_docs/version-3.3/03_guides/code/07_webserver.py
@@ -7,9 +7,9 @@
http_server = None
-# Just a simple handler that will print the number of processed items so far
-# on every GET request.
class RequestHandler(BaseHTTPRequestHandler):
+ """A handler that prints the number of processed items on every GET request."""
+
def do_get(self) -> None:
self.log_request()
self.send_response(200)
@@ -18,8 +18,7 @@ def do_get(self) -> None:
def run_server() -> None:
- # Start the HTTP server on the provided port,
- # and save a reference to the server.
+ """Start the HTTP server on the provided port, and save a reference to the server."""
global http_server
with ThreadingHTTPServer(
('', Actor.configuration.web_server_port), RequestHandler