如何使用Boto3更新AWS Glue数据目录中爬虫的调度程序

本文介绍如何在AWS账户中更新爬虫的调度程序。

更多Python相关文章，请阅读：Python 教程

示例

问题描述： 使用Python中的 boto3 库来更新爬虫的调度程序。

解决这个问题的方法/算法

步骤1： 导入 boto3 和 botocore 异常以处理异常。
步骤2： 调用此函数所需的参数是： crawler_name 和 scheduler 。
scheduler 的格式应为 cron（cron_expression）。Cron_Expression 可以编写为（15 12 * ？），即爬虫将在每天12:15 UTC运行。
步骤3： 使用 boto3 lib 创建一个AWS会话。请确保在默认配置文件中添加了 ****region_name** ** 。如果未添加，则在创建会话时显式传递 region_name 。
步骤4： 为 glue 创建一个AWS客户端。
步骤5： 现在使用 update_crawler_schedule 函数，将 crawler_name 作为CrawlerName传递，而将 scheduler 作为Schedule传递。
步骤6： 它返回响应元数据并更新爬虫的调度程序状态。
步骤7： 如果在更新爬虫的调度程序时出现任何错误，请处理通用异常。

示例代码

以下代码更新爬虫的调度程序 –

import boto3
from botocore.exceptions import ClientError

def update_scheduler_of_a_crawler(crawler_name, scheduler)
   session = boto3.session.Session()
   glue_client = session.client('glue')
   try:
      response = glue_client.update_crawler_schedule(CrawlerName=crawler_name,       Schedule=scheduler)
      return response
   except ClientError as e:
      raise Exception("boto3 client error in update_scheduler_of_a_crawler: " + e.__str__())
   except Exception as e:
      raise Exception("Unexpected error in update_scheduler_of_a_crawler: " + e.__str__())
print(update_scheduler_of_a_crawler("Data Dimension","cron(15 12 * * ? *)"))

输出

{'ResponseMetadata': {'RequestId': '73e50130-*****************8e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 28 Mar 2021 07:26:55 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '73e50130-***************8e'}, 'RetryAttempts': 0}}