如何使用Python对媒体文件进行爬取?

如何使用Python对媒体文件进行爬取?

阅读更多:Python 教程

介绍

在实际的企业业务环境中,大多数数据可能并不是以文本或Excel文件的形式存储的。基于SQL的关系型数据库,如Oracle、SQL Server、PostgreSQL和MySQL得到了广泛使用,许多替代性数据库也变得非常流行。

数据库的选择通常取决于应用程序的性能、数据完整性和可扩展性需求。

如何操作

在本例中,我们将学习如何创建一个sqlite3数据库。sqlite默认随python安装一起,并且不需要进一步的安装。如果您不确定,请尝试以下操作。我们还将导入Pandas

从SQL加载数据到DataFrame相当简单,而且Pandas有一些函数来简化这个过程。

import sqlite3
import pandas as pd
print(f"输出 \n {sqlite3.version}")

输出

2.6.0

输出

# 连接对象
conn = sqlite3.connect("example.db")
# 客户数据
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"],
"firstName" : ["Person1", "Person2", "Person3", "Person4"],
"state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"输出 \n *** 客户信息 -\n {customers}")

输出

*** 客户信息 -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
# 订单数据
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"],
"productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})

print(f"输出 \n *** 订单信息 -\n {orders}")

输出

*** 订单信息 -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
# 写入数据库
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

输出

# 创建sql以获取数据。
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""

输出

# 执行sql。
pd.read_sql_query(q, con=conn)

例子

7. 将所有内容放在一起。

import sqlite3
import pandas as pd
print(f"输出 \n {sqlite3.version}")
# 连接对象
conn = sqlite3.connect("example.db")
# 客户数据
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"],
"firstName" : ["Person1", "Person2", "Person3", "Person4"],
"state" : ["VIC", "NSW", "QLD", "WA"]
})

print(f"*** 客户信息 -\n {customers}")

# 订单数据
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"],
"productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})


```python
print(f"*** 订单信息 -\n {orders}")

# 写入数据库
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

# 创建sql以获取数据。
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;

"""

# 执行sql。
pd.read_sql_query(q, con=conn)

输出

2.6.0
*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
customerID firstName productQuantity
____________________________________
0      a1         Person1     3
1 c1 Person3 2
2 d1 Person4 1

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程