如何使用Python对媒体文件进行爬取?
阅读更多:Python 教程
介绍
在实际的企业业务环境中,大多数数据可能并不是以文本或Excel文件的形式存储的。基于SQL的关系型数据库,如Oracle、SQL Server、PostgreSQL和MySQL得到了广泛使用,许多替代性数据库也变得非常流行。
数据库的选择通常取决于应用程序的性能、数据完整性和可扩展性需求。
如何操作
在本例中,我们将学习如何创建一个sqlite3数据库。sqlite默认随python安装一起,并且不需要进一步的安装。如果您不确定,请尝试以下操作。我们还将导入Pandas。
从SQL加载数据到DataFrame相当简单,而且Pandas有一些函数来简化这个过程。
import sqlite3
import pandas as pd
print(f"输出 \n {sqlite3.version}")
输出
2.6.0
输出
# 连接对象
conn = sqlite3.connect("example.db")
# 客户数据
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"],
"firstName" : ["Person1", "Person2", "Person3", "Person4"],
"state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"输出 \n *** 客户信息 -\n {customers}")
输出
*** 客户信息 -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
# 订单数据
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"],
"productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})
print(f"输出 \n *** 订单信息 -\n {orders}")
输出
*** 订单信息 -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
# 写入数据库
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)
输出
# 创建sql以获取数据。
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""
输出
# 执行sql。
pd.read_sql_query(q, con=conn)
例子
7. 将所有内容放在一起。
import sqlite3
import pandas as pd
print(f"输出 \n {sqlite3.version}")
# 连接对象
conn = sqlite3.connect("example.db")
# 客户数据
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"],
"firstName" : ["Person1", "Person2", "Person3", "Person4"],
"state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"*** 客户信息 -\n {customers}")
# 订单数据
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"],
"productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})
```python
print(f"*** 订单信息 -\n {orders}")
# 写入数据库
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)
# 创建sql以获取数据。
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""
# 执行sql。
pd.read_sql_query(q, con=conn)
输出
2.6.0
*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
customerID firstName productQuantity
____________________________________
0 a1 Person1 3
1 c1 Person3 2
2 d1 Person4 1
极客教程