用SQLAlchemy将Pandas连接到数据库

在这篇文章中，我们将讨论如何将pandas连接到数据库并使用SQLAlchemy执行数据库操作。

第一步是使用SQLAlchemy的create_engine()函数与你现有的数据库建立一个连接。

语法:

from sqlalchemy import create_engine

engine = create_engine(dialect+driver://username:password@host:port/database)

解释:

dialect–DBMS的名称
driver – 在SQLAlchemy和数据库之间移动信息的DB API的名称。
username，密码 – DB用户证书
host : port – 指定主机的类型和端口号。
database – 数据库名称

用SQLAlchemy将Pandas连接到数据库

语法: pandas.DataFrame.to_sql(table_name, engine_name, if_exists, index)

解释:

table_name – 需要存储的表的名称
engine_name – 连接到数据库的引擎的名称
if_exists – 默认情况下，如果table_name已经存在，pandas会抛出一个错误。使用 “REPLACE “将这个数据集替换成旧的，或者使用 “APPEND “将数据添加到现有的表中。
index – (bool), 向表添加索引列，以唯一地识别每一行。

在这个例子中，我们可以使用PostgreSQL数据库，这是最简单的方法之一，但接下来的程序对SQLAlchemy支持的所有其他数据库都是一样的。

让我们首先导入必要的数据集。现在，让我们建立与PostgreSQL数据库的连接，并使用psycopg2驱动使其与python交互。接下来，我们将使用to_sql()函数加载要推送到SQLite数据库的数据帧，如图所示。

# import necessary packages
import pandas
import psycopg2
from sqlalchemy import create_engine
  
# establish connection with the database
engine = create_engine(
    "dialect+driver//username:password@hostname:portnumber/databasename")
  
# read the pandas dataframe
data = pandas.read_csv("path to dataset")
  
# connect the pandas dataframe with postgresql table
data.to_sql('loan_data', engine, if_exists='replace')

输出:

这将在PostgreSQL数据库中创建一个名为loan_data的表。

用SQLAlchemy将Pandas连接到数据库

将一个表连接到PostgreSQL数据库

将PostgreSQL表转换为pandas数据帧

就像我们上面做的那样，我们也可以使用read_sql_table()函数将PostgreSQL表转换成pandas数据帧，如下图所示。在这里，让我们读一下 loan_data表，如下图所示。

语法: pandas.DataFrame.read_sql_table(table_name, con = engine_name, columns)

table_name – 需要存储的表的名称
con – 连接到数据库的引擎的名称
columns – 需要从SQL表中读取的列的列表

# import necessary packages
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
  
# establish connection with the database
engine = create_engine(
    "dialect+driver//username:password@hostname:portnumber/databasename")
  
# read the postgresql table
table_df = pd.read_sql_table(
    "loan_data",
    con=engine,
    columns=['Loan_ID',
             'Gender',
             'Married',
             'Dependents',
             'Education',
             'Self_Employed',
             'ApplicantIncome',
             'CoapplicantIncome',
             'LoanAmount',
             'Loan_Amount_Term',
             'Credit_History',
             'Property_Area',
             'Loan_Status'],
  
)
  
# print the postgresql table loaded as 
# pandas dataframe
print(table_df)

输出:

用SQLAlchemy将Pandas连接到数据库

使用SQLAlchemy将Postgresql表读为数据框架

传递SQL查询以查询表数据

我们也可以将SQL查询传递给read_sql_table函数，以只读PostgreSQL数据库中的特定列或记录。这个过程仍然是一样的。SQL语法与传统的从SQL表中查询数据的语法保持一致。下面的例子显示了如何使用SQL查询获得loan_data表的所有记录。

# import necessary packages
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
  
# establish connection with the database
engine = create_engine(
    "dialect+driver//username:password@hostname:portnumber/databasename")
  
# read table data using sql query
sql_df = pd.read_sql(
    "SELECT * FROM loan_data",
    con=engine
)
  
print(sql_df)

输出:

用SQLAlchemy将Pandas连接到数据库