使用Python构建燃油价格跟踪器
在这个现代的生活方式中,燃料已经成为所有人类的必需品。它是我们生活方式的基础。因此,我们要用Python写一个脚本来跟踪它们的价格。
需要的模块
- bs4: Beautiful Soup(bs4)是一个用于从HTML和XML文件中提取数据的Python库。这个模块并没有内置在Python中。要安装它,请在终端键入以下命令。
pip install bs4
- 请求。请求允许你极其容易地发送HTTP/1.1请求。这个模块也没有内置在Python中。要安装它,请在终端键入以下命令。
pip install requests
让我们看看脚本的逐步执行情况。
第1步:导入所有的依赖性
# import module
import pandas as pd
import requests
from bs4 import BeautifulSoup
第2步:创建一个URL获取函数
# user define function
# Scrape the data
def getdata(url):
r = requests.get(url)
return r.text
第3步:现在将URL传入getdata()函数,并将数据转换成HTML代码。
# link for extract html data
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
result = soup.find_all("div", class_="gold_silver_table")
print(result)
输出 :
[<div class=”gold_silver_table”> <table border=”0″ cellpadding=”1″ cellspacing=”1″ width=”100%”> <tr class=”first”> <td class=”heading” width=”200″>City</td> <td class=”heading” width=”200″>Today Price</td> <td class=”heading” width=”200″>Yesterday’s Price</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-new-delhi.html” title=”New Delhi”>New Delhi</a></td> <td> ₹ 82.08</td> <td> ₹ 82.03</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-kolkata.html” title=”Kolkata”>Kolkata</a></td> <td> ₹ 83.57</td> <td> ₹ 83.52</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-mumbai.html” title=”Mumbai”>Mumbai</a></td> <td> ₹ 88.73</td> <td> ₹ 88.68</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-chennai.html” title=”Chennai”>Chennai</a></td> <td> ₹ 85.04</td> <td> ₹ 85.00</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-gurgaon.html” title=”Gurgaon”>Gurgaon</a></td> <td> ₹ 79.92</td> <td> ₹ 79.84</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-noida.html” title=”Noida”>Noida</a></td> <td> ₹ 82.23</td> <td> ₹ 82.30</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-bangalore.html” title=”Bangalore”>Bangalore</a></td> <td> ₹ 84.75</td> <td> ₹ 84.70</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-bhubaneswar.html” title=”Bhubaneswar”>Bhubaneswar</a></td> <td> ₹ 82.47</td> <td> ₹ 82.59</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-chandigarh.html” title=”Chandigarh”>Chandigarh</a></td> <td> ₹ 78.96</td> <td> ₹ 78.92</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-hyderabad.html” title=”Hyderabad”>Hyderabad</a></td> <td> ₹ 85.30</td> <td> ₹ 85.25</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-jaipur.html” title=”Jaipur”>Jaipur</a></td> <td> ₹ 90.08</td> <td> ₹ 89.24</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-lucknow.html” title=”Lucknow”>Lucknow</a></td> <td> ₹ 82.20</td> <td> ₹ 82.09</td> </tr> <tr class=”even_row”> <td><a href=”/petrol-price-in-patna.html” title=”Patna”>Patna</a></td> <td> ₹ 84.73</td> <td> ₹ 84.88</td> </tr> <tr class=”odd_row”> <td><a href=”/petrol-price-in-trivandrum.html” title=”Trivandrum”>Trivandrum</a></td> <td> ₹ 83.91</td> <td> ₹ 84.03</td> </tr> </table> </div>]
注意:这些脚本将只给你字符串格式的原始数据,你必须根据你的需要打印你的数据。
第4步:现在,用soup.find_all()搜索你需要的数据到sting中。
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
result
输出 :
第四步:制作一个DataFrame来显示你的结果。
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
Complete code:
# import module
import requests
import pandas as pd
from bs4 import BeautifulSoup
# link for extract html data
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
输出 :