Ruby HTTPClient 教程|极客教程

在本教程中，我们展示了如何使用 Ruby HTTPClient 模块。我们获取数据，发布数据，使用 cookie 并连接到安全的网页。

超文本传输协议（ <abbr>HTTP</abbr> ）是用于分布式协作超媒体信息系统的应用协议。 HTTP 是万维网数据通信的基础。

Ruby HTTPClient提供了用于通过 HTTP 访问 Web 资源的方法。它提供了 Ruby 中的 libwww-perl（LWP）的功能。

$ sudo gem install httpclient

该模块通过sudo gem install httpclient命令安装。

$ service nginx status
 * nginx is running

我们在本地主机上运行 nginx Web 服务器。我们的一些示例将连接到本地运行的 nginx 服务器上的 PHP 脚本。

版本

第一个程序输出库和 Ruby 语言的版本。

version.rb

#!/usr/bin/ruby

require 'httpclient'

puts HTTPClient::LIB_NAME
puts HTTPClient::RUBY_VERSION_STRING
puts HTTPClient::VERSION

这三个常数提供了库和 Ruby 版本号。

$ ./version.rb 
(2.8.0, ruby 1.9.3 (2013-11-22))
ruby 1.9.3 (2013-11-22)
2.8.0

这是示例的示例输出。

get_content 函数

get_content是一种用于获取由给定 URL 标识的文档的高级方法。

get_content.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new
cont = client.get_content 'http://www.something.com'

puts cont

该脚本获取www.something.com网页的内容。

cont = client.get_content 'http://www.something.com'

get_content方法将结果作为一个字符串返回。

$ ./get_content.rb 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

这是get_content.rb脚本的输出。

以下程序获取一个小型网页，并剥离其 HTML 标签。

strip_tags.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

client.get_content('http://www.something.com') do |chunk|
    puts chunk.gsub(%r{</?[^>]+?>}, '')
end

该脚本会剥离www.something.com网页的 HTML 标签。

client.get_content('http://www.something.com') do |chunk|
    puts chunk.gsub(%r{</?[^>]+?>}, '')
end

一个简单的正则表达式用于剥离 HTML 标记。在这种情况下，get_content方法以字符串块的形式返回内容。

$ ./strip_tags.rb 
Something.
Something.

该脚本将打印网页的标题和内容。

请求

HTTP 请求是从客户端发送到浏览器的消息，以检索某些信息或采取某些措施。

HTTPClient's request方法创建一个新请求。请注意，HTTPClient类具有诸如get，post或put之类的方法，这些方法为我们节省了一些输入。

create_request.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new
method = 'GET'
url = URI.parse 'http://www.something.com'

res = client.request method, url
puts res.body

该示例创建一个 GET 请求并将其发送到http://www.something.com。

method = 'GET'
url = URI.parse 'http://www.something.com'

我们创建一个请求方法和 URL。

res = client.request method, url

使用request方法发出请求。

puts res.body

消息响应的body属性包含消息的正文。

$ ./create_request.rb 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

这是示例的输出。

状态

HTTP::Message表示 HTTP 请求或响应。其status方法返回响应的 HTTP 状态代码。

status.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://www.something.com'
puts res.status
puts HTTP::Status::successful? res.status

res = client.get 'http://www.something.com/news/'
puts res.status
puts HTTP::Status::successful? res.status

res = client.get 'http://www.urbandicionary.com/define.php?term=Dog'
puts res.status
puts HTTP::Status::successful? res.status

我们使用get方法执行三个 HTTP 请求，并检查返回的状态。

puts HTTP::Status::successful? res.status

HTTP::Status::successful?方法指示状态代码是否成功。

$ ./status.rb 
200
true
404
false
302
false

200 是对成功的 HTTP 请求的标准响应，404 指示找不到请求的资源，302 指示该资源已临时重定向。

`head`方法

head方法检索文档标题。标头由字段组成，包括日期，服务器，内容类型或上次修改时间。

head.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.head 'http://www.something.com'

puts "Server: " + res.header['Server'][0]
puts "Last modified: " + res.header['Last-Modified'][0]
puts "Content type: " + res.header['Content-Type'][0]
puts "Content length: " + res.header['Content-Length'][0]

该示例打印服务器，www.something.com网页的上次修改时间，内容类型和内容长度。

$ ./head.rb 
Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141
Last modified: Mon, 25 Oct 1999 15:36:02 GMT
Content type: text/html
Content length: 77

这是head.rb程序的输出。

`get`方法

get方法向服务器发出 GET 请求。 GET 方法请求指定资源的表示形式。

greet.php

<?php

echo "Hello " . htmlspecialchars($_GET['name']);

?>

在/usr/share/nginx/html/目录中，有此greet.php文件。该脚本返回name变量的值，该值是从客户端检索到的。 htmlspecialchars()函数将特殊字符转换为 HTML 实体；例如 &至&放大器。

mget.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/greet.php?name=Jan'

puts res.body

该脚本将带有值的变量发送到服务器上的 PHP 脚本。该变量直接在 URL 中指定。

$ ./mget.rb 
Hello Jan

This is the output of the example.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [08/May/2016:13:15:31 +0200] "GET /greet.php?name=Jan HTTP/1.1" 200 19 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

我们检查了 nginx 访问日志。

get方法采用第二个参数，我们可以在其中指定查询参数。

mget2.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

query = {'name' => 'Jan'}
res = client.get 'http://localhost/greet.php', query

puts res.body

该示例与上一个示例基本相同。

$ ./mget2.rb 
Hello Jan

重定向

重定向是将一个 URL 转发到另一个 URL 的过程。 HTTP 响应状态代码 301“永久移动”用于永久 URL 重定向。

location = /oldpage.html {

        return 301 /files/newpage.html;
}

将这些行添加到位于 Debian 上/etc/nginx/sites-available/default的 Nginx 配置文件中。

$ sudo service nginx restart

编辑完文件后，我们必须重新启动 nginx 才能应用更改。

newpage.html

<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

这是位于 nginx 文档根目录中的newpage.html文件。

redirect.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/oldpage.html', :follow_redirect => true
puts res.body

该脚本访问旧页面并遵循重定向。

res = client.get 'http://localhost/oldpage.html', :follow_redirect => true

:follow_redirect选项用于遵循重定向。

$ ./redirect.rb 
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

$ tail -2 /var/log/nginx/access.log
127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /oldpage.html HTTP/1.1" 301 193 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"
127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /files/newpage.html HTTP/1.1" 200 113 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

从access.log文件中可以看到，该请求已重定向到新的文件名。通信包含两个 GET 消息。

用户代理

在本节中，我们指定用户代理的名称。

agent.php

<?php 

echo $_SERVER['HTTP_USER_AGENT'];

?>

在 nginx 文档根目录中，我们有这个简单的 PHP 文件。它返回用户代理的名称。

agent.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"}

res = client.get 'http://localhost/agent.php'
puts res.body

该脚本为agent.php脚本创建一个简单的 GET 请求。

client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"}

在HTTPClient的构造函数中，我们指定用户代理。

$ ./agent.rb 
Ruby script

服务器使用我们随请求发送的代理名称进行了响应。

`post`方法

post方法在给定的 URL 上调度 POST 请求，为填写的表单内容提供键/值对。

target.php

<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

在本地 Web 服务器上，我们有此target.php文件。它只是将过帐的值打印回客户。

post_value.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

query = {"name" => "Jan"}
res = client.post 'http://localhost/target.php', query

puts res.body

脚本使用具有Jan值的name键发送请求。 POST 请求通过post方法发出。

$ ./mpost.rb 
Hello Jan

这是mpost.rb脚本的输出。

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [08/May/2016:13:38:57 +0200] "POST /target.php HTTP/1.1" 200 19 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

使用 POST 方法时，不会在请求 URL 中发送该值。

从字典中检索定义

要解析 HTML，我们使用nokogiri gem。可以使用sudo gem install nokogiri命令安装。

get_term.rb

#!/usr/bin/ruby

require 'httpclient'
require 'nokogiri'

client = HTTPClient.new

term = 'dog'
res = client.get 'http://www.dictionary.com/browse/'+term

doc = Nokogiri::HTML res.body
doc.css("div.def-content").map do |node|
    puts node.text.strip!.gsub(/\s{3,}/, " ")
end

在此脚本中，我们在www.dictionary.com上找到了术语狗的定义。 Nokogiri::HTML用于解析 HTML 代码。

res = client.get 'http://www.dictionary.com/browse/'+term

为了执行搜索，我们在 URL 的末尾附加了该词。

doc = Nokogiri::HTML res.body
doc.css("div.def-content").map do |node|
    puts node.text.strip!.gsub(/\s{3,}/, " ")
end

我们使用Nokogiri::HTML类解析内容。定义位于<div class="def-content">标签内。我们通过删除过多的空白来改善格式。

cookie

HTTP cookie 是从网站发送并在用户浏览时存储在用户的 Web 浏览器或程序数据子文件夹中的一小段数据。当用户访问网页时，浏览器/程序会将 cookie 发送回服务器，以通知用户先前的活动。 Cookies 的有效期限为有效期。

接收到 HTTP 请求时，服务器可以发送带有响应的Set-Cookie标头。之后，cookie 值将以 Cookie HTTP 标头的形式与对同一服务器的所有请求一起发送。

cookies.php

<?php

theme =_COOKIE['theme'];

if (isset(theme)) {

    echo "Your theme istheme";
} else {

    echo "You are using default theme";
    setcookie('theme', 'black-and-white', time() + (86400 * 7));
}

?>

该 PHP 文件读取 cookie。如果 cookie 不存在，则会创建它。 cookie 存储用户的主题。

send_cookie.rb

#!/usr/bin/ruby

require 'httpclient'

url = URI.parse "http://localhost/cookies.php"

cookie = WebAgent::Cookie.new
cookie.name = "theme"
cookie.value = "green-and-black"
cookie.url = url

client = HTTPClient.new
client.cookie_manager.add cookie

res = client.get url
puts res.body

我们创建一个自定义 cookie，并将其发送到cookies.php页面。

cookie = WebAgent::Cookie.new
cookie.name = "theme"
cookie.value = "green-and-black"
cookie.url = url

使用WebAgent::Cookie类创建一个 cookie。

client = HTTPClient.new
client.cookie_manager.add cookie

cookie 被添加到 cookie 管理器中。

$ ./send_cookie.rb 
Your theme is green-and-black

接下来，我们将读取 cookie 并将其本地存储在文件中。

read_cookie.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/cookies.php'

client.set_cookie_store 'cookie.dat'
p res.header["Set-Cookie"]

client.save_cookie_store

该脚本从 PHP 文件读取 cookie，并将其本地存储在cookie.dat文件中。

最后，我们读取存储的 cookie，并将其发送到同一 PHP 文件。

send_cookie2.rb

#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

cm = HTTPClient::CookieManager.new 'cookie.dat'
cm.load_cookies
client.cookie_manager = cm

res = client.get 'http://localhost/cookies.php'
p res.body

HTTPClient::CookieManager用于读取 cookie。

$ ./send_cookie.rb 
Unknown key: Max-Age = 604800
"You are using default theme"
$ ./read_cookie.rb 
Unknown key: Max-Age = 604800
["theme=black-and-white; expires=Sun, 15-May-2016 16:00:08 GMT; Max-Age=604800"]
$ ./send_cookie.rb 
"Your theme is black-and-white"

我们运行脚本。

证书

客户端的set_auth方法设置用于领域的名称和密码。安全领域是一种用于保护 Web 应用资源的机制。

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password: 
Re-type new password: 
Adding password for user user7

我们使用htpasswd工具创建用于基本 HTTP 身份验证的用户名和密码。

location /secure {

        auth_basic "Restricted Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
}

在 nginx /etc/nginx/sites-available/default配置文件中，我们创建一个安全页面。领域的名称是“限制区域”。

index.html

<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

在/usr/share/nginx/html/secure目录中，我们有这个 HTML 文件。

credentials.rb

#!/usr/bin/ruby

require 'httpclient'

user = 'user7'
passwd = '7user'

client = HTTPClient.new
client.set_auth 'http://localhost/secure/', user, passwd
cont = client.get_content 'http://localhost/secure/'

puts cont

该脚本连接到安全网页；它提供访问该页面所需的用户名和密码。

$ ./credentials.rb 
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>