Selenium+PhantomJS(系列一:设置User-Agent)
Selenium+PhantomJS系列教程:
- Selenium+PhantomJS(系列一:设置User-Agent)
- Selenium+PhantomJS(系列二:模拟登录淘宝)
- Selenium+PhantomJS(系列三:模拟登录知乎)
- Selenium+PhantomJS(系列四:模拟登录微博)
- Selenium+PhantomJS(系列五:selenium的等待)
Selenium+PhantomJS(系列一:设置User-Agent)
有些网站的WebServer对User-Agent有限制,可能会拒绝不熟悉的User-Agent的访问,所以,写Web自动化代码可能需要将User-Agent稍微伪装一下,否则可能会被拒绝访问。这里简单记录一下Selenium中使用PhantomJS,设置User-Agent的方法。
python下Selenium依赖:
1 |
sudo pip install selenium |
默认情况下,是没有自动设置User-Agent的,默认的User-Agent显示为PhantomJS;设置PhantomJS的user-agent,是要设置“phantomjs.page.settings.userAgent”这个desired_capability:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
''' Created on Dec 6, 2013 @author: Jay @summary: Set user-agent before using PhantomJS to get a web page. ''' from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = ( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 " ) driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap) driver.get("http://dianping.com/") cap_dict = driver.desired_capabilities for key in cap_dict: print '%s: %s' % (key, cap_dict[key]) print driver.current_url driver.quit |
执行后输出如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
jay@Jay-Air:~/workspace/python_study/dp/qa/2013/12 $python user_agent_phantomjs.py rotatable: False takesScreenshot: True acceptSslCerts: False browserConnectionEnabled: False javascriptEnabled: True driverVersion: 1.0.3 databaseEnabled: False locationContextEnabled: False platform: mac-unknown-32bit browserName: phantomjs version: 1.9.1 driverName: ghostdriver nativeEvents: True phantomjs.page.settings.userAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 applicationCacheEnabled: False webStorageEnabled: False proxy: {u'proxyType': u'direct'} handlesAlerts: False cssSelectorsEnabled: True http://www.dianping.com/citylist |
关键点:
1 2 3 4 5 |
dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = ( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 " ) driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap) |
也可通过添加以下代码观察是否成功:
1 2 |
agent = browser.execute_script("return navigator.userAgent") print agent |
设置好后,就可以访问了。通过Selenium可以非常方便的渲染出所需的页面。