Selenium+PhantomJS(系列四:模拟登录微博)
Selenium+PhantomJS系列教程:
- Selenium+PhantomJS(系列一:设置User-Agent)
- Selenium+PhantomJS(系列二:模拟登录淘宝)
- Selenium+PhantomJS(系列三:模拟登录知乎)
- Selenium+PhantomJS(系列四:模拟登录微博)
- Selenium+PhantomJS(系列五:selenium的等待)
Selenium+PhantomJS(系列四:模拟登录微博)
引入selenium package, 建立webdriver对象
1 2 |
from selenium import webdriver sel = selenium.webdriver.Chrome() |
打开设定的url,并等待response:
1 2 3 4 |
loginurl = 'http://weibo.com/' #open the login in page sel.get(loginurl) time.sleep(10) |
通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#sign in the username try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername') print 'user success!' except: print 'user error!' time.sleep(1) #sign in the pasword try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW') print 'pw success!' except: print 'pw error!' time.sleep(1) #click to login try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click() print 'click success!' except: print 'click error!' time.sleep(3) |
验证登录成功与否,若currenturl发生变化,则认为登录成功:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
curpage_url = sel.current_url print curpage_url while(curpage_url == loginurl): #print 'please input the verify code:' print 'please input the verify code:' verifycode = sys.stdin.readline() sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode) try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click() print 'click success!' except: print 'click error!' time.sleep(3) curpage_url = sel.current_url |
通过对象的方法获取当前访问网站的session cookie:
1 2 3 4 5 6 |
#get the session cookie cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()] #print cookie cookiestr = ';'.join(item for item in cookie) print cookiestr |
得到cookie之后,就可以通过urllib2、scrapy、requests等访问相应的网站:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import urllib2 print '%%%using the urllib2 !!' homeurl = sel.current_url print 'homeurl: %s' % homeurl headers = {'cookie':cookiestr} req = urllib2.Request(homeurl, headers = headers) try: response = urllib2.urlopen(req) text = response.read() fd = open('homepage', 'w') fd.write(text) fd.close() print '###get home page html success!!' except: print '### get home page html error!!' |
转载自:http://blog.csdn.net/warrior_zhang/article/details/50198699