Selenium
?Selenium
簡介Selenium
Selenium
基礎Selenium
例外方法Selenium
互動方法Selenium
?¶例:
請使用該網頁(https://www.imdb.com/)爬取iron man電影資訊
Selenium
簡介¶Selenium IDE
Seleinum
的整合開發環境,可以錄製、編輯Selenium
測試Selenium API
Selenium
支援C#、Java、Python...
等語言建立Selenium
測試,使用Selenium API
與WebDriver
溝通Selenium WebDriver
Selenium API
所送出的訊息控制Web
瀏覽器。其瀏覽器包括Chrome、Firefox、IE、Edge
...等Selenium
¶Step 1:
Selenium API
Step 2:
Download WebDriver
Step 1
¶#安裝
pip install selenium
or
conda install selenium
#使用
from selenium import webdriver
selenium
¶webdriver_variable = webdriver.Chrome('driver path') # 開啟webdriver
webdriver_variable.quit() #關閉webdriver
## 測試selenium是否安裝完成
from selenium import webdriver
import time
driver = webdriver.Chrome('./chromedriver')
time.sleep(10)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:6: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
Selenium
基礎¶html
內容¶driver_variable.implicitly_wait(integer) #隱含等待
driver_variable.get('web link') #連至特定網址
driver_variable.title #抓取網頁title
driver_variable.page_source #抓取html內容
# 範例
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.imdb.com/title/tt0371746/?ref_=nv_sr_srsg_0')
print(driver.title)
#print(driver.page_source)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object """
鋼鐵人 (2008) - IMDb
# 引用
from selenium.webdriver.common.by import By
#抓取單一元素
driver_variable.find_element(By.ID, "id")
driver_variable.find_element(By.NAME, "name")
driver_variable.find_element(By.XPATH, "xpath")
driver_variable.find_element(By.TAG_NAME, "tag name")
driver_variable.find_element(By.CLASS_NAME, "class name")
driver_variable.find_element(By.CSS_SELECTOR, "css selector")
#抓取多個元素
driver_variable.find_elements(By.ID, "id")
driver_variable.find_elements(By.NAME, "name")
driver_variable.find_elements(By.XPATH, "xpath")
driver_variable.find_elements(By.TAG_NAME, "tag name")
driver_variable.find_elements(By.CLASS_NAME, "class name")
driver_variable.find_elements(By.CSS_SELECTOR, "css selector")
## 範例 - tag name
from selenium.webdriver.common.by import By
from selenium import webdriver
import time
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.digitimes.com.tw/col/article.asp?id=1300&cf=AI1')
time.sleep(10)
tag_p = driver.find_element(By.TAG_NAME,"h1")
print(tag_p.text)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object """
半導體設備供應鏈的台灣角色
## 範例 - css selector
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.imdb.com/title/tt1877830/fullcredits?ref_=tt_cl_sm')
time.sleep(10)
cast_selector = driver.find_elements(By.CSS_SELECTOR, '.primary_photo+ td a') #注意寫法
for i in cast_selector:
print(i.text)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:6: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
Robert Pattinson Zoë Kravitz Jeffrey Wright Colin Farrell Paul Dano John Turturro Andy Serkis Peter Sarsgaard Barry Keoghan Jayme Lawson Gil Perez-Abraham Peter McDonald Con O'Neill Alex Ferns Rupert Penry-Jones Kosha Engler Archie Barnes Janine Harouni Hana Hrzic Joseph Walker Luke Roberts Oscar Novak Stella Stocker Sandra Dickinson Jack Bennett Andre Nightingale Richard James-Neale Lorraine Tai Joseph Balderrama James Eeles Angela Yeoh Leemore Marrett Jr. Ezra Elliott Itoya Osagiede Stewart Alexander Adam Rojko Vega Heider Ali Marcus Onilude Elena Saurel Ed Kear Sid Sagar Amanda Blake Todd Boyce Brandon Bassir Will Austin Chabris Napier-Lawrence Douglas Russell Charlie Carver Max Carver Phil Aizlewood Mark Killeen Philip Shaun McGuinness Lorna Brown Elliot Warren Jay Lycurgo Stefan Race Elijah Baker Craige Middleburg Akie Kotabe Spike Fearn Urielle Klein-Mekongo Bronson Webb Madeleine Gray Ste Johnston Arthur Lee Parry Glasspool Jordan Coulson Hadas Gold Pat Battle Bobby Cuza Dean Meminger Roma Torre Mike Capozzola Amanda Hurwitz Joshua Eldridge-Smith Daniel Rainford Nathalie Armin Jose Palma Kazeem Tosin Amore Jonathan Addis Adaeze Cornelia Anane Rodrig Andrisan Eduardo Arrufat-Reboso Kiran Asahan Diego Barraza Amy Clare Beales Nicholas Benjamin Scott Bennett Charlie Bentley Douglas Bunn Phil Campbell Tony Christian Ruth Clarson Bern Collaço Andreea Helen David Nick Davison Obie Dean Adria Dinev Viliyan Donchev Craig Douglas Evan A. Dunn Daniel Eghan Hayden Ellingworth Darcie Ellson Paul Fitchford Joseph L Geist Albert Giannitelli Susan Gillias Callum Gore Tamara Gough George Graham Rachel Handshaw Juke Hardy Metin Hassan Christopher James Healy Sarah Hussain Shenel Hussein Simon Jago Yasmin J. James Tobias James-Samuels Adnan Kundi Erran Lake Sophie Lamont Stuart D. Latham Mickey Lewis Eugene Lin Annishia Camilla Lunette Teresa Mahoney Ben Mansfield Tiago Martins Obie Matthew Nichola Jean Mazur Kenny-Lee Mbanefo Tony McCarthy Tremayne Miller Bharat Mistri Christopher Moore Sri Moorthy Ayse Muge Clément Osty Nick Owenford Andrew Paxton-Gray Richard Price Zoltan Rencsar Paul Riddell Will Rowlands Iana Saliuk Bernardo Santos Kemal Shah Eugene Shawn Sam Shoubber Amber Sienna Dave Simon James Snelling Gareth Snow Richard Stanley Jimmy Star Alfredo Tavares Michelle Thomas James Travis Peter Trevor Sahil Vaid Vic Waghorn Stuart Whelan Paul Whelligan Daniel Joseph Woolf
## 範例 - 擷取標籤屬性
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.imdb.com/title/tt1877830/?ref_=nv_sr_srsg_0')
tag_a = driver.find_element(By.TAG_NAME, 'a')
print(tag_a.get_attribute('href'))
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object """
https://www.imdb.com/?ref_=nv_home
By.ID 可透過文字 "id" 取代
By.NAME 可透過文字 "name" 取代
By.XPATH 可透過文字 "xpath" 取代
By.TAG_NAME 可透過文字 "tag name" 取代
By.CLASS_NAME 可透過文字 "class name" 取代
By.CSS_SELECTOR 可透過文字 "css selector" 取代
## 範例 - css selector
from selenium import webdriver
import time
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(2)
driver.get('https://www.imdb.com/title/tt1877830/fullcredits?ref_=tt_cl_sm')
time.sleep(10)
cast_selector = driver.find_elements("css selector", '.primary_photo+ td a') #注意寫法
for i in cast_selector:
print(i.text)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:6: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
Robert Pattinson Zoë Kravitz Jeffrey Wright Colin Farrell Paul Dano John Turturro Andy Serkis Peter Sarsgaard Barry Keoghan Jayme Lawson Gil Perez-Abraham Peter McDonald Con O'Neill Alex Ferns Rupert Penry-Jones Kosha Engler Archie Barnes Janine Harouni Hana Hrzic Joseph Walker Luke Roberts Oscar Novak Stella Stocker Sandra Dickinson Jack Bennett Andre Nightingale Richard James-Neale Lorraine Tai Joseph Balderrama James Eeles Angela Yeoh Leemore Marrett Jr. Ezra Elliott Itoya Osagiede Stewart Alexander Adam Rojko Vega Heider Ali Marcus Onilude Elena Saurel Ed Kear Sid Sagar Amanda Blake Todd Boyce Brandon Bassir Will Austin Chabris Napier-Lawrence Douglas Russell Charlie Carver Max Carver Phil Aizlewood Mark Killeen Philip Shaun McGuinness Lorna Brown Elliot Warren Jay Lycurgo Stefan Race Elijah Baker Craige Middleburg Akie Kotabe Spike Fearn Urielle Klein-Mekongo Bronson Webb Madeleine Gray Ste Johnston Arthur Lee Parry Glasspool Jordan Coulson Hadas Gold Pat Battle Bobby Cuza Dean Meminger Roma Torre Mike Capozzola Amanda Hurwitz Joshua Eldridge-Smith Daniel Rainford Nathalie Armin Jose Palma Kazeem Tosin Amore Jonathan Addis Adaeze Cornelia Anane Rodrig Andrisan Eduardo Arrufat-Reboso Kiran Asahan Diego Barraza Amy Clare Beales Nicholas Benjamin Scott Bennett Charlie Bentley Douglas Bunn Phil Campbell Tony Christian Ruth Clarson Bern Collaço Andreea Helen David Nick Davison Obie Dean Adria Dinev Viliyan Donchev Craig Douglas Evan A. Dunn Daniel Eghan Hayden Ellingworth Darcie Ellson Paul Fitchford Joseph L Geist Albert Giannitelli Susan Gillias Callum Gore Tamara Gough George Graham Rachel Handshaw Juke Hardy Metin Hassan Christopher James Healy Sarah Hussain Shenel Hussein Simon Jago Yasmin J. James Tobias James-Samuels Adnan Kundi Erran Lake Sophie Lamont Stuart D. Latham Mickey Lewis Eugene Lin Annishia Camilla Lunette Teresa Mahoney Ben Mansfield Tiago Martins Obie Matthew Nichola Jean Mazur Kenny-Lee Mbanefo Tony McCarthy Tremayne Miller Bharat Mistri Christopher Moore Sri Moorthy Ayse Muge Clément Osty Nick Owenford Andrew Paxton-Gray Richard Price Zoltan Rencsar Paul Riddell Will Rowlands Iana Saliuk Bernardo Santos Kemal Shah Eugene Shawn Sam Shoubber Amber Sienna Dave Simon James Snelling Gareth Snow Richard Stanley Jimmy Star Alfredo Tavares Michelle Thomas James Travis Peter Trevor Sahil Vaid Vic Waghorn Stuart Whelan Paul Whelligan Daniel Joseph Woolf
# 練習 - 之前做法
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.digitimes.com.tw/col/article.asp?id=1300&cf=AI1')
bts = BeautifulSoup(response.text, 'lxml')
main = bts.find_all('p', class_='main_p')
for i in main:
print(i.get_text())
print("- - - - - - ")
caption_tag = bts.find(class_='caption')
print(caption_tag.get_text(' ').replace('\t', '').replace('\n', ''))
caption_desc = bts.find(class_='thumbnail_desc')
print("簡介:")
print(caption_desc.string)
半導體,不同於其他產業,是一個非常技術密集且資本密集的產業。 在技術密集方面,半導體廠傾盡全力開發更先進的1奈米製程,也持續投入更大的資本在半導體設備上。在台積電及其他半導體IC大廠競相爭奪半導體市佔的同時,其實比較少被提到的半導體設備商,也扮演著產業鏈舉足輕重的角色。 在前AIoT的時代,90%所收集到的資料都已被分析利用,在AIoT時代,更巨大的資料量將被產生,而目前統計結果告訴我們只有50%的資料是能夠直接被機器使用的。 但不論是資料的生成以及分析都需要使用到大量的晶片。最近新聞報導不斷出現美國車廠對晶片供應短缺的擔憂,就是一個很好的例子。 在這個對自動化以及資料分析更重視的時代,IC晶片的需求的成長正朝著幾何級數方向發展。任何的產業最終都須達到供需的平衡,IC設計的進步以及製造製程朝向3奈米以及1奈米的發展確實讓技術以及製程上追上了市場應用對晶片規格的要求,但在供應鏈上,大量的資本投入加上對半導體設備的大力投資,則更進一步的幫助市場達到供需的平衡。 整體半導體的供應鏈極為分工精密,大致上可分為兩大區塊。第一個區塊是半導體的生產供應鏈,包含晶片設計、封裝、測試;第二個區塊是半導體設備的供應鏈,這部分其實就是提供設備以及原物料支援前述第一區塊的生產、測試、封裝各流程。相信大部分的讀者對半導體IC的生產供應鏈已經相當熟悉,也了解台灣重要的地位。但是在半導體設備的供應鏈上,台灣其實也佔了世界上舉足輕重的地位,但是這部分卻比較少在報章媒體的報導上被彰顯。 以半導體設備的產業趨勢而言,相對於之前30年聚焦於半導體設備的採購價格,最近3到5年,半導體設備的每單位晶圓成本(cost per wafer)以及總體擁有成本(cost of ownership)已經成為生產到場在談論交易時兩個重要指標。同時在半導體製造上,效能也不再是唯一,單位區域價格(cost per area)以及能耗(power consumption)都是半導體製程進步的非常重要指標。 防疫期間全球供應鏈的移轉以及不穩定性,考驗著半導體設備商在強勁的半導體設備需求上穩定供貨的能力。如同傳統產業如在數位轉型上的超前部署,則能在疫情期間穩定度過甚至得到的超預期的成果。這幾年產業對半導體需求的急速上升,如果沒有全球半導體設備廠和台灣的關鍵零組件供應商對全球供應鏈預先的超前部署以及台灣這一年來的穩定防疫成果,在COVID-19(新冠肺炎)期間我們可能早已看到全球高科技市場有更大的波動以及不穩定性。 至於台灣在未來半導體上的發展,筆者認為產官學界應持續關注自由貿易協定(FTA),如區域全面經濟夥伴協定(RCEP)、泛太平洋貿易協定(CPTPP),並全力推動跨太平洋夥伴全面進步協定(CPTPP)合作及海峽兩岸經濟合作架構協議(ECFA),才能在對半導體供應鏈能持續保有重要的地位,以及對各種風險能有更好的因應。另外每單位晶圓成本(cost per wafer)以及總體擁有成本(cost of ownership)的考慮也是台灣的廠商應該更加注意的。而關於這部分,下次專欄我們會再有更深入的討論。 - - - - - - 楊燿宏 台灣應材美國總公司AGS核心工程部資深總監 簡介: 楊燿宏為應用材料美國總公司工程部資深總監、前矽谷美臺高科技論壇(UTHF)大會召集人、2020美臺產業科技論壇單元主持人、北美台灣工程師協會NATEA矽谷分會2018會長及現任顧問、舊金山灣區台灣商會TCCSFBA任期顧問,亦為僑委會僑務促進委員,擁有30多項美國與國際專利。
Selenium
例外方法¶ElementNotVisibleException #元素存在,但不可見
ErrorInResponseException #伺服器回應錯誤
NoSuchElementException #選取元素不存在
TimeoutException #超過時間期限
# 範例
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(2)
driver.get('https://www.imdb.com/title/tt1877830/?ref_=nv_sr_srsg_0')
try:
main_tag = driver.find_element("tag name", 'test')
print(main_tag.text)
except NoSuchElementException:
print("無此標籤")
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:6: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
無此標籤
selenium
例外方法Selenium
互動方法¶#引用
from selenium.webdriver.common.keys import Keys
#使用
send_keys() #寫入文字
Keys.ENTER #執行Enter
## 範例
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.google.com.tw/')
input_tag = driver.find_element("css selector", 'textarea.gLFyf')
input_tag.send_keys('聯成電腦')
driver.implicitly_wait(5)
input_tag.send_keys(Keys.ENTER)
driver.implicitly_wait(5)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object """
Selenium Action Chain
¶#引用
from selenium.webdriver.common.action_chains import ActionChains
#使用
click() #點擊
double_click() #點擊兩下
move_to_element() #移動滑鼠至元件上
key_down() #按下鍵盤某鍵
key_up() #放開鍵盤某鍵
perform() #儲存動作
send_keys() #於目前指定元素送出按鍵
release() #鬆開滑鼠按鍵
## 範例 - 抓取刺激1995的演員名單
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
link = "https://m.imdb.com/chart/top/"
driver = webdriver.Chrome('./chromedriver')
driver.get(link)
movie1995tag = driver.find_element("css selector", '.ipc-title-link-wrapper')
#print(movie1995tag.text)
actions = ActionChains(driver)
actions.click(movie1995tag)
actions.perform()
print('actions ok')
print('- - - - - -')
cast = driver.find_elements('css selector', 'a[data-testid="title-cast-item__actor"]')
for i in cast:
print(i.text)
time.sleep(10)
driver.quit()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:8: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
actions ok - - - - - - Tim Robbins Morgan Freeman Bob Gunton William Sadler Clancy Brown Gil Bellows Mark Rolston James Whitmore Jeffrey DeMunn Larry Brandenburg Neil Giuntoli Brian Libby David Proval Joseph Ragno Jude Ciccolella Paul McCrane Renee Blaine Scott Mann