Selector Gadget
介紹Beautiful Soup
使用CSS Selector
<ol>
<li>one</li>
<li>two</li>
<li>three</li>
<li>four</li>
</ol>
tag name {
attribute name: value;
}
li <!--tag name(選擇器)-->
{
color:red; <!--color為屬性名稱,red為設定值-->
}
<!--html結構-->
<div>
<p>
CSS <span>Hello World</span>
</p>
</div>
<!--css設定-->
div {
color:red;
}
div -> p -> span
div { color:red; }
會導致p、span繼承div的設定,因此全部文字皆為紅色*
點類別名稱
.class1 {
color:red;
}
.class2 {
color:green;
}
(font-size)
設為16px;#id名稱
<h1>this is h1</h1>
<a id="title1" href="...">one</a>
<a id="title2" href="...">two</a>
<a id="title3" href="...">three</a>
<span>this is span</span>
h1, #title2, span {
color:red;
}
<h1>this is h1</h1>
<a id="title1" class="a_class" href="...">one</a>
<a id="title2" class="a_class" href="...">two</a>
<a id="title3" class="a_class" href="...">three</a>
<span>this is span</span>
h1, #title2, span {
color:red;
}
.a_class{
color:green;
}
<h1>this is h1</h1>
<a id="title1" class="a_class" href="...">one</a>
<a id="title2" class="a_class t_class" href="...">two</a>
<a id="title3" class="a_class" href="...">three</a>
<span>this is span</span>
h1, span {
color:red;
}
.t_class{
color:red;
}
.a_class{
color:green;
}
父標籤 子標籤
<div>
<h1><span><a href="...">div h1 span a</a></span></h1>
<a href="...">div a</a>
<ul>
<li><a href="...">div ul li a</a></li>
</ul>
</div>
div a {
color:red;
}
標籤 + 標籤
<span><a href="...">span a</a></span>
<a href="...">first a</a>
<a href="...">second a</a>
span+a {
color:red;
}
標籤 > 標籤
<div class="div_box">
<p>.div_box p</p>
<div class="div_containbox">
<p>.div_containbox p</p>
</div>
<p>.div_box p</p>
</div>
.div_box>p{
color:red;
}
標籤 ~ 標籤
<span><a href="...">span a</a></span>
<a href="...">first a</a>
<a href="...">second a</a>
span~a{
color:red;
}
標籤[屬性] <!--具有某個屬性-->
標籤[attribute = "value"] <!--具有某屬性且具有某屬性值-->
標籤[attribute ~= "value"] <!--包含某個屬性值(以空白分格)-->
標籤[attribute $= "value"] <!--以某屬性值為結尾的屬性-->
標籤[attribute *= "value"] <!--包含某屬性質的屬性-->
標籤[attribute ^= "value"] <!--以某屬性值為開頭的屬性-->
<span class="first_span"><a id="test" href="www.google.com.tw">google</a></span><br/>
<a href="www.yahoo.com" class="">yahoo</a><br>
<a href="www.pchome.com">pchome</a><br>
<span class="first_span">first span</span><br/>
<span class="second_span">second span</span><br>
<span title="this is span">this is span</span>
a[href ^= "www"] { color:red;}
a[href $= "com"] { color:pink;}
span[class*="span"] { color:peru;}
[title~="span"] { color:purple;}
<p>only p</p>
<div>
<p>div > first p</p>
<p>div > second p</p>
</div>
<div>
<p>div > only p</p>
</div>
<ol>
<li>one</li>
<li>two</li>
</ol>
p:first-child{ color:red;}
p:last-child{ color:blue;}
p:only-child{ color:orange;}
li:nth-child(2){ color:red;}
Selector Gadget + Chrome Console
¶Selector Gadget
¶Chrome
開發人員工具Console
¶document.querySelector("css Selector")
document.querySelectorAll("css Selector")
querySelectorAll("css Selector")[index].firstChild.nodeValue
BeautifulSoup
使用CSS Selector
¶beautifulsoup
物件或tab
物件皆可使用css selector
抓取元素,使用函數如下select()
select_one() #只抓取符合的第一筆
nth-...
,需將其改為nth-of-type()
# 範例
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.imdb.com/title/tt1877830/fullcredits?ref_=tt_cl_sm')
bts = BeautifulSoup(response.text, 'lxml')
css_select = bts.select('.subpage_title_block__right-column .header , .parent a') # 抓取標題名稱
for i in css_select:
print(i.get_text())
蝙蝠俠 Full Cast & Crew
{'玩命關頭9': [' Vin Diesel', ' Michelle Rodriguez', ' Jordana Brewster', ' Tyrese Gibson', ' Ludacris', ' Nathalie Emmanuel', ' Charlize Theron', ' John Cena', ' Finn Cole', ' Sung Kang', ' Anna Sawai', ' Helen Mirren', ' Kurt Russell', ' Lucas Black', ' Shad Moss', ' Thue Ersted Rasmussen', ' Don Omar', ' Shea Whigham', ' Vinnie Bennett', ' JD Pardo', ' Michael Rooker', ' Jim Parrack', ' Siena Agudong', ' Isaac Holtane', ' Immanuel Holtane', ' Azia Dinea Hale', ' Juju Zhang', ' Karson Kern', ' Igby Rigney', ' Sophia Tatum', ' Francis Ngannou', ' Martyn Ford', ' Bad Bunny', ' Jimmy Lin', ' Jason Tobin', ' Cardi B', ' Cered', ' Ozuna', ' Oqwe Lin', ' Bill Simmons', ' Vincent Sinclair Diesel', ' Luka Hays', ' Melanie Beiler', ' Dzenita Bijavica', ' Janice Blue', ' Sophia Bui', ' Miranda Chambers', ' Méghane De Croock', ' Jean Donnay', ' Patrick Doran', ' Lex Elle', ' Albert Giannitelli', ' Anthony A. Gonzalez', ' Miraj Grbic', ' Elizabeth Haley', ' Tony Holness', ' Rob Horrocks', ' Michelle Marie Jacquot', ' Joy A. Kennelly', ' Jae Kim', ' Mark Krenik', ' Lucas Krystek', ' Laurine Lambert', ' Jorge Leon', ' Lorin Alond Ly', ' Ryan James Mack', ' Johnny Mansbach', ' Humberto Martinez', ' Kenny-Lee Mbanefo', ' Terry McGinnis', ' Adrian Mozzi', ' Aaron Olatunjie', ' Amber Pauline', ' Donnie Saylor', ' Demitra Sealy', ' Ginta Sebre', ' Amber Sienna', ' Jimmy Star', ' Jason Statham', ' Brian Torres', ' Ella Walker', ' Valeria Zunzun']}