问题描述
我正在尝试抓取以下html:
<table>
<tr>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellRight" style="border:0;color:#0066CC;"
title="View summary" width="70%">92%</td>
<td class="cellRight" style="border:0;" width="30%">
</td>
</tr>
</table>
</td>
</tr>
<tr class="listroweven">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58100:6');" title=
"Display Assignments for Art 5 with Ms. Martinho"><span style=
"text-decoration: underline">58100/6 - Art 5 with Ms.
Martinho</span></span></td>
<td class="cellLeft" nowrap>
Martinho, Suzette<br>
<b>Email:</b> <a href="mailto:smartinho@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" onclick=
"window.location.href = '/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=coursesummary&studentid=100916&action=form&courseCode=58100&courseSection=6&mp=MP4';"
style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
<tr class="listrowodd">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58200:10');" title=
"Display Assignments for Family and Consumer Sciences 5 with Sheerin">
<span style="text-decoration: underline">58200/10 - Family and
Consumer Sciences 5 with Sheerin</span></span></td>
<td class="cellLeft" nowrap>
Sheerin, Susan<br>
<b>Email:</b> <a href="mailto:ssheerin@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
</table>
我正在尝试提取学生成绩的值,如果不存在任何成绩,则在这种情况下,将在html中显示值“ no grades”。 但是,当我执行以下选择请求时:
doc.select("[class=cellRight]")
我得到一个输出,其中所有等级值都列出了两次(因为它们嵌套在包含[class = cellRight]区分符的两个元素和正常数量的“无等级”列表中。所以我的问题是,我只能在包含区分符[class = cellRight]的文档中选择最里面的子项(我已经处理了空白值的问题)感谢所有帮助!
1楼
对此有很多可能性。
一种可能是:如果所有“ cellRight”元素的所有父类也都携带该类,则对其进行测试。 如果找到,则丢弃:
List<Element> keepList = new ArrayList<>();
Elements els = doc.select(".cellRight");
for (Element el : els){
boolean keep = true;
for (Element parentEl : el.parents()){
if (parentEl.hasClass("cellRight")){
//parent has class as well -> discard!
keep = false;
break;
}
}
if (keep){
keepList.add(el);
}
}
//keepList now contains inner most elements with your class
请注意,这是在没有编译器的情况下编写的,超出了我的脑海。 可能存在拼写/语法错误。
其他说明。
仅当存在单个类时,您对"[class=cellRight]"
才能很好地工作。
对于具有随机顺序的多个cls(这完全是可以预期的),最好使用点语法".cellRight"