[웹크롤링] HTML의 Locator로 웹스크래핑(Web Scraping)

HTML 태그는 자신의 이름 뿐만 아니라 고유한 속성을 가질 수 있는데, 특정 태그 요소를 지칭하는 Locator가 있고 id, class를 사용합니다.

- tagname 태그의 이름

- id 하나의 고유 태그를 가리키는 라벨

- class 여러 태그를 묶는 라벨

<p>This element has only tagname</p>
<p id="target">This element has tagname and id</p>
<p class="targets">This element has tagname and class</p>

네이버 뉴스 IT/과학면 (https://news.naver.com/section/105) 에서 HTML 문서를 가져와 soup 객체에 파싱해서 저장해줍니다.

id가 results인 div 태그를 찾아봅시다.

class가 "sa_text_strong"인 strong 태그를 찾아봅시다.

class 태그를 사용해서 연쇄적으로 find 메소드 적용할 수 있습니다.

아래 예시코드를 참고

questions = soup.find_all("li", "question-list-item")

for question in questions:
	print(question.find("div", "question").find("div", "top).h4.text)

[웹크롤링] BeautifulSoup 웹스크래핑 attribute 속성 참조 (1)	2024.12.26
[웹크롤링] BeautifulSoup로 페이지네이션(pagination) 구현 (1)	2024.12.26
[웹크롤링] 정적 웹크롤링 BeautifulSoup (0)	2024.12.26
[웹크롤링] 정적 웹 크롤링 requests 라이브러리 (1)	2024.12.26
[Spark] Apache Spark 파티션과 병렬처리 구조(셔플링) (1)	2024.12.26

티스토리툴바