爬蟲 - 95~96天 - 在線學python100天

2023年9月16日#python学习270

AI 翻譯

這篇文章透過AI由簡體中文翻譯成繁體中文。查看原文

AI 生成的摘要

95天从news获取5条新闻，提交给openai生成关键词，再送到spotipy返回歌曲。今天的练习跳过。96天学习爬虫功能，使用requests获取html内容，使用BeautifulSoup格式化，从hacker news获取标题含python、replit或SQL的内容。

記錄#

95 天是從 news 獲取 5 條新聞，提交給 openai 生成關鍵詞，再把關鍵詞送到 spotipy 返回歌曲。因為 openai 的關係，今天的練習跳過。
96 天是學習獲取 html 內容並解析，可能終於學到 python 最強大的功能了：爬蟲！
使用 response = requests.get(url) html = response.text 獲取網頁的 html 內容。
使用 soup = BeautifulSoup(html, 'html.parser') 對 html 格式化。在這之前要導入庫：from bs4 import BeautifulSoup
使用 soup.find_all("span", {"class", "titleline"}) 獲取指定內容。span 是標籤名，後面是類和類名。
今天的練習是：從 hacker news 獲取內容標題，如果含有 python 和 replit 則 print。過程中發現沒有包含這兩個關鍵詞的標題，所以增加了另一個關鍵詞：SQL

CODE#

main.py#

from bs4 import BeautifulSoup
import requests

url = "https://news.ycombinator.com"

respone = requests.get(url)
html = respone.text

soup = BeautifulSoup(html, 'html.parser')
title = soup.find_all("span", {"class", "titleline"})
print(len(title))

for txt in title:
  if "python" in txt.text or "replit" in txt.text or "SQL" in txt.text:
    print(txt.text)