Record#
- 95 days is to get 5 news from news, submit them to openai to generate keywords, and then send the keywords to spotipy to return songs. Due to the relationship with openai, today's practice is skipped.
- Day 96 is to learn how to get HTML content and parse it, and finally learn the most powerful feature of Python: web scraping!
- Use
response = requests.get(url)
andhtml = response.text
to get the HTML content of the webpage. - Use
soup = BeautifulSoup(html, 'html.parser')
to format the HTML. Before that, import the library:from bs4 import BeautifulSoup
. - Use
soup.find_all("span", {"class", "titleline"})
to get the specified content.span
is the tag name, followed by the class and class name. - Today's practice is: get the content titles from hacker news, and if they contain python and replit, then print them. During the process, it was found that there were no titles containing these two keywords, so another keyword, SQL, was added.
CODE#
main.py#
from bs4 import BeautifulSoup
import requests
url = "https://news.ycombinator.com"
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
title = soup.find_all("span", {"class", "titleline"})
print(len(title))
for txt in title:
if "python" in txt.text or "replit" in txt.text or "SQL" in txt.text:
print(txt.text)