二手产品经理

二手产品经理

THIS IS RENO

Scraping - 95~96 Days - Learn Python Online for 100 Days

Record#

  1. 95 days is to get 5 news from news, submit them to openai to generate keywords, and then send the keywords to spotipy to return songs. Due to the relationship with openai, today's practice is skipped.
  2. Day 96 is to learn how to get HTML content and parse it, and finally learn the most powerful feature of Python: web scraping!
  3. Use response = requests.get(url) and html = response.text to get the HTML content of the webpage.
  4. Use soup = BeautifulSoup(html, 'html.parser') to format the HTML. Before that, import the library: from bs4 import BeautifulSoup.
  5. Use soup.find_all("span", {"class", "titleline"}) to get the specified content. span is the tag name, followed by the class and class name.
  6. Today's practice is: get the content titles from hacker news, and if they contain python and replit, then print them. During the process, it was found that there were no titles containing these two keywords, so another keyword, SQL, was added.

CODE#

main.py#

from bs4 import BeautifulSoup
import requests

url = "https://news.ycombinator.com"

response = requests.get(url)
html = response.text

soup = BeautifulSoup(html, 'html.parser')
title = soup.find_all("span", {"class", "titleline"})
print(len(title))

for txt in title:
  if "python" in txt.text or "replit" in txt.text or "SQL" in txt.text:
    print(txt.text)
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.