Try It Yourself#

Applying the same methodology for extracting a hextet, we can extract the url from an html anchor tag. The steps can be enumerated:

  • use find() to identify the index location of the first tag on the page - using as an argument to find() the string <a href=

  • by the same method, find the quote " mark’s index position

  • then the second quote mark

  • extract the url from between those two quote marks.

# our data
page = """
    <h1>Lorem ipsum dolor sit amet.</h1>
        <a href="">Search</a>
      <li><a href="">Python docs</a></li>

# find the first import index value
start_link = None
# find the first quote mark
start_quote = None
# find the final quote mark that comes just before the end of the url
end_quote = None

# now we use string slicing for extraction
url = None

Try it by launching the notebook below
