Forum Discussion

DCadet's avatar
DCadet
Icon for Bronze II rankBronze II
28 days ago
Solved

Help with Introduction to Python Scripting: Ep.7 – Demonstrate Your Skills

Hello all,

I am stuck with the last question on this Immersive lab . Below is my question 

Using Python, build a web scraper to scrape the website for 12-digit phone numbers beginning with + (e.g., +123456789012). The requests and BeautifulSoup4 (BS4) libraries are available to you. How many extracted phone numbers are returned?

I created the following python script 

import requests
from bs4 import BeautifulSoup
import re

url = "http://10.102.35.108:4321" 
try:
    response = requests.get(url)
    response.raise_for_status()  # Raise an exception for bad status codes
except requests.exceptions.RequestException as e:
    print(f"Error fetching the page: {e}")
    exit()
soup = BeautifulSoup(response.text, 'html.parser')
phone_pattern = r"\+\d{12}" 
found_numbers = re.findall(phone_pattern, soup.get_text()) 
num_found = len(found_numbers)
print(f"Found {num_found} phone numbers:")
for number in found_numbers:
    print(number) 

The value is 0, but I am getting an incorrect solution. please help 

 

  • Perfect code. However, the answer field expects you to count also duplicates. I didn't notice this in the first place, because I did just a wget, then grep "+" and counted manually.

    Generally, it's best to get a local copy first (that can be analyzed manually), and then implement automatic analysis. If something goes wrong, you can always take a look at the local copy.


6 Replies

  • The URL I use was the following 
    url = "http://10.102.35.108:801" 

    • netcat's avatar
      netcat
      Icon for Silver III rankSilver III

      You're supposed to download the web page recursively, i.e. all pages as well, e.g. chariry.html etc.
      On the main page there are no phone numbers, that's right. On the other pages there are a few.

      • DCadet's avatar
        DCadet
        Icon for Bronze II rankBronze II

        Thank you for your insight, I got 14, but it's still wrong 

        import requests
        from bs4 import BeautifulSoup
        import re
        from urllib.parse import urljoin

        visited = set()
        phone_numbers = set()

        def scrape(url):
            if url in visited:
                return
            visited.add(url)
            print(f"Scraping {url}")
            try:
                r = requests.get(url)
                soup = BeautifulSoup(r.text, 'html.parser')

                # Find 12-digit numbers starting with +
                matches = re.findall(r'\+\d{12}', r.text)
                phone_numbers.update(matches)

                # Recurse into internal links
                for link in soup.find_all('a', href=True):
                    full_url = urljoin(url, link['href'])
                    if full_url.startswith(url):  # ensure we stay within the same site
                        scrape(full_url)
            except Exception as e:
                print(f"Error scraping {url}: {e}")

        start_url = f"http://10.102.50.198:{801}/"
        scrape(start_url)

        print("Extracted phone numbers:")
        for number in phone_numbers:
            print(number)

        print(f"Total unique phone numbers found: {len(phone_numbers)}")