Forum Discussion
Help with Introduction to Python Scripting: Ep.7 – Demonstrate Your Skills
Hello all,
I am stuck with the last question on this Immersive lab . Below is my question
Using Python, build a web scraper to scrape the website for 12-digit phone numbers beginning with + (e.g., +123456789012). The requests and BeautifulSoup4 (BS4) libraries are available to you. How many extracted phone numbers are returned?
I created the following python script
import requests
from bs4 import BeautifulSoup
import re
url = "http://10.102.35.108:4321"
try:
response = requests.get(url)
response.raise_for_status() # Raise an exception for bad status codes
except requests.exceptions.RequestException as e:
print(f"Error fetching the page: {e}")
exit()
soup = BeautifulSoup(response.text, 'html.parser')
phone_pattern = r"\+\d{12}"
found_numbers = re.findall(phone_pattern, soup.get_text())
num_found = len(found_numbers)
print(f"Found {num_found} phone numbers:")
for number in found_numbers:
print(number)
The value is 0, but I am getting an incorrect solution. please help
Perfect code. However, the answer field expects you to count also duplicates. I didn't notice this in the first place, because I did just a wget, then grep "+" and counted manually.
Generally, it's best to get a local copy first (that can be analyzed manually), and then implement automatic analysis. If something goes wrong, you can always take a look at the local copy.
6 Replies
- DCadet
Bronze II
The URL I use was the following
url = "http://10.102.35.108:801"- netcat
Silver III
You're supposed to download the web page recursively, i.e. all pages as well, e.g. chariry.html etc.
On the main page there are no phone numbers, that's right. On the other pages there are a few.- DCadet
Bronze II
Thank you for your insight, I got 14, but it's still wrong
import requests
from bs4 import BeautifulSoup
import re
from urllib.parse import urljoinvisited = set()
phone_numbers = set()def scrape(url):
if url in visited:
return
visited.add(url)
print(f"Scraping {url}")
try:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')# Find 12-digit numbers starting with +
matches = re.findall(r'\+\d{12}', r.text)
phone_numbers.update(matches)# Recurse into internal links
for link in soup.find_all('a', href=True):
full_url = urljoin(url, link['href'])
if full_url.startswith(url): # ensure we stay within the same site
scrape(full_url)
except Exception as e:
print(f"Error scraping {url}: {e}")start_url = f"http://10.102.50.198:{801}/"
scrape(start_url)print("Extracted phone numbers:")
for number in phone_numbers:
print(number)print(f"Total unique phone numbers found: {len(phone_numbers)}")