Overview: This project is a web application designed to automate data extraction from the Gobx Partner Portal. Built using Streamlit, it combines web scraping, Excel data processing, and a sleek user interface to offer a seamless experience for extracting and organizing data. The tool automates repetitive tasks, making it an invaluable resource for businesses dealing with large-scale data.
Key Features:
- Excel Integration: Reads input data from an uploaded Excel file (.xlsx) and updates it with extracted information.
- Secure Web Scraping: Uses Selenium with undetected ChromeDriver to interact with the Gobx Partner Portal. Users manually log in for secure access.
- Data Extraction: Extracts specific data points (e.g., customer email and phone number) from the portal based on provided CNIC numbers.
- Incremental Save: Periodically saves updated data to prevent loss during long processes.
- Streamlit UI: User-friendly interface with buttons, upload fields, and real-time progress updates.
- Error Handling: Provides detailed error messages for timeout and driver issues.
- Downloadable Output: Offers a downloadable version of the updated Excel file post-processing.
Applications:
- Automates data retrieval tasks for businesses, saving time and reducing manual effort.
- Ensures data integrity by directly integrating with structured Excel files.
- Useful for data analysts and administrators handling customer or partner data.
Technologies Used:
- Python: Programming language.
- Streamlit: Framework for building interactive web applications.
- Selenium: For web scraping and browser automation.
- BeautifulSoup: For parsing HTML content.
- openpyxl: For Excel file manipulation.
Challenges Solved:
- Eliminates the repetitive task of manual data entry and retrieval.
- Enhances productivity by automating incremental saves and offering real-time feedback.
- Addresses compatibility issues with undetected ChromeDriver to bypass scraping restrictions.
Key Advantages:
- Streamlines the data extraction workflow, reducing manual errors.
- Ensures secure and efficient handling of sensitive data.
- Provides a scalable solution for repetitive business processes.