Java crawler
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Balhau 63931fc1db
Clean stuff
2 years ago
scripts/scrapers Removed file 3 years ago
src Clean stuff 2 years ago
.gitignore Work on readme.md and .gitignore 3 years ago
.gitlab-ci.yml Start working on BancoDePortugal statistics scrapper 2 years ago
.travis.yml Set jdk in travis file 6 years ago
LICENSE Create LICENSE 6 years ago
README.md Fixed readme 3 years ago
pom.xml Start work on pdf scrapper 3 years ago

README.md

build status

Web PT Data

This project has as a main purpose provide a clean way to get important data from public services of Portuguese government and other organizations with civic and public interest in general.

Open WEB Services for Portuguese public services data.

List of Services to be integrated

Documentation

SMDX

Scrapper Scripts

Sometimes an api is a bit overkill if you just to tinker quickly with the data. With this in mind was created a new folder called scripts which will hold miscelaneous scripts to help ease the process of scrapping data.

For those scrips in which is used python you’ll need to ensure that python3 is being used. It is also advised to use the traditional

virtualenv .venv; source .venv/bin/activate

And ensure that everything is working by validating python version with

python --version

BaseGov

A small python script was developed as a first iteration which will help you scrape over BaseGov.pt data