-
-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Revisão retroativa] Campos dos Goytacazes-RJ #637
Comments
Ainda não abri PR mas gostaria de avisar que estou trabalhando nessa issue |
Conforme comentado, abri o PR #702. No caso, fui além do solicitado pelo fato de que algumas mudanças poderiam ser feitas para cobrir alguns casos que talvez ainda não existissem em 2020, data da versão anterior às mudanças do PR, com exemplos de referência via comentário para justificar a escolha do tratamento. |
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. The existing code did not address the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. The existing code did not address the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. The existing code did not address the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. The existing code did not address the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. The existing code did not address the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
Finalmente reescrevi o PR com as mudanças que eu gostaria de fazer. Como a reescrita ficou bem diferente da implementação usual dos Spiders desse repo, peço para fazer o review com tempo. |
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
Repetindo o comentário que deixei no PR:
|
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
The way the spider was implemented assumed that there could only be a single file_url per day per is_extra_edition value, which was not always true. This refactoring gathers all the various files per day and is_extra_edition. We addressed the text format for Saturday gazettes to be considered is_extra_edition. We also included the start_date and end_date handling, and edition_number when applicable. resolve okfn-brasil#637
Olá, fiz uma pesquisa pela cidade de Campos dos Goytacazes e não obtive sucesso no retorno. |
@samueldsiqueira, esta issue já tem uma PR vinculada, então não teria como ajudar pq tá "feito", estava aguardando revisão |
porém, vou fechar a PR e a issue por incompatibilidade. O comentário de @ayharano sobre parte dos documentos estarem em .rar impossibilita adicionarmos o raspador. Verifiquei o período que havia mencionado (outubro de 2012 a 2013) e segue do mesmo jeito... vou deixar uma issue para discutirmos se faz sentido ou temos como adicionar uma solução pra essa situação, e aí podemos retomar a task a partir do acumulo |
@slfabio trouxe a sugestão de ignorarmos o intervalo e seguirmos com a integração do raspador. Reabro a issue para conversarmos sobre a ideia. Fabio pode argumentar mais a sua sugestão, claro, mas a princípio, não concordo muito, pois vai incorporar ao raspador (e no limite, ao projeto) uma lógica de ficar deixando de lado certos trechos de diários oficiais de propósito, "hardcodando" esses contornos. E temos o comprometimento de oferecer a base de dados de maneira confiável e sequencial. Porém, de forma provisória, penso que podemos assumir o Forçar o start_date "errado" seria uma decisão nova no projeto, mas teria uma natureza próxima ao que fazemos com sites descontinuados: importa ter o intervalo vigente primeiro, e depois ir expandindo a cobertura rumo aos diários antigos. Aí poderíamos retomar a PR que foi fechada... O que vocês acham? |
Prefiro sua proposta também, @trevineju. Já vai trazer os últimos 11 anos para a plataforma. Por enquanto estamos sem estagiário, eu também não estou conseguindo tempo pra puxar nenhuma issue. Muito obrigado por reabrir a issue, esse é um dos maiores municípios do Estado, temos bastante interesse de incluí-lo no QD. |
Spider existente funciona, porém não possui filtro de datas (start_date e end_date) para reduzir a quantidade de requests e extrair apenas os períodos solicitados.
The text was updated successfully, but these errors were encountered: