Carga Horária
Teórica por semana |
Prática por semana |
Créditos |
Duração |
Total |
4 |
2 |
8 |
8 semanas |
120 horas |
Docentes responsáveis
Luiz Carlos Estraviz Rodriguez
Objetivo
Este curso apresenta procedimentos e ferramentas que permitem aos pesquisadores tratar dados de forma aberta e reprodutível. O objetivo é capacitar o aluno a desenvolver trabalhos científicos cujas etapas de análise sejam compreensíveis e replicáveis por terceiros. A adoção de fluxos de trabalho abertos facilita a colaboração, o compartilhamento de análises e a publicação de dados, promovendo uma disseminação mais eficiente do conhecimento. Fundamentalmente, abordam-se os seguintes conceitos e ferramentas: Princípios FAIR: (Findable, Accessible, Interoperable, Reusable); Controle de Versão: Uso de Git e GitHub; Reprodutibilidade Técnica: Contêineres e ambientes virtuais (Docker, Conda); Literate Programming: Documentação dinâmica com RMarkdown, Jupyter Notebooks e Quarto.
Conteúdo
1. Inteligência Artificial (IA) na Ciência Aberta: Uso Ético de IA em fluxos de trabalho e assistência à codificação; 2. Fundamentos da Ciência Aberta: Princípios Gezelter, FOSTER e governança de dados; 3. Princípios FAIR: Práticas para dados localizáveis, acessíveis, interoperáveis e reutilizáveis; 4. Fluxos de trabalho reprodutíveis e disponibilização de conteúdo: Estruturação de projetos, disponibilização via Dokuwiki e automação de análises; 5. Organização e Gestão de Dados: Padrões de metadados e curadoria de repositórios; 6. Controle de Versão Avançado: Colaboração científica via Git e GitHub; 7. Ambientes Computacionais: Uso de Conda e Contêineres (Docker) para portabilidade técnica; 8. Programação Literária com Quarto: Evolução do RMarkdown para publicação multi-linguagem; 9. Computação Interativa com Jupyter: Exploração e documentação de dados em notebooks; 10. Fundamentos de Python para Ciência de Dados: Estruturas lógicas, funções e pacotes essenciais; 11. Fundamentos de R para Pesquisa: Estruturas de dados, vetores e lógica de programação; 12. Manipulação de Dados com Tidyverse (R): Limpeza e transformação eficiente de grandes volumes de dados; 13. Visualização de Dados com ggplot2: Princípios de design visual e geração de gráficos de alta qualidad; 14. Análise de Dados Geoespaciais e Multidimensionais: Processamento de dados científicos complexos; 15. Comunicação e Dashboards com R Shiny: Desenvolvimento de aplicações web para
exploração de resultados; 16. Publicação Aberta: Repositórios de dados, preprints e licenças Creative Commons.
Bibliografia
>> Foundational Open Science Resources
01. The Turing Way (https://book.the-turing-way.org/): Comprehensive open-source handbook covering reproducible research, ethical data science, and collaborative practices. Highly relevant for understanding reproducibility frameworks and best practices. Directly aligns with course foundations and FAIR principles
02. FOSTER Open Science Training Handbook (https://open-science-training-handbook.gitbook.io/book): Multi-module training handbook with methods, techniques, and practices for open science training. Includes practical exercises and case studies. Supports the FOSTER framework mentioned in the course justification.
03. Center for Open Science (COS) (https://www.cos.io/): Blog, YouTube channel, and resources on open research best practices. Includes TOP Guidelines (Transparency and Openness Promotion). Provides practical guidance on implementing open science practices.
>> FAIR Principles and Data Management
04. GO FAIR - FAIR Principles (https://www.go-fair.org/fair-principles/): Official FAIR principles documentation with detailed explanations of Findable, Accessible, Interoperable, and Reusable data standards. Core topic of the course.
05. The Turing Way - FAIR Data (https://book.the-turing-way.org/reproducible-research/rdm/rdm-fair/): Detailed guide on implementing FAIR principles in research data management. Bridges theory and practice for FAIR implementation
06. Enabling FAIR Data in Earth and Environmental Science (Nature) (https://www.nature.com/articles/s41597-022-01606-w): Research article on applying FAIR principles specifically to environmental science research. Domain-specific application of FAIR principles
>> Version Control with Git and GitHub
07. Git & GitHub Tutorial for Scientists (https://gitbookdown.dallasdatascience.com/): Beginner-friendly tutorial designed for scientists with no formal programming background. Covers local version control and GitHub collaboration. Directly supports course module on Git and GitHub
08. Git/GitHub Guide: A Minimal Tutorial (https://kbroman.org/github_tutorial/): Concise, practical guide to Git and GitHub for statistical and computational scientists. Accessible introduction to version control concepts.
09. Git for Scientists (https://www.gitscientist.com/): Specialized training course with practical exercises
designed specifically for scientists. Science-focused approach to Git workflows
Git for Data Analysis (LSE Blog): (https://blogs.lse.ac.uk/impactofsocialsciences/2016/12/15/git-for-data-analysis-why-version-control-is-essential-for-collaboration-and-for-gaining-public-trust/): Explains why version control is essential for collaborative research and reproducibility. Motivates the importance of Git in environmental research
>> Computational Environments: Docker and Conda
10. Containers in Research Workflows (Carpentries) (https://carpentries-incubator.github.io/docker-introduction/reproduciblity.html): Introduction to Docker containers for research reproducibility with practical examples. Supports course module on Docker
11. Docker for Study Reproducibility with R Markdown (http://library.virginia.edu/data/articles/how-to-use-docker-for-study-reproducibility-with-r-markdown): Tutorial on integrating Docker with R Markdown for reproducible research. Practical application of containers in data science workflows
12. Lab Teaching: Docker 101 for Reproducible Science (https://kordinglab.com/2022/10/28/LabTeaching-Docker-for-Science.html): Beginner-friendly introduction to Docker concepts and practical implementation. Accessible entry point for understanding containerization
13. Conda for Data Scientists (https://docs.conda.io/projects/conda/en/stable/user-guide/concepts/data-science.html): Official Conda documentation on environment management for data science. Supports course module on Conda virtual environments
14. Containers for Computational Reproducibility (Nature) (https://www.nature.com/articles/s43586-023-00244-9.pdf): Peer-reviewed article on using containers to ensure computational reproducibility. Scientific perspective on containerization benefits
>> Literate Programming and Dynamic Documentation
15. Quarto Documentation (https://quarto.org/docs/faq/rmarkdown.html): Official Quarto documentation with FAQ for R Markdown users. Explains evolution from RMarkdown to Quarto. Directly supports course module on Quarto
16. Introduction to Reproducible Publications with Quarto (Carpentries) (https://carpentries-incubator.github.io/reproducible-publications-quarto/): Workshop materials on using Quarto for scientific publications with RStudio. Practical guide to literate programming
17. R for Data Science - Quarto Chapter (https://r4ds.hadley.nz/quarto.html): Chapter on Quarto from the popular R for Data Science textbook. Integration of Quarto in data science workflows
18. Literate Programming and Reproducible Research (Duke) (https://www2.stat.duke.edu/~ar182/rr/LiterateProgramming.html): Overview of literate programming concepts and their application to reproducible research. Foundational concepts for dynamic documentation
19. Jupyter Notebook Official Site (https://jupyter.org/): Official Jupyter documentation and resources for interactive computing. Supports course module on Jupyter Notebooks
20. How to Use Jupyter Notebook: A Beginner's Tutorial (https://www.dataquest.io/blog/jupyter-notebook-tutorial/): Beginner-friendly tutorial on creating and sharing Jupyter Notebooks. Practical introduction to interactive computing
>> Python for Environmental Data Science
21. Introduction to Earth Data Science (https://earthdatascience.org/courses/intro-to-earth-data-science/): Comprehensive beginner course on analyzing and visualizing earth and environmental science data using Python. Domain-specific Python training aligned with course focus
22. Introduction to Earth and Environmental Data Science (GitHub) (https://earth-env-data-science.github.io/intro.html): Textbook introducing modern computing tools, programming, and best practices for environmental data science. Comprehensive resource for Python fundamentals in
environmental context
23. Geospatial Data Analysis with Python (https://uwgda-jupyterbook.readthedocs.io/): Course on geospatial data processing, analysis, and visualization using Python. Supports course module on geospatial and multidimensional data analysis
24. 12 Python Libraries for Geospatial Data Analysis (https://www.geoapify.com/python-geospatial-data-analysis/): Overview of essential Python libraries for geospatial analysis. Reference for geospatial tools and libraries
>> R Programming and Data Visualization
25. Data Analysis and Visualization in R for Ecologists (Data Carpentry) (https://datacarpentry.github.io/R-ecology-lesson/aio.html): Comprehensive lesson on R fundamentals and tidyverse for ecological data analysis. Supports course modules on R fundamentals and tidyverse
26. ggplot2 Official Documentation (https://ggplot2.tidyverse.org/): Official ggplot2 package documentation with grammar of graphics principles. Supports course module on data visualization with ggplot2
27. Chapter 11 ggplot2 - Tabular Data Analysis with R (https://static-bcrf.biochem.wisc.edu/courses/Tabular-data-analysis-with-R-and-Tidyverse/book/11-ggplot2chapter.html): Educational chapter on ggplot2 principles and applications. Practical guide to high-quality scientific plotting
28. R and Tidyverse Data Manipulation Tutorial (SERC) (https://serc.si.edu/coastalcarbon/r-tutorial) Tutorial on data manipulation with tidyverse for environmental data. Practical application of tidyverse in environmental research
>> Interactive Dashboards with R Shiny
29. Shiny Official Gallery (https://shiny.posit.co/r/gallery/): Gallery of Shiny applications with code examples and best practices. Supports course module on R Shiny dashboards
30 Getting Started with Shiny Dashboard (https://rstudio.github.io/shinydashboard/get_started.html): Official guide to building dashboards with shinydashboard package. Practical introduction to dashboard development
31. Building an Interactive Data Exploration App with R Shiny (ttps://kenpyfin.medium.com/building-an-interactive-data-exploration-app-with-r-shiny-819153b164b0): Tutorial on creating interactive data exploration applications. Demonstrates communication and interactive result exploration
>> Open Publishing and Data Repositories
32. Zenodo (https://zenodo.org/): Open repository for research outputs from EU-funded projects. Supports data, code, and publications. Practical platform for open publishing and data sharing
33. Figshare (https://figshare.com/): Generalist repository for datasets, figures, and research outputs with DOI assignment. Major platform for data publishing in environmental sciences
34. Dryad (https://datadryad.org/): Repository for research data underlying scientific publications. Supports open publishing module
35. Open Science Framework (OSF) (https://osf.io/) Broad interdisciplinary platform for project management, collaboration, and data sharing. Comprehensive platform for open science workflows
36. Open Preprints and Creative Commons Licensing (https://creativecommons.org/about/open-science/open-preprints/): Guide on publishing preprints with open licenses for maximum reuse. Supports open publishing module
>> AI Ethics in Research
37. The Ethics of Using Artificial Intelligence in Scientific Research
(https://pmc.ncbi.nlm.nih.gov/articles/PMC12057767/): Comprehensive review of ethical issues related to AI in research, including reproducibility and transparency concerns. Supports course module on ethical use of AI in research workflows
38. AI All-Rounder: Ethical, Trustworthy, Reproducible (https://www.fz-juelich.de/en/inm/inm-7/research-focus/ai-all-rounder-ethical-trustworthy-reproducible): Discussion of trustworthy and reproducible AI methods for research. Emphasizes reproducibility in AI-assisted research
39. Responsible AI (Google Research) (https://research.google/research-areas/responsible-ai/): Overview of responsible AI principles and research directions. Framework for ethical AI implementation
40. Machine Learning in Environmental Science (https://saiwa.ai/sairone/blog/machine-learning-in-environmental-science/): Exploration of machine learning applications in environmental sustainability and monitoring. Domain-specific AI applications
>> Comprehensive Environmental Data Science Resources
41. Earth Lab - University of Colorado Boulder (https://earthlab.colorado.edu/): Center specializing in data-intensive open, reproducible environmental science. Offers workshops and courses. Mentioned in this course as pedagogical inspiration
42. Free Earth Data Science Courses & Textbooks (https://earthdatascience.org/courses/): Collection of free, open educational resources for earth data science. Comprehensive resource library for environmental data science
43. Developing Reproducible Workflows Collaboratively (NCEAS) (https://www.nceas.ucsb.edu/news/developing-reproducible-workflows-collaboratively): Module on collaborative development of reproducible workflows for multi-institutional teams. Supports collaborative aspects of open science
>> Research Data Management and Metadata Standard
44. Metadata and Describing Data (Cornell Data Services) (https://data.research.cornell.edu/data-management/storing-and-managing/metadata/): Guide on creating effective metadata for research data. Supports data organization and management module
45. Metadata for Data Management: A Tutorial (https://guides.lib.unc.edu/metadata/standards): Tutorial on metadata standards and schemas for research data. Practical guide to metadata implementation
46. Adding Metadata to Your Research Data (Cambridge) (https://www.data.cam.ac.uk/organising-storing/metadata): Concise guide on metadata as documentation for datasets. Emphasizes metadata importance in data management
>> Best Practices and Workflow Organization
47. Tips and Resources for Reproducible Workflow (BITSS) (https://bitss.github.io/ACRE/tips-and-resources-for-reproducible-workflow.html): Practical tips for implementing reproducible workflows in research projects. Supports reproducible workflows module
48. The Basic Reproducible Workflow Template (http://www.practicereproducibleresearch.org/core-chapters/3-basic.html): Foundational framework for understanding and organizing reproducible workflows. Provides structure for project organization
49. A Reproducible Data Analysis Workflow With R Markdown, Git, and Docker (https://qcmb.psychopen.eu/index.php/qcmb/article/view/3763/3763.html): Peer-reviewed tutorial integrating multiple tools for reproducible analysis. Demonstrates integration of course topics
50. Using Open Science Tools to Teach Environmental Sciences (https://pmc.ncbi.nlm.nih.gov/articles/PMC12283245/): Recent article on pedagogical approaches to teaching open science in environmental disciplines. Directly relevant to course teaching methodology