--- title: "FAQ and Practical Gotchas in ksTFL" author: "ksTFL Team" output: rmarkdown::html_vignette: dev: pdf css: ksTFL-vignette.css vignette: > %\VignetteIndexEntry{FAQ and Practical Gotchas in ksTFL} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 80 --- ```{r setup, include=FALSE} knitr::opts_chunk$set( comment = "", collapse = TRUE, eval = FALSE, dev = "png", dev.args = if (capabilities("cairo")) list(type = "cairo") else NULL ) options(pkgdown.internet = FALSE) ``` ksTFL logo ## Overview This vignette collects short answers to the ksTFL questions that usually appear after the first successful output: hidden helper columns, width recalculation, span header levels, replay metadata, template precedence, Table of Contents behavior, and practical `define_cols()` / `compute_cols()` recipes. It is intentionally practical: the audience is readers who already built at least one spec and now need to debug or sharpen a workflow. Related reading: - [Getting Started](Getting_Started_with_ksTFL.html) for the pipeline and object model. - [Advanced StyleRows](Advanced_StyleRows.html), [Column Width Management](Column_Width_Management.html), and [Real Examples](Real_Examples_with_ksTFL.html) when a question turns into a deeper workflow problem. - [Font Management](Font_Management.html) for font discovery and fallback - [Rendering Pipeline](Rendering_Pipeline.html) for renderer internals ------------------------------------------------------------------------ ## Data and spec behavior ### 1. Why does `cols` not drop the other columns from my data? Because `cols` is a presentation lens, not a data-mutation step. ksTFL keeps the full input data inside the spec's shadow data so later `compute_cols()` calls can still reference helper fields that never appear in the document. ### 2. Can I hide a column and still use it in `compute_cols()`? Yes. This is the standard helper-column pattern: set `isVisible = FALSE` and keep using that column in conditions or `value_from` arguments. The package examples do this with fields such as `SECTION`, `SECTION_ID`, `MODELVAL`, and `SOC_GROUP`. ```r spec <- create_table(df) |> define_cols(FLAG, isVisible = FALSE) |> compute_cols(FLAG == "Y", c_style(c(PARAM, VALUE), styleRef = "flagged")) ``` ### 3. Why can I not set `colWidth` on an invisible column? Invisible columns are forced to width `"0.0cm"` and removed from width recalculation entirely. If a column must reserve visual space, it is not truly invisible and should stay visible. ### 4. Why did the other column widths change after I locked one column? Setting `colWidth` locks those columns. With `autoColWidth = TRUE` (the default), ksTFL re-normalizes the remaining visible unlocked columns so they fill the leftover width. ```r spec <- create_table(df) |> define_cols(ID, colWidth = "20%") ``` After that call, `ID` stays fixed at `20%` and the remaining visible unlocked columns are recalculated to fill the rest. ### 5. Why did `c_glue()` not modify a repeated value? If a cell was already suppressed by `dedupe = TRUE`, `c_glue()` skips it on purpose. The same skip happens for non-leader cells inside a merge, so glue the leader column or turn deduplication off for that field. ------------------------------------------------------------------------ ## Row actions and layout rules ### 6. Why does `compute_cols()` not like aggregate logic such as `mean(x)`? `compute_cols()` conditions are captured lazily and evaluated row-wise. If you need section-level or whole-table aggregates, calculate them upstream or write them into a helper column before creating the spec. ```r df$group_mean <- ave(df$AVAL, df$GROUP, FUN = mean) spec <- create_table(df) |> compute_cols(group_mean > 10, c_style(AVAL, styleRef = "flagged")) ``` ### 7. Can I nest `c_*()` actions inside each other? No. Row actions are siblings, not nested verbs. Either pass multiple actions to one `compute_cols()` call or use several `compute_cols()` calls with the same condition. ### 8. Why does every `add_span_header()` call create a new row of headers? Because `stubOrder` auto-increments when you omit it. Reuse the same `stubOrder` for sibling span headers that belong on one header row, and only increase it when you really want a new level. ```r spec <- create_table(df) |> add_span_header(c(TRT_A_N, TRT_A_PCT), label = "Treatment A", stubOrder = 1) |> add_span_header(c(TRT_B_N, TRT_B_PCT), label = "Treatment B", stubOrder = 1) ``` ### 9. Can span headers overlap? Yes across different levels, no within the same level. Headers at the same `stubOrder` must not share columns, but parent and child levels can overlap freely. ### 10. How do I keep a small table under a figure on the same page? Set `continuousSection = TRUE` on the following spec, not the first one. Keep page size and margins compatible across both sections, and use this pattern for short follow-on content because Word still handles overflow naturally. ```r report <- create_report( create_figure(plot_obj) |> set_document(continuousSection = TRUE), create_table(summary_tbl) |> set_document(continuousSection = TRUE) ) ``` ### 11. When should I use `isGrouping`, `isPaging`, and `isColBreak`? Use `isGrouping` when a value change defines a logical section, `isPaging` when that value change should start a new vertical page group, and `isColBreak` when a wide listing should split horizontally into segments while repeating ID columns. ### 12. Why do my footnotes repeat on every page? That is the default: `footnotePlace = "repeated"`. Switch to `"last_page"` when you want a final note block only, or `"doc_footer"` when the note belongs in the Word footer area. ```r tfl_set_options(footnotePlace = "last_page") ``` ------------------------------------------------------------------------ ## Rendering, replay, and reproducibility ### 13. What is the practical difference between `write_doc()`, `save_report()`, and `replay_report()`? `write_doc()` is the one-step path for everyday use. `save_report()` writes the spec JSON plus table/figure payloads without rendering, while `replay_report()` renders later from those saved artifacts and can also combine previously saved outputs into one document. ```r write_doc(report, name = "tables") saved <- save_report(report, docFileName = "tables.docx", metaPath = "meta") replay_report(saved$spec_file, meta_dir = "meta") ``` ### 14. When do I need a persistent `metaPath` instead of `tempdir()`? Use `tempdir()` when you only need the final DOCX right now. Use a persistent `metaPath` when you want exact replays, QC comparison, report inventories, or a later combined replay workflow. If you replay by DOCX name, ksTFL resolves the latest saved spec in that meta folder; if you need an exact historical version, replay by the saved JSON file name instead. ```r replay_report("tables.docx", meta_dir = "meta") replay_report("abc123def456.json", meta_dir = "meta") ``` ### 15. Can I delete the original figure file after saving a report? For replay-based workflows, yes after a successful save, because ksTFL copies the figure into `metaPath` under its `dataRef`. The saved meta folder becomes the durable rendering input. ### 16. Why did different sections of one report use different templates? That is the default behavior for multi-spec reports. Each spec resolves its own `docTemplate`, so a table can use one bundled template while a text or figure section uses another. ```r report <- create_report( create_table(adsl) |> set_page_style(docTemplate = "Navy_Pro"), create_text() |> set_page_style(docTemplate = "Carbon_Dark") ) ``` ### 17. How do I force one template across every section? Use `overrideTemplate` in `write_doc()` or `replay_report()`. That global override wins over per-spec `docTemplate` values and is the cleanest way to re-skin a finished bundle. ```r write_doc(report, name = "tables", overrideTemplate = "Navy_Pro") replay_report("tables.docx", meta_dir = "meta", overrideTemplate = "Navy_Pro") ``` ------------------------------------------------------------------------ ## TOC and report assembly ### 18. Why does a Table of Contents not appear even though I asked for one? You need both parts of the contract: request a TOC (`toc = TRUE`, `insertTOC = TRUE`, or the package option) and mark at least one title or subtitle with `toclevel`. A TOC request with no `toclevel` entries has nothing to index. ```r spec <- create_table(df) |> add_title("Table 1", toclevel = 1) write_doc(create_report(spec), name = "tables", toc = TRUE) ``` ### 19. Why is the TOC still just a placeholder when I open the DOCX? ksTFL writes a Word TOC field, not a pre-expanded static table. Open the file in Word, click inside the TOC, and update fields with `F9` to populate it. ### 20. Can `create_report()` accept a named list of specs built in a loop? Yes. `create_report()` accepts named lists of `TFL_spec` objects, which is useful when specs are created dynamically or in separate program files. The list names become the key prefixes inside the final `TFL_report`. ```r specs <- list( demog = create_table(adsl), labs = create_table(adlb) ) report <- create_report(specs) ``` ------------------------------------------------------------------------ ## Practical column and action recipes These are short copy-paste patterns for the `define_cols()` and `compute_cols()` cases that usually come up after the first working table. ### 21. How do I define several display columns in one place? Use one `define_cols()` call when the columns share the same labels, widths, or base value styles. ```r spec <- create_table(adsl) |> define_cols( c(AGE, WEIGHT, HEIGHT), label = c("Age", "Weight
(kg)", "Height
(cm)"), colWidth = c("12%", "14%", "14%"), valueStyleRef = c("ar", "ar", "ar") ) ``` This keeps aligned numeric columns easy to maintain. ### 22. How do I use `NA` to skip one column inside a batch `define_cols()` call? Use `NA` at the position you want to leave unchanged. This is handy when most columns share one update but one column should keep its existing definition. ```r spec <- create_table(adsl) |> define_cols( c(USUBJID, AGE, TRT01A), label = c("Subject ID", NA, "Treatment"), colWidth = c("18%", NA, "20%"), valueStyleRef = c("mono", "ar", NA) ) ``` Here `AGE` keeps its current label and width, and `TRT01A` keeps its current value style. This also works well with hidden helper columns when you want to skip `colWidth` because invisible columns are forced to `"0.0cm"`. ### 23. How do I hide a helper column but still use it to drive formatting? Hide the helper with `isVisible = FALSE`, then refer to it in `compute_cols()` as usual. ```r spec <- create_table(df) |> define_cols(FLAG, isVisible = FALSE) |> define_cols(c(PARAM, VALUE), label = c("Parameter", "Value")) |> add_style("flagged", s_font(color = "#8B0000", bold = TRUE)) |> compute_cols( FLAG == "Y", c_style(c(PARAM, VALUE), styleRef = "flagged") ) ``` This is the standard pattern for QC flags, section ids, and hidden totals. ### 24. How do I turn a hidden grouping column into a stub header? Use `c_addrow()` on the first row of each group and pull the display text from the hidden column. ```r spec <- create_table(df) |> add_style( "section_header", s_font(bold = TRUE, color = "#FFFFFF"), s_table_style(background_color = "#4682B4") ) |> define_cols(REGION, isVisible = FALSE) |> define_cols(c(PRODUCT, REVENUE), label = c("Product", "Revenue")) |> compute_cols( firstOf(REGION), c_addrow( pos = "above", value_from = REGION, styleRef = "section_header" ) ) ``` This is usually cleaner than repeating the region on every detail row. ### 25. How do I insert subtotals from a hidden total column? Precompute the subtotal upstream, hide that helper column, and insert it on the last row of each group. ```r spec <- create_table(df) |> add_style( "subtotal_row", s_font(bold = TRUE), s_table_style(background_color = "#D9D9D9") ) |> define_cols(TOTAL, isVisible = FALSE) |> compute_cols( lastOf(REGION), c_addrow( pos = "below", value_from = TOTAL, styleRef = f_combine("subtotal_row", "ar") ) ) ``` This works well when the display row is just a formatted version of stored summary text. ### 26. How do I apply one condition to several visible columns at once? Pass a column vector to `c_style()` instead of repeating the same condition in separate calls. ```r spec <- create_table(labs) |> add_style("out_of_range", s_font(color = "#FF4500", bold = TRUE)) |> compute_cols( VISIT == "Week 8" & AVAL > AVAL_ULN, c_style(c(PARAM, AVAL, UNIT), styleRef = "out_of_range") ) ``` Use this when the flag belongs to the row but only a few columns should show it. ### 27. How do I combine font and background styles for one rule? Compose styles with `f_combine()` instead of defining a new style for every font-plus-fill pairing. ```r spec <- create_table(df) |> add_style( "warn_bg", s_table_style(background_color = "#FFF4E5") ) |> compute_cols( CRITFL == "Y", c_style(c(PARAM, VALUE), styleRef = f_combine("b", "warn_bg")) ) ``` This is a good fit for one-off emphasis rules. ### 28. How do I give columns a base style and still add row-level highlighting later? Put default alignment or indentation in `define_cols()`, then add the conditional layer in `compute_cols()`. ```r spec <- create_table(df) |> add_style( "warn_row", s_table_style(background_color = "#FFF4E5") ) |> define_cols(PARAM, valueStyleRef = "indent_1") |> define_cols(VALUE, valueStyleRef = "ar") |> compute_cols( FLAG == "Y", c_style(everything(), styleRef = "warn_row") ) ``` The base column styles stay in place; the row style adds on top. ### 29. How do I build a total line by combining `c_merge()`, `c_clear()`, and `c_glue()`? Use one `compute_cols()` call when the same rows need several sibling actions. ```r spec <- create_table(df) |> compute_cols( PRODUCT == "TOTAL", c_merge(c(PRODUCT, REVENUE), styleRef = f_combine("b", "ar")), c_clear(PRODUCT), c_glue(PRODUCT, "after", REGION), c_glue(PRODUCT, "after", text = " total: "), c_glue(PRODUCT, "after", REVENUE) ) ``` This is useful when the display string does not exist as one input column. ### 30. How do I apply more than one action to the same condition without nested `c_*()` calls? Keep the actions as separate arguments inside one `compute_cols()` call. ```r spec <- create_table(df) |> add_style("boundary", s_font(bold = TRUE)) |> compute_cols( firstOf(GROUP), c_addrow(pos = "above", value_from = GROUP, styleRef = "boundary"), c_style(c(PARAM, VALUE), styleRef = "boundary") ) ``` Row actions are siblings, not nested verbs. ### 31. How do I build a two-level stub with one hidden column and two style rules? Insert the group header from the hidden column, then use separate style rules for summary rows and detail rows. ```r spec <- create_table(df) |> define_cols(REGION, isVisible = FALSE) |> define_cols(c(PRODUCT, REVENUE), label = c("Product", "Revenue")) |> compute_cols( firstOf(REGION), c_addrow(pos = "above", value_from = REGION, styleRef = "b") ) |> compute_cols( PRODUCT == "TOTAL", c_style(PRODUCT, styleRef = f_combine("i", "indent_1")), c_style(REVENUE, styleRef = "i") ) |> compute_cols( PRODUCT != "TOTAL", c_style(PRODUCT, styleRef = "indent_2") ) ``` That pattern is handy when the output stub needs visible hierarchy even though the source data is still flat. ### 32. Can I combine multiple actions of the same or different types, and how do they work together? Yes, but as sibling actions, not nested calls. You can pass any mix of `c_style()`, `c_addrow()`, `c_merge()`, `c_clear()`, `c_glue()`, and `c_pageBreak()` in one `compute_cols()` call. ```r spec <- create_table(df) |> compute_cols( firstOf(GROUP), c_addrow("above", value_from = GROUP, styleRef = "b"), c_style(c(PARAM, VALUE), styleRef = f_combine("b", "fc_navy")) ) |> compute_cols( PARAM == "TOTAL", c_merge(c(PARAM, VALUE), styleRef = "ar"), c_clear(PARAM), c_glue(PARAM, "after", text = "Total: "), c_glue(PARAM, "after", VALUE) ) ``` Practical rule: when one action depends on the visual result of another, prefer separate `compute_cols()` calls (as above) to keep intent explicit. ### 33. How can I create three- or four-level nested text in one column (for example Parameter/Visit/Statistic indentation)? It depends on the input data shape. Two common patterns are shown below. #### Pattern A: detail rows only, hierarchy injected with `c_addrow()` ```r dt <- tibble::tribble( ~PARAM, ~VISIT, ~STATISTICS, ~VALUE, "ALT", "Visit 1", "Mean", 1L, "ALT", "Visit 1", "Median", 2L, "ALT", "Visit 2", "Mean", 1L, "ALT", "Visit 2", "Median", 2L, "AST", "Visit 1", "Mean", 1L, "AST", "Visit 1", "Median", 2L, "AST", "Visit 2", "Mean", 1L, "AST", "Visit 2", "Median", 2L ) spec <- create_table(dt) |> define_cols(c(PARAM, VISIT), isVisible = FALSE) |> define_cols( c(STATISTICS, VALUE), label = c("Parameter
Visit
Statistics", "Value"), valueStyleRef = c("indent_2", NA), labelStyleRef = c("al", NA) ) |> compute_cols( firstOf(PARAM), c_addrow("above", value_from = PARAM) ) |> compute_cols( firstOf(VISIT), c_addrow("above", value_from = VISIT, styleRef = "indent_1") ) ``` What this does and why: - Input contains only detail rows (`STATISTICS` + `VALUE`). - `PARAM` and `VISIT` are hidden helper columns that drive layout. - `c_addrow()` inserts visible hierarchy rows above first parameter/visit boundaries. - This keeps source data tidy while producing a nested visual stub. #### Pattern B: placeholder hierarchy rows in data, collapsed with `c_merge()` ```r dt <- tibble::tribble( ~PARAM, ~VISIT, ~STATISTICS, ~VALUE, "ALT", "Visit 1", NA, NA, "ALT", "Visit 1", NA, NA, "ALT", "Visit 1", "Mean", 1L, "ALT", "Visit 1", "Median", 2L, "ALT", "Visit 2", NA, NA, "ALT", "Visit 2", NA, NA, "ALT", "Visit 2", "Mean", 1L, "ALT", "Visit 2", "Median", 2L, "AST", "Visit 1", NA, NA, "AST", "Visit 1", NA, NA, "AST", "Visit 1", "Mean", 1L, "AST", "Visit 1", "Median", 2L, "AST", "Visit 2", NA, NA, "AST", "Visit 2", NA, NA, "AST", "Visit 2", "Mean", 1L, "AST", "Visit 2", "Median", 2L ) spec <- create_table(dt) |> define_cols(c(PARAM, VISIT), isVisible = FALSE) |> define_cols( c(STATISTICS, VALUE), label = c("Parameter
Visit
Statistics", "Value"), valueStyleRef = c("indent_2", NA), labelStyleRef = c("al", NA) ) |> compute_cols( firstOf(PARAM, VISIT), c_merge(c(PARAM, VISIT, STATISTICS), styleRef = "indent_0") ) |> compute_cols( !firstOf(PARAM, VISIT) & is.na(STATISTICS), c_merge(c(VISIT, STATISTICS), styleRef = "indent_1") ) ``` What this does and why: - Input already contains placeholder hierarchy rows (`STATISTICS = NA`). - `c_merge()` turns those rows into spanning hierarchy lines. - First merge call builds the top level (`PARAM` + `VISIT` context). - Second merge call handles lower placeholder rows (visit-level line). - This pattern is useful when source extracts already contain structural rows and you want to preserve that model. Both patterns are valid. Choose by source shape: - Use Pattern A when hierarchy should be derived from boundaries. - Use Pattern B when hierarchy rows already exist in incoming data. ### 34. How do I switch between continuous sections, repeating/not repeating headers, and row-break behavior across pages? These controls come from different layers: - Continuous sections between specs: use `set_document(continuousSection = TRUE)` on the following spec. - Repeating title/subtitle groups across pages: controlled by `isContinues` (`FALSE` repeats, `TRUE` suppresses repeated title/subtitle output). - Table header repetition and row splitting across pages are template layout settings (`repeat_header_on_each_page`, `allow_row_break_across_pages`). ```r report <- create_report( create_table(tbl_a) |> set_document(isContinues = FALSE), create_table(tbl_b) |> set_document(continuousSection = TRUE, isContinues = TRUE) ) write_doc(report, name = "layout_switch") ``` Important caveat: when `isColBreak` is active, ksTFL enforces `repeat_header_on_each_page = TRUE` and `allow_row_break_across_pages = FALSE` for correct horizontal pagination. ------------------------------------------------------------------------ ## Metadata workflows: replay, combine, and validation ### 35. How do I replay a document from stored metadata without re-running R code? Use `replay_report()` with either the DOCX filename (uses the latest saved spec) or the exact spec JSON hash for a specific historical version. This replays from the saved JSON and data files, not from R objects, so the original data frames or ggplot objects are not needed. ```r # Replay the latest version by DOCX name replay_report("tables_01.docx", meta_dir = "meta") # Replay an exact historical version by spec hash replay_report("abc123def456.json", meta_dir = "meta") # Override output location replay_report( "tables_01.docx", meta_dir = "meta", output_path = "qc/tables_01_replay.docx" ) ``` Practical workflow: run production specs with `save_report()` instead of `write_doc()` to preserve the metadata, then use `replay_report()` for QC re-runs, template switches, or regulatory re-submissions without touching the original R scripts. ### 36. How do I combine multiple documents into a single DOCX with a Table of Contents? Pass a vector of spec references (DOCX names or JSON hashes) to `replay_report()` along with a combined `output_path`. The function merges all specs into one document and optionally inserts a TOC page at the front. ```r # Combine two documents from the same meta folder replay_report( spec_json = c("tables_01.docx", "listings_01.docx"), meta_dir = "meta", output_path = "output/combined_tables_listings.docx", insertTOC = TRUE, tocTitle = "Table of Contents" ) # Combine documents from different meta folders replay_report( spec_json = c( "meta_tables/abc123.json", "meta_figures/def456.json", "meta_listings/ghi789.json" ), output_path = "output/full_clinical_report.docx", insertTOC = TRUE, tocTitle = "Clinical Study Report - Contents" ) ``` This is the standard pattern for assembling final submission packages from individually validated outputs. ### 37. How do I filter and combine only the latest versions of documents? Use `list_reports()` to scan the meta folder, filter for `is_latest == TRUE`, then pass the matched `spec_file` entries to `replay_report()`. This is useful when you have many historical versions but only want to combine the current set. ```r library(dplyr) meta_index <- list_reports("meta", sort_by = "doc_file") # Keep only latest entries latest <- meta_index %>% filter(is_latest) # Optional: filter by document name patterns tables_and_figures <- latest %>% filter(grepl("table|figure", doc_file, ignore.case = TRUE)) # Combine into one document replay_report( spec_json = tables_and_figures$spec_file, meta_dir = "meta", output_path = "output/final_report.docx", insertTOC = TRUE ) ``` This pattern is particularly useful for batch production workflows where hundreds of outputs are generated separately and then assembled into themed bundles (tables-only, figures-only, or full report). ### 38. How do I match saved metadata with actual DOCX files for QC validation? Use `list_reports()` to get the metadata index, then cross-check with the actual files on disk using an inner join. This ensures both the metadata and the rendered output exist before attempting validation or replay. ```r library(dplyr) library(tibble) # Read metadata index meta_index <- list_reports("meta", sort_by = "doc_file") latest <- meta_index %>% filter(is_latest) # Scan output folder for actual DOCX files docx_on_disk <- list.files( "output", pattern = "\\.docx$", full.names = FALSE ) docx_on_disk <- docx_on_disk[!startsWith(docx_on_disk, "~$")] # Skip temp files # Inner join - keep only entries with both metadata and file matched <- latest %>% inner_join( tibble(doc_file = docx_on_disk), by = "doc_file" ) %>% arrange(doc_file, datetime) cat(sprintf( "Matched: %d of %d latest entries have corresponding DOCX files\n", nrow(matched), nrow(latest) )) # Use matched entries for validation workflow for (i in seq_len(nrow(matched))) { cat(sprintf( "%2d. %s [%s] -> %s\n", i, matched$doc_file[i], matched$datetime[i], matched$spec_file[i] )) } ``` This cross-reference pattern is the foundation of validation workflows: programmers save metadata during production runs, QC reviewers scan the output folder and metadata folder, then match and replay only the entries that exist in both places. ### 39. How do I store metadata persistently for regulatory validation? Use `save_report()` with a persistent `metaPath` (not `tempdir()`) to create a durable metadata archive. This archive contains: - Spec JSON files (hash-named, one per save) - Data JSON files (referenced by `dataRef` in specs) - Figure image files (copied with original extensions preserved) - `_index.json` (automatically maintained index of all specs) ```r # Set persistent directories in options tfl_set_options( output_directory = "output", meta_directory = "meta" ) # Save report with metadata spec1 <- create_table(adsl) %>% add_title("Table 1: Demographics", toclevel = 1) %>% set_document(hasData = TRUE) spec2 <- create_table(advs) %>% add_title("Table 2: Vital Signs", toclevel = 1) %>% set_document(hasData = TRUE) report <- create_report(spec1, spec2) result <- save_report( report, docFileName = "tables_demographics_vitals.docx", outDir = "output", metaPath = "meta", insertTOC = TRUE ) # Metadata now available for: # - QC replay: replay_report(result$spec_file, meta_dir = "meta") # - Template switch: replay_report(..., overrideTemplate = "Navy_Pro") # - Historical audit: list_reports("meta") shows all versions with timestamps ``` Validation workflow benefits: - **Reproducibility**: Exact replay without re-running upstream data processing - **Auditability**: Every save creates a timestamped entry in `_index.json` - **Template flexibility**: Re-render with different templates without changing specs - **QC independence**: Reviewers replay from metadata, not from live R sessions ### 40. How do I clean up obsolete metadata files while keeping the latest versions? Use `clean_reports()` to remove old spec JSONs and orphaned data files while preserving the most recent N versions per document. This keeps the metadata folder manageable in long-running projects. ```r # Keep only the 2 most recent versions of each document clean_reports(meta_dir = "meta", keep_versions = 2) # Keep only the latest version (most aggressive cleanup) clean_reports(meta_dir = "meta", keep_versions = 1) ``` The function: - Identifies obsolete spec JSONs (older than `keep_versions`) - Deletes obsolete specs - Scans surviving specs for referenced data/figure files - Deletes orphaned data JSONs and images not referenced by any surviving spec - Updates `_index.json` to reflect the cleaned state Run this periodically in development to avoid accumulating hundreds of obsolete metadata files, or use it before archiving a project to keep only the final validated versions.